Age | Commit message (Collapse) | Author |
|
The panel-tpo-td043mtea1 driver incorrectly includes the OF vendor
prefix in its SPI alias. Fix it, and move the manual alias to an SPI
module device table.
Fixes: dc2e1e5b2799 ("drm/panel: Add driver for the Toppoly TD043MTEA1 panel")
Reported-by: H. Nikolaus Schaller <hns@goldelico.com>
Signed-off-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com>
Signed-off-by: Tomi Valkeinen <tomi.valkeinen@ti.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20191007170801.27647-6-laurent.pinchart@ideasonboard.com
Acked-by: Sam Ravnborg <sam@ravnborg.org>
Reviewed-by: Sebastian Reichel <sebastian.reichel@collabora.com>
Tested-by: H. Nikolaus Schaller <hns@goldelico.com>
|
|
The panel-tpo-td028ttec1 driver incorrectly includes the OF vendor
prefix in its SPI alias. Fix it.
Fixes: 415b8dd08711 ("drm/panel: Add driver for the Toppoly TD028TTEC1 panel")
Reported-by: H. Nikolaus Schaller <hns@goldelico.com>
Signed-off-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com>
Signed-off-by: Tomi Valkeinen <tomi.valkeinen@ti.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20191007170801.27647-5-laurent.pinchart@ideasonboard.com
Acked-by: Sam Ravnborg <sam@ravnborg.org>
Reviewed-by: Sebastian Reichel <sebastian.reichel@collabora.com>
Tested-by: H. Nikolaus Schaller <hns@goldelico.com>
Tested-by: Andreas Kemnade <andreas@kemnade.info>
|
|
The panel-sony-acx565akm driver incorrectly includes the OF vendor
prefix in its SPI alias. Fix it, and move the manual alias to an SPI
module device table.
Fixes: 1c8fc3f0c5d2 ("drm/panel: Add driver for the Sony ACX565AKM panel")
Reported-by: H. Nikolaus Schaller <hns@goldelico.com>
Signed-off-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com>
Signed-off-by: Tomi Valkeinen <tomi.valkeinen@ti.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20191007170801.27647-4-laurent.pinchart@ideasonboard.com
Acked-by: Sam Ravnborg <sam@ravnborg.org>
Reviewed-by: Sebastian Reichel <sebastian.reichel@collabora.com>
|
|
The panel-nec-nl8048hl11 driver incorrectly includes the OF vendor
prefix in its SPI alias. Fix it, and move the manual alias to an SPI
module device table.
Fixes: df439abe6501 ("drm/panel: Add driver for the NEC NL8048HL11 panel")
Reported-by: H. Nikolaus Schaller <hns@goldelico.com>
Signed-off-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com>
Signed-off-by: Tomi Valkeinen <tomi.valkeinen@ti.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20191007170801.27647-3-laurent.pinchart@ideasonboard.com
Acked-by: Sam Ravnborg <sam@ravnborg.org>
Reviewed-by: Sebastian Reichel <sebastian.reichel@collabora.com>
|
|
The panel-lg-lb035q02 driver incorrectly includes the OF vendor prefix
in its SPI alias. Fix it, and move the manual alias to an SPI module
device table.
Fixes: f5b0c6542476 ("drm/panel: Add driver for the LG Philips LB035Q02 panel")
Reported-by: H. Nikolaus Schaller <hns@goldelico.com>
Signed-off-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com>
Signed-off-by: Tomi Valkeinen <tomi.valkeinen@ti.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20191007170801.27647-2-laurent.pinchart@ideasonboard.com
Acked-by: Sam Ravnborg <sam@ravnborg.org>
Reviewed-by: Sebastian Reichel <sebastian.reichel@collabora.com>
|
|
Any changes interesting to tasks waiting in io_cqring_wait() are
commited with io_cqring_ev_posted(). However, io_ring_drop_ctx_refs()
also tries to do that but with no reason, that means spurious wakeups
every io_free_req() and io_uring_enter().
Just use percpu_ref_put() instead.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
git://anongit.freedesktop.org/drm/drm-intel into drm-next
UAPI Changes:
- Never allow userptr into the mappable GGTT (Chris)
No existing users. Avoid anyone from even trying to
spare a deadlock scenario.
Cross-subsystem Changes:
Core Changes:
Driver Changes:
- Eliminate struct_mutex use as BKL! (Chris)
Only used for execbuf serialisation.
- Initialize DDI TC and TBT ports (D-I) on Tigerlake (Lucas)
- Fix DKL link training for 2.7GHz and 1.62GHz (Jose)
- Add Tigerlake DKL PHY programming sequences (Clinton)
- Add Tigerlake Thunderbolt PLL divider values (Imre)
- drm/i915: Use helpers for drm_mm_node booleans (Chris)
- Restrict L3 remapping sysfs interface to dwords (Chris)
- Fix audio power up sequence for gen10+ display (Kai)
- Skip redundant execlist resubmission (Chris)
- Only unwedge if we can reset GPU first (Chris)
- Initialise breadcrumb lists on the virtual engine (Chris)
- Don't rely on kernel context existing during early errors (Matt A)
- Update Icelake+ MG_DP_MODE programming table (Clinton)
- Update DMC firmware for Icelake (Anusha)
- Downgrade DP MST error after unplugging TypeC cable (Srinivasan)
- Limit MST modes based on plane size too (Ville)
- Polish intel_tv_mode_valid() (Ville)
- Fix g4x sprite scaling stride check with GTT remapping (Ville)
- Don't advertize non-exisiting crtcs (Ville)
- Clean up encoder->crtc_mask setup (Ville)
- Use tc_port instead of port parameter to MG registers (Jose)
- Remove static variable for aux last status (Jani)
- Implement a better i945gm vblank irq vs. C-states workaround (Ville)
- Make the object creation interface consistent (CQ)
- Rename intel_vga_msr_write() to intel_vga_reset_io_mem() (Jani, Ville)
- Eliminate previous drm_dbg/drm_err usage (Jani)
- Move gmbus setup down to intel_modeset_init() (Jani)
- Abstract all vgaarb access to intel_vga.[ch] (Jani)
- Split out i915_switcheroo.[ch] from i915_drv.c (Jani)
- Use intel_gt in has_reset* (Chris)
- Eliminate return value for i915_gem_init_early (Matt A)
- Selftest improvements (Chris)
- Update HuC firmware header version number format (Daniele)
Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20191007134801.GA24313@jlahtine-desk.ger.corp.intel.com
|
|
Merge misc fixes from Andrew Morton:
"The usual shower of hotfixes.
Chris's memcg patches aren't actually fixes - they're mature but a few
niggling review issues were late to arrive.
The ocfs2 fixes are quite old - those took some time to get reviewer
attention.
Subsystems affected by this patch series: ocfs2, hotfixes, mm/memcg,
mm/slab-generic"
* emailed patches from Andrew Morton <akpm@linux-foundation.org>:
mm, sl[aou]b: guarantee natural alignment for kmalloc(power-of-two)
mm, sl[ou]b: improve memory accounting
mm, memcg: make scan aggression always exclude protection
mm, memcg: make memory.emin the baseline for utilisation determination
mm, memcg: proportional memory.{low,min} reclaim
mm/vmpressure.c: fix a signedness bug in vmpressure_register_event()
mm/page_alloc.c: fix a crash in free_pages_prepare()
mm/z3fold.c: claim page in the beginning of free
kernel/sysctl.c: do not override max_threads provided by userspace
memcg: only record foreign writebacks with dirty pages when memcg is not disabled
mm: fix -Wmissing-prototypes warnings
writeback: fix use-after-free in finish_writeback_work()
mm/memremap: drop unused SECTION_SIZE and SECTION_MASK
panic: ensure preemption is disabled during panic()
fs: ocfs2: fix a possible null-pointer dereference in ocfs2_info_scan_inode_alloc()
fs: ocfs2: fix a possible null-pointer dereference in ocfs2_write_end_nolock()
fs: ocfs2: fix possible null-pointer dereferences in ocfs2_xa_prepare_entry()
ocfs2: clear zero in unaligned direct IO
|
|
In most configurations, kmalloc() happens to return naturally aligned
(i.e. aligned to the block size itself) blocks for power of two sizes.
That means some kmalloc() users might unknowingly rely on that
alignment, until stuff breaks when the kernel is built with e.g.
CONFIG_SLUB_DEBUG or CONFIG_SLOB, and blocks stop being aligned. Then
developers have to devise workaround such as own kmem caches with
specified alignment [1], which is not always practical, as recently
evidenced in [2].
The topic has been discussed at LSF/MM 2019 [3]. Adding a
'kmalloc_aligned()' variant would not help with code unknowingly relying
on the implicit alignment. For slab implementations it would either
require creating more kmalloc caches, or allocate a larger size and only
give back part of it. That would be wasteful, especially with a generic
alignment parameter (in contrast with a fixed alignment to size).
Ideally we should provide to mm users what they need without difficult
workarounds or own reimplementations, so let's make the kmalloc()
alignment to size explicitly guaranteed for power-of-two sizes under all
configurations. What this means for the three available allocators?
* SLAB object layout happens to be mostly unchanged by the patch. The
implicitly provided alignment could be compromised with
CONFIG_DEBUG_SLAB due to redzoning, however SLAB disables redzoning for
caches with alignment larger than unsigned long long. Practically on at
least x86 this includes kmalloc caches as they use cache line alignment,
which is larger than that. Still, this patch ensures alignment on all
arches and cache sizes.
* SLUB layout is also unchanged unless redzoning is enabled through
CONFIG_SLUB_DEBUG and boot parameter for the particular kmalloc cache.
With this patch, explicit alignment is guaranteed with redzoning as
well. This will result in more memory being wasted, but that should be
acceptable in a debugging scenario.
* SLOB has no implicit alignment so this patch adds it explicitly for
kmalloc(). The potential downside is increased fragmentation. While
pathological allocation scenarios are certainly possible, in my testing,
after booting a x86_64 kernel+userspace with virtme, around 16MB memory
was consumed by slab pages both before and after the patch, with
difference in the noise.
[1] https://lore.kernel.org/linux-btrfs/c3157c8e8e0e7588312b40c853f65c02fe6c957a.1566399731.git.christophe.leroy@c-s.fr/
[2] https://lore.kernel.org/linux-fsdevel/20190225040904.5557-1-ming.lei@redhat.com/
[3] https://lwn.net/Articles/787740/
[akpm@linux-foundation.org: documentation fixlet, per Matthew]
Link: http://lkml.kernel.org/r/20190826111627.7505-3-vbabka@suse.cz
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Reviewed-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Acked-by: Michal Hocko <mhocko@suse.com>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Acked-by: Christoph Hellwig <hch@lst.de>
Cc: David Sterba <dsterba@suse.cz>
Cc: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Ming Lei <ming.lei@redhat.com>
Cc: Dave Chinner <david@fromorbit.com>
Cc: "Darrick J . Wong" <darrick.wong@oracle.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: James Bottomley <James.Bottomley@HansenPartnership.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Patch series "guarantee natural alignment for kmalloc()", v2.
This patch (of 2):
SLOB currently doesn't account its pages at all, so in /proc/meminfo the
Slab field shows zero. Modifying a counter on page allocation and
freeing should be acceptable even for the small system scenarios SLOB is
intended for. Since reclaimable caches are not separated in SLOB,
account everything as unreclaimable.
SLUB currently doesn't account kmalloc() and kmalloc_node() allocations
larger than order-1 page, that are passed directly to the page
allocator. As they also don't appear in /proc/slabinfo, it might look
like a memory leak. For consistency, account them as well. (SLAB
doesn't actually use page allocator directly, so no change there).
Ideally SLOB and SLUB would be handled in separate patches, but due to
the shared kmalloc_order() function and different kfree()
implementations, it's easier to patch both at once to prevent
inconsistencies.
Link: http://lkml.kernel.org/r/20190826111627.7505-2-vbabka@suse.cz
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Ming Lei <ming.lei@redhat.com>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: "Darrick J . Wong" <darrick.wong@oracle.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: James Bottomley <James.Bottomley@HansenPartnership.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
This patch is an incremental improvement on the existing
memory.{low,min} relative reclaim work to base its scan pressure
calculations on how much protection is available compared to the current
usage, rather than how much the current usage is over some protection
threshold.
This change doesn't change the experience for the user in the normal
case too much. One benefit is that it replaces the (somewhat arbitrary)
100% cutoff with an indefinite slope, which makes it easier to ballpark
a memory.low value.
As well as this, the old methodology doesn't quite apply generically to
machines with varying amounts of physical memory. Let's say we have a
top level cgroup, workload.slice, and another top level cgroup,
system-management.slice. We want to roughly give 12G to
system-management.slice, so on a 32GB machine we set memory.low to 20GB
in workload.slice, and on a 64GB machine we set memory.low to 52GB.
However, because these are relative amounts to the total machine size,
while the amount of memory we want to generally be willing to yield to
system.slice is absolute (12G), we end up putting more pressure on
system.slice just because we have a larger machine and a larger workload
to fill it, which seems fairly unintuitive. With this new behaviour, we
don't end up with this unintended side effect.
Previously the way that memory.low protection works is that if you are
50% over a certain baseline, you get 50% of your normal scan pressure.
This is certainly better than the previous cliff-edge behaviour, but it
can be improved even further by always considering memory under the
currently enforced protection threshold to be out of bounds. This means
that we can set relatively low memory.low thresholds for variable or
bursty workloads while still getting a reasonable level of protection,
whereas with the previous version we may still trivially hit the 100%
clamp. The previous 100% clamp is also somewhat arbitrary, whereas this
one is more concretely based on the currently enforced protection
threshold, which is likely easier to reason about.
There is also a subtle issue with the way that proportional reclaim
worked previously -- it promotes having no memory.low, since it makes
pressure higher during low reclaim. This happens because we base our
scan pressure modulation on how far memory.current is between memory.min
and memory.low, but if memory.low is unset, we only use the overage
method. In most cromulent configurations, this then means that we end
up with *more* pressure than with no memory.low at all when we're in low
reclaim, which is not really very usable or expected.
With this patch, memory.low and memory.min affect reclaim pressure in a
more understandable and composable way. For example, from a user
standpoint, "protected" memory now remains untouchable from a reclaim
aggression standpoint, and users can also have more confidence that
bursty workloads will still receive some amount of guaranteed
protection.
Link: http://lkml.kernel.org/r/20190322160307.GA3316@chrisdown.name
Signed-off-by: Chris Down <chris@chrisdown.name>
Reviewed-by: Roman Gushchin <guro@fb.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Acked-by: Michal Hocko <mhocko@kernel.org>
Cc: Tejun Heo <tj@kernel.org>
Cc: Dennis Zhou <dennis@kernel.org>
Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Roman points out that when when we do the low reclaim pass, we scale the
reclaim pressure relative to position between 0 and the maximum
protection threshold.
However, if the maximum protection is based on memory.elow, and
memory.emin is above zero, this means we still may get binary behaviour
on second-pass low reclaim. This is because we scale starting at 0, not
starting at memory.emin, and since we don't scan at all below emin, we
end up with cliff behaviour.
This should be a fairly uncommon case since usually we don't go into the
second pass, but it makes sense to scale our low reclaim pressure
starting at emin.
You can test this by catting two large sparse files, one in a cgroup
with emin set to some moderate size compared to physical RAM, and
another cgroup without any emin. In both cgroups, set an elow larger
than 50% of physical RAM. The one with emin will have less page
scanning, as reclaim pressure is lower.
Rebase on top of and apply the same idea as what was applied to handle
cgroup_memory=disable properly for the original proportional patch
http://lkml.kernel.org/r/20190201045711.GA18302@chrisdown.name ("mm,
memcg: Handle cgroup_disable=memory when getting memcg protection").
Link: http://lkml.kernel.org/r/20190201051810.GA18895@chrisdown.name
Signed-off-by: Chris Down <chris@chrisdown.name>
Suggested-by: Roman Gushchin <guro@fb.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Tejun Heo <tj@kernel.org>
Cc: Dennis Zhou <dennis@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
cgroup v2 introduces two memory protection thresholds: memory.low
(best-effort) and memory.min (hard protection). While they generally do
what they say on the tin, there is a limitation in their implementation
that makes them difficult to use effectively: that cliff behaviour often
manifests when they become eligible for reclaim. This patch implements
more intuitive and usable behaviour, where we gradually mount more
reclaim pressure as cgroups further and further exceed their protection
thresholds.
This cliff edge behaviour happens because we only choose whether or not
to reclaim based on whether the memcg is within its protection limits
(see the use of mem_cgroup_protected in shrink_node), but we don't vary
our reclaim behaviour based on this information. Imagine the following
timeline, with the numbers the lruvec size in this zone:
1. memory.low=1000000, memory.current=999999. 0 pages may be scanned.
2. memory.low=1000000, memory.current=1000000. 0 pages may be scanned.
3. memory.low=1000000, memory.current=1000001. 1000001* pages may be
scanned. (?!)
* Of course, we won't usually scan all available pages in the zone even
without this patch because of scan control priority, over-reclaim
protection, etc. However, as shown by the tests at the end, these
techniques don't sufficiently throttle such an extreme change in input,
so cliff-like behaviour isn't really averted by their existence alone.
Here's an example of how this plays out in practice. At Facebook, we are
trying to protect various workloads from "system" software, like
configuration management tools, metric collectors, etc (see this[0] case
study). In order to find a suitable memory.low value, we start by
determining the expected memory range within which the workload will be
comfortable operating. This isn't an exact science -- memory usage deemed
"comfortable" will vary over time due to user behaviour, differences in
composition of work, etc, etc. As such we need to ballpark memory.low,
but doing this is currently problematic:
1. If we end up setting it too low for the workload, it won't have
*any* effect (see discussion above). The group will receive the full
weight of reclaim and won't have any priority while competing with the
less important system software, as if we had no memory.low configured
at all.
2. Because of this behaviour, we end up erring on the side of setting
it too high, such that the comfort range is reliably covered. However,
protected memory is completely unavailable to the rest of the system,
so we might cause undue memory and IO pressure there when we *know* we
have some elasticity in the workload.
3. Even if we get the value totally right, smack in the middle of the
comfort zone, we get extreme jumps between no pressure and full
pressure that cause unpredictable pressure spikes in the workload due
to the current binary reclaim behaviour.
With this patch, we can set it to our ballpark estimation without too much
worry. Any undesirable behaviour, such as too much or too little reclaim
pressure on the workload or system will be proportional to how far our
estimation is off. This means we can set memory.low much more
conservatively and thus waste less resources *without* the risk of the
workload falling off a cliff if we overshoot.
As a more abstract technical description, this unintuitive behaviour
results in having to give high-priority workloads a large protection
buffer on top of their expected usage to function reliably, as otherwise
we have abrupt periods of dramatically increased memory pressure which
hamper performance. Having to set these thresholds so high wastes
resources and generally works against the principle of work conservation.
In addition, having proportional memory reclaim behaviour has other
benefits. Most notably, before this patch it's basically mandatory to set
memory.low to a higher than desirable value because otherwise as soon as
you exceed memory.low, all protection is lost, and all pages are eligible
to scan again. By contrast, having a gradual ramp in reclaim pressure
means that you now still get some protection when thresholds are exceeded,
which means that one can now be more comfortable setting memory.low to
lower values without worrying that all protection will be lost. This is
important because workingset size is really hard to know exactly,
especially with variable workloads, so at least getting *some* protection
if your workingset size grows larger than you expect increases user
confidence in setting memory.low without a huge buffer on top being
needed.
Thanks a lot to Johannes Weiner and Tejun Heo for their advice and
assistance in thinking about how to make this work better.
In testing these changes, I intended to verify that:
1. Changes in page scanning become gradual and proportional instead of
binary.
To test this, I experimented stepping further and further down
memory.low protection on a workload that floats around 19G workingset
when under memory.low protection, watching page scan rates for the
workload cgroup:
+------------+-----------------+--------------------+--------------+
| memory.low | test (pgscan/s) | control (pgscan/s) | % of control |
+------------+-----------------+--------------------+--------------+
| 21G | 0 | 0 | N/A |
| 17G | 867 | 3799 | 23% |
| 12G | 1203 | 3543 | 34% |
| 8G | 2534 | 3979 | 64% |
| 4G | 3980 | 4147 | 96% |
| 0 | 3799 | 3980 | 95% |
+------------+-----------------+--------------------+--------------+
As you can see, the test kernel (with a kernel containing this
patch) ramps up page scanning significantly more gradually than the
control kernel (without this patch).
2. More gradual ramp up in reclaim aggression doesn't result in
premature OOMs.
To test this, I wrote a script that slowly increments the number of
pages held by stress(1)'s --vm-keep mode until a production system
entered severe overall memory contention. This script runs in a highly
protected slice taking up the majority of available system memory.
Watching vmstat revealed that page scanning continued essentially
nominally between test and control, without causing forward reclaim
progress to become arrested.
[0]: https://facebookmicrosites.github.io/cgroup2/docs/overview.html#case-study-the-fbtax2-project
[akpm@linux-foundation.org: reflow block comments to fit in 80 cols]
[chris@chrisdown.name: handle cgroup_disable=memory when getting memcg protection]
Link: http://lkml.kernel.org/r/20190201045711.GA18302@chrisdown.name
Link: http://lkml.kernel.org/r/20190124014455.GA6396@chrisdown.name
Signed-off-by: Chris Down <chris@chrisdown.name>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Reviewed-by: Roman Gushchin <guro@fb.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Tejun Heo <tj@kernel.org>
Cc: Dennis Zhou <dennis@kernel.org>
Cc: Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
The "mode" and "level" variables are enums and in this context GCC will
treat them as unsigned ints so the error handling is never triggered.
I also removed the bogus initializer because it isn't required any more
and it's sort of confusing.
[akpm@linux-foundation.org: reduce implicit and explicit typecasting]
[akpm@linux-foundation.org: fix return value, add comment, per Matthew]
Link: http://lkml.kernel.org/r/20190925110449.GO3264@mwanda
Fixes: 3cadfa2b9497 ("mm/vmpressure.c: convert to use match_string() helper")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Acked-by: David Rientjes <rientjes@google.com>
Reviewed-by: Matthew Wilcox <willy@infradead.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Enrico Weigelt <info@metux.net>
Cc: Kate Stewart <kstewart@linuxfoundation.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
On architectures like s390, arch_free_page() could mark the page unused
(set_page_unused()) and any access later would trigger a kernel panic.
Fix it by moving arch_free_page() after all possible accessing calls.
Hardware name: IBM 2964 N96 400 (z/VM 6.4.0)
Krnl PSW : 0404e00180000000 0000000026c2b96e (__free_pages_ok+0x34e/0x5d8)
R:0 T:1 IO:0 EX:0 Key:0 M:1 W:0 P:0 AS:3 CC:2 PM:0 RI:0 EA:3
Krnl GPRS: 0000000088d43af7 0000000000484000 000000000000007c 000000000000000f
000003d080012100 000003d080013fc0 0000000000000000 0000000000100000
00000000275cca48 0000000000000100 0000000000000008 000003d080010000
00000000000001d0 000003d000000000 0000000026c2b78a 000000002717fdb0
Krnl Code: 0000000026c2b95c: ec1100b30659 risbgn %r1,%r1,0,179,6
0000000026c2b962: e32014000036 pfd 2,1024(%r1)
#0000000026c2b968: d7ff10001000 xc 0(256,%r1),0(%r1)
>0000000026c2b96e: 41101100 la %r1,256(%r1)
0000000026c2b972: a737fff8 brctg %r3,26c2b962
0000000026c2b976: d7ff10001000 xc 0(256,%r1),0(%r1)
0000000026c2b97c: e31003400004 lg %r1,832
0000000026c2b982: ebff1430016a asi 5168(%r1),-1
Call Trace:
__free_pages_ok+0x16a/0x5d8)
memblock_free_all+0x206/0x290
mem_init+0x58/0x120
start_kernel+0x2b0/0x570
startup_continue+0x6a/0xc0
INFO: lockdep is turned off.
Last Breaking-Event-Address:
__free_pages_ok+0x372/0x5d8
Kernel panic - not syncing: Fatal exception: panic_on_oops
00: HCPGIR450W CP entered; disabled wait PSW 00020001 80000000 00000000 26A2379C
In the past, only kernel_poison_pages() would trigger this but it needs
"page_poison=on" kernel cmdline, and I suspect nobody tested that on
s390. Recently, kernel_init_free_pages() (commit 6471384af2a6 ("mm:
security: introduce init_on_alloc=1 and init_on_free=1 boot options"))
was added and could trigger this as well.
[akpm@linux-foundation.org: add comment]
Link: http://lkml.kernel.org/r/1569613623-16820-1-git-send-email-cai@lca.pw
Fixes: 8823b1dbc05f ("mm/page_poison.c: enable PAGE_POISONING as a separate option")
Fixes: 6471384af2a6 ("mm: security: introduce init_on_alloc=1 and init_on_free=1 boot options")
Signed-off-by: Qian Cai <cai@lca.pw>
Reviewed-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Acked-by: Christian Borntraeger <borntraeger@de.ibm.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: "Kirill A. Shutemov" <kirill@shutemov.name>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Alexander Duyck <alexander.duyck@gmail.com>
Cc: <stable@vger.kernel.org> [5.3+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
There's a really hard to reproduce race in z3fold between z3fold_free()
and z3fold_reclaim_page(). z3fold_reclaim_page() can claim the page
after z3fold_free() has checked if the page was claimed and
z3fold_free() will then schedule this page for compaction which may in
turn lead to random page faults (since that page would have been
reclaimed by then).
Fix that by claiming page in the beginning of z3fold_free() and not
forgetting to clear the claim in the end.
[vitalywool@gmail.com: v2]
Link: http://lkml.kernel.org/r/20190928113456.152742cf@bigdell
Link: http://lkml.kernel.org/r/20190926104844.4f0c6efa1366b8f5741eaba9@gmail.com
Signed-off-by: Vitaly Wool <vitalywool@gmail.com>
Reported-by: Markus Linnala <markus.linnala@gmail.com>
Cc: Dan Streetman <ddstreet@ieee.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Henry Burns <henrywolfeburns@gmail.com>
Cc: Shakeel Butt <shakeelb@google.com>
Cc: Markus Linnala <markus.linnala@gmail.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Partially revert 16db3d3f1170 ("kernel/sysctl.c: threads-max observe
limits") because the patch is causing a regression to any workload which
needs to override the auto-tuning of the limit provided by kernel.
set_max_threads is implementing a boot time guesstimate to provide a
sensible limit of the concurrently running threads so that runaways will
not deplete all the memory. This is a good thing in general but there
are workloads which might need to increase this limit for an application
to run (reportedly WebSpher MQ is affected) and that is simply not
possible after the mentioned change. It is also very dubious to
override an admin decision by an estimation that doesn't have any direct
relation to correctness of the kernel operation.
Fix this by dropping set_max_threads from sysctl_max_threads so any
value is accepted as long as it fits into MAX_THREADS which is important
to check because allowing more threads could break internal robust futex
restriction. While at it, do not use MIN_THREADS as the lower boundary
because it is also only a heuristic for automatic estimation and admin
might have a good reason to stop new threads to be created even when
below this limit.
This became more severe when we switched x86 from 4k to 8k kernel
stacks. Starting since 6538b8ea886e ("x86_64: expand kernel stack to
16K") (3.16) we use THREAD_SIZE_ORDER = 2 and that halved the auto-tuned
value.
In the particular case
3.12
kernel.threads-max = 515561
4.4
kernel.threads-max = 200000
Neither of the two values is really insane on 32GB machine.
I am not sure we want/need to tune the max_thread value further. If
anything the tuning should be removed altogether if proven not useful in
general. But we definitely need a way to override this auto-tuning.
Link: http://lkml.kernel.org/r/20190922065801.GB18814@dhcp22.suse.cz
Fixes: 16db3d3f1170 ("kernel/sysctl.c: threads-max observe limits")
Signed-off-by: Michal Hocko <mhocko@suse.com>
Reviewed-by: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Heinrich Schuchardt <xypron.glpk@gmx.de>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
disabled
In kdump kernel, memcg usually is disabled with 'cgroup_disable=memory'
for saving memory. Now kdump kernel will always panic when dump vmcore
to local disk:
BUG: kernel NULL pointer dereference, address: 0000000000000ab8
Oops: 0000 [#1] SMP NOPTI
CPU: 0 PID: 598 Comm: makedumpfile Not tainted 5.3.0+ #26
Hardware name: HPE ProLiant DL385 Gen10/ProLiant DL385 Gen10, BIOS A40 10/02/2018
RIP: 0010:mem_cgroup_track_foreign_dirty_slowpath+0x38/0x140
Call Trace:
__set_page_dirty+0x52/0xc0
iomap_set_page_dirty+0x50/0x90
iomap_write_end+0x6e/0x270
iomap_write_actor+0xce/0x170
iomap_apply+0xba/0x11e
iomap_file_buffered_write+0x62/0x90
xfs_file_buffered_aio_write+0xca/0x320 [xfs]
new_sync_write+0x12d/0x1d0
vfs_write+0xa5/0x1a0
ksys_write+0x59/0xd0
do_syscall_64+0x59/0x1e0
entry_SYSCALL_64_after_hwframe+0x44/0xa9
And this will corrupt the 1st kernel too with 'cgroup_disable=memory'.
Via the trace and with debugging, it is pointing to commit 97b27821b485
("writeback, memcg: Implement foreign dirty flushing") which introduced
this regression. Disabling memcg causes the null pointer dereference at
uninitialized data in function mem_cgroup_track_foreign_dirty_slowpath().
Fix it by returning directly if memcg is disabled, but not trying to
record the foreign writebacks with dirty pages.
Link: http://lkml.kernel.org/r/20190924141928.GD31919@MiWiFi-R3L-srv
Fixes: 97b27821b485 ("writeback, memcg: Implement foreign dirty flushing")
Signed-off-by: Baoquan He <bhe@redhat.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Jan Kara <jack@suse.cz>
Cc: Tejun Heo <tj@kernel.org>
Cc: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
We get two warnings when build kernel W=1:
mm/shuffle.c:36:12: warning: no previous prototype for `shuffle_show' [-Wmissing-prototypes]
mm/sparse.c:220:6: warning: no previous prototype for `subsection_mask_set' [-Wmissing-prototypes]
Make the functions static to fix this.
Link: http://lkml.kernel.org/r/1566978161-7293-1-git-send-email-wang.yi59@zte.com.cn
Signed-off-by: Yi Wang <wang.yi59@zte.com.cn>
Reviewed-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
finish_writeback_work() reads @done->waitq after decrementing
@done->cnt. However, once @done->cnt reaches zero, @done may be freed
(from stack) at any moment and @done->waitq can contain something
unrelated by the time finish_writeback_work() tries to read it. This
led to the following crash.
"BUG: kernel NULL pointer dereference, address: 0000000000000002"
#PF: supervisor write access in kernel mode
#PF: error_code(0x0002) - not-present page
PGD 0 P4D 0
Oops: 0002 [#1] SMP DEBUG_PAGEALLOC
CPU: 40 PID: 555153 Comm: kworker/u98:50 Kdump: loaded Not tainted
...
Workqueue: writeback wb_workfn (flush-btrfs-1)
RIP: 0010:_raw_spin_lock_irqsave+0x10/0x30
Code: 48 89 d8 5b c3 e8 50 db 6b ff eb f4 0f 1f 40 00 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 53 9c 5b fa 31 c0 ba 01 00 00 00 <f0> 0f b1 17 75 05 48 89 d8 5b c3 89 c6 e8 fe ca 6b ff eb f2 66 90
RSP: 0018:ffffc90049b27d98 EFLAGS: 00010046
RAX: 0000000000000000 RBX: 0000000000000246 RCX: 0000000000000000
RDX: 0000000000000001 RSI: 0000000000000003 RDI: 0000000000000002
RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000001
R10: ffff889fff407600 R11: ffff88ba9395d740 R12: 000000000000e300
R13: 0000000000000003 R14: 0000000000000000 R15: 0000000000000000
FS: 0000000000000000(0000) GS:ffff88bfdfa00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000002 CR3: 0000000002409005 CR4: 00000000001606e0
Call Trace:
__wake_up_common_lock+0x63/0xc0
wb_workfn+0xd2/0x3e0
process_one_work+0x1f5/0x3f0
worker_thread+0x2d/0x3d0
kthread+0x111/0x130
ret_from_fork+0x1f/0x30
Fix it by reading and caching @done->waitq before decrementing
@done->cnt.
Link: http://lkml.kernel.org/r/20190924010631.GH2233839@devbig004.ftw2.facebook.com
Fixes: 5b9cce4c7eb069 ("writeback: Generalize and expose wb_completion")
Signed-off-by: Tejun Heo <tj@kernel.org>
Debugged-by: Chris Mason <clm@fb.com>
Reviewed-by: Jens Axboe <axboe@kernel.dk>
Cc: Jan Kara <jack@suse.cz>
Cc: <stable@vger.kernel.org> [5.2+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
SECTION_SIZE and SECTION_MASK macros are not getting used anymore. But
they do conflict with existing definitions on arm64 platform causing
following warning during build. Lets drop these unused macros.
mm/memremap.c:16: warning: "SECTION_MASK" redefined
#define SECTION_MASK ~((1UL << PA_SECTION_SHIFT) - 1)
arch/arm64/include/asm/pgtable-hwdef.h:79: note: this is the location of the previous definition
#define SECTION_MASK (~(SECTION_SIZE-1))
mm/memremap.c:17: warning: "SECTION_SIZE" redefined
#define SECTION_SIZE (1UL << PA_SECTION_SHIFT)
arch/arm64/include/asm/pgtable-hwdef.h:78: note: this is the location of the previous definition
#define SECTION_SIZE (_AC(1, UL) << SECTION_SHIFT)
Link: http://lkml.kernel.org/r/1569312010-31313-1-git-send-email-anshuman.khandual@arm.com
Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
Reported-by: kbuild test robot <lkp@intel.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: Logan Gunthorpe <logang@deltatee.com>
Cc: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Calling 'panic()' on a kernel with CONFIG_PREEMPT=y can leave the
calling CPU in an infinite loop, but with interrupts and preemption
enabled. From this state, userspace can continue to be scheduled,
despite the system being "dead" as far as the kernel is concerned.
This is easily reproducible on arm64 when booting with "nosmp" on the
command line; a couple of shell scripts print out a periodic "Ping"
message whilst another triggers a crash by writing to
/proc/sysrq-trigger:
| sysrq: Trigger a crash
| Kernel panic - not syncing: sysrq triggered crash
| CPU: 0 PID: 1 Comm: init Not tainted 5.2.15 #1
| Hardware name: linux,dummy-virt (DT)
| Call trace:
| dump_backtrace+0x0/0x148
| show_stack+0x14/0x20
| dump_stack+0xa0/0xc4
| panic+0x140/0x32c
| sysrq_handle_reboot+0x0/0x20
| __handle_sysrq+0x124/0x190
| write_sysrq_trigger+0x64/0x88
| proc_reg_write+0x60/0xa8
| __vfs_write+0x18/0x40
| vfs_write+0xa4/0x1b8
| ksys_write+0x64/0xf0
| __arm64_sys_write+0x14/0x20
| el0_svc_common.constprop.0+0xb0/0x168
| el0_svc_handler+0x28/0x78
| el0_svc+0x8/0xc
| Kernel Offset: disabled
| CPU features: 0x0002,24002004
| Memory Limit: none
| ---[ end Kernel panic - not syncing: sysrq triggered crash ]---
| Ping 2!
| Ping 1!
| Ping 1!
| Ping 2!
The issue can also be triggered on x86 kernels if CONFIG_SMP=n,
otherwise local interrupts are disabled in 'smp_send_stop()'.
Disable preemption in 'panic()' before re-enabling interrupts.
Link: http://lkml.kernel.org/r/20191002123538.22609-1-will@kernel.org
Link: https://lore.kernel.org/r/BX1W47JXPMR8.58IYW53H6M5N@dragonstone
Signed-off-by: Will Deacon <will@kernel.org>
Reported-by: Xogium <contact@xogium.me>
Reviewed-by: Kees Cook <keescook@chromium.org>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Feng Tang <feng.tang@intel.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
ocfs2_info_scan_inode_alloc()
In ocfs2_info_scan_inode_alloc(), there is an if statement on line 283
to check whether inode_alloc is NULL:
if (inode_alloc)
When inode_alloc is NULL, it is used on line 287:
ocfs2_inode_lock(inode_alloc, &bh, 0);
ocfs2_inode_lock_full_nested(inode, ...)
struct ocfs2_super *osb = OCFS2_SB(inode->i_sb);
Thus, a possible null-pointer dereference may occur.
To fix this bug, inode_alloc is checked on line 286.
This bug is found by a static analysis tool STCheck written by us.
Link: http://lkml.kernel.org/r/20190726033717.32359-1-baijiaju1990@gmail.com
Signed-off-by: Jia-Ju Bai <baijiaju1990@gmail.com>
Reviewed-by: Joseph Qi <joseph.qi@linux.alibaba.com>
Cc: Mark Fasheh <mark@fasheh.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Cc: Changwei Ge <gechangwei@live.cn>
Cc: Gang He <ghe@suse.com>
Cc: Jun Piao <piaojun@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
In ocfs2_write_end_nolock(), there are an if statement on lines 1976,
2047 and 2058, to check whether handle is NULL:
if (handle)
When handle is NULL, it is used on line 2045:
ocfs2_update_inode_fsync_trans(handle, inode, 1);
oi->i_sync_tid = handle->h_transaction->t_tid;
Thus, a possible null-pointer dereference may occur.
To fix this bug, handle is checked before calling
ocfs2_update_inode_fsync_trans().
This bug is found by a static analysis tool STCheck written by us.
Link: http://lkml.kernel.org/r/20190726033705.32307-1-baijiaju1990@gmail.com
Signed-off-by: Jia-Ju Bai <baijiaju1990@gmail.com>
Reviewed-by: Joseph Qi <joseph.qi@linux.alibaba.com>
Cc: Mark Fasheh <mark@fasheh.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Cc: Changwei Ge <gechangwei@live.cn>
Cc: Gang He <ghe@suse.com>
Cc: Jun Piao <piaojun@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
In ocfs2_xa_prepare_entry(), there is an if statement on line 2136 to
check whether loc->xl_entry is NULL:
if (loc->xl_entry)
When loc->xl_entry is NULL, it is used on line 2158:
ocfs2_xa_add_entry(loc, name_hash);
loc->xl_entry->xe_name_hash = cpu_to_le32(name_hash);
loc->xl_entry->xe_name_offset = cpu_to_le16(loc->xl_size);
and line 2164:
ocfs2_xa_add_namevalue(loc, xi);
loc->xl_entry->xe_value_size = cpu_to_le64(xi->xi_value_len);
loc->xl_entry->xe_name_len = xi->xi_name_len;
Thus, possible null-pointer dereferences may occur.
To fix these bugs, if loc-xl_entry is NULL, ocfs2_xa_prepare_entry()
abnormally returns with -EINVAL.
These bugs are found by a static analysis tool STCheck written by us.
[akpm@linux-foundation.org: remove now-unused ocfs2_xa_add_entry()]
Link: http://lkml.kernel.org/r/20190726101447.9153-1-baijiaju1990@gmail.com
Signed-off-by: Jia-Ju Bai <baijiaju1990@gmail.com>
Reviewed-by: Joseph Qi <joseph.qi@linux.alibaba.com>
Cc: Mark Fasheh <mark@fasheh.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Cc: Changwei Ge <gechangwei@live.cn>
Cc: Gang He <ghe@suse.com>
Cc: Jun Piao <piaojun@huawei.com>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Unused portion of a part-written fs-block-sized block is not set to zero
in unaligned append direct write.This can lead to serious data
inconsistencies.
Ocfs2 manage disk with cluster size(for example, 1M), part-written in
one cluster will change the cluster state from UN-WRITTEN to WRITTEN,
VFS(function dio_zero_block) doesn't do the cleaning because bh's state
is not set to NEW in function ocfs2_dio_wr_get_block when we write a
WRITTEN cluster. For example, the cluster size is 1M, file size is 8k
and we direct write from 14k to 15k, then 12k~14k and 15k~16k will
contain dirty data.
We have to deal with two cases:
1.The starting position of direct write is outside the file.
2.The starting position of direct write is located in the file.
We need set bh's state to NEW in the first case. In the second case, we
need mapped twice because bh's state of area out file should be set to
NEW while area in file not.
[akpm@linux-foundation.org: coding style fixes]
Link: http://lkml.kernel.org/r/5292e287-8f1a-fd4a-1a14-661e555e0bed@huawei.com
Signed-off-by: Jia Guo <guojia12@huawei.com>
Reviewed-by: Yiwen Jiang <jiangyiwen@huawei.com>
Cc: Mark Fasheh <mark@fasheh.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Cc: Joseph Qi <joseph.qi@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Currently execution of panic() continues until Xen's panic notifier
(xen_panic_event()) is called at which point we make a hypercall that
never returns.
This means that any notifier that is supposed to be called later as
well as significant part of panic() code (such as pstore writes from
kmsg_dump()) is never executed.
There is no reason for xen_panic_event() to be this last point in
execution since panic()'s emergency_restart() will call into
xen_emergency_restart() from where we can perform our hypercall.
Nevertheless, we will provide xen_legacy_crash boot option that will
preserve original behavior during crash. This option could be used,
for example, if running kernel dumper (which happens after panic
notifiers) is undesirable.
Reported-by: James Dingwall <james@dingwall.me.uk>
Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Reviewed-by: Juergen Gross <jgross@suse.com>
|
|
If we cannot claim the timeline->mutex while preparing for a wait on it,
we have to skip the timeline. In doing so, treat it as active so that
under a intel_gt_wait_for_idle() loop, we repeat the wait after
scheduling away.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20191006165002.30312-4-chris@chris-wilson.co.uk
|
|
Disable irqs around updating the context image to keep lockdep happy:
<4>[ 673.483340] WARNING: possible irq lock inversion dependency detected
<4>[ 673.483342] 5.4.0-rc1-CI-Trybot_5118+ #1 Tainted: G U
<4>[ 673.483342] --------------------------------------------------------
<4>[ 673.483343] swapper/2/0 just changed the state of lock:
<4>[ 673.483344] ffff88845db885a0 (&i915_request_get(rq)->submit/1){-...}, at: __i915_sw_fence_complete+0x1b2/0x250 [i915]
<4>[ 673.483387] but this lock took another, HARDIRQ-unsafe lock in the past:
<4>[ 673.483388] (&ce->pin_mutex/2){+...}
<4>[ 673.483389]
and interrupts could create inverse lock ordering between them.
<4>[ 673.483390]
other info that might help us debug this:
<4>[ 673.483390] Chain exists of:
&i915_request_get(rq)->submit/1 --> &engine->active.lock --> &ce->pin_mutex/2
<4>[ 673.483392] Possible interrupt unsafe locking scenario:
<4>[ 673.483392] CPU0 CPU1
<4>[ 673.483393] ---- ----
<4>[ 673.483393] lock(&ce->pin_mutex/2);
<4>[ 673.483394] local_irq_disable();
<4>[ 673.483395] lock(&i915_request_get(rq)->submit/1);
<4>[ 673.483396] lock(&engine->active.lock);
<4>[ 673.483396] <Interrupt>
<4>[ 673.483397] lock(&i915_request_get(rq)->submit/1);
<4>[ 673.483398]
*** DEADLOCK ***
<4>[ 673.483398] 2 locks held by swapper/2/0:
<4>[ 673.483399] #0: ffff8883f61ac9b0 (&(>->irq_lock)->rlock){-.-.}, at: gen11_gt_irq_handler+0x42/0x280 [i915]
<4>[ 673.483433] #1: ffff88845db8c418 (&(&rq->lock)->rlock){-.-.}, at: intel_engine_breadcrumbs_irq+0x34a/0x5a0 [i915]
<4>[ 673.483463]
the shortest dependencies between 2nd lock and 1st lock:
<4>[ 673.483466] -> (&ce->pin_mutex/2){+...} ops: 614520 {
<4>[ 673.483468] HARDIRQ-ON-W at:
<4>[ 673.483471] lock_acquire+0xa7/0x1c0
<4>[ 673.483501] live_unlite_restore+0x1d8/0x6c0 [i915]
<4>[ 673.483543] __i915_subtests+0xb8/0x210 [i915]
<4>[ 673.483581] __run_selftests+0x112/0x170 [i915]
<4>[ 673.483615] i915_live_selftests+0x2c/0x60 [i915]
<4>[ 673.483644] i915_pci_probe+0x93/0x1b0 [i915]
<4>[ 673.483646] pci_device_probe+0x9e/0x120
<4>[ 673.483648] really_probe+0xea/0x420
<4>[ 673.483649] driver_probe_device+0x10b/0x120
<4>[ 673.483651] device_driver_attach+0x4a/0x50
<4>[ 673.483652] __driver_attach+0x97/0x130
<4>[ 673.483653] bus_for_each_dev+0x74/0xc0
<4>[ 673.483654] bus_add_driver+0x142/0x220
<4>[ 673.483655] driver_register+0x56/0xf0
<4>[ 673.483657] do_one_initcall+0x58/0x2ff
<4>[ 673.483659] do_init_module+0x56/0x1f8
<4>[ 673.483660] load_module+0x243e/0x29f0
<4>[ 673.483661] __do_sys_finit_module+0xe9/0x110
<4>[ 673.483662] do_syscall_64+0x4f/0x210
<4>[ 673.483665] entry_SYSCALL_64_after_hwframe+0x49/0xbe
<4>[ 673.483665] INITIAL USE at:
<4>[ 673.483667] lock_acquire+0xa7/0x1c0
<4>[ 673.483698] live_unlite_restore+0x1d8/0x6c0 [i915]
<4>[ 673.483733] __i915_subtests+0xb8/0x210 [i915]
<4>[ 673.483764] __run_selftests+0x112/0x170 [i915]
<4>[ 673.483793] i915_live_selftests+0x2c/0x60 [i915]
<4>[ 673.483821] i915_pci_probe+0x93/0x1b0 [i915]
<4>[ 673.483822] pci_device_probe+0x9e/0x120
<4>[ 673.483824] really_probe+0xea/0x420
<4>[ 673.483825] driver_probe_device+0x10b/0x120
<4>[ 673.483826] device_driver_attach+0x4a/0x50
<4>[ 673.483827] __driver_attach+0x97/0x130
<4>[ 673.483828] bus_for_each_dev+0x74/0xc0
<4>[ 673.483829] bus_add_driver+0x142/0x220
<4>[ 673.483830] driver_register+0x56/0xf0
<4>[ 673.483831] do_one_initcall+0x58/0x2ff
<4>[ 673.483833] do_init_module+0x56/0x1f8
<4>[ 673.483834] load_module+0x243e/0x29f0
<4>[ 673.483835] __do_sys_finit_module+0xe9/0x110
<4>[ 673.483836] do_syscall_64+0x4f/0x210
<4>[ 673.483837] entry_SYSCALL_64_after_hwframe+0x49/0xbe
<4>[ 673.483838] }
<4>[ 673.483868] ... key at: [<ffffffffa0a8f132>] __key.70113+0x2/0xffffffffffef2ed0 [i915]
<4>[ 673.483869] ... acquired at:
<4>[ 673.483935] __execlists_reset+0xfb/0xc20 [i915]
<4>[ 673.483965] execlists_reset+0x3d/0x50 [i915]
<4>[ 673.483995] intel_engine_reset+0xdf/0x230 [i915]
<4>[ 673.484022] live_preempt_hang+0x1d7/0x2e0 [i915]
<4>[ 673.484064] __i915_subtests+0xb8/0x210 [i915]
<4>[ 673.484130] __run_selftests+0x112/0x170 [i915]
<4>[ 673.484163] i915_live_selftests+0x2c/0x60 [i915]
<4>[ 673.484193] i915_pci_probe+0x93/0x1b0 [i915]
<4>[ 673.484194] pci_device_probe+0x9e/0x120
<4>[ 673.484195] really_probe+0xea/0x420
<4>[ 673.484196] driver_probe_device+0x10b/0x120
<4>[ 673.484197] device_driver_attach+0x4a/0x50
<4>[ 673.484198] __driver_attach+0x97/0x130
<4>[ 673.484199] bus_for_each_dev+0x74/0xc0
<4>[ 673.484200] bus_add_driver+0x142/0x220
<4>[ 673.484202] driver_register+0x56/0xf0
<4>[ 673.484203] do_one_initcall+0x58/0x2ff
<4>[ 673.484204] do_init_module+0x56/0x1f8
<4>[ 673.484205] load_module+0x243e/0x29f0
<4>[ 673.484206] __do_sys_finit_module+0xe9/0x110
<4>[ 673.484207] do_syscall_64+0x4f/0x210
<4>[ 673.484208] entry_SYSCALL_64_after_hwframe+0x49/0xbe
<4>[ 673.484209] -> (&engine->active.lock){..-.} ops: 972791 {
<4>[ 673.484211] IN-SOFTIRQ-W at:
<4>[ 673.484213] lock_acquire+0xa7/0x1c0
<4>[ 673.484214] _raw_spin_lock_irqsave+0x33/0x50
<4>[ 673.484244] execlists_submission_tasklet+0xaf/0x100 [i915]
<4>[ 673.484246] tasklet_action_common.isra.18+0x6c/0x1c0
<4>[ 673.484247] __do_softirq+0xdf/0x47f
<4>[ 673.484248] irq_exit+0xba/0xc0
<4>[ 673.484249] do_IRQ+0x83/0x160
<4>[ 673.484250] ret_from_intr+0x0/0x1d
<4>[ 673.484252] cpuidle_enter_state+0xb2/0x450
<4>[ 673.484253] cpuidle_enter+0x24/0x40
<4>[ 673.484254] do_idle+0x1e7/0x250
<4>[ 673.484256] cpu_startup_entry+0x14/0x20
<4>[ 673.484257] start_secondary+0x15f/0x1b0
<4>[ 673.484258] secondary_startup_64+0xa4/0xb0
<4>[ 673.484259] INITIAL USE at:
<4>[ 673.484261] lock_acquire+0xa7/0x1c0
<4>[ 673.484290] intel_engine_init_active+0x7e/0xb0 [i915]
<4>[ 673.484305] intel_engines_setup+0x1cd/0x3b0 [i915]
<4>[ 673.484305] i915_gem_init+0x12d/0x900 [i915]
<4>[ 673.484305] i915_driver_probe+0xb70/0x15d0 [i915]
<4>[ 673.484305] i915_pci_probe+0x43/0x1b0 [i915]
<4>[ 673.484305] pci_device_probe+0x9e/0x120
<4>[ 673.484305] really_probe+0xea/0x420
<4>[ 673.484305] driver_probe_device+0x10b/0x120
<4>[ 673.484305] device_driver_attach+0x4a/0x50
<4>[ 673.484305] __driver_attach+0x97/0x130
<4>[ 673.484305] bus_for_each_dev+0x74/0xc0
<4>[ 673.484305] bus_add_driver+0x142/0x220
<4>[ 673.484305] driver_register+0x56/0xf0
<4>[ 673.484305] do_one_initcall+0x58/0x2ff
<4>[ 673.484305] do_init_module+0x56/0x1f8
<4>[ 673.484305] load_module+0x243e/0x29f0
<4>[ 673.484305] __do_sys_finit_module+0xe9/0x110
<4>[ 673.484305] do_syscall_64+0x4f/0x210
<4>[ 673.484305] entry_SYSCALL_64_after_hwframe+0x49/0xbe
<4>[ 673.484305] }
<4>[ 673.484305] ... key at: [<ffffffffa0a8f160>] __key.70307+0x0/0xffffffffffef2ea0 [i915]
<4>[ 673.484305] ... acquired at:
<4>[ 673.484305] _raw_spin_lock_irqsave+0x33/0x50
<4>[ 673.484305] execlists_submit_request+0x2b/0x1e0 [i915]
<4>[ 673.484305] submit_notify+0xa8/0x13c [i915]
<4>[ 673.484305] __i915_sw_fence_complete+0x81/0x250 [i915]
<4>[ 673.484305] i915_sw_fence_wake+0x51/0x70 [i915]
<4>[ 673.484305] __i915_sw_fence_complete+0x1ee/0x250 [i915]
<4>[ 673.484305] dma_i915_sw_fence_wake+0x1b/0x30 [i915]
<4>[ 673.484305] dma_fence_signal_locked+0x9e/0x1b0
<4>[ 673.484305] dma_fence_signal+0x1f/0x40
<4>[ 673.484305] fence_work+0x28/0x80 [i915]
<4>[ 673.484305] process_one_work+0x26a/0x620
<4>[ 673.484305] worker_thread+0x37/0x380
<4>[ 673.484305] kthread+0x119/0x130
<4>[ 673.484305] ret_from_fork+0x24/0x50
<4>[ 673.484305] -> (&i915_request_get(rq)->submit/1){-...} ops: 857694 {
<4>[ 673.484305] IN-HARDIRQ-W at:
<4>[ 673.484305] lock_acquire+0xa7/0x1c0
<4>[ 673.484305] _raw_spin_lock_irqsave_nested+0x39/0x50
<4>[ 673.484305] __i915_sw_fence_complete+0x1b2/0x250 [i915]
<4>[ 673.484305] intel_engine_breadcrumbs_irq+0x3d0/0x5a0 [i915]
<4>[ 673.484305] cs_irq_handler+0x39/0x50 [i915]
<4>[ 673.484305] gen11_gt_irq_handler+0x17b/0x280 [i915]
<4>[ 673.484305] gen11_irq_handler+0x54/0xf0 [i915]
<4>[ 673.484305] __handle_irq_event_percpu+0x41/0x2c0
<4>[ 673.484305] handle_irq_event_percpu+0x2b/0x70
<4>[ 673.484305] handle_irq_event+0x2f/0x50
<4>[ 673.484305] handle_edge_irq+0x99/0x1b0
<4>[ 673.484305] do_IRQ+0x7e/0x160
<4>[ 673.484305] ret_from_intr+0x0/0x1d
<4>[ 673.484305] cpuidle_enter_state+0xb2/0x450
<4>[ 673.484305] cpuidle_enter+0x24/0x40
<4>[ 673.484305] do_idle+0x1e7/0x250
<4>[ 673.484305] cpu_startup_entry+0x14/0x20
<4>[ 673.484305] start_secondary+0x15f/0x1b0
<4>[ 673.484305] secondary_startup_64+0xa4/0xb0
<4>[ 673.484305] INITIAL USE at:
<4>[ 673.484305] lock_acquire+0xa7/0x1c0
<4>[ 673.484305] _raw_spin_lock_irqsave_nested+0x39/0x50
<4>[ 673.484305] __i915_sw_fence_complete+0x1b2/0x250 [i915]
<4>[ 673.484305] __engine_park+0x233/0x420 [i915]
<4>[ 673.484305] ____intel_wakeref_put_last+0x1c/0x70 [i915]
<4>[ 673.484305] intel_gt_resume+0x202/0x2c0 [i915]
<4>[ 673.484305] i915_gem_init+0x36e/0x900 [i915]
<4>[ 673.484305] i915_driver_probe+0xb70/0x15d0 [i915]
<4>[ 673.484305] i915_pci_probe+0x43/0x1b0 [i915]
<4>[ 673.484305] pci_device_probe+0x9e/0x120
<4>[ 673.484305] really_probe+0xea/0x420
<4>[ 673.484305] driver_probe_device+0x10b/0x120
<4>[ 673.484305] device_driver_attach+0x4a/0x50
<4>[ 673.484305] __driver_attach+0x97/0x130
<4>[ 673.484305] bus_for_each_dev+0x74/0xc0
<4>[ 673.484305] bus_add_driver+0x142/0x220
<4>[ 673.484305] driver_register+0x56/0xf0
<4>[ 673.484305] do_one_initcall+0x58/0x2ff
<4>[ 673.484305] do_init_module+0x56/0x1f8
<4>[ 673.484305] load_module+0x243e/0x29f0
<4>[ 673.484305] __do_sys_finit_module+0xe9/0x110
<4>[ 673.484305] do_syscall_64+0x4f/0x210
<4>[ 673.484305] entry_SYSCALL_64_after_hwframe+0x49/0xbe
<4>[ 673.484305] }
<4>[ 673.484305] ... key at: [<ffffffffa0a8f6a1>] __key.80173+0x1/0xffffffffffef2960 [i915]
<4>[ 673.484305] ... acquired at:
<4>[ 673.484305] mark_lock+0x382/0x500
<4>[ 673.484305] __lock_acquire+0x7e1/0x15d0
<4>[ 673.484305] lock_acquire+0xa7/0x1c0
<4>[ 673.484305] _raw_spin_lock_irqsave_nested+0x39/0x50
<4>[ 673.484305] __i915_sw_fence_complete+0x1b2/0x250 [i915]
<4>[ 673.484305] intel_engine_breadcrumbs_irq+0x3d0/0x5a0 [i915]
<4>[ 673.484305] cs_irq_handler+0x39/0x50 [i915]
<4>[ 673.484305] gen11_gt_irq_handler+0x17b/0x280 [i915]
<4>[ 673.484305] gen11_irq_handler+0x54/0xf0 [i915]
<4>[ 673.484305] __handle_irq_event_percpu+0x41/0x2c0
<4>[ 673.484305] handle_irq_event_percpu+0x2b/0x70
<4>[ 673.484305] handle_irq_event+0x2f/0x50
<4>[ 673.484305] handle_edge_irq+0x99/0x1b0
<4>[ 673.484305] do_IRQ+0x7e/0x160
<4>[ 673.484305] ret_from_intr+0x0/0x1d
<4>[ 673.484305] cpuidle_enter_state+0xb2/0x450
<4>[ 673.484305] cpuidle_enter+0x24/0x40
<4>[ 673.484305] do_idle+0x1e7/0x250
<4>[ 673.484305] cpu_startup_entry+0x14/0x20
<4>[ 673.484305] start_secondary+0x15f/0x1b0
<4>[ 673.484305] secondary_startup_64+0xa4/0xb0
<4>[ 673.484305]
stack backtrace:
<4>[ 673.484305] CPU: 2 PID: 0 Comm: swapper/2 Tainted: G U 5.4.0-rc1-CI-Trybot_5118+ #1
<4>[ 673.484305] Hardware name: Intel Corporation Ice Lake Client Platform/IceLake U DDR4 SODIMM PD RVP TLC, BIOS ICLSFWR1.R00.3183.A00.1905020411 05/02/2019
<4>[ 673.484305] Call Trace:
<4>[ 673.484305] <IRQ>
<4>[ 673.484305] dump_stack+0x67/0x9b
<4>[ 673.484305] check_usage_forwards+0x13c/0x150
<4>[ 673.484305] ? mark_lock+0x382/0x500
<4>[ 673.484305] mark_lock+0x382/0x500
<4>[ 673.484305] ? check_usage_backwards+0x140/0x140
<4>[ 673.484305] __lock_acquire+0x7e1/0x15d0
<4>[ 673.484305] ? debug_object_deactivate+0x17e/0x190
<4>[ 673.484305] lock_acquire+0xa7/0x1c0
<4>[ 673.484305] ? __i915_sw_fence_complete+0x1b2/0x250 [i915]
<4>[ 673.484305] _raw_spin_lock_irqsave_nested+0x39/0x50
<4>[ 673.484305] ? __i915_sw_fence_complete+0x1b2/0x250 [i915]
<4>[ 673.484305] __i915_sw_fence_complete+0x1b2/0x250 [i915]
<4>[ 673.484305] intel_engine_breadcrumbs_irq+0x3d0/0x5a0 [i915]
<4>[ 673.484305] cs_irq_handler+0x39/0x50 [i915]
<4>[ 673.484305] gen11_gt_irq_handler+0x17b/0x280 [i915]
<4>[ 673.484305] gen11_irq_handler+0x54/0xf0 [i915]
<4>[ 673.484305] __handle_irq_event_percpu+0x41/0x2c0
<4>[ 673.484305] handle_irq_event_percpu+0x2b/0x70
<4>[ 673.484305] handle_irq_event+0x2f/0x50
<4>[ 673.484305] handle_edge_irq+0x99/0x1b0
<4>[ 673.484305] do_IRQ+0x7e/0x160
<4>[ 673.484305] common_interrupt+0xf/0xf
<4>[ 673.484305] </IRQ>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20191004203121.31138-1-chris@chris-wilson.co.uk
|
|
As we may signal a request and take the engine->active.lock within the
signaler, the engine submission paths have to use a nested annotation on
their requests -- but we guarantee that we can never submit on the same
engine as the signaling fence.
<4>[ 723.763281] WARNING: possible circular locking dependency detected
<4>[ 723.763285] 5.3.0-g80fa0e042cdb-drmtip_379+ #1 Tainted: G U
<4>[ 723.763288] ------------------------------------------------------
<4>[ 723.763291] gem_exec_await/1388 is trying to acquire lock:
<4>[ 723.763294] ffff93a7b53221d8 (&engine->active.lock){..-.}, at: execlists_submit_request+0x2b/0x1e0 [i915]
<4>[ 723.763378]
but task is already holding lock:
<4>[ 723.763381] ffff93a7c25f6d20 (&i915_request_get(rq)->submit/1){-.-.}, at: __i915_sw_fence_complete+0x1b2/0x250 [i915]
<4>[ 723.763420]
which lock already depends on the new lock.
<4>[ 723.763423]
the existing dependency chain (in reverse order) is:
<4>[ 723.763427]
-> #2 (&i915_request_get(rq)->submit/1){-.-.}:
<4>[ 723.763434] _raw_spin_lock_irqsave_nested+0x39/0x50
<4>[ 723.763478] __i915_sw_fence_complete+0x1b2/0x250 [i915]
<4>[ 723.763513] intel_engine_breadcrumbs_irq+0x3aa/0x5e0 [i915]
<4>[ 723.763600] cs_irq_handler+0x49/0x50 [i915]
<4>[ 723.763659] gen11_gt_irq_handler+0x17b/0x280 [i915]
<4>[ 723.763690] gen11_irq_handler+0x54/0xf0 [i915]
<4>[ 723.763695] __handle_irq_event_percpu+0x41/0x2d0
<4>[ 723.763699] handle_irq_event_percpu+0x2b/0x70
<4>[ 723.763702] handle_irq_event+0x2f/0x50
<4>[ 723.763706] handle_edge_irq+0xee/0x1a0
<4>[ 723.763709] do_IRQ+0x7e/0x160
<4>[ 723.763712] ret_from_intr+0x0/0x1d
<4>[ 723.763717] __slab_alloc.isra.28.constprop.33+0x4f/0x70
<4>[ 723.763720] kmem_cache_alloc+0x28d/0x2f0
<4>[ 723.763724] vm_area_dup+0x15/0x40
<4>[ 723.763727] dup_mm+0x2dd/0x550
<4>[ 723.763730] copy_process+0xf21/0x1ef0
<4>[ 723.763734] _do_fork+0x71/0x670
<4>[ 723.763737] __se_sys_clone+0x6e/0xa0
<4>[ 723.763741] do_syscall_64+0x4f/0x210
<4>[ 723.763744] entry_SYSCALL_64_after_hwframe+0x49/0xbe
<4>[ 723.763747]
-> #1 (&(&rq->lock)->rlock#2){-.-.}:
<4>[ 723.763752] _raw_spin_lock+0x2a/0x40
<4>[ 723.763789] __unwind_incomplete_requests+0x3eb/0x450 [i915]
<4>[ 723.763825] __execlists_submission_tasklet+0x9ec/0x1d60 [i915]
<4>[ 723.763864] execlists_submission_tasklet+0x34/0x50 [i915]
<4>[ 723.763874] tasklet_action_common.isra.5+0x47/0xb0
<4>[ 723.763878] __do_softirq+0xd8/0x4ae
<4>[ 723.763881] irq_exit+0xa9/0xc0
<4>[ 723.763883] smp_apic_timer_interrupt+0xb7/0x280
<4>[ 723.763887] apic_timer_interrupt+0xf/0x20
<4>[ 723.763892] cpuidle_enter_state+0xae/0x450
<4>[ 723.763895] cpuidle_enter+0x24/0x40
<4>[ 723.763899] do_idle+0x1e7/0x250
<4>[ 723.763902] cpu_startup_entry+0x14/0x20
<4>[ 723.763905] start_secondary+0x15f/0x1b0
<4>[ 723.763908] secondary_startup_64+0xa4/0xb0
<4>[ 723.763911]
-> #0 (&engine->active.lock){..-.}:
<4>[ 723.763916] __lock_acquire+0x15d8/0x1ea0
<4>[ 723.763919] lock_acquire+0xa6/0x1c0
<4>[ 723.763922] _raw_spin_lock_irqsave+0x33/0x50
<4>[ 723.763956] execlists_submit_request+0x2b/0x1e0 [i915]
<4>[ 723.764002] submit_notify+0xa8/0x13c [i915]
<4>[ 723.764035] __i915_sw_fence_complete+0x81/0x250 [i915]
<4>[ 723.764054] i915_sw_fence_wake+0x51/0x64 [i915]
<4>[ 723.764054] __i915_sw_fence_complete+0x1ee/0x250 [i915]
<4>[ 723.764054] dma_i915_sw_fence_wake_timer+0x14/0x20 [i915]
<4>[ 723.764054] dma_fence_signal_locked+0x9e/0x1c0
<4>[ 723.764054] dma_fence_signal+0x1f/0x40
<4>[ 723.764054] vgem_fence_signal_ioctl+0x67/0xc0 [vgem]
<4>[ 723.764054] drm_ioctl_kernel+0x83/0xf0
<4>[ 723.764054] drm_ioctl+0x2f3/0x3b0
<4>[ 723.764054] do_vfs_ioctl+0xa0/0x6f0
<4>[ 723.764054] ksys_ioctl+0x35/0x60
<4>[ 723.764054] __x64_sys_ioctl+0x11/0x20
<4>[ 723.764054] do_syscall_64+0x4f/0x210
<4>[ 723.764054] entry_SYSCALL_64_after_hwframe+0x49/0xbe
<4>[ 723.764054]
other info that might help us debug this:
<4>[ 723.764054] Chain exists of:
&engine->active.lock --> &(&rq->lock)->rlock#2 --> &i915_request_get(rq)->submit/1
<4>[ 723.764054] Possible unsafe locking scenario:
<4>[ 723.764054] CPU0 CPU1
<4>[ 723.764054] ---- ----
<4>[ 723.764054] lock(&i915_request_get(rq)->submit/1);
<4>[ 723.764054] lock(&(&rq->lock)->rlock#2);
<4>[ 723.764054] lock(&i915_request_get(rq)->submit/1);
<4>[ 723.764054] lock(&engine->active.lock);
<4>[ 723.764054]
*** DEADLOCK ***
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=111862
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20191004194758.19679-1-chris@chris-wilson.co.uk
|
|
Avoid going to the base i915 device when we already have a path from gt
to the runtime powermanagement interface. The benefit is that it looks a
bit more self-consistent to always be acquiring the gt->uncore->rpm for
use with the gt->uncore.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20191007154531.1750-1-chris@chris-wilson.co.uk
|
|
Don't populate the array hw_engine_mask on the stack but instead make it
static. Makes the object code smaller by 316 bytes.
Before:
text data bss dec hex filename
34004 4388 320 38712 9738 gpu/drm/i915/gt/intel_reset.o
After:
text data bss dec hex filename
33528 4548 320 38396 95fc gpu/drm/i915/gt/intel_reset.o
(gcc version 9.2.1, amd64)
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20191007154151.23245-1-colin.king@canonical.com
|
|
Participate in device cgroup. All kfd devices are exposed via /dev/kfd.
So use /dev/dri/renderN node.
Before exposing the device to a task check if it has permission to
access it. If the task (based on its cgroup) can access /dev/dri/renderN
then expose the device via kfd node.
If the task cannot access /dev/dri/renderN then process device data
(pdd) is not created. This will ensure that task cannot use the device.
In sysfs topology, all device nodes are visible irrespective of the task
cgroup. The sysfs node directories are created at driver load time and
cannot be changed dynamically. However, access to information inside
nodes is controlled based on the task's cgroup permissions.
Signed-off-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
For AMD compute (amdkfd) driver.
All AMD compute devices are exported via single device node /dev/kfd. As
a result devices cannot be controlled individually using device cgroup.
AMD compute devices will rely on its graphics counterpart that exposes
/dev/dri/renderN node for each device. For each task (based on its
cgroup), KFD driver will check if /dev/dri/renderN node is accessible
before exposing it.
Signed-off-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com>
Acked-by: Tejun Heo <tj@kernel.org>
Acked-by: Felix Kuehling <Felix.Kuehling@amd.com>
Reviewed-by: Roman Gushchin <guro@fb.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
Add proper ifdefs around CIK code in kfd setup.
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
Adjust indentation from spaces to tab (+optional two spaces) as in
coding style with command like:
$ sed -e 's/^ /\t/' -i */Kconfig
Signed-off-by: Krzysztof Kozlowski <krzk@kernel.org>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
The vram vendor can be found as a separate sysfs file at:
/sys/class/drm/card[X]/device/mem_info_vram_vendor
The vram vendor is displayed as a string value.
v2: Use correct bit masking, and cache vram_vendor in gmc
v3: Drop unused functions for vram width, type, and vendor
Signed-off-by: Ori Messinger <ori.messinger@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
In the current code if "device_info" is ever NULL then the kernel will
Oops so probably || was intended instead of &&.
Fixes: e392c887df97 ("drm/amdkfd: Use array to probe kfd2kgd_calls")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
The error handling is off by one. We should not free the first
"tables[i].bo" without decrementing "i" because that might result in a
double free. The second problem is that when an error occurs, then the
zeroth element "tables[0].bo" isn't freed.
I had make "i" signed int for the error handling to work, so I just
updated "ret" as well as a clean up.
Fixes: f96357a991b9 ("drm/amd/powerplay: implement smu_init(fini)_fb_allocations function")
Reviewed-by: Kevin Wang <kevin1.wang@amd.com>
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
This function needs to drop the mutex before returning.
Fixes: f7e3a5776fa6 ("drm/amd/powerplay: check SMU engine readiness before proceeding on S3 resume")
Reviewed-by: Kevin Wang <kevin1.wang@amd.com>
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
In non-ETSI regulatory domains scan is blocked when operating channel
is a DFS channel. For ETSI, however, once DFS channel is marked as
available after the CAC, this channel will remain available (for some
time) even after leaving this channel.
Therefore a scan can be done without any impact on the availability
of the DFS channel as no new CAC is required after the scan.
Enable scan in mac80211 in these cases.
Signed-off-by: Aaron Komisar <aaron.komisar@tandemg.com>
Link: https://lore.kernel.org/r/1570024728-17284-1-git-send-email-aaron.komisar@tandemg.com
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
|
|
Fixes gcc '-Wunused-but-set-variable' warning:
drivers/gpu/drm/amd/display/dc/dce110/dce110_hw_sequencer.c: In function dce110_enable_audio_stream:
drivers/gpu/drm/amd/display/dc/dce110/dce110_hw_sequencer.c:949:23: warning: variable pp_smu set but not used [-Wunused-but-set-variable]
drivers/gpu/drm/amd/display/dc/dce110/dce110_hw_sequencer.c: In function dce110_disable_audio_stream:
drivers/gpu/drm/amd/display/dc/dce110/dce110_hw_sequencer.c:983:23: warning: variable pp_smu set but not used [-Wunused-but-set-variable]
drivers/gpu/drm/amd/display/dc/dce110/dce110_hw_sequencer.c: In function dce110_program_front_end_for_pipe:
drivers/gpu/drm/amd/display/dc/dce110/dce110_hw_sequencer.c:2429:19: warning: variable old_pipe set but not used [-Wunused-but-set-variable]
'pp_smu' is not used since commit 170a2398d2d8 ("drm/amd/display:
make clk_mgr call enable_pme_wa")
'old_pipe' is not used since commit 65d38262b3e8 ("drm/amd/display:
fbc state could not reach while enable fbc")
Reviewed-by: Harry Wentland <harry.wentland@amd.com>
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: zhengbin <zhengbin13@huawei.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
Fixes gcc '-Wunused-but-set-variable' warning:
drivers/gpu/drm/amd/display/dc/dcn20/dcn20_dpp.c: In function dpp2_get_optimal_number_of_taps:
drivers/gpu/drm/amd/display/dc/dcn20/dcn20_dpp.c:359:11: warning: variable pixel_width set but not used [-Wunused-but-set-variable]
It is not used since commit f7de96ee8b5f ("drm/amd/display:
Add DCN2 DPP")
Reviewed-by: Harry Wentland <harry.wentland@amd.com>
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: zhengbin <zhengbin13@huawei.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
'v_ratio_chroma'
Fixes gcc '-Wunused-but-set-variable' warning:
drivers/gpu/drm/amd/display/dc/dcn20/dcn20_dwb_scl.c: In function dwb_program_horz_scalar:
drivers/gpu/drm/amd/display/dc/dcn20/dcn20_dwb_scl.c:725:11: warning: variable h_ratio_chroma set but not used [-Wunused-but-set-variable]
drivers/gpu/drm/amd/display/dc/dcn20/dcn20_dwb_scl.c: In function dwb_program_vert_scalar:
drivers/gpu/drm/amd/display/dc/dcn20/dcn20_dwb_scl.c:806:11: warning: variable v_ratio_chroma set but not used [-Wunused-but-set-variable]
They are not used since commit 345429a67c48 ("drm/amd/display:
Add DCN2 DWB")
Reviewed-by: Harry Wentland <harry.wentland@amd.com>
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: zhengbin <zhengbin13@huawei.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
Fixes gcc '-Wunused-but-set-variable' warning:
drivers/gpu/drm/amd/display/dc/dsc/rc_calc.c: In function calc_rc_params:
drivers/gpu/drm/amd/display/dc/dsc/rc_calc.c:180:6: warning: variable source_bpp set but not used [-Wunused-but-set-variable]
It is not used since commit 97bda0322b8a ("drm/amd/display:
Add DSC support for Navi (v2)")
Reviewed-by: Harry Wentland <harry.wentland@amd.com>
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: zhengbin <zhengbin13@huawei.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
Fix sparse warnings:
drivers/gpu/drm/amd/display/dc/core/dc_link.c:687:6: warning: symbol 'wait_for_alt_mode' was not declared. Should it be static?
Reviewed-by: Harry Wentland <harry.wentland@amd.com>
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: zhengbin <zhengbin13@huawei.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
Function kgd2kfd_init is missing a void argument, add it
to clean up the non-ANSI function declaration.
Acked-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
Remove duplicated include.
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
It's only used in amdgpu_device.c and the naming also
reflects that. Move it there.
Reviewed-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
Fixes gcc '-Wunused-but-set-variable' warning:
rivers/gpu/drm/amd/amdgpu/../display/modules/freesync/freesync.c:
In function mod_freesync_get_settings:
drivers/gpu/drm/amd/amdgpu/../display/modules/freesync/freesync.c:984:24:
warning: variable core_freesync set but not used [-Wunused-but-set-variable]
It is not used since commit 98e6436d3af5 ("drm/amd/display: Refactor FreeSync module")
Reviewed-by: Harry Wentland <harry.wentland@amd.com>
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|