Age | Commit message (Collapse) | Author |
|
Both Spectrum-1 and Spectrum-2 chips are currently configured such that
pairs of TC n (which is used for UC traffic) and TC n+8 (which is used
for MC traffic) are feeding into the same subgroup. Strict
prioritization is configured between the two TCs, and by enabling
MC-aware mode on the switch, the lower-numbered (UC) TCs are favored
over the higher-numbered (MC) TCs.
On Spectrum-2 however, there is an issue in configuration of the
MC-aware mode. As a result, MC traffic is prioritized over UC traffic.
To work around the issue, configure the MC TCs with DWRR mode (while
keeping the UC TCs in strict mode).
With this patch, the multicast-unicast arbitration results in the same
behavior on both Spectrum-1 and Spectrum-2 chips.
Fixes: 7b8195306694 ("mlxsw: spectrum: Configure MC-aware mode on mlxsw ports")
Signed-off-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux
Pull arm64 fix from Catalin Marinas:
"Avoid compiler uninitialised warning introduced by recent arm64 futex
fix"
* tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
arm64: futex: Restore oldval initialization to work around buggy compilers
|
|
Commit 045afc24124d ("arm64: futex: Fix FUTEX_WAKE_OP atomic ops with
non-zero result value") removed oldval's zero initialization in
arch_futex_atomic_op_inuser because it is not necessary. Unfortunately,
Android's arm64 GCC 4.9.4 [1] does not agree:
../kernel/futex.c: In function 'do_futex':
../kernel/futex.c:1658:17: warning: 'oldval' may be used uninitialized
in this function [-Wmaybe-uninitialized]
return oldval == cmparg;
^
In file included from ../kernel/futex.c:73:0:
../arch/arm64/include/asm/futex.h:53:6: note: 'oldval' was declared here
int oldval, ret, tmp;
^
GCC fails to follow that when ret is non-zero, futex_atomic_op_inuser
returns right away, avoiding the uninitialized use that it claims.
Restoring the zero initialization works around this issue.
[1]: https://android.googlesource.com/platform/prebuilts/gcc/linux-x86/aarch64/aarch64-linux-android-4.9/
Cc: stable@vger.kernel.org
Fixes: 045afc24124d ("arm64: futex: Fix FUTEX_WAKE_OP atomic ops with non-zero result value")
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Nathan Chancellor <natechancellor@gmail.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
|
|
To minimize the latency of timer interrupts as observed by the guest,
KVM adjusts the values it programs into the host timers to account for
the host's overhead of programming and handling the timer event. In
the event that the adjustments are too aggressive, i.e. the timer fires
earlier than the guest expects, KVM busy waits immediately prior to
entering the guest.
Currently, KVM manually converts the delay from nanoseconds to clock
cycles. But, the conversion is done in the guest's time domain, while
the delay occurs in the host's time domain. This is perfectly ok when
the guest and host are using the same TSC ratio, but if the guest is
using a different ratio then the delay may not be accurate and could
wait too little or too long.
When the guest is not using the host's ratio, convert the delay from
guest clock cycles to host nanoseconds and use ndelay() instead of
__delay() to provide more accurate timing. Because converting to
nanoseconds is relatively expensive, e.g. requires division and more
multiplication ops, continue using __delay() directly when guest and
host TSCs are running at the same ratio.
Cc: Liran Alon <liran.alon@oracle.com>
Cc: Wanpeng Li <wanpengli@tencent.com>
Cc: stable@vger.kernel.org
Fixes: 3b8a5df6c4dc6 ("KVM: LAPIC: Tune lapic_timer_advance_ns automatically")
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
The introduction of adaptive tuning of lapic timer advancement did not
allow for the scenario where userspace would want to disable adaptive
tuning but still employ timer advancement, e.g. for testing purposes or
to handle a use case where adaptive tuning is unable to settle on a
suitable time. This is epecially pertinent now that KVM places a hard
threshold on the maximum advancment time.
Rework the timer semantics to accept signed values, with a value of '-1'
being interpreted as "use adaptive tuning with KVM's internal default",
and any other value being used as an explicit advancement time, e.g. a
time of '0' effectively disables advancement.
Note, this does not completely restore the original behavior of
lapic_timer_advance_ns. Prior to tracking the advancement per vCPU,
which is necessary to support autotuning, userspace could adjust
lapic_timer_advance_ns for *running* vCPU. With per-vCPU tracking, the
module params are snapshotted at vCPU creation, i.e. applying a new
advancement effectively requires restarting a VM.
Dynamically updating a running vCPU is possible, e.g. a helper could be
added to retrieve the desired delay, choosing between the global module
param and the per-VCPU value depending on whether or not auto-tuning is
(globally) enabled, but introduces a great deal of complexity. The
wrapper itself is not complex, but understanding and documenting the
effects of dynamically toggling auto-tuning and/or adjusting the timer
advancement is nigh impossible since the behavior would be dependent on
KVM's implementation as well as compiler optimizations. In other words,
providing stable behavior would require extremely careful consideration
now and in the future.
Given that the expected use of a manually-tuned timer advancement is to
"tune once, run many", use the vastly simpler approach of recognizing
changes to the module params only when creating a new vCPU.
Cc: Liran Alon <liran.alon@oracle.com>
Cc: Wanpeng Li <wanpengli@tencent.com>
Reviewed-by: Liran Alon <liran.alon@oracle.com>
Cc: stable@vger.kernel.org
Fixes: 3b8a5df6c4dc6 ("KVM: LAPIC: Tune lapic_timer_advance_ns automatically")
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
Automatically adjusting the globally-shared timer advancement could
corrupt the timer, e.g. if multiple vCPUs are concurrently adjusting
the advancement value. That could be partially fixed by using a local
variable for the arithmetic, but it would still be susceptible to a
race when setting timer_advance_adjust_done.
And because virtual_tsc_khz and tsc_scaling_ratio are per-vCPU, the
correct calibration for a given vCPU may not apply to all vCPUs.
Furthermore, lapic_timer_advance_ns is marked __read_mostly, which is
effectively violated when finding a stable advancement takes an extended
amount of timer.
Opportunistically change the definition of lapic_timer_advance_ns to
a u32 so that it matches the style of struct kvm_timer. Explicitly
pass the param to kvm_create_lapic() so that it doesn't have to be
exposed to lapic.c, thus reducing the probability of unintentionally
using the global value instead of the per-vCPU value.
Cc: Liran Alon <liran.alon@oracle.com>
Cc: Wanpeng Li <wanpengli@tencent.com>
Reviewed-by: Liran Alon <liran.alon@oracle.com>
Cc: stable@vger.kernel.org
Fixes: 3b8a5df6c4dc6 ("KVM: LAPIC: Tune lapic_timer_advance_ns automatically")
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
To minimize the latency of timer interrupts as observed by the guest,
KVM adjusts the values it programs into the host timers to account for
the host's overhead of programming and handling the timer event. Now
that the timer advancement is automatically tuned during runtime, it's
effectively unbounded by default, e.g. if KVM is running as L1 the
advancement can measure in hundreds of milliseconds.
Disable timer advancement if adaptive tuning yields an advancement of
more than 5000ns, as large advancements can break reasonable assumptions
of the guest, e.g. that a timer configured to fire after 1ms won't
arrive on the next instruction. Although KVM busy waits to mitigate the
case of a timer event arriving too early, complications can arise when
shifting the interrupt too far, e.g. kvm-unit-test's vmx.interrupt test
will fail when its "host" exits on interrupts as KVM may inject the INTR
before the guest executes STI+HLT. Arguably the unit test is "broken"
in the sense that delaying a timer interrupt by 1ms doesn't technically
guarantee the interrupt will arrive after STI+HLT, but it's a reasonable
assumption that KVM should support.
Furthermore, an unbounded advancement also effectively unbounds the time
spent busy waiting, e.g. if the guest programs a timer with a very large
delay.
5000ns is a somewhat arbitrary threshold. When running on bare metal,
which is the intended use case, timer advancement is expected to be in
the general vicinity of 1000ns. 5000ns is high enough that false
positives are unlikely, while not being so high as to negatively affect
the host's performance/stability.
Note, a future patch will enable userspace to disable KVM's adaptive
tuning, which will allow priveleged userspace will to specifying an
advancement value in excess of this arbitrary threshold in order to
satisfy an abnormal use case.
Cc: Liran Alon <liran.alon@oracle.com>
Cc: Wanpeng Li <wanpengli@tencent.com>
Cc: stable@vger.kernel.org
Fixes: 3b8a5df6c4dc6 ("KVM: LAPIC: Tune lapic_timer_advance_ns automatically")
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
It was reported that with some special Multi Processor Group configuration,
e.g:
bcdedit.exe /set groupsize 1
bcdedit.exe /set maxgroup on
bcdedit.exe /set groupaware on
for a 16-vCPU guest WS2012 shows BSOD on boot when PV TLB flush mechanism
is in use.
Tracing kvm_hv_flush_tlb immediately reveals the issue:
kvm_hv_flush_tlb: processor_mask 0x0 address_space 0x0 flags 0x2
The only flag set in this request is HV_FLUSH_ALL_VIRTUAL_ADDRESS_SPACES,
however, processor_mask is 0x0 and no HV_FLUSH_ALL_PROCESSORS is specified.
We don't flush anything and apparently it's not what Windows expects.
TLFS doesn't say anything about such requests and newer Windows versions
seem to be unaffected. This all feels like a WS2012 bug, which is, however,
easy to workaround in KVM: let's flush everything when we see an empty
flush request, over-flushing doesn't hurt.
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
If guest sets MSR_IA32_TSCDEADLINE to value such that in host
time-domain it's shorter than lapic_timer_advance_ns, we can
reach a case that we call hrtimer_start() with expiration time set at
the past.
Because lapic_timer.timer is init with HRTIMER_MODE_ABS_PINNED, it
is not allowed to run in softirq and therefore will never expire.
To avoid such a scenario, verify that deadline expiration time is set on
host time-domain further than (now + lapic_timer_advance_ns).
A future patch can also consider adding a min_timer_deadline_ns module parameter,
similar to min_timer_period_us to avoid races that amount of ns it takes
to run logic could still call hrtimer_start() with expiration timer set
at the past.
Reviewed-by: Joao Martins <joao.m.martins@oracle.com>
Signed-off-by: Liran Alon <liran.alon@oracle.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc into HEAD
KVM/PPC fixes for 5.1
- Fix host hang in the HTM assist code for POWER9
- Take srcu read lock around memslot lookup
|
|
As stated in the original commit for pidfd_send_signal() we don't allow
to signal processes through O_PATH file descriptors since it is
semantically equivalent to a write on the pidfd.
We already correctly error out right now and return EBADF if an O_PATH
fd is passed. This is because we use file->f_op to detect whether a
pidfd is passed and O_PATH fds have their file->f_op set to empty_fops
in do_dentry_open() and thus fail the test.
Thus, there is no regression. It's just semantically correct to use
fdget() and return an error right from there instead of taking a
reference and returning an error later.
Signed-off-by: Christian Brauner <christian@brauner.io>
Acked-by: Oleg Nesterov <oleg@redhat.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Jann Horn <jann@thejh.net>
Cc: David Howells <dhowells@redhat.com>
Cc: "Michael Kerrisk (man-pages)" <mtk.manpages@gmail.com>
Cc: Andy Lutomirsky <luto@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Aleksa Sarai <cyphar@cyphar.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux
Pull s390 bug fixes from Martin Schwidefsky:
- Fix overwrite of the initial ramdisk due to misuse of IS_ENABLED
- Fix integer overflow in the dasd driver resulting in incorrect number
of blocks for large devices
- Fix a lockdep false positive in the 3270 driver
- Fix a deadlock in the zcrypt driver
- Fix incorrect debug feature entries in the pkey api
- Fix inline assembly constraints fallout with CONFIG_KASAN=y
* tag 's390-5.1-3' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux:
s390: correct some inline assembly constraints
s390/pkey: add one more argument space for debug feature entry
s390/zcrypt: fix possible deadlock situation on ap queue remove
s390/3270: fix lockdep false positive on view->lock
s390/dasd: Fix capacity calculation for large volumes
s390/mem_detect: Use IS_ENABLED(CONFIG_BLK_DEV_INITRD)
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs
Pull AFS fixes from David Howells:
- Stop using the deprecated get_seconds().
- Don't make tracepoint strings const as the section they go in isn't
read-only.
- Differentiate failure due to unmarshalling from other failure cases.
We shouldn't abort with RXGEN_CC/SS_UNMARSHAL if it's not due to
unmarshalling.
- Add a missing unlock_page().
- Fix the interaction between receiving a notification from a server
that it has invalidated all outstanding callback promises and a
client call that we're in the middle of making that will get a new
promise.
* tag 'afs-fixes-20190413' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs:
afs: Fix in-progess ops to ignore server-level callback invalidation
afs: Unlock pages for __pagevec_release()
afs: Differentiate abort due to unmarshalling from other errors
afs: Avoid section confusion in CM_NAME
afs: avoid deprecated get_seconds()
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6
Pull crypto fix from Herbert Xu:
"Fix a bug in the implementation of the x86 accelerated version of
poly1305"
* 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6:
crypto: x86/poly1305 - fix overflow during partial reduction
|
|
Pull drm fixes from Dave Airlie:
"Since Easter is looming for me, I'm just pushing whatever is in my
tree, I'll see what else turns up and maybe I'll send another pull
early next week if there is anything.
tegra:
- stream id programming fix
- avoid divide by 0 for bad hdmi audio setup code
ttm:
- Hugepages fix
- refcount imbalance in error path fix
amdgpu:
- GPU VM fixes for Vega/RV
- DC AUX fix for active DP-DVI dongles
- DC fix for multihead regression"
* tag 'drm-fixes-2019-04-18' of git://anongit.freedesktop.org/drm/drm:
drm/tegra: hdmi: Setup audio only if configured
drm/amd/display: If one stream full updates, full update all planes
drm/amdgpu/gmc9: fix VM_L2_CNTL3 programming
drm/amdgpu: shadow in shadow_list without tbo.mem.start cause page fault in sriov TDR
gpu: host1x: Program stream ID to bypass without SMMU
drm/amd/display: extending AUX SW Timeout
drm/ttm: fix dma_fence refcount imbalance on error path
drm/ttm: fix incrementing the page pointer for huge pages
drm/ttm: fix start page for huge page check in ttm_put_pages()
drm/ttm: fix out-of-bounds read in ttm_put_pages() v2
|
|
For our component-backed driver to be properly removed, we need to
delete the component master in sun4i_drv_remove and make sure to call
component_unbind_all in the master's unbind so that all components are
unbound when the master is.
Fixes: 9026e0d122ac ("drm: Add Allwinner A10 Display Engine support")
Signed-off-by: Paul Kocialkowski <paul.kocialkowski@bootlin.com>
Signed-off-by: Maxime Ripard <maxime.ripard@bootlin.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190418132727.5128-4-paul.kocialkowski@bootlin.com
|
|
Our sun4i_drv_unbind gets the drm device using dev_get_drvdata.
However, that driver data is never set in sun4i_drv_bind.
Set it there to avoid getting a NULL pointer at unbind time.
Fixes: 9026e0d122ac ("drm: Add Allwinner A10 Display Engine support")
Signed-off-by: Paul Kocialkowski <paul.kocialkowski@bootlin.com>
Signed-off-by: Maxime Ripard <maxime.ripard@bootlin.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190418132727.5128-3-paul.kocialkowski@bootlin.com
|
|
A call to drm_atomic_helper_shutdown is required to properly release
the internal references taken by the core and avoid warnings about
leaking objects. Add it since it was missing.
Fixes: 9026e0d122ac ("drm: Add Allwinner A10 Display Engine support")
Signed-off-by: Paul Kocialkowski <paul.kocialkowski@bootlin.com>
Signed-off-by: Maxime Ripard <maxime.ripard@bootlin.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190418132727.5128-2-paul.kocialkowski@bootlin.com
|
|
When we perform a walk in the completion function, we need to ensure
that it is atomic.
Fixes: ac3c8f36c31d ("crypto: lrw - Do not use auxiliary buffer")
Cc: <stable@vger.kernel.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Acked-by: Ondrej Mosnacek <omosnace@redhat.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
|
|
When we perform a walk in the completion function, we need to ensure
that it is atomic.
Reported-by: syzbot+6f72c20560060c98b566@syzkaller.appspotmail.com
Fixes: 78105c7e769b ("crypto: xts - Drop use of auxiliary buffer")
Cc: <stable@vger.kernel.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Acked-by: Ondrej Mosnacek <omosnace@redhat.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
|
|
tick_freeze() introduced by suspend-to-idle in commit 124cf9117c5f ("PM /
sleep: Make it possible to quiesce timers during suspend-to-idle") uses
timekeeping_suspend() instead of syscore_suspend() during
suspend-to-idle. As a consequence generic sched_clock will keep going
because sched_clock_suspend() and sched_clock_resume() are not invoked
during suspend-to-idle which can result in a generic sched_clock wrap.
On a ARM system with suspend-to-idle enabled, sched_clock is registered
as "56 bits at 13MHz, resolution 76ns, wraps every 4398046511101ns", which
means the real wrapping duration is 8796093022202ns.
[ 134.551779] suspend-to-idle suspend (timekeeping_suspend())
[ 1204.912239] suspend-to-idle resume (timekeeping_resume())
......
[ 1206.912239] suspend-to-idle suspend (timekeeping_suspend())
[ 5880.502807] suspend-to-idle resume (timekeeping_resume())
......
[ 6000.403724] suspend-to-idle suspend (timekeeping_suspend())
[ 8035.753167] suspend-to-idle resume (timekeeping_resume())
......
[ 8795.786684] (2)[321:charger_thread]......
[ 8795.788387] (2)[321:charger_thread]......
[ 0.057226] (0)[0:swapper/0]......
[ 0.061447] (2)[0:swapper/2]......
sched_clock was not stopped during suspend-to-idle, and sched_clock_poll
hrtimer was not expired because timekeeping_suspend() was invoked during
suspend-to-idle. It makes sched_clock wrap at kernel time 8796s.
To prevent this, invoke sched_clock_suspend() and sched_clock_resume() in
tick_freeze() together with timekeeping_suspend() and timekeeping_resume().
Fixes: 124cf9117c5f (PM / sleep: Make it possible to quiesce timers during suspend-to-idle)
Signed-off-by: Chang-An Chen <chang-an.chen@mediatek.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Matthias Brugger <matthias.bgg@gmail.com>
Cc: John Stultz <john.stultz@linaro.org>
Cc: Kees Cook <keescook@chromium.org>
Cc: Corey Minyard <cminyard@mvista.com>
Cc: <linux-mediatek@lists.infradead.org>
Cc: <linux-arm-kernel@lists.infradead.org>
Cc: Stanley Chu <stanley.chu@mediatek.com>
Cc: <kuohong.wang@mediatek.com>
Cc: <freddy.hsin@mediatek.com>
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/1553828349-8914-1-git-send-email-chang-an.chen@mediatek.com
|
|
Family 17h differs from prior families by:
- Does not support an L2 cache miss event
- It has re-enumerated PMC counters for:
- L2 cache references
- front & back end stalled cycles
So we add a new amd_f17h_perfmon_event_map[] so that the generic
perf event names will resolve to the correct h/w events on
family 17h and above processors.
Reference sections 2.1.13.3.3 (stalls) and 2.1.13.3.6 (L2):
https://www.amd.com/system/files/TechDocs/54945_PPR_Family_17h_Models_00h-0Fh.pdf
Signed-off-by: Kim Phillips <kim.phillips@amd.com>
Cc: <stable@vger.kernel.org> # v4.9+
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Janakarajan Natarajan <Janakarajan.Natarajan@amd.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Martin Liška <mliska@suse.cz>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Pu Wen <puwen@hygon.cn>
Cc: Suravee Suthikulpanit <Suravee.Suthikulpanit@amd.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Fixes: e40ed1542dd7 ("perf/x86: Add perf support for AMD family-17h processors")
[ Improved the formatting a bit. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
Specifically, max_tfd_queue_size should be 0x10000 like in
22560 family and not 0x100 like in 22000 family.
Signed-off-by: Shaul Triebitz <shaul.triebitz@intel.com>
Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
|
|
debugfs can now report an error code if something went wrong instead of
just NULL. So if the return value is to be used as a "real" dentry, it
needs to be checked if it is an error before dereferencing it.
This is now happening because of ff9fb72bc077 ("debugfs: return error
values, not NULL"). If multiple iwlwifi devices are in the system, this
can cause problems when the driver attempts to create the main debugfs
directory again. Later on in the code we fail horribly by trying to
dereference a pointer that is an error value.
Reported-by: Laura Abbott <labbott@redhat.com>
Reported-by: Gabriel Ramirez <gabriello.ramirez@gmail.com>
Cc: Johannes Berg <johannes.berg@intel.com>
Cc: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
Cc: Luca Coelho <luciano.coelho@intel.com>
Cc: Intel Linux Wireless <linuxwifi@intel.com>
Cc: Kalle Valo <kvalo@codeaurora.org>
Cc: stable <stable@vger.kernel.org> # 5.0
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
|
|
In ini debug TLVs bit 24 is set. The driver relies on it in the memory
allocation for the debug configuration. This implementation is
problematic in case of a new debug TLV that is not supported yet is added
and uses bit 24. In such a scenario the driver allocate space without
using it which causes errors in the apply point enabling flow.
Solve it by explicitly checking if a given TLV is part of the list of
the supported ini debug TLVs.
Signed-off-by: Shahar S Matityahu <shahar.s.matityahu@intel.com>
Fixes: f14cda6f3b31 ("iwlwifi: trans: parse and store debug ini TLVs")
Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
|
|
If we fail to initialize because rfkill is enabled, then trying
to do debug collection currently just fails. Prevent that in the
high-level code, although we should probably also fix the lower
level code to do things more carefully.
It's not 100% clear that it fixes this commit, as the original
dump code at the time might've been more careful. In any case,
we don't really need to dump anything in this expected scenario.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Fixes: 7125648074e8 ("iwlwifi: add fw dump upon RT ucode start failure")
Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
|
|
The driver uses msix causes-register to handle both msix and non msix
interrupts when performing sync nmi. On devices that do not support
msix this register is unmapped and accessing it causes a kernel panic.
Solve this by differentiating the two cases and accessing the proper
causes-register in each case.
Reported-by: Michal Hocko <mhocko@kernel.org>
Signed-off-by: Shahar S Matityahu <shahar.s.matityahu@intel.com>
Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
|
|
kernel_randomize_memory() uses __PHYSICAL_MASK_SHIFT to calculate
the maximum amount of system RAM supported. The size of the direct
mapping section is obtained from the smaller one of the below two
values:
(actual system RAM size + padding size) vs (max system RAM size supported)
This calculation is wrong since commit
b83ce5ee9147 ("x86/mm/64: Make __PHYSICAL_MASK_SHIFT always 52").
In it, __PHYSICAL_MASK_SHIFT was changed to be 52, regardless of whether
the kernel is using 4-level or 5-level page tables. Thus, it will always
use 4 PB as the maximum amount of system RAM, even in 4-level paging
mode where it should actually be 64 TB.
Thus, the size of the direct mapping section will always
be the sum of the actual system RAM size plus the padding size.
Even when the amount of system RAM is 64 TB, the following layout will
still be used. Obviously KALSR will be weakened significantly.
|____|_______actual RAM_______|_padding_|______the rest_______|
0 64TB ~120TB
Instead, it should be like this:
|____|_______actual RAM_______|_________the rest______________|
0 64TB ~120TB
The size of padding region is controlled by
CONFIG_RANDOMIZE_MEMORY_PHYSICAL_PADDING, which is 10 TB by default.
The above issue only exists when
CONFIG_RANDOMIZE_MEMORY_PHYSICAL_PADDING is set to a non-zero value,
which is the case when CONFIG_MEMORY_HOTPLUG is enabled. Otherwise,
using __PHYSICAL_MASK_SHIFT doesn't affect KASLR.
Fix it by replacing __PHYSICAL_MASK_SHIFT with MAX_PHYSMEM_BITS.
[ bp: Massage commit message. ]
Fixes: b83ce5ee9147 ("x86/mm/64: Make __PHYSICAL_MASK_SHIFT always 52")
Signed-off-by: Baoquan He <bhe@redhat.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Thomas Garnier <thgarnie@google.com>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Kees Cook <keescook@chromium.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: frank.ramsay@hpe.com
Cc: herbert@gondor.apana.org.au
Cc: kirill@shutemov.name
Cc: mike.travis@hpe.com
Cc: thgarnie@google.com
Cc: x86-ml <x86@kernel.org>
Cc: yamada.masahiro@socionext.com
Link: https://lkml.kernel.org/r/20190417083536.GE7065@MiWiFi-R3L-srv
|
|
clang points out that the return code from this function is
undefined for one of the error paths:
../drivers/s390/net/ctcm_main.c:1595:7: warning: variable 'result' is used uninitialized whenever 'if' condition is true
[-Wsometimes-uninitialized]
if (priv->channel[direction] == NULL) {
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
../drivers/s390/net/ctcm_main.c:1638:9: note: uninitialized use occurs here
return result;
^~~~~~
../drivers/s390/net/ctcm_main.c:1595:3: note: remove the 'if' if its condition is always false
if (priv->channel[direction] == NULL) {
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
../drivers/s390/net/ctcm_main.c:1539:12: note: initialize the variable 'result' to silence this warning
int result;
^
Make it return -ENODEV here, as in the related failure cases.
gcc has a known bug in underreporting some of these warnings
when it has already eliminated the assignment of the return code
based on some earlier optimization step.
Reviewed-by: Nathan Chancellor <natechancellor@gmail.com>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
There are a couple of spelling mistakes in NL_SET_ERR_MSG_MOD error
messages. Fix these.
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Acked-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Mukesh Ojha <mojha@codeaurora.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
gcc warn this:
drivers/net/ethernet/stmicro/stmmac/norm_desc.c: In function ndesc_init_rx_desc:
drivers/net/ethernet/stmicro/stmmac/norm_desc.c:138:6: warning: variable 'bfsize1' set but not used [-Wunused-but-set-variable]
Like enh_desc_init_rx_desc, we should use bfsize1
in ndesc_init_rx_desc to calculate 'p->des1'
Fixes: 583e63614149 ("net: stmmac: use correct DMA buffer size in the RX descriptor")
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Reviewed-by: Aaro Koskinen <aaro.koskinen@nokia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
When scatter to CQE is enabled on a DCT QP it corrupts the mailbox command
since it tried to treat it as as QP create mailbox command instead of a
DCT create command.
The corrupted mailbox command causes userspace to malfunction as the
device doesn't create the QP as expected.
A new mlx5 capability is exposed to user-space which ensures that it will
not enable the feature on DCT without this fix in the kernel.
Fixes: 5d6ff1babe78 ("IB/mlx5: Support scatter to CQE for DC transport type")
Signed-off-by: Guy Levi <guyle@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
|
|
There is a UBSAN report as below:
UBSAN: Undefined behaviour in net/ipv4/tcp_input.c:2877:56
signed integer overflow:
2147483647 * 1000 cannot be represented in type 'int'
CPU: 3 PID: 0 Comm: swapper/3 Not tainted 5.1.0-rc4-00058-g582549e #1
Call Trace:
<IRQ>
dump_stack+0x8c/0xba
ubsan_epilogue+0x11/0x60
handle_overflow+0x12d/0x170
? ttwu_do_wakeup+0x21/0x320
__ubsan_handle_mul_overflow+0x12/0x20
tcp_ack_update_rtt+0x76c/0x780
tcp_clean_rtx_queue+0x499/0x14d0
tcp_ack+0x69e/0x1240
? __wake_up_sync_key+0x2c/0x50
? update_group_capacity+0x50/0x680
tcp_rcv_established+0x4e2/0xe10
tcp_v4_do_rcv+0x22b/0x420
tcp_v4_rcv+0xfe8/0x1190
ip_protocol_deliver_rcu+0x36/0x180
ip_local_deliver+0x15b/0x1a0
ip_rcv+0xac/0xd0
__netif_receive_skb_one_core+0x7f/0xb0
__netif_receive_skb+0x33/0xc0
netif_receive_skb_internal+0x84/0x1c0
napi_gro_receive+0x2a0/0x300
receive_buf+0x3d4/0x2350
? detach_buf_split+0x159/0x390
virtnet_poll+0x198/0x840
? reweight_entity+0x243/0x4b0
net_rx_action+0x25c/0x770
__do_softirq+0x19b/0x66d
irq_exit+0x1eb/0x230
do_IRQ+0x7a/0x150
common_interrupt+0xf/0xf
</IRQ>
It can be reproduced by:
echo 2147483647 > /proc/sys/net/ipv4/tcp_min_rtt_wlen
Fixes: f672258391b42 ("tcp: track min RTT using windowed min-filter")
Signed-off-by: ZhangXiaoxu <zhangxiaoxu5@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
into drm-fixes
- GPUVM fixes for vega/RV and shadow buffers
- TTM fixes for hugepages
- TTM fix for refcount imbalance in error path
- DC AUX fix for some active DP-DVI dongles
- DC fix for multihead VT switch regression
Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Alex Deucher <alexdeucher@gmail.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190415051703.3377-1-alexander.deucher@amd.com
|
|
git://anongit.freedesktop.org/tegra/linux into drm-fixes
drm/tegra: Fixes for v5.1-rc6
This contains a follow-up fix for the stream ID programming and a fix
for a regression on older Tegra devices (Tegra20 and Tegra30) that are
running into a division by zero trying to enable audio over HDMI.
Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Thierry Reding <thierry.reding@gmail.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190417073525.21680-1-thierry.reding@gmail.com
|
|
Pull smb3 fixes from Steve French:
"Five small SMB3 fixes, all also for stable - an important fix for an
oplock (lease) bug, a handle leak, and three bugs spotted by KASAN"
* tag '5.1-rc5-smb3-fixes' of git://git.samba.org/sfrench/cifs-2.6:
CIFS: keep FileInfo handle live during oplock break
cifs: fix handle leak in smb2_query_symlink()
cifs: Fix lease buffer length error
cifs: Fix use-after-free in SMB2_read
cifs: Fix use-after-free in SMB2_write
|
|
If a request transmission fails due to write space or slot unavailability
errors, but the queued task then gets transmitted before it has time to
process the error in call_transmit_status() or call_bc_transmit_status(),
we need to suppress the transmission error code to prevent it from leaking
out of the RPC layer.
Reported-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Tested-by: Chuck Lever <chuck.lever@oracle.com>
|
|
This is a leftover from when the rings initially were not free flowing,
and hence a test for tail + 1 == head would indicate full. Since we now
let them wrap instead of mask them with the size, we need to check if
they drift more than the ring size from each other.
This fixes a case where we'd overwrite CQ ring entries, if the user
failed to reap completions. Both cases would ultimately result in lost
completions as the application violated the depth it asked for. The only
difference is that before this fix we'd return invalid entries for the
overflowed completions, instead of properly flagging it in the
cq_ring->overflow variable.
Reported-by: Stefan Bühler <source@stbuehler.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
We don't return NULL when we don't find the bpf_prog_info_node, fix
that.
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Reported-by: Song Liu <songliubraving@fb.com>
Acked-by: Song Liu <songliubraving@fb.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Fixes: 3792cb2ff43b ("perf bpf: Save BTF in a rbtree in perf_env")
Link: http://lkml.kernel.org/r/20190417145539.11669-1-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
By calling maps__insert() we assume to get 2 references on the map,
which we relese within maps__remove call.
However if there's already same map name, we currently don't bump the
reference and can crash, like:
Program received signal SIGABRT, Aborted.
0x00007ffff75e60f5 in raise () from /lib64/libc.so.6
(gdb) bt
#0 0x00007ffff75e60f5 in raise () from /lib64/libc.so.6
#1 0x00007ffff75d0895 in abort () from /lib64/libc.so.6
#2 0x00007ffff75d0769 in __assert_fail_base.cold () from /lib64/libc.so.6
#3 0x00007ffff75de596 in __assert_fail () from /lib64/libc.so.6
#4 0x00000000004fc006 in refcount_sub_and_test (i=1, r=0x1224e88) at tools/include/linux/refcount.h:131
#5 refcount_dec_and_test (r=0x1224e88) at tools/include/linux/refcount.h:148
#6 map__put (map=0x1224df0) at util/map.c:299
#7 0x00000000004fdb95 in __maps__remove (map=0x1224df0, maps=0xb17d80) at util/map.c:953
#8 maps__remove (maps=0xb17d80, map=0x1224df0) at util/map.c:959
#9 0x00000000004f7d8a in map_groups__remove (map=<optimized out>, mg=<optimized out>) at util/map_groups.h:65
#10 machine__process_ksymbol_unregister (sample=<optimized out>, event=0x7ffff7279670, machine=<optimized out>) at util/machine.c:728
#11 machine__process_ksymbol (machine=<optimized out>, event=0x7ffff7279670, sample=<optimized out>) at util/machine.c:741
#12 0x00000000004fffbb in perf_session__deliver_event (session=0xb11390, event=0x7ffff7279670, tool=0x7fffffffc7b0, file_offset=13936) at util/session.c:1362
#13 0x00000000005039bb in do_flush (show_progress=false, oe=0xb17e80) at util/ordered-events.c:243
#14 __ordered_events__flush (oe=0xb17e80, how=OE_FLUSH__ROUND, timestamp=<optimized out>) at util/ordered-events.c:322
#15 0x00000000005005e4 in perf_session__process_user_event (session=session@entry=0xb11390, event=event@entry=0x7ffff72a4af8,
...
Add the map to the list and getting the reference event if we find the
map with same name.
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Eric Saint-Etienne <eric.saint.etienne@oracle.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Song Liu <songliubraving@fb.com>
Fixes: 1e6285699b30 ("perf symbols: Fix slowness due to -ffunction-section")
Link: http://lkml.kernel.org/r/20190416160127.30203-10-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
Current perf_evlist__poll_thread() code could finish without draining
the data. Adding the logic that makes sure we won't finish before the
drain.
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Song Liu <songliubraving@fb.com>
Fixes: 657ee5531903 ("perf evlist: Introduce side band thread")
Link: http://lkml.kernel.org/r/20190416160127.30203-9-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
As reported by Jiri Olsa in:
"[BUG] perf: intel_pt won't display kernel function"
https://lore.kernel.org/lkml/20190403143738.GB32001@krava
Recent changes to support PERF_RECORD_KSYMBOL and PERF_RECORD_BPF_EVENT
broke --kallsyms option. This is because it broke test __map__is_kmodule.
This patch fixes this by adding check for bpf program, so that these maps
are not mistaken as kernel modules.
Signed-off-by: Song Liu <songliubraving@fb.com>
Reported-by: Jiri Olsa <jolsa@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Andrii Nakryiko <andrii.nakryiko@gmail.com>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Martin KaFai Lau <kafai@fb.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Yonghong Song <yhs@fb.com>
Link: http://lkml.kernel.org/r/20190416160127.30203-8-jolsa@kernel.org
Fixes: 76193a94522f ("perf, bpf: Introduce PERF_RECORD_KSYMBOL")
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
perf_env__find_bpf_prog_info()
We currently don't return NULL in case we don't find the
bpf_prog_info_node, fixing that.
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Acked-by: Song Liu <songliubraving@fb.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Fixes: e4378f0cb90b ("perf bpf: Save bpf_prog_info in a rbtree in perf_env")
Link: http://lkml.kernel.org/r/20190416134151.15282-1-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
Pull IPMI fixes from Corey Minyard:
"Fixes for some bugs cause by recent changes. One crash if you feed bad
data to the module parameters, one BUG that sometimes occurs when a
user closes the connection, and one bug that cause the driver to not
work if the configuration information only comes in from SMBIOS"
* tag 'for-linus-5.1-2' of git://github.com/cminyard/linux-ipmi:
ipmi: fix sleep-in-atomic in free_user at cleanup SRCU user->release_barrier
ipmi: ipmi_si_hardcode.c: init si_type array to fix a crash
ipmi: Fix failure on SMBIOS specified devices
|
|
Pull networking fixes from David Miller:
1) Handle init flow failures properly in iwlwifi driver, from Shahar S
Matityahu.
2) mac80211 TXQs need to be unscheduled on powersave start, from Felix
Fietkau.
3) SKB memory accounting fix in A-MDSU aggregation, from Felix Fietkau.
4) Increase RCU lock hold time in mlx5 FPGA code, from Saeed Mahameed.
5) Avoid checksum complete with XDP in mlx5, also from Saeed.
6) Fix netdev feature clobbering in ibmvnic driver, from Thomas Falcon.
7) Partial sent TLS record leak fix from Jakub Kicinski.
8) Reject zero size iova range in vhost, from Jason Wang.
9) Allow pending work to complete before clcsock release from Karsten
Graul.
10) Fix XDP handling max MTU in thunderx, from Matteo Croce.
11) A lot of protocols look at the sa_family field of a sockaddr before
validating it's length is large enough, from Tetsuo Handa.
12) Don't write to free'd pointer in qede ptp error path, from Colin Ian
King.
13) Have to recompile IP options in ipv4_link_failure because it can be
invoked from ARP, from Stephen Suryaputra.
14) Doorbell handling fixes in qed from Denis Bolotin.
15) Revert net-sysfs kobject register leak fix, it causes new problems.
From Wang Hai.
16) Spectre v1 fix in ATM code, from Gustavo A. R. Silva.
17) Fix put of BROPT_VLAN_STATS_PER_PORT in bridging code, from Nikolay
Aleksandrov.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (111 commits)
socket: fix compat SO_RCVTIMEO_NEW/SO_SNDTIMEO_NEW
tcp: tcp_grow_window() needs to respect tcp_space()
ocelot: Clean up stats update deferred work
ocelot: Don't sleep in atomic context (irqs_disabled())
net: bridge: fix netlink export of vlan_stats_per_port option
qed: fix spelling mistake "faspath" -> "fastpath"
tipc: set sysctl_tipc_rmem and named_timeout right range
tipc: fix link established but not in session
net: Fix missing meta data in skb with vlan packet
net: atm: Fix potential Spectre v1 vulnerabilities
net/core: work around section mismatch warning for ptp_classifier
net: bridge: fix per-port af_packet sockets
bnx2x: fix spelling mistake "dicline" -> "decline"
route: Avoid crash from dereferencing NULL rt->from
MAINTAINERS: normalize Woojung Huh's email address
bonding: fix event handling for stacked bonds
Revert "net-sysfs: Fix memory leak in netdev_register_kobject"
rtnetlink: fix rtnl_valid_stats_req() nlmsg_len check
qed: Fix the DORQ's attentions handling
qed: Fix missing DORQ attentions
...
|
|
free_user() could be called in atomic context.
This patch pushed the free operation off into a workqueue.
Example:
BUG: sleeping function called from invalid context at kernel/workqueue.c:2856
in_atomic(): 1, irqs_disabled(): 0, pid: 177, name: ksoftirqd/27
CPU: 27 PID: 177 Comm: ksoftirqd/27 Not tainted 4.19.25-3 #1
Hardware name: AIC 1S-HV26-08/MB-DPSB04-06, BIOS IVYBV060 10/21/2015
Call Trace:
dump_stack+0x5c/0x7b
___might_sleep+0xec/0x110
__flush_work+0x48/0x1f0
? try_to_del_timer_sync+0x4d/0x80
_cleanup_srcu_struct+0x104/0x140
free_user+0x18/0x30 [ipmi_msghandler]
ipmi_free_recv_msg+0x3a/0x50 [ipmi_msghandler]
deliver_response+0xbd/0xd0 [ipmi_msghandler]
deliver_local_response+0xe/0x30 [ipmi_msghandler]
handle_one_recv_msg+0x163/0xc80 [ipmi_msghandler]
? dequeue_entity+0xa0/0x960
handle_new_recv_msgs+0x15c/0x1f0 [ipmi_msghandler]
tasklet_action_common.isra.22+0x103/0x120
__do_softirq+0xf8/0x2d7
run_ksoftirqd+0x26/0x50
smpboot_thread_fn+0x11d/0x1e0
kthread+0x103/0x140
? sort_range+0x20/0x20
? kthread_destroy_worker+0x40/0x40
ret_from_fork+0x1f/0x40
Fixes: 77f8269606bf ("ipmi: fix use-after-free of user->release_barrier.rda")
Reported-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
Signed-off-by: Corey Minyard <cminyard@mvista.com>
Cc: stable@vger.kernel.org # 5.0
Cc: Yang Yingliang <yangyingliang@huawei.com>
|
|
The patch a6dbe4427559 ("vt: perform safe console erase in the right
order") introduced a bug. The conditional do_update_region() was
replaced by a call to update_region() that does contain the conditional
already, but with unwanted extra side effects such as restoring the cursor
drawing.
In order to reproduce the bug:
- use framebuffer console with the AMDGPU driver
- type "links" to start the console www browser
- press 'q' and space to exit links
Now the cursor will be permanently visible in the center of the
screen. It will stay there until something overwrites it.
The bug goes away if we change update_region() back to the conditional
do_update_region().
[ nico: reworded changelog ]
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Reviewed-by: Nicolas Pitre <nico@fluxnic.net>
Cc: stable@vger.kernel.org
Fixes: a6dbe4427559 ("vt: perform safe console erase in the right order")
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
When called with vmas_arg==NULL, get_user_pages_longterm() allocates
an array of nr_pages*8 which can easily get greater that the max order,
for example, registering memory for a 256GB guest does this and fails
in __alloc_pages_nodemask().
This adds a loop over chunks of entries to fit the max order limit.
Fixes: 678e174c4c16 ("powerpc/mm/iommu: allow migration of cma allocated pages during mm_iommu_do_alloc", 2019-03-05)
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|
|
Currently mm_iommu_do_alloc() is called in 2 cases:
- VFIO_IOMMU_SPAPR_REGISTER_MEMORY ioctl() for normal memory:
this locks &mem_list_mutex and then locks mm::mmap_sem
several times when adjusting locked_vm or pinning pages;
- vfio_pci_nvgpu_regops::mmap() for GPU memory:
this is called with mm::mmap_sem held already and it locks
&mem_list_mutex.
So one can craft a userspace program to do special ioctl and mmap in
2 threads concurrently and cause a deadlock which lockdep warns about
(below).
We did not hit this yet because QEMU constructs the machine in a single
thread.
This moves the overlap check next to where the new entry is added and
reduces the amount of time spent with &mem_list_mutex held.
This moves locked_vm adjustment from under &mem_list_mutex.
This relies on mm_iommu_adjust_locked_vm() doing nothing when entries==0.
This is one of the lockdep warnings:
======================================================
WARNING: possible circular locking dependency detected
5.1.0-rc2-le_nv2_aikATfstn1-p1 #363 Not tainted
------------------------------------------------------
qemu-system-ppc/8038 is trying to acquire lock:
000000002ec6c453 (mem_list_mutex){+.+.}, at: mm_iommu_do_alloc+0x70/0x490
but task is already holding lock:
00000000fd7da97f (&mm->mmap_sem){++++}, at: vm_mmap_pgoff+0xf0/0x160
which lock already depends on the new lock.
the existing dependency chain (in reverse order) is:
-> #1 (&mm->mmap_sem){++++}:
lock_acquire+0xf8/0x260
down_write+0x44/0xa0
mm_iommu_adjust_locked_vm.part.1+0x4c/0x190
mm_iommu_do_alloc+0x310/0x490
tce_iommu_ioctl.part.9+0xb84/0x1150 [vfio_iommu_spapr_tce]
vfio_fops_unl_ioctl+0x94/0x430 [vfio]
do_vfs_ioctl+0xe4/0x930
ksys_ioctl+0xc4/0x110
sys_ioctl+0x28/0x80
system_call+0x5c/0x70
-> #0 (mem_list_mutex){+.+.}:
__lock_acquire+0x1484/0x1900
lock_acquire+0xf8/0x260
__mutex_lock+0x88/0xa70
mm_iommu_do_alloc+0x70/0x490
vfio_pci_nvgpu_mmap+0xc0/0x130 [vfio_pci]
vfio_pci_mmap+0x198/0x2a0 [vfio_pci]
vfio_device_fops_mmap+0x44/0x70 [vfio]
mmap_region+0x5d4/0x770
do_mmap+0x42c/0x650
vm_mmap_pgoff+0x124/0x160
ksys_mmap_pgoff+0xdc/0x2f0
sys_mmap+0x40/0x80
system_call+0x5c/0x70
other info that might help us debug this:
Possible unsafe locking scenario:
CPU0 CPU1
---- ----
lock(&mm->mmap_sem);
lock(mem_list_mutex);
lock(&mm->mmap_sem);
lock(mem_list_mutex);
*** DEADLOCK ***
1 lock held by qemu-system-ppc/8038:
#0: 00000000fd7da97f (&mm->mmap_sem){++++}, at: vm_mmap_pgoff+0xf0/0x160
Fixes: c10c21efa4bc ("powerpc/vfio/iommu/kvm: Do not pin device memory", 2018-12-19)
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|
|
`ni6501_alloc_usb_buffers()` is called from `ni6501_auto_attach()` to
allocate RX and TX buffers for USB transfers. It allocates
`devpriv->usb_rx_buf` followed by `devpriv->usb_tx_buf`. If the
allocation of `devpriv->usb_tx_buf` fails, it frees
`devpriv->usb_rx_buf`, leaving the pointer set dangling, and returns an
error. Later, `ni6501_detach()` will be called from the core comedi
module code to clean up. `ni6501_detach()` also frees both
`devpriv->usb_rx_buf` and `devpriv->usb_tx_buf`, but
`devpriv->usb_rx_buf` may have already beed freed, leading to a
double-free error. Fix it bu removing the call to
`kfree(devpriv->usb_rx_buf)` from `ni6501_alloc_usb_buffers()`, relying
on `ni6501_detach()` to free the memory.
Signed-off-by: Ian Abbott <abbotti@mev.co.uk>
Cc: stable <stable@vger.kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|