Age | Commit message (Collapse) | Author |
|
kernel_randomize_memory() uses __PHYSICAL_MASK_SHIFT to calculate
the maximum amount of system RAM supported. The size of the direct
mapping section is obtained from the smaller one of the below two
values:
(actual system RAM size + padding size) vs (max system RAM size supported)
This calculation is wrong since commit
b83ce5ee9147 ("x86/mm/64: Make __PHYSICAL_MASK_SHIFT always 52").
In it, __PHYSICAL_MASK_SHIFT was changed to be 52, regardless of whether
the kernel is using 4-level or 5-level page tables. Thus, it will always
use 4 PB as the maximum amount of system RAM, even in 4-level paging
mode where it should actually be 64 TB.
Thus, the size of the direct mapping section will always
be the sum of the actual system RAM size plus the padding size.
Even when the amount of system RAM is 64 TB, the following layout will
still be used. Obviously KALSR will be weakened significantly.
|____|_______actual RAM_______|_padding_|______the rest_______|
0 64TB ~120TB
Instead, it should be like this:
|____|_______actual RAM_______|_________the rest______________|
0 64TB ~120TB
The size of padding region is controlled by
CONFIG_RANDOMIZE_MEMORY_PHYSICAL_PADDING, which is 10 TB by default.
The above issue only exists when
CONFIG_RANDOMIZE_MEMORY_PHYSICAL_PADDING is set to a non-zero value,
which is the case when CONFIG_MEMORY_HOTPLUG is enabled. Otherwise,
using __PHYSICAL_MASK_SHIFT doesn't affect KASLR.
Fix it by replacing __PHYSICAL_MASK_SHIFT with MAX_PHYSMEM_BITS.
[ bp: Massage commit message. ]
Fixes: b83ce5ee9147 ("x86/mm/64: Make __PHYSICAL_MASK_SHIFT always 52")
Signed-off-by: Baoquan He <bhe@redhat.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Thomas Garnier <thgarnie@google.com>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Kees Cook <keescook@chromium.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: frank.ramsay@hpe.com
Cc: herbert@gondor.apana.org.au
Cc: kirill@shutemov.name
Cc: mike.travis@hpe.com
Cc: thgarnie@google.com
Cc: x86-ml <x86@kernel.org>
Cc: yamada.masahiro@socionext.com
Link: https://lkml.kernel.org/r/20190417083536.GE7065@MiWiFi-R3L-srv
|
|
qdio.ko offers a small number of high-level functions to drive the
scanning of a QDIO queue for ready-to-process SBALs:
qdio_get_next_buffers(), __[ti]qdio_inbound_processing() and
__qdio_outbound_processing().
Let each of those functions maintain the 'start' index for their current
scan, and pass it to lower-level helpers as needed. This improves the
code's overall layering, and allows us to eliminate the additional
first_to_kick cursor with a follow-on patch.
Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Reviewed-by: Jens Remus <jremus@linux.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
|
|
Refactor all the low-level helpers to take the first_to_check cursor as
parameter, rather than accessing it directly.
Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Reviewed-by: Jens Remus <jremus@linux.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
|
|
$(call if_changed,...) must have FORCE as a prerequisite.
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1554280212-10578-1-git-send-email-yamada.masahiro@socionext.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
clang points out that the return code from this function is
undefined for one of the error paths:
../drivers/s390/net/ctcm_main.c:1595:7: warning: variable 'result' is used uninitialized whenever 'if' condition is true
[-Wsometimes-uninitialized]
if (priv->channel[direction] == NULL) {
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
../drivers/s390/net/ctcm_main.c:1638:9: note: uninitialized use occurs here
return result;
^~~~~~
../drivers/s390/net/ctcm_main.c:1595:3: note: remove the 'if' if its condition is always false
if (priv->channel[direction] == NULL) {
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
../drivers/s390/net/ctcm_main.c:1539:12: note: initialize the variable 'result' to silence this warning
int result;
^
Make it return -ENODEV here, as in the related failure cases.
gcc has a known bug in underreporting some of these warnings
when it has already eliminated the assignment of the return code
based on some earlier optimization step.
Reviewed-by: Nathan Chancellor <natechancellor@gmail.com>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
There are a couple of spelling mistakes in NL_SET_ERR_MSG_MOD error
messages. Fix these.
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Acked-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Mukesh Ojha <mojha@codeaurora.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
gcc warn this:
drivers/net/ethernet/stmicro/stmmac/norm_desc.c: In function ndesc_init_rx_desc:
drivers/net/ethernet/stmicro/stmmac/norm_desc.c:138:6: warning: variable 'bfsize1' set but not used [-Wunused-but-set-variable]
Like enh_desc_init_rx_desc, we should use bfsize1
in ndesc_init_rx_desc to calculate 'p->des1'
Fixes: 583e63614149 ("net: stmmac: use correct DMA buffer size in the RX descriptor")
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Reviewed-by: Aaro Koskinen <aaro.koskinen@nokia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
When scatter to CQE is enabled on a DCT QP it corrupts the mailbox command
since it tried to treat it as as QP create mailbox command instead of a
DCT create command.
The corrupted mailbox command causes userspace to malfunction as the
device doesn't create the QP as expected.
A new mlx5 capability is exposed to user-space which ensures that it will
not enable the feature on DCT without this fix in the kernel.
Fixes: 5d6ff1babe78 ("IB/mlx5: Support scatter to CQE for DC transport type")
Signed-off-by: Guy Levi <guyle@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
|
|
Currently, when a new resource group is created, the allocation values
of the MBA resource are not initialized and remain meaningless data.
For example:
mkdir /sys/fs/resctrl/p1
cat /sys/fs/resctrl/p1/schemata
MB:0=100;1=100
echo "MB:0=10;1=20" > /sys/fs/resctrl/p1/schemata
cat /sys/fs/resctrl/p1/schemata
MB:0= 10;1= 20
rmdir /sys/fs/resctrl/p1
mkdir /sys/fs/resctrl/p2
cat /sys/fs/resctrl/p2/schemata
MB:0= 10;1= 20
Therefore, when the new group is created, it is reasonable to initialize
MBA resource with default values.
Initialize the MBA resource and cache resources in separate functions.
[ bp: Add newlines between code blocks for better readability. ]
Signed-off-by: Xiaochen Shen <xiaochen.shen@intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Fenghua Yu <fenghua.yu@intel.com>
Reviewed-by: Reinette Chatre <reinette.chatre@intel.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: pei.p.jia@intel.com
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tony Luck <tony.luck@intel.com>
Cc: x86-ml <x86@kernel.org>
Link: https://lkml.kernel.org/r/1555499329-1170-3-git-send-email-xiaochen.shen@intel.com
|
|
Carve out per rdt_domain initialization code from rdtgroup_init_alloc()
into a separate function.
No functional change, make the code more readable and save us at least
two indentation levels.
Signed-off-by: Xiaochen Shen <xiaochen.shen@intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: pei.p.jia@intel.com
Cc: Reinette Chatre <reinette.chatre@intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tony Luck <tony.luck@intel.com>
Cc: x86-ml <x86@kernel.org>
Link: https://lkml.kernel.org/r/1555499329-1170-2-git-send-email-xiaochen.shen@intel.com
|
|
Merge it to pick up dependent urgent changes before applying more
resctrl stuff.
Signed-off-by: Borislav Petkov <bp@suse.de>
|
|
There is a UBSAN report as below:
UBSAN: Undefined behaviour in net/ipv4/tcp_input.c:2877:56
signed integer overflow:
2147483647 * 1000 cannot be represented in type 'int'
CPU: 3 PID: 0 Comm: swapper/3 Not tainted 5.1.0-rc4-00058-g582549e #1
Call Trace:
<IRQ>
dump_stack+0x8c/0xba
ubsan_epilogue+0x11/0x60
handle_overflow+0x12d/0x170
? ttwu_do_wakeup+0x21/0x320
__ubsan_handle_mul_overflow+0x12/0x20
tcp_ack_update_rtt+0x76c/0x780
tcp_clean_rtx_queue+0x499/0x14d0
tcp_ack+0x69e/0x1240
? __wake_up_sync_key+0x2c/0x50
? update_group_capacity+0x50/0x680
tcp_rcv_established+0x4e2/0xe10
tcp_v4_do_rcv+0x22b/0x420
tcp_v4_rcv+0xfe8/0x1190
ip_protocol_deliver_rcu+0x36/0x180
ip_local_deliver+0x15b/0x1a0
ip_rcv+0xac/0xd0
__netif_receive_skb_one_core+0x7f/0xb0
__netif_receive_skb+0x33/0xc0
netif_receive_skb_internal+0x84/0x1c0
napi_gro_receive+0x2a0/0x300
receive_buf+0x3d4/0x2350
? detach_buf_split+0x159/0x390
virtnet_poll+0x198/0x840
? reweight_entity+0x243/0x4b0
net_rx_action+0x25c/0x770
__do_softirq+0x19b/0x66d
irq_exit+0x1eb/0x230
do_IRQ+0x7a/0x150
common_interrupt+0xf/0xf
</IRQ>
It can be reproduced by:
echo 2147483647 > /proc/sys/net/ipv4/tcp_min_rtt_wlen
Fixes: f672258391b42 ("tcp: track min RTT using windowed min-filter")
Signed-off-by: ZhangXiaoxu <zhangxiaoxu5@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
into drm-fixes
- GPUVM fixes for vega/RV and shadow buffers
- TTM fixes for hugepages
- TTM fix for refcount imbalance in error path
- DC AUX fix for some active DP-DVI dongles
- DC fix for multihead VT switch regression
Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Alex Deucher <alexdeucher@gmail.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190415051703.3377-1-alexander.deucher@amd.com
|
|
git://anongit.freedesktop.org/tegra/linux into drm-fixes
drm/tegra: Fixes for v5.1-rc6
This contains a follow-up fix for the stream ID programming and a fix
for a regression on older Tegra devices (Tegra20 and Tegra30) that are
running into a division by zero trying to enable audio over HDMI.
Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Thierry Reding <thierry.reding@gmail.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190417073525.21680-1-thierry.reding@gmail.com
|
|
Pull smb3 fixes from Steve French:
"Five small SMB3 fixes, all also for stable - an important fix for an
oplock (lease) bug, a handle leak, and three bugs spotted by KASAN"
* tag '5.1-rc5-smb3-fixes' of git://git.samba.org/sfrench/cifs-2.6:
CIFS: keep FileInfo handle live during oplock break
cifs: fix handle leak in smb2_query_symlink()
cifs: Fix lease buffer length error
cifs: Fix use-after-free in SMB2_read
cifs: Fix use-after-free in SMB2_write
|
|
If a request transmission fails due to write space or slot unavailability
errors, but the queued task then gets transmitted before it has time to
process the error in call_transmit_status() or call_bc_transmit_status(),
we need to suppress the transmission error code to prevent it from leaking
out of the RPC layer.
Reported-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Tested-by: Chuck Lever <chuck.lever@oracle.com>
|
|
Configure s390 runtime CPU speculation bug mitigations in accordance
with the 'mitigations=' cmdline option. This affects Spectre v1 and
Spectre v2.
The default behavior is unchanged.
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Jiri Kosina <jkosina@suse.cz> (on x86)
Reviewed-by: Jiri Kosina <jkosina@suse.cz>
Cc: Borislav Petkov <bp@alien8.de>
Cc: "H . Peter Anvin" <hpa@zytor.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Jiri Kosina <jikos@kernel.org>
Cc: Waiman Long <longman@redhat.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Jon Masters <jcm@redhat.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: linuxppc-dev@lists.ozlabs.org
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: linux-s390@vger.kernel.org
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-arch@vger.kernel.org
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Tyler Hicks <tyhicks@canonical.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: Steven Price <steven.price@arm.com>
Cc: Phil Auld <pauld@redhat.com>
Link: https://lkml.kernel.org/r/e4a161805458a5ec88812aac0307ae3908a030fc.1555085500.git.jpoimboe@redhat.com
|
|
Configure powerpc CPU runtime speculation bug mitigations in accordance
with the 'mitigations=' cmdline option. This affects Meltdown, Spectre
v1, Spectre v2, and Speculative Store Bypass.
The default behavior is unchanged.
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Jiri Kosina <jkosina@suse.cz> (on x86)
Reviewed-by: Jiri Kosina <jkosina@suse.cz>
Cc: Borislav Petkov <bp@alien8.de>
Cc: "H . Peter Anvin" <hpa@zytor.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Jiri Kosina <jikos@kernel.org>
Cc: Waiman Long <longman@redhat.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Jon Masters <jcm@redhat.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: linuxppc-dev@lists.ozlabs.org
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: linux-s390@vger.kernel.org
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-arch@vger.kernel.org
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Tyler Hicks <tyhicks@canonical.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: Steven Price <steven.price@arm.com>
Cc: Phil Auld <pauld@redhat.com>
Link: https://lkml.kernel.org/r/245a606e1a42a558a310220312d9b6adb9159df6.1555085500.git.jpoimboe@redhat.com
|
|
Configure x86 runtime CPU speculation bug mitigations in accordance with
the 'mitigations=' cmdline option. This affects Meltdown, Spectre v2,
Speculative Store Bypass, and L1TF.
The default behavior is unchanged.
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Jiri Kosina <jkosina@suse.cz> (on x86)
Reviewed-by: Jiri Kosina <jkosina@suse.cz>
Cc: Borislav Petkov <bp@alien8.de>
Cc: "H . Peter Anvin" <hpa@zytor.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Jiri Kosina <jikos@kernel.org>
Cc: Waiman Long <longman@redhat.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Jon Masters <jcm@redhat.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: linuxppc-dev@lists.ozlabs.org
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: linux-s390@vger.kernel.org
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-arch@vger.kernel.org
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Tyler Hicks <tyhicks@canonical.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: Steven Price <steven.price@arm.com>
Cc: Phil Auld <pauld@redhat.com>
Link: https://lkml.kernel.org/r/6616d0ae169308516cfdf5216bedd169f8a8291b.1555085500.git.jpoimboe@redhat.com
|
|
Keeping track of the number of mitigations for all the CPU speculation
bugs has become overwhelming for many users. It's getting more and more
complicated to decide which mitigations are needed for a given
architecture. Complicating matters is the fact that each arch tends to
have its own custom way to mitigate the same vulnerability.
Most users fall into a few basic categories:
a) they want all mitigations off;
b) they want all reasonable mitigations on, with SMT enabled even if
it's vulnerable; or
c) they want all reasonable mitigations on, with SMT disabled if
vulnerable.
Define a set of curated, arch-independent options, each of which is an
aggregation of existing options:
- mitigations=off: Disable all mitigations.
- mitigations=auto: [default] Enable all the default mitigations, but
leave SMT enabled, even if it's vulnerable.
- mitigations=auto,nosmt: Enable all the default mitigations, disabling
SMT if needed by a mitigation.
Currently, these options are placeholders which don't actually do
anything. They will be fleshed out in upcoming patches.
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Jiri Kosina <jkosina@suse.cz> (on x86)
Reviewed-by: Jiri Kosina <jkosina@suse.cz>
Cc: Borislav Petkov <bp@alien8.de>
Cc: "H . Peter Anvin" <hpa@zytor.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Jiri Kosina <jikos@kernel.org>
Cc: Waiman Long <longman@redhat.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Jon Masters <jcm@redhat.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: linuxppc-dev@lists.ozlabs.org
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: linux-s390@vger.kernel.org
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-arch@vger.kernel.org
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Tyler Hicks <tyhicks@canonical.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: Steven Price <steven.price@arm.com>
Cc: Phil Auld <pauld@redhat.com>
Link: https://lkml.kernel.org/r/b07a8ef9b7c5055c3a4637c87d07c296d5016fe0.1555085500.git.jpoimboe@redhat.com
|
|
This is a leftover from when the rings initially were not free flowing,
and hence a test for tail + 1 == head would indicate full. Since we now
let them wrap instead of mask them with the size, we need to check if
they drift more than the ring size from each other.
This fixes a case where we'd overwrite CQ ring entries, if the user
failed to reap completions. Both cases would ultimately result in lost
completions as the application violated the depth it asked for. The only
difference is that before this fix we'd return invalid entries for the
overflowed completions, instead of properly flagging it in the
cq_ring->overflow variable.
Reported-by: Stefan Bühler <source@stbuehler.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
We don't return NULL when we don't find the bpf_prog_info_node, fix
that.
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Reported-by: Song Liu <songliubraving@fb.com>
Acked-by: Song Liu <songliubraving@fb.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Fixes: 3792cb2ff43b ("perf bpf: Save BTF in a rbtree in perf_env")
Link: http://lkml.kernel.org/r/20190417145539.11669-1-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
By calling maps__insert() we assume to get 2 references on the map,
which we relese within maps__remove call.
However if there's already same map name, we currently don't bump the
reference and can crash, like:
Program received signal SIGABRT, Aborted.
0x00007ffff75e60f5 in raise () from /lib64/libc.so.6
(gdb) bt
#0 0x00007ffff75e60f5 in raise () from /lib64/libc.so.6
#1 0x00007ffff75d0895 in abort () from /lib64/libc.so.6
#2 0x00007ffff75d0769 in __assert_fail_base.cold () from /lib64/libc.so.6
#3 0x00007ffff75de596 in __assert_fail () from /lib64/libc.so.6
#4 0x00000000004fc006 in refcount_sub_and_test (i=1, r=0x1224e88) at tools/include/linux/refcount.h:131
#5 refcount_dec_and_test (r=0x1224e88) at tools/include/linux/refcount.h:148
#6 map__put (map=0x1224df0) at util/map.c:299
#7 0x00000000004fdb95 in __maps__remove (map=0x1224df0, maps=0xb17d80) at util/map.c:953
#8 maps__remove (maps=0xb17d80, map=0x1224df0) at util/map.c:959
#9 0x00000000004f7d8a in map_groups__remove (map=<optimized out>, mg=<optimized out>) at util/map_groups.h:65
#10 machine__process_ksymbol_unregister (sample=<optimized out>, event=0x7ffff7279670, machine=<optimized out>) at util/machine.c:728
#11 machine__process_ksymbol (machine=<optimized out>, event=0x7ffff7279670, sample=<optimized out>) at util/machine.c:741
#12 0x00000000004fffbb in perf_session__deliver_event (session=0xb11390, event=0x7ffff7279670, tool=0x7fffffffc7b0, file_offset=13936) at util/session.c:1362
#13 0x00000000005039bb in do_flush (show_progress=false, oe=0xb17e80) at util/ordered-events.c:243
#14 __ordered_events__flush (oe=0xb17e80, how=OE_FLUSH__ROUND, timestamp=<optimized out>) at util/ordered-events.c:322
#15 0x00000000005005e4 in perf_session__process_user_event (session=session@entry=0xb11390, event=event@entry=0x7ffff72a4af8,
...
Add the map to the list and getting the reference event if we find the
map with same name.
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Eric Saint-Etienne <eric.saint.etienne@oracle.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Song Liu <songliubraving@fb.com>
Fixes: 1e6285699b30 ("perf symbols: Fix slowness due to -ffunction-section")
Link: http://lkml.kernel.org/r/20190416160127.30203-10-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
Current perf_evlist__poll_thread() code could finish without draining
the data. Adding the logic that makes sure we won't finish before the
drain.
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Song Liu <songliubraving@fb.com>
Fixes: 657ee5531903 ("perf evlist: Introduce side band thread")
Link: http://lkml.kernel.org/r/20190416160127.30203-9-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
As reported by Jiri Olsa in:
"[BUG] perf: intel_pt won't display kernel function"
https://lore.kernel.org/lkml/20190403143738.GB32001@krava
Recent changes to support PERF_RECORD_KSYMBOL and PERF_RECORD_BPF_EVENT
broke --kallsyms option. This is because it broke test __map__is_kmodule.
This patch fixes this by adding check for bpf program, so that these maps
are not mistaken as kernel modules.
Signed-off-by: Song Liu <songliubraving@fb.com>
Reported-by: Jiri Olsa <jolsa@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Andrii Nakryiko <andrii.nakryiko@gmail.com>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Martin KaFai Lau <kafai@fb.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Yonghong Song <yhs@fb.com>
Link: http://lkml.kernel.org/r/20190416160127.30203-8-jolsa@kernel.org
Fixes: 76193a94522f ("perf, bpf: Introduce PERF_RECORD_KSYMBOL")
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
perf_env__find_bpf_prog_info()
We currently don't return NULL in case we don't find the
bpf_prog_info_node, fixing that.
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Acked-by: Song Liu <songliubraving@fb.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Fixes: e4378f0cb90b ("perf bpf: Save bpf_prog_info in a rbtree in perf_env")
Link: http://lkml.kernel.org/r/20190416134151.15282-1-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
Pull IPMI fixes from Corey Minyard:
"Fixes for some bugs cause by recent changes. One crash if you feed bad
data to the module parameters, one BUG that sometimes occurs when a
user closes the connection, and one bug that cause the driver to not
work if the configuration information only comes in from SMBIOS"
* tag 'for-linus-5.1-2' of git://github.com/cminyard/linux-ipmi:
ipmi: fix sleep-in-atomic in free_user at cleanup SRCU user->release_barrier
ipmi: ipmi_si_hardcode.c: init si_type array to fix a crash
ipmi: Fix failure on SMBIOS specified devices
|
|
Pull networking fixes from David Miller:
1) Handle init flow failures properly in iwlwifi driver, from Shahar S
Matityahu.
2) mac80211 TXQs need to be unscheduled on powersave start, from Felix
Fietkau.
3) SKB memory accounting fix in A-MDSU aggregation, from Felix Fietkau.
4) Increase RCU lock hold time in mlx5 FPGA code, from Saeed Mahameed.
5) Avoid checksum complete with XDP in mlx5, also from Saeed.
6) Fix netdev feature clobbering in ibmvnic driver, from Thomas Falcon.
7) Partial sent TLS record leak fix from Jakub Kicinski.
8) Reject zero size iova range in vhost, from Jason Wang.
9) Allow pending work to complete before clcsock release from Karsten
Graul.
10) Fix XDP handling max MTU in thunderx, from Matteo Croce.
11) A lot of protocols look at the sa_family field of a sockaddr before
validating it's length is large enough, from Tetsuo Handa.
12) Don't write to free'd pointer in qede ptp error path, from Colin Ian
King.
13) Have to recompile IP options in ipv4_link_failure because it can be
invoked from ARP, from Stephen Suryaputra.
14) Doorbell handling fixes in qed from Denis Bolotin.
15) Revert net-sysfs kobject register leak fix, it causes new problems.
From Wang Hai.
16) Spectre v1 fix in ATM code, from Gustavo A. R. Silva.
17) Fix put of BROPT_VLAN_STATS_PER_PORT in bridging code, from Nikolay
Aleksandrov.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (111 commits)
socket: fix compat SO_RCVTIMEO_NEW/SO_SNDTIMEO_NEW
tcp: tcp_grow_window() needs to respect tcp_space()
ocelot: Clean up stats update deferred work
ocelot: Don't sleep in atomic context (irqs_disabled())
net: bridge: fix netlink export of vlan_stats_per_port option
qed: fix spelling mistake "faspath" -> "fastpath"
tipc: set sysctl_tipc_rmem and named_timeout right range
tipc: fix link established but not in session
net: Fix missing meta data in skb with vlan packet
net: atm: Fix potential Spectre v1 vulnerabilities
net/core: work around section mismatch warning for ptp_classifier
net: bridge: fix per-port af_packet sockets
bnx2x: fix spelling mistake "dicline" -> "decline"
route: Avoid crash from dereferencing NULL rt->from
MAINTAINERS: normalize Woojung Huh's email address
bonding: fix event handling for stacked bonds
Revert "net-sysfs: Fix memory leak in netdev_register_kobject"
rtnetlink: fix rtnl_valid_stats_req() nlmsg_len check
qed: Fix the DORQ's attentions handling
qed: Fix missing DORQ attentions
...
|
|
free_user() could be called in atomic context.
This patch pushed the free operation off into a workqueue.
Example:
BUG: sleeping function called from invalid context at kernel/workqueue.c:2856
in_atomic(): 1, irqs_disabled(): 0, pid: 177, name: ksoftirqd/27
CPU: 27 PID: 177 Comm: ksoftirqd/27 Not tainted 4.19.25-3 #1
Hardware name: AIC 1S-HV26-08/MB-DPSB04-06, BIOS IVYBV060 10/21/2015
Call Trace:
dump_stack+0x5c/0x7b
___might_sleep+0xec/0x110
__flush_work+0x48/0x1f0
? try_to_del_timer_sync+0x4d/0x80
_cleanup_srcu_struct+0x104/0x140
free_user+0x18/0x30 [ipmi_msghandler]
ipmi_free_recv_msg+0x3a/0x50 [ipmi_msghandler]
deliver_response+0xbd/0xd0 [ipmi_msghandler]
deliver_local_response+0xe/0x30 [ipmi_msghandler]
handle_one_recv_msg+0x163/0xc80 [ipmi_msghandler]
? dequeue_entity+0xa0/0x960
handle_new_recv_msgs+0x15c/0x1f0 [ipmi_msghandler]
tasklet_action_common.isra.22+0x103/0x120
__do_softirq+0xf8/0x2d7
run_ksoftirqd+0x26/0x50
smpboot_thread_fn+0x11d/0x1e0
kthread+0x103/0x140
? sort_range+0x20/0x20
? kthread_destroy_worker+0x40/0x40
ret_from_fork+0x1f/0x40
Fixes: 77f8269606bf ("ipmi: fix use-after-free of user->release_barrier.rda")
Reported-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
Signed-off-by: Corey Minyard <cminyard@mvista.com>
Cc: stable@vger.kernel.org # 5.0
Cc: Yang Yingliang <yangyingliang@huawei.com>
|
|
All stack types on x86 64-bit have guard pages now.
So there is no point in executing probabilistic overflow checks as the
guard pages are a accurate and reliable overflow prevention.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Nicolai Stange <nstange@suse.de>
Cc: Sean Christopherson <sean.j.christopherson@intel.com>
Cc: x86-ml <x86@kernel.org>
Link: https://lkml.kernel.org/r/20190414160146.466354762@linutronix.de
|
|
The IRQ stack lives in percpu space, so an IRQ handler that overflows it
will overwrite other data structures.
Use vmap() to remap the IRQ stack so that it will have the usual guard
pages that vmap()/vmalloc() allocations have. With this, the kernel will
panic immediately on an IRQ stack overflow.
[ tglx: Move the map code to a proper place and invoke it only when a CPU
is about to be brought online. No point in installing the map at
early boot for all possible CPUs. Fail the CPU bringup if the vmap()
fails as done for all other preparatory stages in CPU hotplug. ]
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Nicolai Stange <nstange@suse.de>
Cc: Sean Christopherson <sean.j.christopherson@intel.com>
Cc: x86-ml <x86@kernel.org>
Link: https://lkml.kernel.org/r/20190414160146.363733568@linutronix.de
|
|
Currently, the IRQ stack is hardcoded as the first page of the percpu
area, and the stack canary lives on the IRQ stack. The former gets in
the way of adding an IRQ stack guard page, and the latter is a potential
weakness in the stack canary mechanism.
Split the IRQ stack into its own private percpu pages.
[ tglx: Make 64 and 32 bit share struct irq_stack ]
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Brijesh Singh <brijesh.singh@amd.com>
Cc: "Chang S. Bae" <chang.seok.bae@intel.com>
Cc: Dominik Brodowski <linux@dominikbrodowski.net>
Cc: Feng Tang <feng.tang@intel.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jan Beulich <JBeulich@suse.com>
Cc: Jiri Kosina <jkosina@suse.cz>
Cc: Joerg Roedel <jroedel@suse.de>
Cc: Jordan Borgner <mail@jordan-borgner.de>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Maran Wilson <maran.wilson@oracle.com>
Cc: Masahiro Yamada <yamada.masahiro@socionext.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mike Rapoport <rppt@linux.vnet.ibm.com>
Cc: Nick Desaulniers <ndesaulniers@google.com>
Cc: Nicolai Stange <nstange@suse.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Pu Wen <puwen@hygon.cn>
Cc: "Rafael Ávila de Espíndola" <rafael@espindo.la>
Cc: Sean Christopherson <sean.j.christopherson@intel.com>
Cc: Stefano Stabellini <sstabellini@kernel.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: x86-ml <x86@kernel.org>
Cc: xen-devel@lists.xenproject.org
Link: https://lkml.kernel.org/r/20190414160146.267376656@linutronix.de
|
|
Preparatory change for disentangling the irq stack union as a
prerequisite for irq stacks with guard pages.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: "Chang S. Bae" <chang.seok.bae@intel.com>
Cc: Dominik Brodowski <linux@dominikbrodowski.net>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Nicolai Stange <nstange@suse.de>
Cc: Pavel Tatashin <pasha.tatashin@oracle.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Sean Christopherson <sean.j.christopherson@intel.com>
Cc: x86-ml <x86@kernel.org>
Cc: Yi Wang <wang.yi59@zte.com.cn>
Link: https://lkml.kernel.org/r/20190414160146.177558566@linutronix.de
|
|
irq_ctx_init() crashes hard on page allocation failures. While that's ok
during early boot, it's just wrong in the CPU hotplug bringup code.
Check the page allocation failure and return -ENOMEM and handle it at the
call sites. On early boot the only way out is to BUG(), but on CPU hotplug
there is no reason to crash, so just abort the operation.
Rename the function to something more sensible while at it.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Alison Schofield <alison.schofield@intel.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Anshuman Khandual <anshuman.khandual@arm.com>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Nicolai Stange <nstange@suse.de>
Cc: Pu Wen <puwen@hygon.cn>
Cc: Sean Christopherson <sean.j.christopherson@intel.com>
Cc: Shaokun Zhang <zhangshaokun@hisilicon.com>
Cc: Stefano Stabellini <sstabellini@kernel.org>
Cc: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
Cc: x86-ml <x86@kernel.org>
Cc: xen-devel@lists.xenproject.org
Cc: Yazen Ghannam <yazen.ghannam@amd.com>
Cc: Yi Wang <wang.yi59@zte.com.cn>
Cc: Zhenzhong Duan <zhenzhong.duan@oracle.com>
Link: https://lkml.kernel.org/r/20190414160146.089060584@linutronix.de
|
|
irq_ctx_init() is invoked from native_init_IRQ() or from xen_init_IRQ()
code. There is no reason to have this split. The interrupt stacks must be
allocated no matter what.
Invoke it from init_IRQ() before invoking the native or XEN init
implementation.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Juergen Gross <jgross@suse.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Josh Abraham <j.abraham1776@gmail.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Mike Rapoport <rppt@linux.vnet.ibm.com>
Cc: Nicolai Stange <nstange@suse.de>
Cc: Sean Christopherson <sean.j.christopherson@intel.com>
Cc: Stefano Stabellini <sstabellini@kernel.org>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: x86-ml <x86@kernel.org>
Cc: xen-devel@lists.xenproject.org
Link: https://lkml.kernel.org/r/20190414160146.001162606@linutronix.de
|
|
Preparatory patch to share code with 32bit.
No functional changes.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: "Chang S. Bae" <chang.seok.bae@intel.com>
Cc: Dominik Brodowski <linux@dominikbrodowski.net>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Kosina <jkosina@suse.cz>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mike Rapoport <rppt@linux.vnet.ibm.com>
Cc: Nick Desaulniers <ndesaulniers@google.com>
Cc: Nicolai Stange <nstange@suse.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Pingfan Liu <kernelfans@gmail.com>
Cc: Sean Christopherson <sean.j.christopherson@intel.com>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: x86-ml <x86@kernel.org>
Link: https://lkml.kernel.org/r/20190414160145.912584074@linutronix.de
|
|
The percpu storage holds a pointer to the stack not the stack
itself. Rename it before sharing struct irq_stack with 64-bit.
No functional changes.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Kosina <jkosina@suse.cz>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Nick Desaulniers <ndesaulniers@google.com>
Cc: Nicolai Stange <nstange@suse.de>
Cc: Sean Christopherson <sean.j.christopherson@intel.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: x86-ml <x86@kernel.org>
Link: https://lkml.kernel.org/r/20190414160145.824805922@linutronix.de
|
|
There is no reason to have an u32 array in struct irq_stack. The only
purpose of the array is to size the struct properly.
Preparatory change for sharing struct irq_stack with 64-bit.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Kosina <jkosina@suse.cz>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Nick Desaulniers <ndesaulniers@google.com>
Cc: Pingfan Liu <kernelfans@gmail.com>
Cc: Pu Wen <puwen@hygon.cn>
Cc: Sean Christopherson <sean.j.christopherson@intel.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: x86-ml <x86@kernel.org>
Link: https://lkml.kernel.org/r/20190414160145.736241969@linutronix.de
|
|
On 32-bit IRQ_STACK_SIZE is the same as THREAD_SIZE.
To allow sharing struct irq_stack with 32-bit, define IRQ_STACK_SIZE for
32-bit and use it for struct irq_stack.
No functional change.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Kosina <jkosina@suse.cz>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Nick Desaulniers <ndesaulniers@google.com>
Cc: Sean Christopherson <sean.j.christopherson@intel.com>
Cc: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: x86-ml <x86@kernel.org>
Link: https://lkml.kernel.org/r/20190414160145.632513987@linutronix.de
|
|
The current implementation of in_exception_stack() iterates over the
exception stacks array. Most of the time this is an useless exercise, but
even for the actual use cases (perf and ftrace) it takes at least 2
iterations to get to the NMI stack.
As the exception stacks and the guard pages are page aligned the loop can
be avoided completely.
Add a initial check whether the stack pointer is inside the full exception
stack area and leave early if not.
Create a lookup table which describes the stack area. The table index is
the page offset from the beginning of the exception stacks. So for any
given stack pointer the page offset is computed and a lookup in the
description table is performed. If it is inside a guard page, return. If
not, use the descriptor to fill in the info structure.
The table is filled at compile time and for the !KASAN case the interesting
page descriptors exactly fit into a single cache line. Just the last guard
page descriptor is in the next cacheline, but that should not be accessed
in the regular case.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Borislav Petkov <bp@suse.de>
Acked-by: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Sean Christopherson <sean.j.christopherson@intel.com>
Cc: x86-ml <x86@kernel.org>
Link: https://lkml.kernel.org/r/20190414160145.543320386@linutronix.de
|
|
The debug IST stack is actually two separate debug stacks to handle #DB
recursion. This is required because the CPU starts always at top of stack
on exception entry, which means on #DB recursion the second #DB would
overwrite the stack of the first.
The low level entry code therefore adjusts the top of stack on entry so a
secondary #DB starts from a different stack page. But the stack pages are
adjacent without a guard page between them.
Split the debug stack into 3 stacks which are separated by guard pages. The
3rd stack is never mapped into the cpu_entry_area and is only there to
catch triple #DB nesting:
--- top of DB_stack <- Initial stack
--- end of DB_stack
guard page
--- top of DB1_stack <- Top of stack after entering first #DB
--- end of DB1_stack
guard page
--- top of DB2_stack <- Top of stack after entering second #DB
--- end of DB2_stack
guard page
If DB2 would not act as the final guard hole, a second #DB would point the
top of #DB stack to the stack below #DB1 which would be valid and not catch
the not so desired triple nesting.
The backing store does not allocate any memory for DB2 and its guard page
as it is not going to be mapped into the cpu_entry_area.
- Adjust the low level entry code so it adjusts top of #DB with the offset
between the stacks instead of exception stack size.
- Make the dumpstack code aware of the new stacks.
- Adjust the in_debug_stack() implementation and move it into the NMI code
where it belongs. As this is NMI hotpath code, it just checks the full
area between top of DB_stack and bottom of DB1_stack without checking
for the guard page. That's correct because the NMI cannot hit a
stackpointer pointing to the guard page between DB and DB1 stack. Even
if it would, then the NMI operation still is unaffected, but the resume
of the debug exception on the topmost DB stack will crash by touching
the guard page.
[ bp: Make exception_stack_names static const char * const ]
Suggested-by: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Sean Christopherson <sean.j.christopherson@intel.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Baoquan He <bhe@redhat.com>
Cc: "Chang S. Bae" <chang.seok.bae@intel.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Dominik Brodowski <linux@dominikbrodowski.net>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Joerg Roedel <jroedel@suse.de>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: linux-doc@vger.kernel.org
Cc: Masahiro Yamada <yamada.masahiro@socionext.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Qian Cai <cai@lca.pw>
Cc: Sean Christopherson <sean.j.christopherson@intel.com>
Cc: x86-ml <x86@kernel.org>
Link: https://lkml.kernel.org/r/20190414160145.439944544@linutronix.de
|
|
All usage sites which expected that the exception stacks in the CPU entry
area are mapped linearly are fixed up. Enable guard pages between the
IST stacks.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Sean Christopherson <sean.j.christopherson@intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: x86-ml <x86@kernel.org>
Link: https://lkml.kernel.org/r/20190414160145.349862042@linutronix.de
|
|
The entry order of the TSS.IST array and the order of the stack
storage/mapping are not required to be the same.
With the upcoming split of the debug stack this is going to fall apart as
the number of TSS.IST array entries stays the same while the actual stacks
are increasing.
Make them separate so that code like dumpstack can just utilize the mapping
order. The IST index is solely required for the actual TSS.IST array
initialization.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Baoquan He <bhe@redhat.com>
Cc: "Chang S. Bae" <chang.seok.bae@intel.com>
Cc: Dominik Brodowski <linux@dominikbrodowski.net>
Cc: Dou Liyang <douly.fnst@cn.fujitsu.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jann Horn <jannh@google.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Nicolai Stange <nstange@suse.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Qian Cai <cai@lca.pw>
Cc: Sean Christopherson <sean.j.christopherson@intel.com>
Cc: x86-ml <x86@kernel.org>
Link: https://lkml.kernel.org/r/20190414160145.241588113@linutronix.de
|
|
The patch a6dbe4427559 ("vt: perform safe console erase in the right
order") introduced a bug. The conditional do_update_region() was
replaced by a call to update_region() that does contain the conditional
already, but with unwanted extra side effects such as restoring the cursor
drawing.
In order to reproduce the bug:
- use framebuffer console with the AMDGPU driver
- type "links" to start the console www browser
- press 'q' and space to exit links
Now the cursor will be permanently visible in the center of the
screen. It will stay there until something overwrites it.
The bug goes away if we change update_region() back to the conditional
do_update_region().
[ nico: reworded changelog ]
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Reviewed-by: Nicolas Pitre <nico@fluxnic.net>
Cc: stable@vger.kernel.org
Fixes: a6dbe4427559 ("vt: perform safe console erase in the right order")
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
All users gone.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: "Chang S. Bae" <chang.seok.bae@intel.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Dominik Brodowski <linux@dominikbrodowski.net>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Kosina <jkosina@suse.cz>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Nick Desaulniers <ndesaulniers@google.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Pingfan Liu <kernelfans@gmail.com>
Cc: Pu Wen <puwen@hygon.cn>
Cc: Sean Christopherson <sean.j.christopherson@intel.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: x86-ml <x86@kernel.org>
Link: https://lkml.kernel.org/r/20190414160145.151435667@linutronix.de
|
|
Convert the TSS.IST setup code to use the cpu entry area information
directly instead of assuming a linear mapping of the IST stacks.
The store to orig_ist[] is no longer required as there are no users
anymore.
This is the last preparatory step towards IST guard pages.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: "Chang S. Bae" <chang.seok.bae@intel.com>
Cc: Dominik Brodowski <linux@dominikbrodowski.net>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Sean Christopherson <sean.j.christopherson@intel.com>
Cc: x86-ml <x86@kernel.org>
Link: https://lkml.kernel.org/r/20190414160145.061686012@linutronix.de
|
|
When called with vmas_arg==NULL, get_user_pages_longterm() allocates
an array of nr_pages*8 which can easily get greater that the max order,
for example, registering memory for a 256GB guest does this and fails
in __alloc_pages_nodemask().
This adds a loop over chunks of entries to fit the max order limit.
Fixes: 678e174c4c16 ("powerpc/mm/iommu: allow migration of cma allocated pages during mm_iommu_do_alloc", 2019-03-05)
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|
|
Currently mm_iommu_do_alloc() is called in 2 cases:
- VFIO_IOMMU_SPAPR_REGISTER_MEMORY ioctl() for normal memory:
this locks &mem_list_mutex and then locks mm::mmap_sem
several times when adjusting locked_vm or pinning pages;
- vfio_pci_nvgpu_regops::mmap() for GPU memory:
this is called with mm::mmap_sem held already and it locks
&mem_list_mutex.
So one can craft a userspace program to do special ioctl and mmap in
2 threads concurrently and cause a deadlock which lockdep warns about
(below).
We did not hit this yet because QEMU constructs the machine in a single
thread.
This moves the overlap check next to where the new entry is added and
reduces the amount of time spent with &mem_list_mutex held.
This moves locked_vm adjustment from under &mem_list_mutex.
This relies on mm_iommu_adjust_locked_vm() doing nothing when entries==0.
This is one of the lockdep warnings:
======================================================
WARNING: possible circular locking dependency detected
5.1.0-rc2-le_nv2_aikATfstn1-p1 #363 Not tainted
------------------------------------------------------
qemu-system-ppc/8038 is trying to acquire lock:
000000002ec6c453 (mem_list_mutex){+.+.}, at: mm_iommu_do_alloc+0x70/0x490
but task is already holding lock:
00000000fd7da97f (&mm->mmap_sem){++++}, at: vm_mmap_pgoff+0xf0/0x160
which lock already depends on the new lock.
the existing dependency chain (in reverse order) is:
-> #1 (&mm->mmap_sem){++++}:
lock_acquire+0xf8/0x260
down_write+0x44/0xa0
mm_iommu_adjust_locked_vm.part.1+0x4c/0x190
mm_iommu_do_alloc+0x310/0x490
tce_iommu_ioctl.part.9+0xb84/0x1150 [vfio_iommu_spapr_tce]
vfio_fops_unl_ioctl+0x94/0x430 [vfio]
do_vfs_ioctl+0xe4/0x930
ksys_ioctl+0xc4/0x110
sys_ioctl+0x28/0x80
system_call+0x5c/0x70
-> #0 (mem_list_mutex){+.+.}:
__lock_acquire+0x1484/0x1900
lock_acquire+0xf8/0x260
__mutex_lock+0x88/0xa70
mm_iommu_do_alloc+0x70/0x490
vfio_pci_nvgpu_mmap+0xc0/0x130 [vfio_pci]
vfio_pci_mmap+0x198/0x2a0 [vfio_pci]
vfio_device_fops_mmap+0x44/0x70 [vfio]
mmap_region+0x5d4/0x770
do_mmap+0x42c/0x650
vm_mmap_pgoff+0x124/0x160
ksys_mmap_pgoff+0xdc/0x2f0
sys_mmap+0x40/0x80
system_call+0x5c/0x70
other info that might help us debug this:
Possible unsafe locking scenario:
CPU0 CPU1
---- ----
lock(&mm->mmap_sem);
lock(mem_list_mutex);
lock(&mm->mmap_sem);
lock(mem_list_mutex);
*** DEADLOCK ***
1 lock held by qemu-system-ppc/8038:
#0: 00000000fd7da97f (&mm->mmap_sem){++++}, at: vm_mmap_pgoff+0xf0/0x160
Fixes: c10c21efa4bc ("powerpc/vfio/iommu/kvm: Do not pin device memory", 2018-12-19)
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|
|
The orig_ist[] array is a shadow copy of the IST array in the TSS. The
reason why it exists is that older kernels used two TSS variants with
different pointers into the debug stack. orig_ist[] contains the real
starting points.
There is no point anymore to do so because the same information can be
retrieved using the base address of the cpu entry area mapping and the
offsets of the various exception stacks.
No functional change. Preparation for removing orig_ist.
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Sean Christopherson <sean.j.christopherson@intel.com>
Cc: x86-ml <x86@kernel.org>
Link: https://lkml.kernel.org/r/20190414160144.974900463@linutronix.de
|
|
The orig_ist[] array is a shadow copy of the IST array in the TSS. The
reason why it exists is that older kernels used two TSS variants with
different pointers into the debug stack. orig_ist[] contains the real
starting points.
There is no point anymore to do so because the same information can be
retrieved using the base address of the cpu entry area mapping and the
offsets of the various exception stacks.
No functional change. Preparation for removing orig_ist.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Nicolai Stange <nstange@suse.de>
Cc: Sean Christopherson <sean.j.christopherson@intel.com>
Cc: x86-ml <x86@kernel.org>
Link: https://lkml.kernel.org/r/20190414160144.885741626@linutronix.de
|