Age | Commit message (Collapse) | Author |
|
It's not obvious that the HRTIMER_MODE variants are bit combinations,
because all modes are hard coded constants currently.
Change it so the bit meanings are clear; and use the symbols for creating
modes which combine bits.
While at it get rid of the ugly tail comments as well.
Signed-off-by: Anna-Maria Gleixner <anna-maria@linutronix.de>
Cc: Christoph Hellwig <hch@lst.de>
Cc: John Stultz <john.stultz@linaro.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: keescook@chromium.org
Link: http://lkml.kernel.org/r/20171221104205.7269-8-anna-maria@linutronix.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
The POSIX specification defines that relative CLOCK_REALTIME timers are not
affected by clock modifications. Those timers have to use CLOCK_MONOTONIC
to ensure POSIX compliance.
The introduction of the additional HRTIMER_MODE_PINNED mode broke this
requirement for pinned timers.
There is no user space visible impact because user space timers are not
using pinned mode, but for consistency reasons this needs to be fixed.
Check whether the mode has the HRTIMER_MODE_REL bit set instead of
comparing with HRTIMER_MODE_ABS.
Signed-off-by: Anna-Maria Gleixner <anna-maria@linutronix.de>
Cc: Christoph Hellwig <hch@lst.de>
Cc: John Stultz <john.stultz@linaro.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: keescook@chromium.org
Fixes: 597d0275736d ("timers: Framework for identifying pinned timers")
Link: http://lkml.kernel.org/r/20171221104205.7269-7-anna-maria@linutronix.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
The hrtimer_start[_range_ns]() functions start a timer reliably on this CPU only
when HRTIMER_MODE_PINNED is set.
Furthermore the HRTIMER_MODE_PINNED mode is not considered when a hrtimer is initialized.
Signed-off-by: Anna-Maria Gleixner <anna-maria@linutronix.de>
Cc: Christoph Hellwig <hch@lst.de>
Cc: John Stultz <john.stultz@linaro.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: keescook@chromium.org
Link: http://lkml.kernel.org/r/20171221104205.7269-6-anna-maria@linutronix.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
schedule_hrtimeout_range_clock() uses an 'int clock' parameter for the
clock ID, instead of the customary predefined "clockid_t" type.
In hrtimer coding style the canonical variable name for the clock ID is
'clock_id', therefore change the name of the parameter here as well
to make it all consistent.
While at it, clean up the description for the 'clock_id' and 'mode'
function parameters. The clock modes and the clock IDs are not
restricted as the comment suggests.
Fix the mode description as well for the callers of schedule_hrtimeout_range_clock().
No functional changes intended.
Signed-off-by: Anna-Maria Gleixner <anna-maria@linutronix.de>
Cc: Christoph Hellwig <hch@lst.de>
Cc: John Stultz <john.stultz@linaro.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: keescook@chromium.org
Link: http://lkml.kernel.org/r/20171221104205.7269-5-anna-maria@linutronix.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
The '/**' sequence marks the start of a structure description. Add the
missing second asterisk. While at it adapt the ordering of the struct
members to the struct definition and document the purpose of
expires_next more precisely.
Signed-off-by: Anna-Maria Gleixner <anna-maria@linutronix.de>
Cc: Christoph Hellwig <hch@lst.de>
Cc: John Stultz <john.stultz@linaro.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: keescook@chromium.org
Link: http://lkml.kernel.org/r/20171221104205.7269-4-anna-maria@linutronix.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
The protection of a hrtimer which runs its callback against migration to a
different CPU has nothing to do with hard interrupt context.
The protection against migration of a hrtimer running the expiry callback
is the pointer in the cpu_base which holds a pointer to the currently
running timer. This pointer is evaluated in the code which potentially
switches the timer base and makes sure it's kept on the CPU on which the
callback is running.
Reported-by: Anna-Maria Gleixner <anna-maria@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Anna-Maria Gleixner <anna-maria@linutronix.de>
Reviewed-by: Frederic Weisbecker <frederic@kernel.org>
Cc: Christoph Hellwig <hch@lst.de>
Cc: John Stultz <john.stultz@linaro.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: keescook@chromium.org
Link: http://lkml.kernel.org/r/20171221104205.7269-3-anna-maria@linutronix.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
migration_enable/nohz_active
The hrtimer_cpu_base::migration_enable and ::nohz_active fields
were originally introduced to avoid accessing global variables
for these decisions.
Still that results in a (cache hot) load and conditional branch,
which can be avoided by using static keys.
Implement it with static keys and optimize for the most critical
case of high performance networking which tends to disable the
timer migration functionality.
No change in functionality.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Anna-Maria Gleixner <anna-maria@linutronix.de>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: John Stultz <john.stultz@linaro.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: keescook@chromium.org
Link: http://lkml.kernel.org/r/alpine.DEB.2.20.1801142327490.2371@nanos
Link: https://lkml.kernel.org/r/20171221104205.7269-2-anna-maria@linutronix.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
Currently the BSP microcode update code examines the initrd very early
in the boot process. If SME is active, the initrd is treated as being
encrypted but it has not been encrypted (in place) yet. Update the
early boot code that encrypts the kernel to also encrypt the initrd so
that early BSP microcode updates work.
Tested-by: Gabriel Craciunescu <nix.or.die@gmail.com>
Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Reviewed-by: Borislav Petkov <bp@suse.de>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brijesh Singh <brijesh.singh@amd.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20180110192634.6026.10452.stgit@tlendack-t1.amdoffice.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
In preparation for encrypting more than just the kernel, the encryption
support in sme_encrypt_kernel() needs to support 4KB page aligned
encryption instead of just 2MB large page aligned encryption.
Update the routines that populate the PGD to support non-2MB aligned
addresses. This is done by creating PTE page tables for the start
and end portion of the address range that fall outside of the 2MB
alignment. This results in, at most, two extra pages to hold the
PTE entries for each mapping of a range.
Tested-by: Gabriel Craciunescu <nix.or.die@gmail.com>
Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Reviewed-by: Borislav Petkov <bp@suse.de>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brijesh Singh <brijesh.singh@amd.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20180110192626.6026.75387.stgit@tlendack-t1.amdoffice.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
In preparation for encrypting more than just the kernel during early
boot processing, centralize the use of the PMD flag settings based
on the type of mapping desired. When 4KB aligned encryption is added,
this will allow either PTE flags or large page PMD flags to be used
without requiring the caller to adjust.
Tested-by: Gabriel Craciunescu <nix.or.die@gmail.com>
Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Reviewed-by: Borislav Petkov <bp@suse.de>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brijesh Singh <brijesh.singh@amd.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20180110192615.6026.14767.stgit@tlendack-t1.amdoffice.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
In preparation for follow-on patches, combine the PGD mapping parameters
into a struct to reduce the number of function arguments and allow for
direct updating of the next pagetable mapping area pointer.
Tested-by: Gabriel Craciunescu <nix.or.die@gmail.com>
Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Reviewed-by: Borislav Petkov <bp@suse.de>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brijesh Singh <brijesh.singh@amd.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20180110192605.6026.96206.stgit@tlendack-t1.amdoffice.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
Clean up the use of PUSH and POP and when registers are saved in the
__enc_copy() assembly function in order to improve the readability of the code.
Move parameter register saving into general purpose registers earlier
in the code and move all the pushes to the beginning of the function
with corresponding pops at the end.
We do this to prepare fixes.
Tested-by: Gabriel Craciunescu <nix.or.die@gmail.com>
Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Reviewed-by: Borislav Petkov <bp@suse.de>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brijesh Singh <brijesh.singh@amd.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20180110192556.6026.74187.stgit@tlendack-t1.amdoffice.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
If a nonexistent file is supplied to objtool, it complains with a
non-helpful error:
open: No such file or directory
Improve it to:
objtool: Can't open 'foo': No such file or directory
Reported-by: Markus <M4rkusXXL@web.de>
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/406a3d00a21225eee2819844048e17f68523ccf6.1516025651.git.jpoimboe@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
Objtool segfaults when the gold linker is used with
CONFIG_MODVERSIONS=y and CONFIG_UNWINDER_ORC=y.
With CONFIG_MODVERSIONS=y, the .o file gets passed to the linker before
being passed to objtool. The gold linker seems to strip unused ELF
symbols by default, which confuses objtool and causes the seg fault when
it's trying to generate ORC metadata.
Objtool should really be running immediately after GCC anyway, without a
linker call in between. Change the makefile ordering so that objtool is
called before the linker.
Reported-and-tested-by: Markus <M4rkusXXL@web.de>
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Fixes: ee9f8fce9964 ("x86/unwind: Add the ORC unwinder")
Link: http://lkml.kernel.org/r/355f04da33581f4a3bf82e5b512973624a1e23a2.1516025651.git.jpoimboe@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
The new unified copy_siginfo_from_user32 takes care of this.
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
|
|
The function copy_siginfo_from_user32 is used for two things, in ptrace
since the dawn of siginfo for arbirarily modifying a signal that
user space sees, and in sigqueueinfo to send a signal with arbirary
siginfo data.
Create a single copy of copy_siginfo_from_user32 that all architectures
share, and teach it to handle all of the cases in the siginfo union.
In the generic version of copy_siginfo_from_user32 ensure that all
of the fields in siginfo are initialized so that the siginfo structure
can be safely copied to userspace if necessary.
When copying the embedded sigval union copy the si_int member. That
ensures the 32bit values passes through the kernel unchanged.
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
|
|
Nothing tests this define so just remove it.
I suspect the intention was to make the uid field in siginfo 16bit
however I can't find any code that ever tested this defined, and
even if it did it the layout has been this way for 8 years so
changing it now would break the ABI with userspace.
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
|
|
Having si_codes in many different files simply encourages duplicate definitions
that can cause problems later. To avoid that merge the blackfin specific si_codes
into uapi/asm-generic/siginfo.h
Update copy_siginfo_to_user to copy with the absence of BUS_MCEERR_AR that blackfin
defines to be something else.
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
|
|
Having si_codes in many different files simply encourages duplicate definitions
that can cause problems later. To avoid that merge the tile specific si_codes
into uapi/asm-generic/siginfo.h
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
|
|
Having si_codes in many different files simply encourages duplicate definitions
that can cause problems later. To avoid that merce the frv specific si_codes
into uapi/asm-generic/siginfo.h
This allows the removal of arch/frv/uapi/include/asm/siginfo.h as the last
last meaningful definition it held was FPE_MDAOVF.
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
|
|
Having si_codes in many different files simply encourages duplicate
definitions that can cause problems later. To avoid that merge the
ia64 specific si_codes into uapi/asm-generic/siginfo.h
Update the sanity checks in arch/x86/kernel/signal_compat.c to expect
the now lager NSIGILL and NSIGFPE. As nothing excpe the larger count
is exposed on x86 no additional code needs to be updated.
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
|
|
NSIGTRAP is 4 in the generic siginfo and powerpc just undefines
NSGTRAP and redefine it as 4. That accomplishes nothing so remove
the duplication.
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
|
|
The addr_lsb fields is only valid and available when the
signal is SIGBUS and the si_code is BUS_MCEERR_AR or BUS_MCEERR_AO.
Document this with a comment and place the field in the _sigfault union
to make this clear.
All of the fields stay in the same physical location so both the old
and new definitions of struct siginfo will continue to work.
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
|
|
--EWB Added #ifdef CONFIG_X86_X32_ABI to arch/x86/kernel/signal_compat.c
Changed #ifdef CONFIG_X86_X32 to #ifdef CONFIG_X86_X32_ABI in
linux/compat.h
CONFIG_X86_X32 is set when the user requests X32 support.
CONFIG_X86_X32_ABI is set when the user requests X32 support
and the tool-chain has X32 allowing X32 support to be built.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
|
|
gcc-8 reports
drivers/infiniband/core/cma_configfs.c: In function 'make_cma_dev':
./include/linux/string.h:245:9: warning: '__builtin_strncpy' specified
bound 64 equals destination size [-Wstringop-truncation]
We need to use strlcpy() to make sure the string is nul-terminated.
Signed-off-by: Xiongfeng Wang <xiongfeng.wang@linaro.org>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
|
|
iWARP does not use rdma_ah_attr_type, and for this reason we do not have a
RDMA_AH_ATTR_TYPE_IWARP. rdma_ah_find_type should not even be called on iwarp
ports and for clarity it shouldn't have a special test for iWarp.
This changes the result from RDMA_AH_ATTR_TYPE_ROCE to RDMA_AH_ATTR_TYPE_IB
when wrongly called on an iWarp port.
Fixes: 44c58487d51a ("IB/core: Define 'ib' and 'roce' rdma_ah_attr types")
Signed-off-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
|
|
The change of slid from u16 to u32 results in sizeof(struct ib_wc)
cross 64B boundary, which causes more cache misses. This patch
rearranges the fields and remain the size to 64B.
Pahole output before this change:
struct ib_wc {
union {
u64 wr_id; /* 8 */
struct ib_cqe * wr_cqe; /* 8 */
}; /* 0 8 */
enum ib_wc_status status; /* 8 4 */
enum ib_wc_opcode opcode; /* 12 4 */
u32 vendor_err; /* 16 4 */
u32 byte_len; /* 20 4 */
struct ib_qp * qp; /* 24 8 */
union {
__be32 imm_data; /* 4 */
u32 invalidate_rkey; /* 4 */
} ex; /* 32 4 */
u32 src_qp; /* 36 4 */
int wc_flags; /* 40 4 */
u16 pkey_index; /* 44 2 */
/* XXX 2 bytes hole, try to pack */
u32 slid; /* 48 4 */
u8 sl; /* 52 1 */
u8 dlid_path_bits; /* 53 1 */
u8 port_num; /* 54 1 */
u8 smac[6]; /* 55 6 */
/* XXX 1 byte hole, try to pack */
u16 vlan_id; /* 62 2 */
/* --- cacheline 1 boundary (64 bytes) --- */
u8 network_hdr_type; /* 64 1 */
/* size: 72, cachelines: 2, members: 17 */
/* sum members: 62, holes: 2, sum holes: 3 */
/* padding: 7 */
/* last cacheline: 8 bytes */
};
Pahole output after this change:
struct ib_wc {
union {
u64 wr_id; /* 8 */
struct ib_cqe * wr_cqe; /* 8 */
}; /* 0 8 */
enum ib_wc_status status; /* 8 4 */
enum ib_wc_opcode opcode; /* 12 4 */
u32 vendor_err; /* 16 4 */
u32 byte_len; /* 20 4 */
struct ib_qp * qp; /* 24 8 */
union {
__be32 imm_data; /* 4 */
u32 invalidate_rkey; /* 4 */
} ex; /* 32 4 */
u32 src_qp; /* 36 4 */
u32 slid; /* 40 4 */
int wc_flags; /* 44 4 */
u16 pkey_index; /* 48 2 */
u8 sl; /* 50 1 */
u8 dlid_path_bits; /* 51 1 */
u8 port_num; /* 52 1 */
u8 smac[6]; /* 53 6 */
/* XXX 1 byte hole, try to pack */
u16 vlan_id; /* 60 2 */
u8 network_hdr_type; /* 62 1 */
/* size: 64, cachelines: 1, members: 17 */
/* sum members: 62, holes: 1, sum holes: 1 */
/* padding: 1 */
};
Cc: <stable@vger.kernel.org> # v4.13
Fixes: 7db20ecd1d97 ("IB/core: Change wc.slid from 16 to 32 bits")
Signed-off-by: Bodong Wang <bodong@mellanox.com>
Reviewed-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
|
|
Allocating steerable UD QPs depends on having at least one IB port,
while releasing those QPs does not.
As a result, when there are only ETH ports, the IB (RoCE) driver
requests releasing a qp range whose base qp is zero, with
qp count zero.
When SR-IOV is enabled, and the VF driver is running on a VM over
a hypervisor which treats such qp release calls as errors
(rather than NOPs), we see lines in the VM message log like:
mlx4_core 0002:00:02.0: Failed to release qp range base:0 cnt:0
Fix this by adding a check for a zero count in mlx4_release_qp_range()
(which thus treats releasing 0 qps as a nop), and eliminating the
check for device managed flow steering when releasing steerable UD QPs.
(Freeing ib_uc_qpns_bitmap unconditionally is also OK, since it
remains NULL when steerable UD QPs are not allocated).
Cc: <stable@vger.kernel.org>
Fixes: 4196670be786 ("IB/mlx4: Don't allocate range of steerable UD QPs for Ethernet-only device")
Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
|
|
The double swap matches what user space rdma-core does to imm_data.
wc->imm_data is not used in the kernel so this change has no practical
impact.
Acked-by: Michal Kalderon <michal.kalderon@cavium.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
|
|
This matches the changes made recently to the userspace hns
driver when it was made sparse clean.
See rdma-core commit bffd380cfe56 ("libhns: Make the provider sparse
clean")
wc->imm_data is not used in the kernel so this change has no practical
impact.
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
|
|
This matches what the userspace copy of this header has been doing
for a while. imm_data is an opaque 4 byte array carried over the network,
and invalidate_rkey is in CPU byte order.
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
|
|
Resolving DMAC for RoCE is applicable to only Connected mode QPs.
So resolve DMAC for only for Connected mode QPs.
Signed-off-by: Parav Pandit <parav@mellanox.com>
Reviewed-by: Daniel Jurgens <danielj@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
|
|
Instead of returning 0 (success) for RoCE scenarios where DMAC should
not be resolved, avoid such attempt and make code consistent with
ib_create_user_ah().
Signed-off-by: Parav Pandit <parav@mellanox.com>
Reviewed-by: Daniel Jurgens <danielj@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
|
|
Currently ah_attr is initialized by the ib_cm layer for rdma_cm
based applications. For RoCE transport ah_attr.roce.dmac is already
initialized by ib_cm, rdma_cm either from wc, path record, route
resolve, explicit path record setting depending on active or passive
side QP. Therefore avoid resolving DMAC for QP of kernel consumers.
Signed-off-by: Parav Pandit <parav@mellanox.com>
Reviewed-by: Daniel Jurgens <danielj@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
|
|
Currently qp->port stores the port number whenever IB_QP_PORT
QP attribute mask is set (during QP state transition to INIT state).
This port number should be stored for the real QP when XRC target QP
is used.
Follow the ib_modify_qp() implementation and hide the access to ->real_qp.
Fixes: a512c2fbef9c ("IB/core: Introduce modify QP operation with udata")
Signed-off-by: Parav Pandit <parav@mellanox.com>
Reviewed-by: Daniel Jurgens <danielj@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
|
|
In order to provide data consistency with PPL for disks with write-back
cache enabled all data has to be flushed to disks before next PPL
entry. The disks to be flushed are marked in the bitmap. It's modified
under a mutex and it's only read after PPL io unit is submitted.
A limitation of 64 disks in the array has been introduced to keep data
structures and implementation simple. RAID5 arrays with so many disks are
not likely due to high risk of multiple disks failure. Such restriction
should not be a real life limitation.
With write-back cache disabled next PPL entry is submitted when data write
for current one completes. Data flush defers next log submission so trigger
it when there are no stripes for handling found.
As PPL assures all data is flushed to disk at request completion, just
acknowledge flush request when PPL is enabled.
Signed-off-by: Tomasz Majchrzak <tomasz.majchrzak@intel.com>
Signed-off-by: Shaohua Li <sh.li@alibaba-inc.com>
|
|
The rdma_ah_find_type() accesses the port array based on an index
controlled by userspace. The existing bounds check is after the first use
of the index, so userspace can generate an out of bounds access, as shown
by the KASN report below.
==================================================================
BUG: KASAN: slab-out-of-bounds in to_rdma_ah_attr+0xa8/0x3b0
Read of size 4 at addr ffff880019ae2268 by task ibv_rc_pingpong/409
CPU: 0 PID: 409 Comm: ibv_rc_pingpong Not tainted 4.15.0-rc2-00031-gb60a3faf5b83-dirty #3
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.7.5-0-ge51488c-20140602_164612-nilsson.home.kraxel.org 04/01/2014
Call Trace:
dump_stack+0xe9/0x18f
print_address_description+0xa2/0x350
kasan_report+0x3a5/0x400
to_rdma_ah_attr+0xa8/0x3b0
mlx5_ib_query_qp+0xd35/0x1330
ib_query_qp+0x8a/0xb0
ib_uverbs_query_qp+0x237/0x7f0
ib_uverbs_write+0x617/0xd80
__vfs_write+0xf7/0x500
vfs_write+0x149/0x310
SyS_write+0xca/0x190
entry_SYSCALL_64_fastpath+0x18/0x85
RIP: 0033:0x7fe9c7a275a0
RSP: 002b:00007ffee5498738 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
RAX: ffffffffffffffda RBX: 00007fe9c7ce4b00 RCX: 00007fe9c7a275a0
RDX: 0000000000000018 RSI: 00007ffee5498800 RDI: 0000000000000003
RBP: 000055d0c8d3f010 R08: 00007ffee5498800 R09: 0000000000000018
R10: 00000000000000ba R11: 0000000000000246 R12: 0000000000008000
R13: 0000000000004fb0 R14: 000055d0c8d3f050 R15: 00007ffee5498560
Allocated by task 1:
__kmalloc+0x3f9/0x430
alloc_mad_private+0x25/0x50
ib_mad_post_receive_mads+0x204/0xa60
ib_mad_init_device+0xa59/0x1020
ib_register_device+0x83a/0xbc0
mlx5_ib_add+0x50e/0x5c0
mlx5_add_device+0x142/0x410
mlx5_register_interface+0x18f/0x210
mlx5_ib_init+0x56/0x63
do_one_initcall+0x15b/0x270
kernel_init_freeable+0x2d8/0x3d0
kernel_init+0x14/0x190
ret_from_fork+0x24/0x30
Freed by task 0:
(stack is not available)
The buggy address belongs to the object at ffff880019ae2000
which belongs to the cache kmalloc-512 of size 512
The buggy address is located 104 bytes to the right of
512-byte region [ffff880019ae2000, ffff880019ae2200)
The buggy address belongs to the page:
page:000000005d674e18 count:1 mapcount:0 mapping: (null) index:0x0 compound_mapcount: 0
flags: 0x4000000000008100(slab|head)
raw: 4000000000008100 0000000000000000 0000000000000000 00000001000c000c
raw: dead000000000100 dead000000000200 ffff88001a402000 0000000000000000
page dumped because: kasan: bad access detected
Memory state around the buggy address:
ffff880019ae2100: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
ffff880019ae2180: 00 00 00 00 00 00 00 00 00 00 00 00 00 fc fc fc
>ffff880019ae2200: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
^
ffff880019ae2280: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
ffff880019ae2300: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
==================================================================
Disabling lock debugging due to kernel taint
Cc: <stable@vger.kernel.org>
Fixes: 44c58487d51a ("IB/core: Define 'ib' and 'roce' rdma_ah_attr types")
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
|
|
NL_SET_ERR_MSG() and NL_SET_ERR_MSG_ATTR() lead to the following warning
in newer versions of gcc:
warning: array initialized from parenthesized string constant
Just remove the parentheses, they're not needed in this context since
anyway since there can be no operator precendence issues or similar.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
'ipv4-Make-neigh-lookup-keys-for-loopback-point-to-point-devices-be-INADDR_ANY'
Jim Westfall says:
====================
ipv4: Make neigh lookup keys for loopback/point-to-point devices be INADDR_ANY
This used to be the previous behavior in older kernels but became broken in
a263b3093641f (ipv4: Make neigh lookups directly in output packet path)
and then later removed because it was broken in 0bb4087cbec0 (ipv4: Fix neigh
lookup keying over loopback/point-to-point devices)
Not having this results in there being an arp entry for every remote ip
address that the device talks to. Given a fairly active device it can
cause the arp table to become huge and/or having to add/purge large number
of entires to keep within table size thresholds.
$ ip -4 neigh show nud noarp | grep tun | wc -l
55850
$ lnstat -k arp_cache:entries,arp_cache:allocs,arp_cache:destroys -c 10
arp_cach|arp_cach|arp_cach|
entries| allocs|destroys|
81493|620166816|620126069|
101867| 10186| 0|
113854| 5993| 0|
118773| 2459| 0|
27937| 18579| 63998|
39256| 5659| 0|
56231| 8487| 0|
65602| 4685| 0|
79697| 7047| 0|
90733| 5517| 0|
v2:
- fixes coding style issues
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Map all lookup neigh keys to INADDR_ANY for loopback/point-to-point devices
to avoid making an entry for every remote ip the device needs to talk to.
This used the be the old behavior but became broken in a263b3093641f
(ipv4: Make neigh lookups directly in output packet path) and later removed
in 0bb4087cbec0 (ipv4: Fix neigh lookup keying over loopback/point-to-point
devices) because it was broken.
Signed-off-by: Jim Westfall <jwestfall@surrealistic.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Use n->primary_key instead of pkey to account for the possibility that a neigh
constructor function may have modified the primary_key value.
Signed-off-by: Jim Westfall <jwestfall@surrealistic.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
ARSTR is always located at the start of the TSU register region, thus
using add_reg() instead of add_tsu_reg() in __sh_eth_get_regs() to dump it
causes EDMR or EDSR (depending on the register layout) to be dumped instead
of ARSTR. Use the correct condition/macro there...
Fixes: 6b4b4fead342 ("sh_eth: Implement ethtool register dump operations")
Signed-off-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
This reverts commit ceaa001a170e43608854d5290a48064f57b565ed.
The OVS_TUNNEL_KEY_ATTR_ERSPAN_OPTS attr should be designed
as a nested attribute to support all ERSPAN v1 and v2's fields.
The current attr is a be32 supporting only one field. Thus, this
patch reverts it and later patch will redo it using nested attr.
Signed-off-by: William Tu <u9012063@gmail.com>
Cc: Jiri Benc <jbenc@redhat.com>
Cc: Pravin Shelar <pshelar@ovn.org>
Acked-by: Jiri Benc <jbenc@redhat.com>
Acked-by: Pravin B Shelar <pshelar@ovn.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
sendfile() calls can hang endless with using Kernel TLS if a socket error occurs.
Socket error codes must be inverted by Kernel TLS before returning because
they are stored with positive sign. If returned non-inverted they are
interpreted as number of bytes sent, causing endless looping of the
splice mechanic behind sendfile().
Signed-off-by: Robert Hering <r.hering@avm.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
In my last patch, I missed fact that cork.base.dst was not initialized
in ip6_make_skb() :
If ip6_setup_cork() returns an error, we might attempt a dst_release()
on some random pointer.
Fixes: 862c03ee1deb ("ipv6: fix possible mem leaks in ipv6_make_skb()")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The commit ffcd28d88e4f ("ALSA: hda - Select INPUT for Realtek
HD-audio codec") introduced the reverse-selection of CONFIG_INPUT for
Realtek codec in order to avoid the mess with dependency between
built-in and modules. Later on, we obtained IS_REACHABLE() macro
exactly for this kind of problems, and now we can remove th INPUT
selection in Kconfig and put IS_REACHABLE(INPUT) to the appropriate
places in the code, so that the driver doesn't need to select other
subsystem forcibly.
Fixes: ffcd28d88e4f ("ALSA: hda - Select INPUT for Realtek HD-audio codec")
Reported-by: Randy Dunlap <rdunlap@infradead.org>
Acked-by: Randy Dunlap <rdunlap@infradead.org> # and build-tested
Signed-off-by: Takashi Iwai <tiwai@suse.de>
|
|
I regularly get 50 MB - 60 MB files during kernel randconfig builds.
These large files mostly contain (many repeats of; e.g., 124,594):
In file included from ../include/linux/string.h:6:0,
from ../include/linux/uuid.h:20,
from ../include/linux/mod_devicetable.h:13,
from ../scripts/mod/devicetable-offsets.c:3:
../include/linux/compiler.h:64:4: warning: '______f' is static but declared in inline function 'strcpy' which is not static [enabled by default]
______f = { \
^
../include/linux/compiler.h:56:23: note: in expansion of macro '__trace_if'
^
../include/linux/string.h:425:2: note: in expansion of macro 'if'
if (p_size == (size_t)-1 && q_size == (size_t)-1)
^
This only happens when CONFIG_FORTIFY_SOURCE=y and
CONFIG_PROFILE_ALL_BRANCHES=y, so prevent PROFILE_ALL_BRANCHES if
FORTIFY_SOURCE=y.
Link: http://lkml.kernel.org/r/9199446b-a141-c0c3-9678-a3f9107f2750@infradead.org
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
|
|
These fall-through are expected.
Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Reviewed-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Commit 3765d35ed8b9 ("net: ipv4: Convert inet_rtm_getroute to rcu
versions of route lookup") broke "ip route get" in the presence
of rules that specify iif lo.
Host-originated traffic always has iif lo, because
ip_route_output_key_hash and ip6_route_output_flags set the flow
iif to LOOPBACK_IFINDEX. Thus, putting "iif lo" in an ip rule is a
convenient way to select only originated traffic and not forwarded
traffic.
inet_rtm_getroute used to match these rules correctly because
even though it sets the flow iif to 0, it called
ip_route_output_key which overwrites iif with LOOPBACK_IFINDEX.
But now that it calls ip_route_output_key_hash_rcu, the ifindex
will remain 0 and not match the iif lo in the rule. As a result,
"ip route get" will return ENETUNREACH.
Fixes: 3765d35ed8b9 ("net: ipv4: Convert inet_rtm_getroute to rcu versions of route lookup")
Tested: https://android.googlesource.com/kernel/tests/+/master/net/test/multinetwork_test.py passes again
Signed-off-by: Lorenzo Colitti <lorenzo@google.com>
Acked-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|