summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2018-01-16hrtimer: Clean up 'enum hrtimer_mode'Anna-Maria Gleixner
It's not obvious that the HRTIMER_MODE variants are bit combinations, because all modes are hard coded constants currently. Change it so the bit meanings are clear; and use the symbols for creating modes which combine bits. While at it get rid of the ugly tail comments as well. Signed-off-by: Anna-Maria Gleixner <anna-maria@linutronix.de> Cc: Christoph Hellwig <hch@lst.de> Cc: John Stultz <john.stultz@linaro.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: keescook@chromium.org Link: http://lkml.kernel.org/r/20171221104205.7269-8-anna-maria@linutronix.de Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-01-16hrtimer: Ensure POSIX compliance (relative CLOCK_REALTIME hrtimers)Anna-Maria Gleixner
The POSIX specification defines that relative CLOCK_REALTIME timers are not affected by clock modifications. Those timers have to use CLOCK_MONOTONIC to ensure POSIX compliance. The introduction of the additional HRTIMER_MODE_PINNED mode broke this requirement for pinned timers. There is no user space visible impact because user space timers are not using pinned mode, but for consistency reasons this needs to be fixed. Check whether the mode has the HRTIMER_MODE_REL bit set instead of comparing with HRTIMER_MODE_ABS. Signed-off-by: Anna-Maria Gleixner <anna-maria@linutronix.de> Cc: Christoph Hellwig <hch@lst.de> Cc: John Stultz <john.stultz@linaro.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: keescook@chromium.org Fixes: 597d0275736d ("timers: Framework for identifying pinned timers") Link: http://lkml.kernel.org/r/20171221104205.7269-7-anna-maria@linutronix.de Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-01-16hrtimer: Fix hrtimer_start[_range_ns]() function descriptionsAnna-Maria Gleixner
The hrtimer_start[_range_ns]() functions start a timer reliably on this CPU only when HRTIMER_MODE_PINNED is set. Furthermore the HRTIMER_MODE_PINNED mode is not considered when a hrtimer is initialized. Signed-off-by: Anna-Maria Gleixner <anna-maria@linutronix.de> Cc: Christoph Hellwig <hch@lst.de> Cc: John Stultz <john.stultz@linaro.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: keescook@chromium.org Link: http://lkml.kernel.org/r/20171221104205.7269-6-anna-maria@linutronix.de Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-01-16hrtimer: Clean up the 'int clock' parameter of schedule_hrtimeout_range_clock()Anna-Maria Gleixner
schedule_hrtimeout_range_clock() uses an 'int clock' parameter for the clock ID, instead of the customary predefined "clockid_t" type. In hrtimer coding style the canonical variable name for the clock ID is 'clock_id', therefore change the name of the parameter here as well to make it all consistent. While at it, clean up the description for the 'clock_id' and 'mode' function parameters. The clock modes and the clock IDs are not restricted as the comment suggests. Fix the mode description as well for the callers of schedule_hrtimeout_range_clock(). No functional changes intended. Signed-off-by: Anna-Maria Gleixner <anna-maria@linutronix.de> Cc: Christoph Hellwig <hch@lst.de> Cc: John Stultz <john.stultz@linaro.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: keescook@chromium.org Link: http://lkml.kernel.org/r/20171221104205.7269-5-anna-maria@linutronix.de Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-01-16hrtimer: Fix kerneldoc syntax for 'struct hrtimer_cpu_base'Anna-Maria Gleixner
The '/**' sequence marks the start of a structure description. Add the missing second asterisk. While at it adapt the ordering of the struct members to the struct definition and document the purpose of expires_next more precisely. Signed-off-by: Anna-Maria Gleixner <anna-maria@linutronix.de> Cc: Christoph Hellwig <hch@lst.de> Cc: John Stultz <john.stultz@linaro.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: keescook@chromium.org Link: http://lkml.kernel.org/r/20171221104205.7269-4-anna-maria@linutronix.de Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-01-16hrtimer: Correct blatantly incorrect commentThomas Gleixner
The protection of a hrtimer which runs its callback against migration to a different CPU has nothing to do with hard interrupt context. The protection against migration of a hrtimer running the expiry callback is the pointer in the cpu_base which holds a pointer to the currently running timer. This pointer is evaluated in the code which potentially switches the timer base and makes sure it's kept on the CPU on which the callback is running. Reported-by: Anna-Maria Gleixner <anna-maria@linutronix.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Anna-Maria Gleixner <anna-maria@linutronix.de> Reviewed-by: Frederic Weisbecker <frederic@kernel.org> Cc: Christoph Hellwig <hch@lst.de> Cc: John Stultz <john.stultz@linaro.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: keescook@chromium.org Link: http://lkml.kernel.org/r/20171221104205.7269-3-anna-maria@linutronix.de Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-01-16hrtimer: Optimize the hrtimer code by using static keys for ↵Thomas Gleixner
migration_enable/nohz_active The hrtimer_cpu_base::migration_enable and ::nohz_active fields were originally introduced to avoid accessing global variables for these decisions. Still that results in a (cache hot) load and conditional branch, which can be avoided by using static keys. Implement it with static keys and optimize for the most critical case of high performance networking which tends to disable the timer migration functionality. No change in functionality. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Anna-Maria Gleixner <anna-maria@linutronix.de> Cc: Christoph Hellwig <hch@lst.de> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: John Stultz <john.stultz@linaro.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Cc: keescook@chromium.org Link: http://lkml.kernel.org/r/alpine.DEB.2.20.1801142327490.2371@nanos Link: https://lkml.kernel.org/r/20171221104205.7269-2-anna-maria@linutronix.de Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-01-16Merge branch 'timers/urgent' into timers/core, to pick up dependent fixIngo Molnar
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-01-16x86/mm: Encrypt the initrd earlier for BSP microcode updateTom Lendacky
Currently the BSP microcode update code examines the initrd very early in the boot process. If SME is active, the initrd is treated as being encrypted but it has not been encrypted (in place) yet. Update the early boot code that encrypts the kernel to also encrypt the initrd so that early BSP microcode updates work. Tested-by: Gabriel Craciunescu <nix.or.die@gmail.com> Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com> Reviewed-by: Borislav Petkov <bp@suse.de> Cc: Borislav Petkov <bp@alien8.de> Cc: Brijesh Singh <brijesh.singh@amd.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/20180110192634.6026.10452.stgit@tlendack-t1.amdoffice.net Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-01-16x86/mm: Prepare sme_encrypt_kernel() for PAGE aligned encryptionTom Lendacky
In preparation for encrypting more than just the kernel, the encryption support in sme_encrypt_kernel() needs to support 4KB page aligned encryption instead of just 2MB large page aligned encryption. Update the routines that populate the PGD to support non-2MB aligned addresses. This is done by creating PTE page tables for the start and end portion of the address range that fall outside of the 2MB alignment. This results in, at most, two extra pages to hold the PTE entries for each mapping of a range. Tested-by: Gabriel Craciunescu <nix.or.die@gmail.com> Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com> Reviewed-by: Borislav Petkov <bp@suse.de> Cc: Borislav Petkov <bp@alien8.de> Cc: Brijesh Singh <brijesh.singh@amd.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/20180110192626.6026.75387.stgit@tlendack-t1.amdoffice.net Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-01-16x86/mm: Centralize PMD flags in sme_encrypt_kernel()Tom Lendacky
In preparation for encrypting more than just the kernel during early boot processing, centralize the use of the PMD flag settings based on the type of mapping desired. When 4KB aligned encryption is added, this will allow either PTE flags or large page PMD flags to be used without requiring the caller to adjust. Tested-by: Gabriel Craciunescu <nix.or.die@gmail.com> Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com> Reviewed-by: Borislav Petkov <bp@suse.de> Cc: Borislav Petkov <bp@alien8.de> Cc: Brijesh Singh <brijesh.singh@amd.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/20180110192615.6026.14767.stgit@tlendack-t1.amdoffice.net Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-01-16x86/mm: Use a struct to reduce parameters for SME PGD mappingTom Lendacky
In preparation for follow-on patches, combine the PGD mapping parameters into a struct to reduce the number of function arguments and allow for direct updating of the next pagetable mapping area pointer. Tested-by: Gabriel Craciunescu <nix.or.die@gmail.com> Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com> Reviewed-by: Borislav Petkov <bp@suse.de> Cc: Borislav Petkov <bp@alien8.de> Cc: Brijesh Singh <brijesh.singh@amd.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/20180110192605.6026.96206.stgit@tlendack-t1.amdoffice.net Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-01-16x86/mm: Clean up register saving in the __enc_copy() assembly codeTom Lendacky
Clean up the use of PUSH and POP and when registers are saved in the __enc_copy() assembly function in order to improve the readability of the code. Move parameter register saving into general purpose registers earlier in the code and move all the pushes to the beginning of the function with corresponding pops at the end. We do this to prepare fixes. Tested-by: Gabriel Craciunescu <nix.or.die@gmail.com> Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com> Reviewed-by: Borislav Petkov <bp@suse.de> Cc: Borislav Petkov <bp@alien8.de> Cc: Brijesh Singh <brijesh.singh@amd.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/20180110192556.6026.74187.stgit@tlendack-t1.amdoffice.net Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-01-16objtool: Improve error message for bad file argumentJosh Poimboeuf
If a nonexistent file is supplied to objtool, it complains with a non-helpful error: open: No such file or directory Improve it to: objtool: Can't open 'foo': No such file or directory Reported-by: Markus <M4rkusXXL@web.de> Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/406a3d00a21225eee2819844048e17f68523ccf6.1516025651.git.jpoimboe@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-01-16objtool: Fix seg fault with gold linkerJosh Poimboeuf
Objtool segfaults when the gold linker is used with CONFIG_MODVERSIONS=y and CONFIG_UNWINDER_ORC=y. With CONFIG_MODVERSIONS=y, the .o file gets passed to the linker before being passed to objtool. The gold linker seems to strip unused ELF symbols by default, which confuses objtool and causes the seg fault when it's trying to generate ORC metadata. Objtool should really be running immediately after GCC anyway, without a linker call in between. Change the makefile ordering so that objtool is called before the linker. Reported-and-tested-by: Markus <M4rkusXXL@web.de> Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Fixes: ee9f8fce9964 ("x86/unwind: Add the ORC unwinder") Link: http://lkml.kernel.org/r/355f04da33581f4a3bf82e5b512973624a1e23a2.1516025651.git.jpoimboe@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-01-15signal: Remove the code to clear siginfo before calling copy_siginfo_from_user32Eric W. Biederman
The new unified copy_siginfo_from_user32 takes care of this. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2018-01-15signal: Unify and correct copy_siginfo_from_user32Eric W. Biederman
The function copy_siginfo_from_user32 is used for two things, in ptrace since the dawn of siginfo for arbirarily modifying a signal that user space sees, and in sigqueueinfo to send a signal with arbirary siginfo data. Create a single copy of copy_siginfo_from_user32 that all architectures share, and teach it to handle all of the cases in the siginfo union. In the generic version of copy_siginfo_from_user32 ensure that all of the fields in siginfo are initialized so that the siginfo structure can be safely copied to userspace if necessary. When copying the embedded sigval union copy the si_int member. That ensures the 32bit values passes through the kernel unchanged. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2018-01-15signal/blackfin: Remove pointless UID16_SIGINFO_COMPAT_NEEDEDEric W. Biederman
Nothing tests this define so just remove it. I suspect the intention was to make the uid field in siginfo 16bit however I can't find any code that ever tested this defined, and even if it did it the layout has been this way for 8 years so changing it now would break the ABI with userspace. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2018-01-15signal/blackfin: Move the blackfin specific si_codes to asm-generic/siginfo.hEric W. Biederman
Having si_codes in many different files simply encourages duplicate definitions that can cause problems later. To avoid that merge the blackfin specific si_codes into uapi/asm-generic/siginfo.h Update copy_siginfo_to_user to copy with the absence of BUS_MCEERR_AR that blackfin defines to be something else. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2018-01-15signal/tile: Move the tile specific si_codes to asm-generic/siginfo.hEric W. Biederman
Having si_codes in many different files simply encourages duplicate definitions that can cause problems later. To avoid that merge the tile specific si_codes into uapi/asm-generic/siginfo.h Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2018-01-15signal/frv: Move the frv specific si_codes to asm-generic/siginfo.hEric W. Biederman
Having si_codes in many different files simply encourages duplicate definitions that can cause problems later. To avoid that merce the frv specific si_codes into uapi/asm-generic/siginfo.h This allows the removal of arch/frv/uapi/include/asm/siginfo.h as the last last meaningful definition it held was FPE_MDAOVF. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2018-01-15signal/ia64: Move the ia64 specific si_codes to asm-generic/siginfo.hEric W. Biederman
Having si_codes in many different files simply encourages duplicate definitions that can cause problems later. To avoid that merge the ia64 specific si_codes into uapi/asm-generic/siginfo.h Update the sanity checks in arch/x86/kernel/signal_compat.c to expect the now lager NSIGILL and NSIGFPE. As nothing excpe the larger count is exposed on x86 no additional code needs to be updated. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2018-01-15signal/powerpc: Remove redefinition of NSIGTRAP on powerpcEric W. Biederman
NSIGTRAP is 4 in the generic siginfo and powerpc just undefines NSGTRAP and redefine it as 4. That accomplishes nothing so remove the duplication. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2018-01-15signal: Move addr_lsb into the _sigfault union for clarityEric W. Biederman
The addr_lsb fields is only valid and available when the signal is SIGBUS and the si_code is BUS_MCEERR_AR or BUS_MCEERR_AO. Document this with a comment and place the field in the _sigfault union to make this clear. All of the fields stay in the same physical location so both the old and new definitions of struct siginfo will continue to work. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2018-01-15signal: unify compat_siginfo_tAl Viro
--EWB Added #ifdef CONFIG_X86_X32_ABI to arch/x86/kernel/signal_compat.c Changed #ifdef CONFIG_X86_X32 to #ifdef CONFIG_X86_X32_ABI in linux/compat.h CONFIG_X86_X32 is set when the user requests X32 support. CONFIG_X86_X32_ABI is set when the user requests X32 support and the tool-chain has X32 allowing X32 support to be built. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
2018-01-15IB/cma: use strlcpy() instead of strncpy()Xiongfeng Wang
gcc-8 reports drivers/infiniband/core/cma_configfs.c: In function 'make_cma_dev': ./include/linux/string.h:245:9: warning: '__builtin_strncpy' specified bound 64 equals destination size [-Wstringop-truncation] We need to use strlcpy() to make sure the string is nul-terminated. Signed-off-by: Xiongfeng Wang <xiongfeng.wang@linaro.org> Reviewed-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-01-15RDMA/core: Clarify rdma_ah_find_typeParav Pandit
iWARP does not use rdma_ah_attr_type, and for this reason we do not have a RDMA_AH_ATTR_TYPE_IWARP. rdma_ah_find_type should not even be called on iwarp ports and for clarity it shouldn't have a special test for iWarp. This changes the result from RDMA_AH_ATTR_TYPE_ROCE to RDMA_AH_ATTR_TYPE_IB when wrongly called on an iWarp port. Fixes: 44c58487d51a ("IB/core: Define 'ib' and 'roce' rdma_ah_attr types") Signed-off-by: Parav Pandit <parav@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-01-15IB/core: Fix ib_wc structure size to remain in 64 bytes boundaryBodong Wang
The change of slid from u16 to u32 results in sizeof(struct ib_wc) cross 64B boundary, which causes more cache misses. This patch rearranges the fields and remain the size to 64B. Pahole output before this change: struct ib_wc { union { u64 wr_id; /* 8 */ struct ib_cqe * wr_cqe; /* 8 */ }; /* 0 8 */ enum ib_wc_status status; /* 8 4 */ enum ib_wc_opcode opcode; /* 12 4 */ u32 vendor_err; /* 16 4 */ u32 byte_len; /* 20 4 */ struct ib_qp * qp; /* 24 8 */ union { __be32 imm_data; /* 4 */ u32 invalidate_rkey; /* 4 */ } ex; /* 32 4 */ u32 src_qp; /* 36 4 */ int wc_flags; /* 40 4 */ u16 pkey_index; /* 44 2 */ /* XXX 2 bytes hole, try to pack */ u32 slid; /* 48 4 */ u8 sl; /* 52 1 */ u8 dlid_path_bits; /* 53 1 */ u8 port_num; /* 54 1 */ u8 smac[6]; /* 55 6 */ /* XXX 1 byte hole, try to pack */ u16 vlan_id; /* 62 2 */ /* --- cacheline 1 boundary (64 bytes) --- */ u8 network_hdr_type; /* 64 1 */ /* size: 72, cachelines: 2, members: 17 */ /* sum members: 62, holes: 2, sum holes: 3 */ /* padding: 7 */ /* last cacheline: 8 bytes */ }; Pahole output after this change: struct ib_wc { union { u64 wr_id; /* 8 */ struct ib_cqe * wr_cqe; /* 8 */ }; /* 0 8 */ enum ib_wc_status status; /* 8 4 */ enum ib_wc_opcode opcode; /* 12 4 */ u32 vendor_err; /* 16 4 */ u32 byte_len; /* 20 4 */ struct ib_qp * qp; /* 24 8 */ union { __be32 imm_data; /* 4 */ u32 invalidate_rkey; /* 4 */ } ex; /* 32 4 */ u32 src_qp; /* 36 4 */ u32 slid; /* 40 4 */ int wc_flags; /* 44 4 */ u16 pkey_index; /* 48 2 */ u8 sl; /* 50 1 */ u8 dlid_path_bits; /* 51 1 */ u8 port_num; /* 52 1 */ u8 smac[6]; /* 53 6 */ /* XXX 1 byte hole, try to pack */ u16 vlan_id; /* 60 2 */ u8 network_hdr_type; /* 62 1 */ /* size: 64, cachelines: 1, members: 17 */ /* sum members: 62, holes: 1, sum holes: 1 */ /* padding: 1 */ }; Cc: <stable@vger.kernel.org> # v4.13 Fixes: 7db20ecd1d97 ("IB/core: Change wc.slid from 16 to 32 bits") Signed-off-by: Bodong Wang <bodong@mellanox.com> Reviewed-by: Parav Pandit <parav@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-01-15IB/mlx4: Fix incorrectly releasing steerable UD QPs when have only ETH portsJack Morgenstein
Allocating steerable UD QPs depends on having at least one IB port, while releasing those QPs does not. As a result, when there are only ETH ports, the IB (RoCE) driver requests releasing a qp range whose base qp is zero, with qp count zero. When SR-IOV is enabled, and the VF driver is running on a VM over a hypervisor which treats such qp release calls as errors (rather than NOPs), we see lines in the VM message log like: mlx4_core 0002:00:02.0: Failed to release qp range base:0 cnt:0 Fix this by adding a check for a zero count in mlx4_release_qp_range() (which thus treats releasing 0 qps as a nop), and eliminating the check for device managed flow steering when releasing steerable UD QPs. (Freeing ib_uc_qpns_bitmap unconditionally is also OK, since it remains NULL when steerable UD QPs are not allocated). Cc: <stable@vger.kernel.org> Fixes: 4196670be786 ("IB/mlx4: Don't allocate range of steerable UD QPs for Ethernet-only device") Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-01-15RDMA/qedr: Fix endian problems around imm_dataJason Gunthorpe
The double swap matches what user space rdma-core does to imm_data. wc->imm_data is not used in the kernel so this change has no practical impact. Acked-by: Michal Kalderon <michal.kalderon@cavium.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-01-15RDMA/hns: Fix endian problems around imm_data and rkeyJason Gunthorpe
This matches the changes made recently to the userspace hns driver when it was made sparse clean. See rdma-core commit bffd380cfe56 ("libhns: Make the provider sparse clean") wc->imm_data is not used in the kernel so this change has no practical impact. Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-01-15RDMA: Mark imm_data as be32 in the verbs uapi headerJason Gunthorpe
This matches what the userspace copy of this header has been doing for a while. imm_data is an opaque 4 byte array carried over the network, and invalidate_rkey is in CPU byte order. Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-01-15IB/core: Limit DMAC resolution to RoCE Connected QPsParav Pandit
Resolving DMAC for RoCE is applicable to only Connected mode QPs. So resolve DMAC for only for Connected mode QPs. Signed-off-by: Parav Pandit <parav@mellanox.com> Reviewed-by: Daniel Jurgens <danielj@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-01-15IB/core: Attempt DMAC resolution for only RoCEParav Pandit
Instead of returning 0 (success) for RoCE scenarios where DMAC should not be resolved, avoid such attempt and make code consistent with ib_create_user_ah(). Signed-off-by: Parav Pandit <parav@mellanox.com> Reviewed-by: Daniel Jurgens <danielj@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-01-15IB/core: Limit DMAC resolution to userspace QPsParav Pandit
Currently ah_attr is initialized by the ib_cm layer for rdma_cm based applications. For RoCE transport ah_attr.roce.dmac is already initialized by ib_cm, rdma_cm either from wc, path record, route resolve, explicit path record setting depending on active or passive side QP. Therefore avoid resolving DMAC for QP of kernel consumers. Signed-off-by: Parav Pandit <parav@mellanox.com> Reviewed-by: Daniel Jurgens <danielj@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-01-15IB/core: Perform modify QP on real oneParav Pandit
Currently qp->port stores the port number whenever IB_QP_PORT QP attribute mask is set (during QP state transition to INIT state). This port number should be stored for the real QP when XRC target QP is used. Follow the ib_modify_qp() implementation and hide the access to ->real_qp. Fixes: a512c2fbef9c ("IB/core: Introduce modify QP operation with udata") Signed-off-by: Parav Pandit <parav@mellanox.com> Reviewed-by: Daniel Jurgens <danielj@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-01-15raid5-ppl: PPL support for disks with write-back cache enabledTomasz Majchrzak
In order to provide data consistency with PPL for disks with write-back cache enabled all data has to be flushed to disks before next PPL entry. The disks to be flushed are marked in the bitmap. It's modified under a mutex and it's only read after PPL io unit is submitted. A limitation of 64 disks in the array has been introduced to keep data structures and implementation simple. RAID5 arrays with so many disks are not likely due to high risk of multiple disks failure. Such restriction should not be a real life limitation. With write-back cache disabled next PPL entry is submitted when data write for current one completes. Data flush defers next log submission so trigger it when there are no stripes for handling found. As PPL assures all data is flushed to disk at request completion, just acknowledge flush request when PPL is enabled. Signed-off-by: Tomasz Majchrzak <tomasz.majchrzak@intel.com> Signed-off-by: Shaohua Li <sh.li@alibaba-inc.com>
2018-01-15RDMA/mlx5: Fix out-of-bound access while querying AHLeon Romanovsky
The rdma_ah_find_type() accesses the port array based on an index controlled by userspace. The existing bounds check is after the first use of the index, so userspace can generate an out of bounds access, as shown by the KASN report below. ================================================================== BUG: KASAN: slab-out-of-bounds in to_rdma_ah_attr+0xa8/0x3b0 Read of size 4 at addr ffff880019ae2268 by task ibv_rc_pingpong/409 CPU: 0 PID: 409 Comm: ibv_rc_pingpong Not tainted 4.15.0-rc2-00031-gb60a3faf5b83-dirty #3 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.7.5-0-ge51488c-20140602_164612-nilsson.home.kraxel.org 04/01/2014 Call Trace: dump_stack+0xe9/0x18f print_address_description+0xa2/0x350 kasan_report+0x3a5/0x400 to_rdma_ah_attr+0xa8/0x3b0 mlx5_ib_query_qp+0xd35/0x1330 ib_query_qp+0x8a/0xb0 ib_uverbs_query_qp+0x237/0x7f0 ib_uverbs_write+0x617/0xd80 __vfs_write+0xf7/0x500 vfs_write+0x149/0x310 SyS_write+0xca/0x190 entry_SYSCALL_64_fastpath+0x18/0x85 RIP: 0033:0x7fe9c7a275a0 RSP: 002b:00007ffee5498738 EFLAGS: 00000246 ORIG_RAX: 0000000000000001 RAX: ffffffffffffffda RBX: 00007fe9c7ce4b00 RCX: 00007fe9c7a275a0 RDX: 0000000000000018 RSI: 00007ffee5498800 RDI: 0000000000000003 RBP: 000055d0c8d3f010 R08: 00007ffee5498800 R09: 0000000000000018 R10: 00000000000000ba R11: 0000000000000246 R12: 0000000000008000 R13: 0000000000004fb0 R14: 000055d0c8d3f050 R15: 00007ffee5498560 Allocated by task 1: __kmalloc+0x3f9/0x430 alloc_mad_private+0x25/0x50 ib_mad_post_receive_mads+0x204/0xa60 ib_mad_init_device+0xa59/0x1020 ib_register_device+0x83a/0xbc0 mlx5_ib_add+0x50e/0x5c0 mlx5_add_device+0x142/0x410 mlx5_register_interface+0x18f/0x210 mlx5_ib_init+0x56/0x63 do_one_initcall+0x15b/0x270 kernel_init_freeable+0x2d8/0x3d0 kernel_init+0x14/0x190 ret_from_fork+0x24/0x30 Freed by task 0: (stack is not available) The buggy address belongs to the object at ffff880019ae2000 which belongs to the cache kmalloc-512 of size 512 The buggy address is located 104 bytes to the right of 512-byte region [ffff880019ae2000, ffff880019ae2200) The buggy address belongs to the page: page:000000005d674e18 count:1 mapcount:0 mapping: (null) index:0x0 compound_mapcount: 0 flags: 0x4000000000008100(slab|head) raw: 4000000000008100 0000000000000000 0000000000000000 00000001000c000c raw: dead000000000100 dead000000000200 ffff88001a402000 0000000000000000 page dumped because: kasan: bad access detected Memory state around the buggy address: ffff880019ae2100: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ffff880019ae2180: 00 00 00 00 00 00 00 00 00 00 00 00 00 fc fc fc >ffff880019ae2200: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc ^ ffff880019ae2280: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ffff880019ae2300: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ================================================================== Disabling lock debugging due to kernel taint Cc: <stable@vger.kernel.org> Fixes: 44c58487d51a ("IB/core: Define 'ib' and 'roce' rdma_ah_attr types") Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-01-15netlink: extack: avoid parenthesized string constant warningJohannes Berg
NL_SET_ERR_MSG() and NL_SET_ERR_MSG_ATTR() lead to the following warning in newer versions of gcc: warning: array initialized from parenthesized string constant Just remove the parentheses, they're not needed in this context since anyway since there can be no operator precendence issues or similar. Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-15Merge branch ↵David S. Miller
'ipv4-Make-neigh-lookup-keys-for-loopback-point-to-point-devices-be-INADDR_ANY' Jim Westfall says: ==================== ipv4: Make neigh lookup keys for loopback/point-to-point devices be INADDR_ANY This used to be the previous behavior in older kernels but became broken in a263b3093641f (ipv4: Make neigh lookups directly in output packet path) and then later removed because it was broken in 0bb4087cbec0 (ipv4: Fix neigh lookup keying over loopback/point-to-point devices) Not having this results in there being an arp entry for every remote ip address that the device talks to. Given a fairly active device it can cause the arp table to become huge and/or having to add/purge large number of entires to keep within table size thresholds. $ ip -4 neigh show nud noarp | grep tun | wc -l 55850 $ lnstat -k arp_cache:entries,arp_cache:allocs,arp_cache:destroys -c 10 arp_cach|arp_cach|arp_cach| entries| allocs|destroys| 81493|620166816|620126069| 101867| 10186| 0| 113854| 5993| 0| 118773| 2459| 0| 27937| 18579| 63998| 39256| 5659| 0| 56231| 8487| 0| 65602| 4685| 0| 79697| 7047| 0| 90733| 5517| 0| v2: - fixes coding style issues ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-15ipv4: Make neigh lookup keys for loopback/point-to-point devices be INADDR_ANYJim Westfall
Map all lookup neigh keys to INADDR_ANY for loopback/point-to-point devices to avoid making an entry for every remote ip the device needs to talk to. This used the be the old behavior but became broken in a263b3093641f (ipv4: Make neigh lookups directly in output packet path) and later removed in 0bb4087cbec0 (ipv4: Fix neigh lookup keying over loopback/point-to-point devices) because it was broken. Signed-off-by: Jim Westfall <jwestfall@surrealistic.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-15net: Allow neigh contructor functions ability to modify the primary_keyJim Westfall
Use n->primary_key instead of pkey to account for the possibility that a neigh constructor function may have modified the primary_key value. Signed-off-by: Jim Westfall <jwestfall@surrealistic.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-15sh_eth: fix dumping ARSTRSergei Shtylyov
ARSTR is always located at the start of the TSU register region, thus using add_reg() instead of add_tsu_reg() in __sh_eth_get_regs() to dump it causes EDMR or EDSR (depending on the register layout) to be dumped instead of ARSTR. Use the correct condition/macro there... Fixes: 6b4b4fead342 ("sh_eth: Implement ethtool register dump operations") Signed-off-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-15Revert "openvswitch: Add erspan tunnel support."William Tu
This reverts commit ceaa001a170e43608854d5290a48064f57b565ed. The OVS_TUNNEL_KEY_ATTR_ERSPAN_OPTS attr should be designed as a nested attribute to support all ERSPAN v1 and v2's fields. The current attr is a be32 supporting only one field. Thus, this patch reverts it and later patch will redo it using nested attr. Signed-off-by: William Tu <u9012063@gmail.com> Cc: Jiri Benc <jbenc@redhat.com> Cc: Pravin Shelar <pshelar@ovn.org> Acked-by: Jiri Benc <jbenc@redhat.com> Acked-by: Pravin B Shelar <pshelar@ovn.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-15net/tls: Fix inverted error codes to avoid endless loopr.hering@avm.de
sendfile() calls can hang endless with using Kernel TLS if a socket error occurs. Socket error codes must be inverted by Kernel TLS before returning because they are stored with positive sign. If returned non-inverted they are interpreted as number of bytes sent, causing endless looping of the splice mechanic behind sendfile(). Signed-off-by: Robert Hering <r.hering@avm.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-15ipv6: ip6_make_skb() needs to clear cork.base.dstEric Dumazet
In my last patch, I missed fact that cork.base.dst was not initialized in ip6_make_skb() : If ip6_setup_cork() returns an error, we might attempt a dst_release() on some random pointer. Fixes: 862c03ee1deb ("ipv6: fix possible mem leaks in ipv6_make_skb()") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: syzbot <syzkaller@googlegroups.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-15ALSA: hda - Use IS_REACHABLE() for dependency on inputTakashi Iwai
The commit ffcd28d88e4f ("ALSA: hda - Select INPUT for Realtek HD-audio codec") introduced the reverse-selection of CONFIG_INPUT for Realtek codec in order to avoid the mess with dependency between built-in and modules. Later on, we obtained IS_REACHABLE() macro exactly for this kind of problems, and now we can remove th INPUT selection in Kconfig and put IS_REACHABLE(INPUT) to the appropriate places in the code, so that the driver doesn't need to select other subsystem forcibly. Fixes: ffcd28d88e4f ("ALSA: hda - Select INPUT for Realtek HD-audio codec") Reported-by: Randy Dunlap <rdunlap@infradead.org> Acked-by: Randy Dunlap <rdunlap@infradead.org> # and build-tested Signed-off-by: Takashi Iwai <tiwai@suse.de>
2018-01-15tracing: Prevent PROFILE_ALL_BRANCHES when FORTIFY_SOURCE=yRandy Dunlap
I regularly get 50 MB - 60 MB files during kernel randconfig builds. These large files mostly contain (many repeats of; e.g., 124,594): In file included from ../include/linux/string.h:6:0, from ../include/linux/uuid.h:20, from ../include/linux/mod_devicetable.h:13, from ../scripts/mod/devicetable-offsets.c:3: ../include/linux/compiler.h:64:4: warning: '______f' is static but declared in inline function 'strcpy' which is not static [enabled by default] ______f = { \ ^ ../include/linux/compiler.h:56:23: note: in expansion of macro '__trace_if' ^ ../include/linux/string.h:425:2: note: in expansion of macro 'if' if (p_size == (size_t)-1 && q_size == (size_t)-1) ^ This only happens when CONFIG_FORTIFY_SOURCE=y and CONFIG_PROFILE_ALL_BRANCHES=y, so prevent PROFILE_ALL_BRANCHES if FORTIFY_SOURCE=y. Link: http://lkml.kernel.org/r/9199446b-a141-c0c3-9678-a3f9107f2750@infradead.org Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2018-01-15sctp: avoid compiler warning on implicit fallthruMarcelo Ricardo Leitner
These fall-through are expected. Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Acked-by: Neil Horman <nhorman@tuxdriver.com> Reviewed-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-15net: ipv4: Make "ip route get" match iif lo rules again.Lorenzo Colitti
Commit 3765d35ed8b9 ("net: ipv4: Convert inet_rtm_getroute to rcu versions of route lookup") broke "ip route get" in the presence of rules that specify iif lo. Host-originated traffic always has iif lo, because ip_route_output_key_hash and ip6_route_output_flags set the flow iif to LOOPBACK_IFINDEX. Thus, putting "iif lo" in an ip rule is a convenient way to select only originated traffic and not forwarded traffic. inet_rtm_getroute used to match these rules correctly because even though it sets the flow iif to 0, it called ip_route_output_key which overwrites iif with LOOPBACK_IFINDEX. But now that it calls ip_route_output_key_hash_rcu, the ifindex will remain 0 and not match the iif lo in the rule. As a result, "ip route get" will return ENETUNREACH. Fixes: 3765d35ed8b9 ("net: ipv4: Convert inet_rtm_getroute to rcu versions of route lookup") Tested: https://android.googlesource.com/kernel/tests/+/master/net/test/multinetwork_test.py passes again Signed-off-by: Lorenzo Colitti <lorenzo@google.com> Acked-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>