summaryrefslogtreecommitdiff
path: root/include
AgeCommit message (Collapse)Author
2018-06-21atomics/treewide: Make test ops optionalMark Rutland
Some of the atomics return the result of a test applied after the atomic operation, and almost all architectures implement these as trivial wrappers around the underlying atomic. Specifically: * <atomic>_inc_and_test(v) is (<atomic>_inc_return(v) == 0) * <atomic>_dec_and_test(v) is (<atomic>_dec_return(v) == 0) * <atomic>_sub_and_test(i, v) is (<atomic>_sub_return(i, v) == 0) * <atomic>_add_negative(i, v) is (<atomic>_add_return(i, v) < 0) Rather than have these definitions duplicated in all architectures, with minor inconsistencies in formatting and documentation, let's make these operations optional, with default fallbacks as above. Implementations must now provide a preprocessor symbol. The instrumented atomics are updated accordingly. Both x86 and m68k have custom implementations, which are left as-is, given preprocessor symbols to avoid being overridden. There should be no functional change as a result of this patch. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Reviewed-by: Will Deacon <will.deacon@arm.com> Acked-by: Geert Uytterhoeven <geert@linux-m68k.org> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Palmer Dabbelt <palmer@sifive.com> Cc: Boqun Feng <boqun.feng@gmail.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/lkml/20180621121321.4761-16-mark.rutland@arm.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-06-21atomics/treewide: Make atomic64_fetch_add_unless() optionalMark Rutland
Architectures with atomic64_fetch_add_unless() provide a preprocessor symbol if they do so, and all other architectures have trivial C implementations of atomic64_add_unless() which are near-identical. Let's unify the trivial definitions of atomic64_fetch_add_unless() in <linux/atomic.h>, so that we always have both atomic64_fetch_add_unless() and atomic64_add_unless() with less boilerplate code. This means that atomic64_add_unless() is always implemented in core code, and the instrumented atomics are updated accordingly. There should be no functional change as a result of this patch. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Reviewed-by: Will Deacon <will.deacon@arm.com> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Boqun Feng <boqun.feng@gmail.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/lkml/20180621121321.4761-15-mark.rutland@arm.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-06-21atomics/generic: Define atomic64_fetch_add_unless()Mark Rutland
As a step towards unifying the atomic/atomic64/atomic_long APIs, this patch converts the generic implementation of atomic64_add_unless() into a generic implementation of atomic64_fetch_add_unless(). A wrapper in <linux/atomic.h> will build atomic_add_unless() atop of this, provided it is given a preprocessor definition. No functional change is intended as a result of this patch. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Reviewed-by: Will Deacon <will.deacon@arm.com> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Boqun Feng <boqun.feng@gmail.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/lkml/20180621121321.4761-9-mark.rutland@arm.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-06-21atomics: Prepare for atomic64_fetch_add_unless()Mark Rutland
Currently all architectures must implement atomic_fetch_add_unless(), with common code providing atomic_add_unless(). Architectures must also implement atomic64_add_unless() directly, with no corresponding atomic64_fetch_add_unless(). This divergence is unfortunate, and means that the APIs for atomic_t, atomic64_t, and atomic_long_t differ. In preparation for unifying things, with architectures providing atomic64_fetch_add_unless, this patch adds a generic atomic64_add_unless() which will use atomic64_fetch_add_unless(). The instrumented atomics are updated to take this case into account. There should be no functional change as a result of this patch. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Reviewed-by: Will Deacon <will.deacon@arm.com> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Albert Ou <albert@sifive.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Boqun Feng <boqun.feng@gmail.com> Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Matt Turner <mattst88@gmail.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Palmer Dabbelt <palmer@sifive.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Richard Henderson <rth@twiddle.net> Cc: Russell King <linux@armlinux.org.uk> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vineet Gupta <vgupta@synopsys.com> Link: https://lore.kernel.org/lkml/20180621121321.4761-8-mark.rutland@arm.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-06-21atomics/treewide: Make atomic_fetch_add_unless() optionalMark Rutland
Several architectures these have a near-identical implementation based on atomic_read() and atomic_cmpxchg() which we can instead define in <linux/atomic.h>, so let's do so, using something close to the existing x86 implementation with try_cmpxchg(). Where an architecture provides its own atomic_fetch_add_unless(), it must define a preprocessor symbol for it. The instrumented atomics are updated accordingly. Note that arch/arc's existing atomic_fetch_add_unless() had redundant barriers, as these are already present in its atomic_cmpxchg() implementation. There should be no functional change as a result of this patch. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Reviewed-by: Geert Uytterhoeven <geert@linux-m68k.org> Reviewed-by: Will Deacon <will.deacon@arm.com> Acked-by: Geert Uytterhoeven <geert@linux-m68k.org> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Palmer Dabbelt <palmer@sifive.com> Cc: Boqun Feng <boqun.feng@gmail.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vineet Gupta <vgupta@synopsys.com> Link: https://lore.kernel.org/lkml/20180621121321.4761-7-mark.rutland@arm.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-06-21atomics/treewide: Make atomic64_inc_not_zero() optionalMark Rutland
We define a trivial fallback for atomic_inc_not_zero(), but don't do the same for atomic64_inc_not_zero(), leading most architectures to define the same boilerplate. Let's add a fallback in <linux/atomic.h>, and remove the redundant implementations. Note that atomic64_add_unless() is always defined in <linux/atomic.h>, and promotes its arguments to the requisite types, so we need not do this explicitly. There should be no functional change as a result of this patch. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Reviewed-by: Will Deacon <will.deacon@arm.com> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Palmer Dabbelt <palmer@sifive.com> Cc: Boqun Feng <boqun.feng@gmail.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/lkml/20180621121321.4761-6-mark.rutland@arm.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-06-21atomics: Make conditional ops return 'bool'Mark Rutland
Some of the atomics return a status value, which is a boolean value describing whether the operation was performed. To make it clear that this is a boolean value, let's update the common fallbacks to return bool, fixing up the return values and comments likewise. At the same time, let's simplify the description of the operations in their respective comments. The instrumented atomics and generic atomic64 implementation are updated accordingly. Note that atomic64_dec_if_positive() doesn't follow the usual test op pattern, and returns the would-be decremented value. This is not changed. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Reviewed-by: Will Deacon <will.deacon@arm.com> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Boqun Feng <boqun.feng@gmail.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/lkml/20180621121321.4761-5-mark.rutland@arm.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-06-21atomics/treewide: Remove atomic_inc_not_zero_hint()Mark Rutland
While documentation suggests atomic_inc_not_zero_hint() will perform better than atomic_inc_not_zero(), this is unlikely to be the case. No architectures implement atomic_inc_not_zero_hint() directly, and thus it either falls back to atomic_inc_not_zero(), or a loop using atomic_cmpxchg(). Whenever the hint does not match the value in memory, the repeated use of atomic_cmpxchg() will be more expensive than the read that atomic_inc_not_zero_hint() attempts to avoid. For architectures with LL/SC atomics, a read cannot be avoided, and it would always be better to use atomic_inc_not_zero() directly. For other architectures, their own atomic_inc_not_zero() is likely to be more optimal than an atomic_cmpxchg() loop regardless. Generally, atomic_inc_not_zero_hint() is liable to perform worse than atomic_inc_not_zero(). Further, atomic_inc_not_zero_hint() only exists for atomic_t, and not atomic64_t or atomic_long_t, and there is only one user in the kernel tree. Given all this, let's remove atomic_inc_not_zero_hint(), and migrate the existing user over to atomic_inc_not_zero(). There should be no functional change as a result of this patch. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Reviewed-by: Will Deacon <will.deacon@arm.com> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Boqun Feng <boqun.feng@gmail.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/lkml/20180621121321.4761-4-mark.rutland@arm.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-06-21atomics/treewide: Rename __atomic_add_unless() => atomic_fetch_add_unless()Mark Rutland
While __atomic_add_unless() was originally intended as a building-block for atomic_add_unless(), it's now used in a number of places around the kernel. It's the only common atomic operation named __atomic*(), rather than atomic_*(), and for consistency it would be better named atomic_fetch_add_unless(). This lack of consistency is slightly confusing, and gets in the way of scripting atomics. Given that, let's clean things up and promote it to an official part of the atomics API, in the form of atomic_fetch_add_unless(). This patch converts definitions and invocations over to the new name, including the instrumented version, using the following script: ---- git grep -w __atomic_add_unless | while read line; do sed -i '{s/\<__atomic_add_unless\>/atomic_fetch_add_unless/}' "${line%%:*}"; done git grep -w __arch_atomic_add_unless | while read line; do sed -i '{s/\<__arch_atomic_add_unless\>/arch_atomic_fetch_add_unless/}' "${line%%:*}"; done ---- Note that we do not have atomic{64,_long}_fetch_add_unless(), which will be introduced by later patches. There should be no functional change as a result of this patch. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Reviewed-by: Will Deacon <will.deacon@arm.com> Acked-by: Geert Uytterhoeven <geert@linux-m68k.org> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Palmer Dabbelt <palmer@sifive.com> Cc: Boqun Feng <boqun.feng@gmail.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/lkml/20180621121321.4761-2-mark.rutland@arm.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-06-21cpu/hotplug: Provide knobs to control SMTThomas Gleixner
Provide a command line and a sysfs knob to control SMT. The command line options are: 'nosmt': Enumerate secondary threads, but do not online them 'nosmt=force': Ignore secondary threads completely during enumeration via MP table and ACPI/MADT. The sysfs control file has the following states (read/write): 'on': SMT is enabled. Secondary threads can be freely onlined 'off': SMT is disabled. Secondary threads, even if enumerated cannot be onlined 'forceoff': SMT is permanentely disabled. Writes to the control file are rejected. 'notsupported': SMT is not supported by the CPU The command line option 'nosmt' sets the sysfs control to 'off'. This can be changed to 'on' to reenable SMT during runtime. The command line option 'nosmt=force' sets the sysfs control to 'forceoff'. This cannot be changed during runtime. When SMT is 'on' and the control file is changed to 'off' then all online secondary threads are offlined and attempts to online a secondary thread later on are rejected. When SMT is 'off' and the control file is changed to 'on' then secondary threads can be onlined again. The 'off' -> 'on' transition does not automatically online the secondary threads. When the control file is set to 'forceoff', the behaviour is the same as setting it to 'off', but the operation is irreversible and later writes to the control file are rejected. When the control status is 'notsupported' then writes to the control file are rejected. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Acked-by: Ingo Molnar <mingo@kernel.org>
2018-06-21locking/atomics, asm-generic/bitops/lock.h: Rewrite using atomic_fetch_*()Will Deacon
The lock bitops can be implemented more efficiently using the atomic_fetch_*() ops, which provide finer-grained control over the memory ordering semantics than the bitops. Signed-off-by: Will Deacon <will.deacon@arm.com> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-arm-kernel@lists.infradead.org Cc: yamada.masahiro@socionext.com Link: https://lore.kernel.org/lkml/1529412794-17720-8-git-send-email-will.deacon@arm.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-06-21locking/atomics, asm-generic/bitops/atomic.h: Rewrite using atomic_*() APIsWill Deacon
The atomic bitops can actually be implemented pretty efficiently using the atomic_*() ops, rather than explicit use of spinlocks. Signed-off-by: Will Deacon <will.deacon@arm.com> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-arm-kernel@lists.infradead.org Cc: yamada.masahiro@socionext.com Link: https://lore.kernel.org/lkml/1529412794-17720-7-git-send-email-will.deacon@arm.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-06-21locking/atomics, asm-generic: Move some macros from <linux/bitops.h> to a ↵Will Deacon
new <linux/bits.h> file In preparation for implementing the asm-generic atomic bitops in terms of atomic_long_*(), we need to prevent <asm/atomic.h> implementations from pulling in <linux/bitops.h>. A common reason for this include is for the BITS_PER_BYTE definition, so move this and some other BIT() and masking macros into a new header file, <linux/bits.h>. Signed-off-by: Will Deacon <will.deacon@arm.com> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-arm-kernel@lists.infradead.org Cc: yamada.masahiro@socionext.com Link: https://lore.kernel.org/lkml/1529412794-17720-4-git-send-email-will.deacon@arm.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-06-21rseq/cleanup: Do not abort rseq c.s. in child on fork()Mathieu Desnoyers
Considering that we explicitly forbid system calls in rseq critical sections, it is not valid to issue a fork or clone system call within a rseq critical section, so rseq_fork() is not required to restart an active rseq c.s. in the child process. Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Ben Maurer <bmaurer@fb.com> Cc: Boqun Feng <boqun.feng@gmail.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Chris Lameter <cl@linux.com> Cc: Dave Watson <davejwatson@fb.com> Cc: Joel Fernandes <joelaf@google.com> Cc: Josh Triplett <josh@joshtriplett.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Michael Kerrisk <mtk.manpages@gmail.com> Cc: Paul E . McKenney <paulmck@linux.vnet.ibm.com> Cc: Paul Turner <pjt@google.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Russell King <linux@arm.linux.org.uk> Cc: Shuah Khan <shuahkh@osg.samsung.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Will Deacon <will.deacon@arm.com> Cc: linux-api@vger.kernel.org Cc: linux-kselftest@vger.kernel.org Link: https://lore.kernel.org/lkml/20180619133230.4087-4-mathieu.desnoyers@efficios.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-06-21kprobes: Remove jprobe stub APIMasami Hiramatsu
Remove jprobe stub APIs from linux/kprobes.h since the jprobe implementation was completely gone. Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org> Acked-by: Thomas Gleixner <tglx@linutronix.de> Cc: Ananth N Mavinakayanahalli <ananth@linux.vnet.ibm.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: linux-arch@vger.kernel.org Link: https://lore.kernel.org/lkml/152942503572.15209.1652552217914694917.stgit@devbox Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-06-21kprobes: Remove jprobe API implementationMasami Hiramatsu
Remove functionally empty jprobe API implementations and test cases. Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org> Acked-by: Thomas Gleixner <tglx@linutronix.de> Cc: Ananth N Mavinakayanahalli <ananth@linux.vnet.ibm.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: linux-arch@vger.kernel.org Link: https://lore.kernel.org/lkml/152942430705.15209.2307050500995264322.stgit@devbox Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-06-21bpf: enforce correct alignment for instructionsEric Dumazet
After commit 9facc336876f ("bpf: reject any prog that failed read-only lock") offsetof(struct bpf_binary_header, image) became 3 instead of 4, breaking powerpc BPF badly, since instructions need to be word aligned. Fixes: 9facc336876f ("bpf: reject any prog that failed read-only lock") Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Daniel Borkmann <daniel@iogearbox.net> Cc: Martin KaFai Lau <kafai@fb.com> Cc: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-06-20nbd: Add the nbd NBD_DISCONNECT_ON_CLOSE config flag.Doron Roberts-Kedes
If NBD_DISCONNECT_ON_CLOSE is set on a device, then the driver will issue a disconnect from nbd_release if the device has no remaining bdev->bd_openers. Fix ret val so reconfigure with only setting the flag succeeds. Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Doron Roberts-Kedes <doronrk@fb.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-06-21Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdmaLinus Torvalds
Pull rdma fixes from Jason Gunthorpe: "Here are eight fairly small fixes collected over the last two weeks. Regression and crashing bug fixes: - mlx4/5: Fixes for issues found from various checkers - A resource tracking and uverbs regression in the core code - qedr: NULL pointer regression found during testing - rxe: Various small bugs" * tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: IB/rxe: Fix missing completion for mem_reg work requests RDMA/core: Save kernel caller name when creating CQ using ib_create_cq() IB/uverbs: Fix ordering of ucontext check in ib_uverbs_write IB/mlx4: Fix an error handling path in 'mlx4_ib_rereg_user_mr()' RDMA/qedr: Fix NULL pointer dereference when running over iWARP without RDMA-CM IB/mlx5: Fix return value check in flow_counters_set_data() IB/mlx5: Fix memory leak in mlx5_ib_create_flow IB/rxe: avoid double kfree skb
2018-06-21Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netLinus Torvalds
Pull networking fixes from David Miller: 1) Fix crash on bpf_prog_load() errors, from Daniel Borkmann. 2) Fix ATM VCC memory accounting, from David Woodhouse. 3) fib6_info objects need RCU freeing, from Eric Dumazet. 4) Fix SO_BINDTODEVICE handling for TCP sockets, from David Ahern. 5) Fix clobbered error code in enic_open() failure path, from Govindarajulu Varadarajan. 6) Propagate dev_get_valid_name() error returns properly, from Li RongQing. 7) Fix suspend/resume in davinci_emac driver, from Bartosz Golaszewski. 8) Various act_ife fixes (recursive locking, IDR leaks, etc.) from Davide Caratti. 9) Fix buggy checksum handling in sungem driver, from Eric Dumazet. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (40 commits) ip: limit use of gso_size to udp stmmac: fix DMA channel hang in half-duplex mode net: stmmac: socfpga: add additional ocp reset line for Stratix10 net: sungem: fix rx checksum support bpfilter: ignore binary files bpfilter: fix build error net/usb/drivers: Remove useless hrtimer_active check net/sched: act_ife: preserve the action control in case of error net/sched: act_ife: fix recursive lock and idr leak net: ethernet: fix suspend/resume in davinci_emac net: propagate dev_get_valid_name return code enic: do not overwrite error code net/tcp: Fix socket lookups with SO_BINDTODEVICE ptp: replace getnstimeofday64() with ktime_get_real_ts64() net/ipv6: respect rcu grace period before freeing fib6_info net: net_failover: fix typo in net_failover_slave_register() ipvlan: use ETH_MAX_MTU as max mtu net: hamradio: use eth_broadcast_addr enic: initialize enic->rfs_h.lock in enic_probe MAINTAINERS: Add Sam as the maintainer for NCSI ...
2018-06-20x86/speculation/l1tf: Limit swap file size to MAX_PA/2Andi Kleen
For the L1TF workaround its necessary to limit the swap file size to below MAX_PA/2, so that the higher bits of the swap offset inverted never point to valid memory. Add a mechanism for the architecture to override the swap file size check in swapfile.c and add a x86 specific max swapfile check function that enforces that limit. The check is only enabled if the CPU is vulnerable to L1TF. In VMs with 42bit MAX_PA the typical limit is 2TB now, on a native system with 46bit PA it is 32TB. The limit is only per individual swap file, so it's always possible to exceed these limits with multiple swap files or partitions. Signed-off-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Josh Poimboeuf <jpoimboe@redhat.com> Acked-by: Michal Hocko <mhocko@suse.com> Acked-by: Dave Hansen <dave.hansen@intel.com>
2018-06-20x86/speculation/l1tf: Disallow non privileged high MMIO PROT_NONE mappingsAndi Kleen
For L1TF PROT_NONE mappings are protected by inverting the PFN in the page table entry. This sets the high bits in the CPU's address space, thus making sure to point to not point an unmapped entry to valid cached memory. Some server system BIOSes put the MMIO mappings high up in the physical address space. If such an high mapping was mapped to unprivileged users they could attack low memory by setting such a mapping to PROT_NONE. This could happen through a special device driver which is not access protected. Normal /dev/mem is of course access protected. To avoid this forbid PROT_NONE mappings or mprotect for high MMIO mappings. Valid page mappings are allowed because the system is then unsafe anyways. It's not expected that users commonly use PROT_NONE on MMIO. But to minimize any impact this is only enforced if the mapping actually refers to a high MMIO address (defined as the MAX_PA-1 bit being set), and also skip the check for root. For mmaps this is straight forward and can be handled in vm_insert_pfn and in remap_pfn_range(). For mprotect it's a bit trickier. At the point where the actual PTEs are accessed a lot of state has been changed and it would be difficult to undo on an error. Since this is a uncommon case use a separate early page talk walk pass for MMIO PROT_NONE mappings that checks for this condition early. For non MMIO and non PROT_NONE there are no changes. Signed-off-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Josh Poimboeuf <jpoimboe@redhat.com> Acked-by: Dave Hansen <dave.hansen@intel.com>
2018-06-20x86/speculation/l1tf: Add sysfs reporting for l1tfAndi Kleen
L1TF core kernel workarounds are cheap and normally always enabled, However they still should be reported in sysfs if the system is vulnerable or mitigated. Add the necessary CPU feature/bug bits. - Extend the existing checks for Meltdowns to determine if the system is vulnerable. All CPUs which are not vulnerable to Meltdown are also not vulnerable to L1TF - Check for 32bit non PAE and emit a warning as there is no practical way for mitigation due to the limited physical address bits - If the system has more than MAX_PA/2 physical memory the invert page workarounds don't protect the system against the L1TF attack anymore, because an inverted physical address will also point to valid memory. Print a warning in this case and report that the system is vulnerable. Add a function which returns the PFN limit for the L1TF mitigation, which will be used in follow up patches for sanity and range checks. [ tglx: Renamed the CPU feature bit to L1TF_PTEINV ] Signed-off-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Josh Poimboeuf <jpoimboe@redhat.com> Acked-by: Dave Hansen <dave.hansen@intel.com>
2018-06-20iomap: add initial support for writes without buffer headsChristoph Hellwig
For now just limited to blocksize == PAGE_SIZE, where we can simply read in the full page in write begin, and just set the whole page dirty after copying data into it. This code is enabled by default and XFS will now be feed pages without buffer heads in ->writepage and ->writepages. If a file system sets the IOMAP_F_BUFFER_HEAD flag on the iomap the old path will still be used, this both helps the transition in XFS and prepares for the gfs2 migration to the iomap infrastructure. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2018-06-20Merge drm-upstream/drm-next into drm-misc-nextGustavo Padovan
We got a few conflicts in drm_atomic.c after merging the DRM writeback support, now we need a backmerge to unlock develop development on drm-misc-next. Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.com>
2018-06-20drm: writeback: Add client capability for exposing writeback connectorsLiviu Dudau
Due to the fact that writeback connectors behave in a special way in DRM (they always report being disconnected) we might confuse some userspace. Add a client capability for writeback connectors that will filter them out for clients that don't understand the capability. Changelog: - only accept the capability if the client has already set the DRM_CLIENT_CAP_ATOMIC one. Cc: Sean Paul <seanpaul@chromium.org> Cc: Brian Starkey <brian.starkey@arm.com> Signed-off-by: Liviu Dudau <liviu.dudau@arm.com> Reviewed-by: Eric Anholt <eric@anholt.net> Reviewed-by: Brian Starkey <brian.starkey@arm.com> Link: https://patchwork.freedesktop.org/patch/229038/
2018-06-20drm: writeback: Add out-fences for writeback connectorsBrian Starkey
Add the WRITEBACK_OUT_FENCE_PTR property to writeback connectors, to enable userspace to get a fence which will signal once the writeback is complete. It is not allowed to request an out-fence without a framebuffer attached to the connector. A timeline is added to drm_writeback_connector for use by the writeback out-fences. In the case of a commit failure or DRM_MODE_ATOMIC_TEST_ONLY, the fence is set to -1. Changes from v2: - Rebase onto Gustavo Padovan's v9 explicit sync series - Change out_fence_ptr type to s32 __user * - Set *out_fence_ptr to -1 in drm_atomic_connector_set_property - Store fence in drm_writeback_job Gustavo Padovan: - Move out_fence_ptr out of connector_state - Signal fence from drm_writeback_signal_completion instead of in driver directly Changes from v3: - Rebase onto commit 7e9081c5aac7 ("drm/fence: fix memory overwrite when setting out_fence fd") (change out_fence_ptr to s32 __user *, for real this time.) - Update documentation around WRITEBACK_OUT_FENCE_PTR Signed-off-by: Brian Starkey <brian.starkey@arm.com> [rebased and fixed conflicts] Signed-off-by: Mihail Atanassov <mihail.atanassov@arm.com> Signed-off-by: Liviu Dudau <liviu.dudau@arm.com> Reviewed-by: Eric Anholt <eric@anholt.net> Reviewed-by: Sean Paul <seanpaul@chromium.org> Link: https://patchwork.freedesktop.org/patch/229036/
2018-06-20drm: Add writeback connector typeBrian Starkey
Writeback connectors represent writeback engines which can write the CRTC output to a memory framebuffer. Add a writeback connector type and related support functions. Drivers should initialize a writeback connector with drm_writeback_connector_init() which takes care of setting up all the writeback-specific details on top of the normal functionality of drm_connector_init(). Writeback connectors have a WRITEBACK_FB_ID property, used to set the output framebuffer, and a WRITEBACK_PIXEL_FORMATS blob used to expose the supported writeback formats to userspace. When a framebuffer is attached to a writeback connector with the WRITEBACK_FB_ID property, it is used only once (for the commit in which it was included), and userspace can never read back the value of WRITEBACK_FB_ID. WRITEBACK_FB_ID can only be set if the connector is attached to a CRTC. Changes since v1: - Added drm_writeback.c + documentation - Added helper to initialize writeback connector in one go - Added core checks - Squashed into a single commit - Dropped the client cap - Writeback framebuffers are no longer persistent Changes since v2: Daniel Vetter: - Subclass drm_connector to drm_writeback_connector - Relax check to allow CRTC to be set without an FB - Add some writeback_ prefixes - Drop PIXEL_FORMATS_SIZE property, as it was unnecessary Gustavo Padovan: - Add drm_writeback_job to handle writeback signalling centrally Changes since v3: - Rebased - Rename PIXEL_FORMATS -> WRITEBACK_PIXEL_FORMATS Chances since v4: - Embed a drm_encoder inside the drm_writeback_connector to reduce the amount of boilerplate code required from the drivers that are using it. Changes since v5: - Added Rob Clark's atomic_commit() vfunc to connector helper funcs, so that writeback jobs are committed from atomic helpers - Updated create_writeback_properties() signature to return an error code rather than a boolean false for failure. - Free writeback job with the connector state rather than when doing the cleanup_work() Changes since v7: - fix extraneous use of out_fence that is only introduced in a subsequent patch. Changes since v8: - whitespace changes pull from subsequent patch Changes since v9: - Revert the v6 changes that free the writeback job in the connector state cleanup and return to doing it in the cleanup_work() function Signed-off-by: Brian Starkey <brian.starkey@arm.com> [rebased and fixed conflicts] Signed-off-by: Mihail Atanassov <mihail.atanassov@arm.com> [rebased and added atomic_commit() vfunc for writeback jobs] Signed-off-by: Rob Clark <robdclark@gmail.com> Signed-off-by: Liviu Dudau <liviu.dudau@arm.com> Reviewed-by: Eric Anholt <eric@anholt.net> Link: https://patchwork.freedesktop.org/patch/229037/
2018-06-20dma-buf: remove kmap_atomic interfaceChristian König
Neither used nor correctly implemented anywhere. Just completely remove the interface. Signed-off-by: Christian König <christian.koenig@amd.com> Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch> Acked-by: Sumit Semwal <sumit.semwal@linaro.org> Link: https://patchwork.freedesktop.org/patch/226645/
2018-06-20dma_buf: remove device parameter from attach callback v2Christian König
The device parameter is completely unused because it is available in the attachment structure as well. v2: fix kerneldoc as well Signed-off-by: Christian König <christian.koenig@amd.com> Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch> Link: https://patchwork.freedesktop.org/patch/226643/
2018-06-20sched/swait: Rename to exclusivePeter Zijlstra
Since swait basically implemented exclusive waits only, make sure the API reflects that. $ git grep -l -e "\<swake_up\>" -e "\<swait_event[^ (]*" -e "\<prepare_to_swait\>" | while read file; do sed -i -e 's/\<swake_up\>/&_one/g' -e 's/\<swait_event[^ (]*/&_exclusive/g' -e 's/\<prepare_to_swait\>/&_exclusive/g' $file; done With a few manual touch-ups. Suggested-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Cc: bigeasy@linutronix.de Cc: oleg@redhat.com Cc: paulmck@linux.vnet.ibm.com Cc: pbonzini@redhat.com Link: https://lkml.kernel.org/r/20180612083909.261946548@infradead.org
2018-06-20sched/swait: Switch to full exclusive modePeter Zijlstra
Linus noted that swait basically implements exclusive mode -- because swake_up() only wakes a single waiter. And because of that it should take care to properly deal with the interruptible case. In short, the problem is that swake_up() can race with a signal. In this this case it is possible the swake_up() 'wakes' the waiter that is already on the way out because it just got a signal and the wakeup gets lost. The normal wait code is very careful and avoids this situation, make sure we do too. Copy the exact exclusive semantics from wait. Suggested-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Cc: bigeasy@linutronix.de Cc: oleg@redhat.com Cc: paulmck@linux.vnet.ibm.com Cc: pbonzini@redhat.com Link: https://lkml.kernel.org/r/20180612083909.209762413@infradead.org
2018-06-20sched/swait: Remove __prepare_to_swaitPeter Zijlstra
There is no public user of this API, remove it. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Cc: bigeasy@linutronix.de Cc: oleg@redhat.com Cc: paulmck@linux.vnet.ibm.com Cc: pbonzini@redhat.com Link: https://lkml.kernel.org/r/20180612083909.157076812@infradead.org
2018-06-20ACPI / processor: Finish making acpi_processor_ppc_has_changed() voidBrian Norris
Commit bca5f557dcea "ACPI / processor: Make acpi_processor_ppc_has_changed() void" changed one of the declarations of acpi_processor_ppc_has_changed() to return void, but the !CPU_FREQ version still returns int. Let's return void to be consistent. Fixes: bca5f557dcea "ACPI / processor: Make acpi_processor_ppc_has_changed() void" Signed-off-by: Brian Norris <briannorris@chromium.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2018-06-20Merge tag 'dma-rename-4.18' of git://git.infradead.org/users/hch/dma-mappingLinus Torvalds
Pull dma-mapping rename from Christoph Hellwig: "Move all the dma-mapping code to kernel/dma and lose their dma-* prefixes" * tag 'dma-rename-4.18' of git://git.infradead.org/users/hch/dma-mapping: dma-mapping: move all DMA mapping code to kernel/dma dma-mapping: use obj-y instead of lib-y for generic dma ops
2018-06-19scsi: libsas: dynamically allocate and free ata hostJason Yan
Commit 2623c7a5f2 ("libata: add refcounting to ata_host") v4.17+ introduced refcounting to ata_host and will increase or decrease the refcount when adding or deleting transport ATA port. Now the ata host for libsas is embedded in domain_device, and the ->kref member is not initialized. Afer we add ata transport class, ata_host_get() will be called when adding transport ATA port and a warning will be triggered as below: refcount_t: increment on 0; use-after-free. WARNING: CPU: 2 PID: 103 at lib/refcount.c:153 refcount_inc+0x40/0x48 ...... Call trace: refcount_inc+0x40/0x48 ata_host_get+0x10/0x18 ata_tport_add+0x40/0x120 ata_sas_tport_add+0xc/0x14 sas_ata_init+0x7c/0xc8 sas_discover_domain+0x380/0x53c process_one_work+0x12c/0x288 worker_thread+0x58/0x3f0 kthread+0xfc/0x128 ret_from_fork+0x10/0x18 And also when removing transport ATA port ata_host_put() will be called and another similar warning will be triggered. If the refcount decreased to zero, the ata host will be freed. But this ata host is only part of domain_device, it cannot be freed directly. So we have to change this embedded static ata host to a dynamically allocated ata host and initialize the ->kref member. To use ata_host_get() and ata_host_put() in libsas, we need to move the declaration of these functions to the public libata.h and export them. Fixes: b6240a4df018 ("scsi: libsas: add transport class for ATA devices") Signed-off-by: Jason Yan <yanaijie@huawei.com> CC: John Garry <john.garry@huawei.com> CC: Taras Kondratiuk <takondra@cisco.com> CC: Tejun Heo <tj@kernel.org> Acked-by: Tejun Heo <tj@kernel.org> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2018-06-19scsi: Remove percpu_idaMatthew Wilcox
With its one user gone, remove the library code. Signed-off-by: Matthew Wilcox <willy@infradead.org> Reviewed-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2018-06-19scsi: target: Convert target drivers to use sbitmapMatthew Wilcox
The sbitmap and the percpu_ida perform essentially the same task, allocating tags for commands. The sbitmap outperforms the percpu_ida as documented here: https://lkml.org/lkml/2014/4/22/553 The sbitmap interface is a little harder to use, but being able to remove the percpu_ida code and getting better performance justifies the additional complexity. Signed-off-by: Matthew Wilcox <willy@infradead.org> Acked-by: Felipe Balbi <felipe.balbi@linux.intel.com> # f_tcm Reviewed-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2018-06-19scsi: target: Abstract tag freeingMatthew Wilcox
Introduce target_free_tag() and convert all drivers to use it. Signed-off-by: Matthew Wilcox <willy@infradead.org> Reviewed-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2018-06-20net/ipv6: respect rcu grace period before freeing fib6_infoEric Dumazet
syzbot reported use after free that is caused by fib6_info being freed without a proper RCU grace period. CPU: 0 PID: 1407 Comm: udevd Not tainted 4.17.0+ #39 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Call Trace: <IRQ> __dump_stack lib/dump_stack.c:77 [inline] dump_stack+0x1b9/0x294 lib/dump_stack.c:113 print_address_description+0x6c/0x20b mm/kasan/report.c:256 kasan_report_error mm/kasan/report.c:354 [inline] kasan_report.cold.7+0x242/0x2fe mm/kasan/report.c:412 __asan_report_load8_noabort+0x14/0x20 mm/kasan/report.c:433 __read_once_size include/linux/compiler.h:188 [inline] find_rr_leaf net/ipv6/route.c:705 [inline] rt6_select net/ipv6/route.c:761 [inline] fib6_table_lookup+0x12b7/0x14d0 net/ipv6/route.c:1823 ip6_pol_route+0x1c2/0x1020 net/ipv6/route.c:1856 ip6_pol_route_output+0x54/0x70 net/ipv6/route.c:2082 fib6_rule_lookup+0x211/0x6d0 net/ipv6/fib6_rules.c:122 ip6_route_output_flags+0x2c5/0x350 net/ipv6/route.c:2110 ip6_route_output include/net/ip6_route.h:82 [inline] icmpv6_xrlim_allow net/ipv6/icmp.c:211 [inline] icmp6_send+0x147c/0x2da0 net/ipv6/icmp.c:535 icmpv6_send+0x17a/0x300 net/ipv6/ip6_icmp.c:43 ip6_link_failure+0xa5/0x790 net/ipv6/route.c:2244 dst_link_failure include/net/dst.h:427 [inline] ndisc_error_report+0xd1/0x1c0 net/ipv6/ndisc.c:695 neigh_invalidate+0x246/0x550 net/core/neighbour.c:892 neigh_timer_handler+0xaf9/0xde0 net/core/neighbour.c:978 call_timer_fn+0x230/0x940 kernel/time/timer.c:1326 expire_timers kernel/time/timer.c:1363 [inline] __run_timers+0x79e/0xc50 kernel/time/timer.c:1666 run_timer_softirq+0x4c/0x70 kernel/time/timer.c:1692 __do_softirq+0x2e0/0xaf5 kernel/softirq.c:284 invoke_softirq kernel/softirq.c:364 [inline] irq_exit+0x1d1/0x200 kernel/softirq.c:404 exiting_irq arch/x86/include/asm/apic.h:527 [inline] smp_apic_timer_interrupt+0x17e/0x710 arch/x86/kernel/apic/apic.c:1052 apic_timer_interrupt+0xf/0x20 arch/x86/entry/entry_64.S:863 </IRQ> RIP: 0010:strlen+0x5e/0xa0 lib/string.c:482 Code: 24 00 74 3b 48 bb 00 00 00 00 00 fc ff df 4c 89 e0 48 83 c0 01 48 89 c2 48 89 c1 48 c1 ea 03 83 e1 07 0f b6 14 1a 38 ca 7f 04 <84> d2 75 23 80 38 00 75 de 48 83 c4 08 4c 29 e0 5b 41 5c 5d c3 48 RSP: 0018:ffff8801af117850 EFLAGS: 00000246 ORIG_RAX: ffffffffffffff13 RAX: ffff880197f53bd0 RBX: dffffc0000000000 RCX: 0000000000000000 RDX: 0000000000000000 RSI: ffffffff81c5b06c RDI: ffff880197f53bc0 RBP: ffff8801af117868 R08: ffff88019a976540 R09: 0000000000000000 R10: ffff88019a976540 R11: 0000000000000000 R12: ffff880197f53bc0 R13: ffff880197f53bc0 R14: ffffffff899e4e90 R15: ffff8801d91c6a00 strlen include/linux/string.h:267 [inline] getname_kernel+0x24/0x370 fs/namei.c:218 open_exec+0x17/0x70 fs/exec.c:882 load_elf_binary+0x968/0x5610 fs/binfmt_elf.c:780 search_binary_handler+0x17d/0x570 fs/exec.c:1653 exec_binprm fs/exec.c:1695 [inline] __do_execve_file.isra.35+0x16fe/0x2710 fs/exec.c:1819 do_execveat_common fs/exec.c:1866 [inline] do_execve fs/exec.c:1883 [inline] __do_sys_execve fs/exec.c:1964 [inline] __se_sys_execve fs/exec.c:1959 [inline] __x64_sys_execve+0x8f/0xc0 fs/exec.c:1959 do_syscall_64+0x1b1/0x800 arch/x86/entry/common.c:290 entry_SYSCALL_64_after_hwframe+0x49/0xbe RIP: 0033:0x7f1576a46207 Code: 77 19 f4 48 89 d7 44 89 c0 0f 05 48 3d 00 f0 ff ff 76 e0 f7 d8 64 41 89 01 eb d8 f7 d8 64 41 89 01 eb df b8 3b 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 02 f3 c3 48 8b 15 00 8c 2d 00 f7 d8 64 89 02 RSP: 002b:00007ffff2784568 EFLAGS: 00000202 ORIG_RAX: 000000000000003b RAX: ffffffffffffffda RBX: 00000000ffffffff RCX: 00007f1576a46207 RDX: 0000000001215b10 RSI: 00007ffff2784660 RDI: 00007ffff2785670 RBP: 0000000000625500 R08: 000000000000589c R09: 000000000000589c R10: 0000000000000000 R11: 0000000000000202 R12: 0000000001215b10 R13: 0000000000000007 R14: 0000000001204250 R15: 0000000000000005 Allocated by task 12188: save_stack+0x43/0xd0 mm/kasan/kasan.c:448 set_track mm/kasan/kasan.c:460 [inline] kasan_kmalloc+0xc4/0xe0 mm/kasan/kasan.c:553 kmem_cache_alloc_trace+0x152/0x780 mm/slab.c:3620 kmalloc include/linux/slab.h:513 [inline] kzalloc include/linux/slab.h:706 [inline] fib6_info_alloc+0xbb/0x280 net/ipv6/ip6_fib.c:152 ip6_route_info_create+0x782/0x2b50 net/ipv6/route.c:3013 ip6_route_add+0x23/0xb0 net/ipv6/route.c:3154 ipv6_route_ioctl+0x5a5/0x760 net/ipv6/route.c:3660 inet6_ioctl+0x100/0x1f0 net/ipv6/af_inet6.c:546 sock_do_ioctl+0xe4/0x3e0 net/socket.c:973 sock_ioctl+0x30d/0x680 net/socket.c:1097 vfs_ioctl fs/ioctl.c:46 [inline] file_ioctl fs/ioctl.c:500 [inline] do_vfs_ioctl+0x1cf/0x16f0 fs/ioctl.c:684 ksys_ioctl+0xa9/0xd0 fs/ioctl.c:701 __do_sys_ioctl fs/ioctl.c:708 [inline] __se_sys_ioctl fs/ioctl.c:706 [inline] __x64_sys_ioctl+0x73/0xb0 fs/ioctl.c:706 do_syscall_64+0x1b1/0x800 arch/x86/entry/common.c:290 entry_SYSCALL_64_after_hwframe+0x49/0xbe Freed by task 1402: save_stack+0x43/0xd0 mm/kasan/kasan.c:448 set_track mm/kasan/kasan.c:460 [inline] __kasan_slab_free+0x11a/0x170 mm/kasan/kasan.c:521 kasan_slab_free+0xe/0x10 mm/kasan/kasan.c:528 __cache_free mm/slab.c:3498 [inline] kfree+0xd9/0x260 mm/slab.c:3813 fib6_info_destroy+0x29b/0x350 net/ipv6/ip6_fib.c:207 fib6_info_release include/net/ip6_fib.h:286 [inline] __ip6_del_rt_siblings net/ipv6/route.c:3235 [inline] ip6_route_del+0x11c4/0x13b0 net/ipv6/route.c:3316 ipv6_route_ioctl+0x616/0x760 net/ipv6/route.c:3663 inet6_ioctl+0x100/0x1f0 net/ipv6/af_inet6.c:546 sock_do_ioctl+0xe4/0x3e0 net/socket.c:973 sock_ioctl+0x30d/0x680 net/socket.c:1097 vfs_ioctl fs/ioctl.c:46 [inline] file_ioctl fs/ioctl.c:500 [inline] do_vfs_ioctl+0x1cf/0x16f0 fs/ioctl.c:684 ksys_ioctl+0xa9/0xd0 fs/ioctl.c:701 __do_sys_ioctl fs/ioctl.c:708 [inline] __se_sys_ioctl fs/ioctl.c:706 [inline] __x64_sys_ioctl+0x73/0xb0 fs/ioctl.c:706 do_syscall_64+0x1b1/0x800 arch/x86/entry/common.c:290 entry_SYSCALL_64_after_hwframe+0x49/0xbe The buggy address belongs to the object at ffff8801b5df2580 which belongs to the cache kmalloc-256 of size 256 The buggy address is located 8 bytes inside of 256-byte region [ffff8801b5df2580, ffff8801b5df2680) The buggy address belongs to the page: page:ffffea0006d77c80 count:1 mapcount:0 mapping:ffff8801da8007c0 index:0xffff8801b5df2e40 flags: 0x2fffc0000000100(slab) raw: 02fffc0000000100 ffffea0006c5cc48 ffffea0007363308 ffff8801da8007c0 raw: ffff8801b5df2e40 ffff8801b5df2080 0000000100000006 0000000000000000 page dumped because: kasan: bad access detected Memory state around the buggy address: ffff8801b5df2480: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb ffff8801b5df2500: fb fb fb fb fb fb fb fb fc fc fc fc fc fc fc fc > ffff8801b5df2580: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb ^ ffff8801b5df2600: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb ffff8801b5df2680: fc fc fc fc fc fc fc fc fb fb fb fb fb fb fb fb Fixes: a64efe142f5e ("net/ipv6: introduce fib6_info struct and helpers") Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: David Ahern <dsahern@gmail.com> Reported-by: syzbot+9e6d75e3edef427ee888@syzkaller.appspotmail.com Acked-by: David Ahern <dsahern@gmail.com> Tested-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-06-19iomap: add an iomap-based readpage and readpages implementationChristoph Hellwig
Simply use iomap_apply to iterate over the file and a submit a bio for each non-uptodate but mapped region and zero everything else. Note that as-is this can not be used for file systems with a blocksize smaller than the page size, but that support will be added later. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2018-06-19iomap: add private pointer to struct iomapAndreas Gruenbacher
Add a private pointer to struct iomap to allow filesystems to pass data from iomap_begin to iomap_end. Will be used by gfs2 for passing on the on-disk inode buffer head. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2018-06-19iomap: add a page_done callbackChristoph Hellwig
This will be used by gfs2 to attach data to transactions for the journaled data mode. But the concept is generic enough that we might be able to use it for other purposes like encryption/integrity post-processing in the future. Based on a patch from Andreas Gruenbacher. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2018-06-19iomap: generic inline data handlingAndreas Gruenbacher
Add generic inline data handling by adding a pointer to the inline data region to struct iomap. When handling a buffered IOMAP_INLINE write, iomap_write_begin will copy the current inline data from the inline data region into the page cache, and iomap_write_end will copy the changes in the page cache back to the inline data region. This doesn't cover inline data reads and direct I/O yet because so far, we have no users. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> [hch: small cleanups to better fit in with other iomap work] Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2018-06-19IB/rdmavt, IB/hfi1: Create device dependent s_flagsMike Marciniszyn
Move some s_flags defines out of rdmavt and into hfi1 because they are hfi1 specific and therefore should remain in the driver instead of bubbling up to rdmavt. Document device specific ranges in rdmavt and remap those in hfi1. Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Kaike Wan <kaike.wan@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-06-19clk: add duty cycle supportJerome Brunet
Add the possibility to apply and query the clock signal duty cycle ratio. This is useful when the duty cycle of the clock signal depends on some other parameters controlled by the clock framework. For example, the duty cycle of a divider may depends on the raw divider setting (ratio = N / div) , which is controlled by the CCF. In such case, going through the pwm framework to control the duty cycle ratio of this clock would be a burden. A clock provider is not required to implement the operation to set and get the duty cycle. If it does not implement .get_duty_cycle(), the ratio is assumed to be 50%. This change also adds a new flag, CLK_DUTY_CYCLE_PARENT. This flag should be used to indicate that a clock, such as gates and muxes, may inherit the duty cycle ratio of its parent clock. If a clock does not provide a get_duty_cycle() callback and has CLK_DUTY_CYCLE_PARENT, then the call will be directly forwarded to its parent clock, if any. For set_duty_cycle(), the clock should also have CLK_SET_RATE_PARENT for the call to be forwarded Signed-off-by: Jerome Brunet <jbrunet@baylibre.com> Signed-off-by: Michael Turquette <mturquette@baylibre.com> Link: lkml.kernel.org/r/20180619144141.8506-1-jbrunet@baylibre.com
2018-06-19IB/mlx5: Add DEVX query EQN supportYishai Hadas
Return the matching device EQN for a given user vector number via the DEVX interface. Note: EQs are owned by the kernel and shared by all user processes. Basically, a user CQ can point to any EQ. The kernel doesn't enforce any such limitation today either. Signed-off-by: Yishai Hadas <yishaih@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-06-19IB/mlx5: Add DEVX support for memory registrationYishai Hadas
Add support to register a memory with the firmware via the DEVX interface. The driver translates a given user address to ib_umem then it will register the physical addresses with the firmware and get a unique id for this registration to be used for this virtual address. Signed-off-by: Yishai Hadas <yishaih@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-06-19IB/mlx5: Add support for DEVX query UARYishai Hadas
Return a device UAR index for a given user index via the DEVX interface. Security note: The hardware protection mechanism works like this: Each device object that is subject to UAR doorbells (QP/SQ/CQ) gets a UAR ID (called uar_page in the device specification manual) upon its creation. Then upon doorbell, hardware fetches the object context for which the doorbell was rang, and validates that the UAR through which the DB was rang matches the UAR ID of the object. If no match the doorbell is silently ignored by the hardware. Of course, the user cannot ring a doorbell on a UAR that was not mapped to it. Now in devx, as the devx kernel does not manipulate the QP/SQ/CQ command mailboxes (except tagging them with UID), we expose to the user its UAR ID, so it can embed it in these objects in the expected specification format. So the only thing the user can do is hurt itself by creating a QP/SQ/CQ with a UAR ID other than his, and then in this case other users may ring a doorbell on its objects. The consequence of that will be that another user can schedule a QP/SQ of the buggy user for execution (just insert it to the hardware schedule queue or arm its CQ for event generation), no further harm is expected. Signed-off-by: Yishai Hadas <yishaih@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-06-19IB/mlx5: Add DEVX support for modify and query commandsYishai Hadas
Add support in DEVX for modify and query commands, the required lock is taken (i.e. READ/WRITE) by the KABI infrastructure accordingly. Signed-off-by: Yishai Hadas <yishaih@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>