git.armlinux.org.uk/linux.git - Linus' kernel tree

Age	Commit message (Collapse)	Author
2017-04-14	Merge branch 'perf/urgent' into perf/core, to pick up fixes	Ingo Molnar
	Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-04-14	futex: Fix small (and harmless looking) inconsistencies	Peter Zijlstra
	During (post-commit) review Darren spotted a few minor things. One (harmless AFAICT) type inconsistency and a comment that wasn't as clear as hoped. Reported-by: Darren Hart (VMWare) <dvhart@infradead.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Darren Hart (VMware) <dvhart@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-04-14	futex: Avoid freeing an active timer	Thomas Gleixner
	Alexander reported a hrtimer debug_object splat: ODEBUG: free active (active state 0) object type: hrtimer hint: hrtimer_wakeup (kernel/time/hrtimer.c:1423) debug_object_free (lib/debugobjects.c:603) destroy_hrtimer_on_stack (kernel/time/hrtimer.c:427) futex_lock_pi (kernel/futex.c:2740) do_futex (kernel/futex.c:3399) SyS_futex (kernel/futex.c:3447 kernel/futex.c:3415) do_syscall_64 (arch/x86/entry/common.c:284) entry_SYSCALL64_slow_path (arch/x86/entry/entry_64.S:249) Which was caused by commit: cfafcd117da0 ("futex: Rework futex_lock_pi() to use rt_mutex__proxy_lock()") ... losing the hrtimer_cancel() in the shuffle. Where previously the hrtimer_cancel() was done by rt_mutex_slowlock() we now need to do it manually. Reported-by: Alexander Levin <alexander.levin@verizon.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Fixes: cfafcd117da0 ("futex: Rework futex_lock_pi() to use rt_mutex__proxy_lock()") Link: http://lkml.kernel.org/r/alpine.DEB.2.20.1704101802370.2906@nanos Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-04-14	Merge branch 'linus' into locking/core, to pick up fixes	Ingo Molnar
	Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-04-14	sched/fair: Move the PELT constants into a generated header	Peter Zijlstra
	Now that we have a tool to generate the PELT constants in C form, use its output as a separate header. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-04-14	sched/fair: Increase PELT accuracy for small tasks	Peter Zijlstra
	We truncate (and loose) the lower 10 bits of runtime in ___update_load_avg(), this means there's a consistent bias to under-account tasks. This is esp. significant for small tasks. Cure this by only forwarding last_update_time to the point we've actually accounted for, leaving the remainder for the next time. Reported-by: Morten Rasmussen <morten.rasmussen@arm.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Morten Rasmussen <morten.rasmussen@arm.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-04-14	sched/fair: Fix comments	Peter Zijlstra
	Historically our periods (or p) argument in PELT denoted the number of full periods (what is now d2). However recent patches have changed this to the total decay (previously p+1), leading to a confusing discrepancy between comments and code. Try and clarify things by making periods (in code) and p (in comments) be the same thing (again). Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-04-14	sched/Documentation: Add 'sched-pelt' tool	Yuyang Du
	Add a user-space program to compute/generate the PELT constants. The kernel/sched/sched-pelt.h header will contain the output of this program. Signed-off-by: Yuyang Du <yuyang.du@intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: bsegall@google.com Cc: dietmar.eggemann@arm.com Cc: matt@codeblueprint.co.uk Cc: morten.rasmussen@arm.com Cc: pjt@google.com Cc: umgwanakikbuti@gmail.com Cc: vincent.guittot@linaro.org Link: http://lkml.kernel.org/r/1486935863-25251-2-git-send-email-yuyang.du@intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-04-14	sched/fair: Fix corner case in __accumulate_sum()	Peter Zijlstra
	Paul noticed that in the (periods >= LOAD_AVG_MAX_N) case in __accumulate_sum(), the returned contribution value (LOAD_AVG_MAX) is incorrect. This is because at this point, the decay_load() on the old state -- the first step in accumulate_sum() -- will not have resulted in 0, and will therefore result in a sum larger than the maximum value of our series. Obviously broken. Note that: decay_load(LOAD_AVG_MAX, LOAD_AVG_MAX_N) = 1 (345 / 32) 47742 * - ^ = ~27 2 Not to mention that any further contribution from the d3 segment (our new period) would also push it over the maximum. Solve this by noting that we can write our c2 term: p c2 = 1024 \Sum y^n n=1 In terms of our maximum value: inf inf p max = 1024 \Sum y^n = 1024 ( \Sum y^n + \Sum y^n + y^0 ) n=0 n=p+1 n=1 Further note that: inf inf inf ( \Sum y^n ) y^p = \Sum y^(n+p) = \Sum y^n n=0 n=0 n=p Combined that gives us: p c2 = 1024 \Sum y^n n=1 inf inf = 1024 ( \Sum y^n - \Sum y^n - y^0 ) n=0 n=p+1 = max - (max y^(p+1)) - 1024 Further simplify things by dealing with p=0 early on. Reported-by: Paul Turner <pjt@google.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Yuyang Du <yuyang.du@intel.com> Cc: linux-kernel@vger.kernel.org Fixes: a481db34b9be ("sched/fair: Optimize ___update_sched_avg()") Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-04-14	debug: Avoid setting BUGFLAG_WARNING twice	Peter Zijlstra
	Dan reported that his static checking complains about BUGFLAG_WARNING being set on both sides of the bitwise-or, it figures that that might've been an unintentional mistake. Since there are no architectures that implement __WARN_TAINT() (I converted them all to implement __WARN_FLAGS()), and all __WARN_FLAGS() implementations already set BUGFLAG_WARNING, we can remove the bit from BUGFLAG_TAINT() and make Dan's checker happy. Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/20170410084939.4bwhrvpmauwfzauq@hirez.programming.kicks-ass.net Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-04-14	x86/unwind: Silence entry-related warnings	Josh Poimboeuf
	A few people have reported unwinder warnings like the following: WARNING: kernel stack frame pointer at ffffc90000fe7ff0 in rsync:1157 has bad value (null) unwind stack type:0 next_sp: (null) mask:2 graph_idx:0 ffffc90000fe7f98: ffffc90000fe7ff0 (0xffffc90000fe7ff0) ffffc90000fe7fa0: ffffffffb7000f56 (trace_hardirqs_off_thunk+0x1a/0x1c) ffffc90000fe7fa8: 0000000000000246 (0x246) ffffc90000fe7fb0: 0000000000000000 ... ffffc90000fe7fc0: 00007ffe3af639bc (0x7ffe3af639bc) ffffc90000fe7fc8: 0000000000000006 (0x6) ffffc90000fe7fd0: 00007f80af433fc5 (0x7f80af433fc5) ffffc90000fe7fd8: 00007ffe3af638e0 (0x7ffe3af638e0) ffffc90000fe7fe0: 00007ffe3af638e0 (0x7ffe3af638e0) ffffc90000fe7fe8: 00007ffe3af63970 (0x7ffe3af63970) ffffc90000fe7ff0: 0000000000000000 ... ffffc90000fe7ff8: ffffffffb7b74b9a (entry_SYSCALL_64_after_swapgs+0x17/0x4f) This warning can happen when unwinding a code path where an interrupt occurred in x86 entry code before it set up the first stack frame. Silently ignore any warnings for this case. Reported-by: Daniel Borkmann <daniel@iogearbox.net> Reported-by: Dave Jones <davej@codemonkey.org.uk> Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com> Acked-by: Thomas Gleixner <tglx@linutronix.de> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Fixes: c32c47c68a0a ("x86/unwind: Warn on bad frame pointer") Link: http://lkml.kernel.org/r/dbd6838826466a60dc23a52098185bc973ce2f1e.1492020577.git.jpoimboe@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-04-14	x86/unwind: Read stack return address in update_stack_state()	Josh Poimboeuf
	Instead of reading the return address when unwind_get_return_address() is called, read it from update_stack_state() and store it in the unwind state. This enables the next patch to check the return address from unwind_next_frame() so it can detect an entry code frame. Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com> Acked-by: Thomas Gleixner <tglx@linutronix.de> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Daniel Borkmann <daniel@iogearbox.net> Cc: Dave Jones <davej@codemonkey.org.uk> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/af0c5e4560c49c0343dca486ea26c4fa92bc4e35.1492020577.git.jpoimboe@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-04-14	x86/unwind: Move common code into update_stack_state()	Josh Poimboeuf
	The __unwind_start() and unwind_next_frame() functions have some duplicated functionality. They both call decode_frame_pointer() and set state->regs and state->bp accordingly. Move that functionality to a common place in update_stack_state(). Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com> Acked-by: Thomas Gleixner <tglx@linutronix.de> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Daniel Borkmann <daniel@iogearbox.net> Cc: Dave Jones <davej@codemonkey.org.uk> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/a2ee4801113f6d2300d58f08f6b69f85edf4eb43.1492020577.git.jpoimboe@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-04-14	perf/x86: Avoid exposing wrong/stale data in intel_pmu_lbr_read_32()	Peter Zijlstra
	When the perf_branch_entry::{in_tx,abort,cycles} fields were added, intel_pmu_lbr_read_32() wasn't updated to initialize them. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Cc: <stable@vger.kernel.org> Fixes: 135c5612c460 ("perf/x86/intel: Support Haswell/v4 LBR format") Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-04-14	xfrm: Prepare the GRO codepath for hardware offloading.	Steffen Klassert
	On IPsec hardware offloading, we already get a secpath with valid state attached when the packet enters the GRO handlers. So check for hardware offload and skip the state lookup in this case. Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2017-04-14	xfrm: Add encapsulation header offsets while SKB is not encrypted	Ilan Tayari
	Both esp4 and esp6 used to assume that the SKB payload is encrypted and therefore the inner_network and inner_transport offsets are not relevant. When doing crypto offload in the NIC, this is no longer the case and the NIC driver needs these offsets so it can do TX TCP checksum offloading. This patch sets the inner_network and inner_transport members of the SKB, as well as encapsulation, to reflect the actual positions of these headers, and removes them only once encryption is done on the payload. Signed-off-by: Ilan Tayari <ilant@mellanox.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2017-04-14	net: Add a xfrm validate function to validate_xmit_skb	Steffen Klassert
	When we do IPsec offloading, we need a fallback for packets that were targeted to be IPsec offloaded but rerouted to a device that does not support IPsec offload. For that we add a function that checks the offloading features of the sending device and and flags the requirement of a fallback before it calls the IPsec output function. The IPsec output function adds the IPsec trailer and does encryption if needed. Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2017-04-14	esp: Use a synchronous crypto algorithm on offloading.	Steffen Klassert
	We need a fallback algorithm for crypto offloading to a NIC. This is because packets can be rerouted to other NICs that don't support crypto offloading. The fallback is going to be implemented at layer2 where we know the final output device but can't handle asynchronous returns fron the crypto layer. Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2017-04-14	xfrm: Add xfrm_replay_overflow functions for offloading	Steffen Klassert
	This patch adds functions that handles IPsec sequence numbers for GSO segments and TSO offloading. We need to calculate and update the sequence numbers based on the segments that GSO/TSO will generate. We need this to keep software and hardware sequence number counter in sync. Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2017-04-14	esp: Add gso handlers for esp4 and esp6	Steffen Klassert
	This patch extends the xfrm_type by an encap function pointer and implements esp4_gso_encap and esp6_gso_encap. These functions doing the basic esp encapsulation for a GSO packet. In case the GSO packet needs to be segmented in software, we add gso_segment functions. This codepath is going to be used on esp hardware offloads. Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2017-04-14	esp6: Reorganize esp_output	Steffen Klassert
	We need a fallback for ESP at layer 2, so split esp6_output into generic functions that can be used at layer 3 and layer 2 and use them in esp_output. We also add esp6_xmit which is used for the layer 2 fallback. Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2017-04-14	esp4: Reorganize esp_output	Steffen Klassert
	We need a fallback for ESP at layer 2, so split esp_output into generic functions that can be used at layer 3 and layer 2 and use them in esp_output. We also add esp_xmit which is used for the layer 2 fallback. Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2017-04-14	esp6: Remame esp_input_done2	Steffen Klassert
	We are going to export the ipv4 and the ipv6 version of esp_input_done2. They are not static anymore and can't have the same name. So rename the ipv6 version to esp6_input_done2. Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2017-04-14	xfrm: Add an IPsec hardware offloading API	Steffen Klassert
	This patch adds all the bits that are needed to do IPsec hardware offload for IPsec states and ESP packets. We add xfrmdev_ops to the net_device. xfrmdev_ops has function pointers that are needed to manage the xfrm states in the hardware and to do a per packet offloading decision. Joint work with: Ilan Tayari <ilant@mellanox.com> Guy Shapiro <guysh@mellanox.com> Yossi Kuperman <yossiku@mellanox.com> Signed-off-by: Guy Shapiro <guysh@mellanox.com> Signed-off-by: Ilan Tayari <ilant@mellanox.com> Signed-off-by: Yossi Kuperman <yossiku@mellanox.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2017-04-14	xfrm: Add mode handlers for IPsec on layer 2	Steffen Klassert
	This patch adds a gso_segment and xmit callback for the xfrm_mode and implement these functions for tunnel and transport mode. Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2017-04-14	xfrm: Move device notifications to a sepatate file	Steffen Klassert
	This is needed for the upcomming IPsec device offloading. Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2017-04-14	xfrm: Add a xfrm type offload.	Steffen Klassert
	We add a struct xfrm_type_offload so that we have the offloaded codepath separated to the non offloaded codepath. With this the non offloade and the offloaded codepath can coexist. Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2017-04-14	net: Add ESP offload features	Steffen Klassert
	This patch adds netdev features to configure IPsec offloads. Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2017-04-14	ALSA: firewire-lib: fix inappropriate assignment between signed/unsigned type	Takashi Sakamoto
	An abstraction of asynchronous transaction for transmission of MIDI messages was introduced in Linux v4.4. Each driver can utilize this abstraction to transfer MIDI messages via fixed-length payload of transaction to a certain unit address. Filling payload of the transaction is done by callback. In this callback, each driver can return negative error code, however current implementation assigns the return value to unsigned variable. This commit changes type of the variable to fix the bug. Reported-by: Julia Lawall <Julia.Lawall@lip6.fr> Cc: <stable@vger.kernel.org> # 4.4+ Fixes: 585d7cba5e1f ("ALSA: firewire-lib: add helper functions for asynchronous transactions to transfer MIDI messages") Signed-off-by: Takashi Sakamoto <o-takashi@sakamocchi.jp> Signed-off-by: Takashi Iwai <tiwai@suse.de>
2017-04-13	Merge branch 'akpm' (patches from Andrew)	Linus Torvalds
	Merge fixes from Andrew Morton: "11 fixes. The presence of 'thp: reduce indentation level in change_huge_pmd()' is unfortunate. But the patchset had been decently reviewed and tested before we decided it was needed in -stable and I felt it best not to churn things at the last minute" * emailed patches from Andrew Morton <akpm@linux-foundation.org>: mailmap: add Martin Kepplinger's email zsmalloc: expand class bit zram: do not use copy_page with non-page aligned address zram: fix operator precedence to get offset hugetlbfs: fix offset overflow in hugetlbfs mmap thp: fix MADV_DONTNEED vs clear soft dirty race thp: fix MADV_DONTNEED vs. MADV_FREE race mm: drop unused pmdp_huge_get_and_clear_notify() thp: fix MADV_DONTNEED vs. numa balancing race thp: reduce indentation level in change_huge_pmd() z3fold: fix page locking in z3fold_alloc()
2017-04-13	scsi: return correct blkprep status code in case scsi_init_io() fails.	Johannes Thumshirn
	When instrumenting the SCSI layer to run into the !blk_rq_nr_phys_segments(rq) case the following warning emitted from the block layer: blk_peek_request: bad return=-22 This happens because since commit fd3fc0b4d730 ("scsi: don't BUG_ON() empty DMA transfers") we return the wrong error value from scsi_prep_fn() back to the block layer. [mkp: silenced checkpatch] Signed-off-by: Johannes Thumshirn <jthumshirn@suse.de> Fixes: fd3fc0b4d730 scsi: don't BUG_ON() empty DMA transfers Cc: <stable@vger.kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Hannes Reinecke <hare@suse.com> Reviewed-by: Bart Van Assche <bart.vanassche@sandisk.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2017-04-13	mailmap: add Martin Kepplinger's email	Martin Kepplinger
	Set the partly deprecated companies' email addresses as alias for the personal one. Link: http://lkml.kernel.org/r/1491984622-17321-1-git-send-email-martin.kepplinger@ginzinger.com Signed-off-by: Martin Kepplinger <martin.kepplinger@ginzinger.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-04-13	zsmalloc: expand class bit	Minchan Kim
	Now 64K page system, zsamlloc has 257 classes so 8 class bit is not enough. With that, it corrupts the system when zsmalloc stores 65536byte data(ie, index number 256) so that this patch increases class bit for simple fix for stable backport. We should clean up this mess soon. index size 0 32 1 288 .. .. 204 52256 256 65536 Fixes: 3783689a1 ("zsmalloc: introduce zspage structure") Link: http://lkml.kernel.org/r/1492042622-12074-3-git-send-email-minchan@kernel.org Signed-off-by: Minchan Kim <minchan@kernel.org> Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-04-13	zram: do not use copy_page with non-page aligned address	Minchan Kim
	The copy_page is optimized memcpy for page-alinged address. If it is used with non-page aligned address, it can corrupt memory which means system corruption. With zram, it can happen with 1. 64K architecture 2. partial IO 3. slub debug Partial IO need to allocate a page and zram allocates it via kmalloc. With slub debug, kmalloc(PAGE_SIZE) doesn't return page-size aligned address. And finally, copy_page(mem, cmem) corrupts memory. So, this patch changes it to memcpy. Actuaully, we don't need to change zram_bvec_write part because zsmalloc returns page-aligned address in case of PAGE_SIZE class but it's not good to rely on the internal of zsmalloc. Note: When this patch is merged to stable, clear_page should be fixed, too. Unfortunately, recent zram removes it by "same page merge" feature so it's hard to backport this patch to -stable tree. I will handle it when I receive the mail from stable tree maintainer to merge this patch to backport. Fixes: 42e99bd ("zram: optimize memory operations with clear_page()/copy_page()") Link: http://lkml.kernel.org/r/1492042622-12074-2-git-send-email-minchan@kernel.org Signed-off-by: Minchan Kim <minchan@kernel.org> Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-04-13	zram: fix operator precedence to get offset	Minchan Kim
	In zram_rw_page, the logic to get offset is wrong by operator precedence (i.e., "<<" is higher than "&"). With wrong offset, zram can corrupt the user's data. This patch fixes it. Fixes: 8c7f01025 ("zram: implement rw_page operation of zram") Link: http://lkml.kernel.org/r/1492042622-12074-1-git-send-email-minchan@kernel.org Signed-off-by: Minchan Kim <minchan@kernel.org> Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-04-13	hugetlbfs: fix offset overflow in hugetlbfs mmap	Mike Kravetz
	If mmap() maps a file, it can be passed an offset into the file at which the mapping is to start. Offset could be a negative value when represented as a loff_t. The offset plus length will be used to update the file size (i_size) which is also a loff_t. Validate the value of offset and offset + length to make sure they do not overflow and appear as negative. Found by syzcaller with commit ff8c0c53c475 ("mm/hugetlb.c: don't call region_abort if region_chg fails") applied. Prior to this commit, the overflow would still occur but we would luckily return ENOMEM. To reproduce: mmap(0, 0x2000, 0, 0x40021, 0xffffffffffffffffULL, 0x8000000000000000ULL); Resulted in, kernel BUG at mm/hugetlb.c:742! Call Trace: hugetlbfs_evict_inode+0x80/0xa0 evict+0x24a/0x620 iput+0x48f/0x8c0 dentry_unlink_inode+0x31f/0x4d0 __dentry_kill+0x292/0x5e0 dput+0x730/0x830 __fput+0x438/0x720 ____fput+0x1a/0x20 task_work_run+0xfe/0x180 exit_to_usermode_loop+0x133/0x150 syscall_return_slowpath+0x184/0x1c0 entry_SYSCALL_64_fastpath+0xab/0xad Fixes: ff8c0c53c475 ("mm/hugetlb.c: don't call region_abort if region_chg fails") Link: http://lkml.kernel.org/r/1491951118-30678-1-git-send-email-mike.kravetz@oracle.com Reported-by: Vegard Nossum <vegard.nossum@oracle.com> Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com> Acked-by: Hillf Danton <hillf.zj@alibaba-inc.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Michal Hocko <mhocko@suse.com> Cc: "Kirill A . Shutemov" <kirill.shutemov@linux.intel.com> Cc: Andrey Ryabinin <aryabinin@virtuozzo.com> Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-04-13	thp: fix MADV_DONTNEED vs clear soft dirty race	Kirill A. Shutemov
	Yet another instance of the same race. Fix is identical to change_huge_pmd(). See "thp: fix MADV_DONTNEED vs. numa balancing race" for more details. Link: http://lkml.kernel.org/r/20170302151034.27829-5-kirill.shutemov@linux.intel.com Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Hillf Danton <hillf.zj@alibaba-inc.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-04-13	thp: fix MADV_DONTNEED vs. MADV_FREE race	Kirill A. Shutemov
	Both MADV_DONTNEED and MADV_FREE handled with down_read(mmap_sem). It's critical to not clear pmd intermittently while handling MADV_FREE to avoid race with MADV_DONTNEED: CPU0: CPU1: madvise_free_huge_pmd() pmdp_huge_get_and_clear_full() madvise_dontneed() zap_pmd_range() pmd_trans_huge(*pmd) == 0 (without ptl) // skip the pmd set_pmd_at(); // pmd is re-established It results in MADV_DONTNEED skipping the pmd, leaving it not cleared. It violates MADV_DONTNEED interface and can result is userspace misbehaviour. Basically it's the same race as with numa balancing in change_huge_pmd(), but a bit simpler to mitigate: we don't need to preserve dirty/young flags here due to MADV_FREE functionality. [kirill.shutemov@linux.intel.com: Urgh... Power is special again] Link: http://lkml.kernel.org/r/20170303102636.bhd2zhtpds4mt62a@black.fi.intel.com Link: http://lkml.kernel.org/r/20170302151034.27829-4-kirill.shutemov@linux.intel.com Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Acked-by: Minchan Kim <minchan@kernel.org> Cc: Minchan Kim <minchan@kernel.org> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Hillf Danton <hillf.zj@alibaba-inc.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-04-13	mm: drop unused pmdp_huge_get_and_clear_notify()	Kirill A. Shutemov
	Dave noticed that after fixing MADV_DONTNEED vs numa balancing race the last pmdp_huge_get_and_clear_notify() user is gone. Let's drop the helper. Link: http://lkml.kernel.org/r/20170306112047.24809-1-kirill.shutemov@linux.intel.com Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-04-13	thp: fix MADV_DONTNEED vs. numa balancing race	Kirill A. Shutemov
	In case prot_numa, we are under down_read(mmap_sem). It's critical to not clear pmd intermittently to avoid race with MADV_DONTNEED which is also under down_read(mmap_sem): CPU0: CPU1: change_huge_pmd(prot_numa=1) pmdp_huge_get_and_clear_notify() madvise_dontneed() zap_pmd_range() pmd_trans_huge(*pmd) == 0 (without ptl) // skip the pmd set_pmd_at(); // pmd is re-established The race makes MADV_DONTNEED miss the huge pmd and don't clear it which may break userspace. Found by code analysis, never saw triggered. Link: http://lkml.kernel.org/r/20170302151034.27829-3-kirill.shutemov@linux.intel.com Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Hillf Danton <hillf.zj@alibaba-inc.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-04-13	thp: reduce indentation level in change_huge_pmd()	Kirill A. Shutemov
	Patch series "thp: fix few MADV_DONTNEED races" For MADV_DONTNEED to work properly with huge pages, it's critical to not clear pmd intermittently unless you hold down_write(mmap_sem). Otherwise MADV_DONTNEED can miss the THP which can lead to userspace breakage. See example of such race in commit message of patch 2/4. All these races are found by code inspection. I haven't seen them triggered. I don't think it's worth to apply them to stable@. This patch (of 4): Restructure code in preparation for a fix. Link: http://lkml.kernel.org/r/20170302151034.27829-2-kirill.shutemov@linux.intel.com Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Acked-by: Vlastimil Babka <vbabka@suse.cz> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Hillf Danton <hillf.zj@alibaba-inc.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-04-13	z3fold: fix page locking in z3fold_alloc()	Vitaly Wool
	Stress testing of the current z3fold implementation on a 8-core system revealed it was possible that a z3fold page deleted from its unbuddied list in z3fold_alloc() would be put on another unbuddied list by z3fold_free() while z3fold_alloc() is still processing it. This has been introduced with commit 5a27aa822 ("z3fold: add kref refcounting") due to the removal of special handling of a z3fold page not on any list in z3fold_free(). To fix this, the z3fold page lock should be taken in z3fold_alloc() before the pool lock is released. To avoid deadlocking, we just try to lock the page as soon as we get a hold of it, and if trylock fails, we drop this page and take the next one. Signed-off-by: Vitaly Wool <vitalywool@gmail.com> Cc: Dan Streetman <ddstreet@ieee.org> Cc: <Oleksiy.Avramchenko@sony.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-04-13	ia64: restore symbol versions for symbols defined in assembly	Jan Beulich
	The ia64 build generates many warnings like this: WARNING: EXPORT symbol "empty_zero_page" [vmlinux] version generation failed, symbol will not be versioned. Besides adding the necessary header this also requires fiddling with some explicit .S -> .o rules. Cc: IA64-ML <linux-ia64@vger.kernel.org> Signed-off-by: Jan Beulich <jbeulich@suse.com> Signed-off-by: Tony Luck <tony.luck@intel.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-04-14	netfilter: nf_conntrack: remove double assignment	Aaron Conole
	The protonet pointer will unconditionally be rewritten, so just do the needed assignment first. Signed-off-by: Aaron Conole <aconole@bytheb.org> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2017-04-14	netfilter: nf_tables: remove double return statement	Aaron Conole
	Signed-off-by: Aaron Conole <aconole@bytheb.org> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2017-04-14	power: supply: bq24190_charger: Use new extcon_register_notifier_all()	Hans de Goede
	When I submitted the extcon handling I had a patch pending for the extcon sub-system for extcon_register_notifier to take -1 as cable id for listening for all type cable events on an extcon with a single notifier. In the end it was decided to instead add a new extcon_register_notifier_all function for this, switch to using this. Signed-off-by: Hans de Goede <hdegoede@redhat.com> Acked-by: Liam Breck <kernel@networkimprov.net> Signed-off-by: Sebastian Reichel <sre@kernel.org>
2017-04-14	Merge remote-tracking branch 'chanwoo-extcon/ib-extcon-4.12' into psy-next	Sebastian Reichel

2017-04-14	power: supply: bq24190_charger: Longer delay while polling reset flag	Liam Breck
	On chip reset, polling loop used udelay(10) which is too short to be useful. Instead, use usleep_range(100, 200). Signed-off-by: Liam Breck <kernel@networkimprov.net> Acked-by: Tony Lindgren <tony@atomide.com> Acked-by: Hans de Goede <hdegoede@redhat.com> Signed-off-by: Sebastian Reichel <sre@kernel.org>
2017-04-14	power: supply: bq24190_charger: Uniform pm_runtime_get() failure handling	Liam Breck
	On pm_runtime_get() failure, always emit an error message. Prevent unbalanced pm_runtime_get by calling: pm_runtime_put_noidle() in irq handler pm_runtime_put_sync() on any probe() failure Rename probe() out labels instead of renumbering them. Fixes: 13d6fa8447fa ("power: bq24190_charger: Use PM runtime autosuspend") Signed-off-by: Liam Breck <kernel@networkimprov.net> Acked-by: Tony Lindgren <tony@atomide.com> Reviewed-by: Hans de Goede <hdegoede@redhat.com> Acked-by: Hans de Goede <hdegoede@redhat.com> Signed-off-by: Sebastian Reichel <sre@kernel.org>
2017-04-14	power: supply: bq24190_charger: Clean up extcon code	Liam Breck
	Polishing and fixes for initial extcon patch. Fixes: 4db249b6f3b4 ("power: supply: bq24190_charger: Use extcon to determine ilimit, 5v boost") Cc: Tony Lindgren <tony@atomide.com> Signed-off-by: Liam Breck <kernel@networkimprov.net> Acked-by: Hans de Goede <hdegoede@redhat.com> Signed-off-by: Sebastian Reichel <sre@kernel.org>