summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2016-03-10x86/entry/traps: Clear TIF_BLOCKSTEP on all debug exceptionsAndy Lutomirski
The SDM says that debug exceptions clear BTF, and we need to keep TIF_BLOCKSTEP in sync with BTF. Clear it unconditionally and improve the comment. I suspect that the fact that kmemcheck could cause TIF_BLOCKSTEP not to be cleared was just an oversight. Signed-off-by: Andy Lutomirski <luto@kernel.org> Cc: Andrew Cooper <andrew.cooper3@citrix.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/fa86e55d196e6dde5b38839595bde2a292c52fdc.1457578375.git.luto@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-03-10x86/entry/32: Restore FLAGS on SYSEXITAndy Lutomirski
We weren't restoring FLAGS at all on SYSEXIT. Apparently no one cared. With this patch applied, native kernels should always honor task_pt_regs()->flags, which opens the door for some sys_iopl() cleanups. I'll do those as a separate series, though, since getting it right will involve tweaking some paravirt ops. ( The short version is that, before this patch, sys_iopl(), invoked via SYSENTER, wasn't guaranteed to ever transfer the updated regs->flags, so sys_iopl() had to change the hardware flags register as well. ) Reported-by: Brian Gerst <brgerst@gmail.com> Signed-off-by: Andy Lutomirski <luto@kernel.org> Cc: Andrew Cooper <andrew.cooper3@citrix.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/3f98b207472dc9784838eb5ca2b89dcc845ce269.1457578375.git.luto@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-03-10x86/entry/32: Filter NT and speed up AC filtering in SYSENTERAndy Lutomirski
This makes the 32-bit code work just like the 64-bit code. It should speed up syscalls on 32-bit kernels on Skylake by something like 20 cycles (by analogy to the 64-bit compat case). It also cleans up NT just like we do for the 64-bit case. Signed-off-by: Andy Lutomirski <luto@kernel.org> Cc: Andrew Cooper <andrew.cooper3@citrix.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/07daef3d44bd1ed62a2c866e143e8df64edb40ee.1457578375.git.luto@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-03-10x86/entry/compat: In SYSENTER, sink AC clearing below the existing FLAGS testAndy Lutomirski
CLAC is slow, and the SYSENTER code already has an unlikely path that runs if unusual flags are set. Drop the CLAC and instead rely on the unlikely path to clear AC. This seems to save ~24 cycles on my Skylake laptop. (Hey, Intel, make this faster please!) Signed-off-by: Andy Lutomirski <luto@kernel.org> Cc: Andrew Cooper <andrew.cooper3@citrix.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/90d6db2189f9add83bc7bddd75a0c19ebbd676b2.1457578375.git.luto@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-03-10selftests/x86: In syscall_nt, test NT|TF as wellAndy Lutomirski
Setting TF prevents fastpath returns in most cases, which causes the test to fail on 32-bit kernels because 32-bit kernels do not, in fact, handle NT correctly on SYSENTER entries. The next patch will fix 32-bit kernels. Signed-off-by: Andy Lutomirski <luto@kernel.org> Cc: Andrew Cooper <andrew.cooper3@citrix.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/bd4bb48af6b10c0dc84aec6dbcf487ed25683495.1457578375.git.luto@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-03-10s390/mm: four page table levels vs. forkMartin Schwidefsky
The fork of a process with four page table levels is broken since git commit 6252d702c5311ce9 "[S390] dynamic page tables." All new mm contexts are created with three page table levels and an asce limit of 4TB. If the parent has four levels dup_mmap will add vmas to the new context which are outside of the asce limit. The subsequent call to copy_page_range will walk the three level page table structure of the new process with non-zero pgd and pud indexes. This leads to memory clobbers as the pgd_index *and* the pud_index is added to the mm->pgd pointer without a pgd_deref in between. The init_new_context() function is selecting the number of page table levels for a new context. The function is used by mm_init() which in turn is called by dup_mm() and mm_alloc(). These two are used by fork() and exec(). The init_new_context() function can distinguish the two cases by looking at mm->context.asce_limit, for fork() the mm struct has been copied and the number of page table levels may not change. For exec() the mm_alloc() function set the new mm structure to zero, in this case a three-level page table is created as the temporary stack space is located at STACK_TOP_MAX = 4TB. This fixes CVE-2016-2143. Reported-by: Marcin Kościelnicki <koriakin@0x04.net> Reviewed-by: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: stable@vger.kernel.org Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2016-03-10spi: rockchip: covert rsd_nsecs to u32 typeShawn Lin
rsd_nsecs is defined as u8 memeber of struct rockchip_spi, but using of_property_read_u32. That means we take risk of truncation by type conversion if we pass on big value from dt. Signed-off-by: Shawn Lin <shawn.lin@rock-chips.com> Signed-off-by: Mark Brown <broonie@kernel.org>
2016-03-10spi: rockchip: header file cleanupShawn Lin
Remove some of unused header files and reoder it into alphabetical order. Signed-off-by: Shawn Lin <shawn.lin@rock-chips.com> Signed-off-by: Mark Brown <broonie@kernel.org>
2016-03-10MAINTAINERS: Add co-maintainer for remoteproc subsystemsBjorn Andersson
Add myself as co-maintainer for the remote processor related subsystems, as agreed with Ohad. Acked-by: Ohad Ben-Cohen <ohad@wizery.com> Signed-off-by: Bjorn Andersson <bjorn.andersson@linaro.org>
2016-03-10ext4: drop unneeded BUFFER_TRACE in ext4_delete_inline_entry()Geliang Tang
BUFFER_TRACE info "call ext4_handle_dirty_metadata" doesn't match the code, so drop it. Signed-off-by: Geliang Tang <geliangtang@163.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2016-03-09ext4: fix misspellings in comments.Adam Buchbinder
Signed-off-by: Adam Buchbinder <adam.buchbinder@gmail.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2016-03-09jbd2: fix FS corruption possibility in jbd2_journal_destroy() on umount pathOGAWA Hirofumi
On umount path, jbd2_journal_destroy() writes latest transaction ID (->j_tail_sequence) to be used at next mount. The bug is that ->j_tail_sequence is not holding latest transaction ID in some cases. So, at next mount, there is chance to conflict with remaining (not overwritten yet) transactions. mount (id=10) write transaction (id=11) write transaction (id=12) umount (id=10) <= the bug doesn't write latest ID mount (id=10) write transaction (id=11) crash mount [recovery process] transaction (id=11) transaction (id=12) <= valid transaction ID, but old commit must not replay Like above, this bug become the cause of recovery failure, or FS corruption. So why ->j_tail_sequence doesn't point latest ID? Because if checkpoint transactions was reclaimed by memory pressure (i.e. bdev_try_to_free_page()), then ->j_tail_sequence is not updated. (And another case is, __jbd2_journal_clean_checkpoint_list() is called with empty transaction.) So in above cases, ->j_tail_sequence is not pointing latest transaction ID at umount path. Plus, REQ_FLUSH for checkpoint is not done too. So, to fix this problem with minimum changes, this patch updates ->j_tail_sequence, and issue REQ_FLUSH. (With more complex changes, some optimizations would be possible to avoid unnecessary REQ_FLUSH for example though.) BTW, journal->j_tail_sequence = ++journal->j_transaction_sequence; Increment of ->j_transaction_sequence seems to be unnecessary, but ext3 does this. Signed-off-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Cc: stable@vger.kernel.org
2016-03-09bpf: avoid copying junk bytes in bpf_get_current_comm()Alexei Starovoitov
Lots of places in the kernel use memcpy(buf, comm, TASK_COMM_LEN); but the result is typically passed to print("%s", buf) and extra bytes after zero don't cause any harm. In bpf the result of bpf_get_current_comm() is used as the part of map key and was causing spurious hash map mismatches. Use strlcpy() to guarantee zero-terminated string. bpf verifier checks that output buffer is zero-initialized, so even for short task names the output buffer don't have junk bytes. Note it's not a security concern, since kprobe+bpf is root only. Fixes: ffeedafbf023 ("bpf: introduce current->pid, tgid, uid, gid, comm accessors") Reported-by: Tobias Waldekranz <tobias@waldekranz.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-03-09bpf: bpf_stackmap_copy depends on CONFIG_PERF_EVENTSAlexei Starovoitov
0-day bot reported build error: kernel/built-in.o: In function `map_lookup_elem': >> kernel/bpf/.tmp_syscall.o:(.text+0x329b3c): undefined reference to `bpf_stackmap_copy' when CONFIG_BPF_SYSCALL is set and CONFIG_PERF_EVENTS is not. Add weak definition to resolve it. This code path in map_lookup_elem() is never taken when CONFIG_PERF_EVENTS is not set. Fixes: 557c0c6e7df8 ("bpf: convert stackmap to pre-allocation") Reported-by: Fengguang Wu <fengguang.wu@intel.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-03-09Merge tag 'spi-fix-v4.5-rc7' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi Pull spi fixes from Mark Brown: "A few driver specific fixes for the Rockchip and i.MX SPI controllers, especially for the i.MX they're annoying bugs if you run into them" * tag 'spi-fix-v4.5-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi: spi: imx: fix spi resource leak with dma transfer spi: imx: allow only WML aligned transfers to use DMA spi: rockchip: add missing spi_master_put spi: rockchip: disable runtime pm when in err case
2016-03-09ext4: more efficient SEEK_DATA implementationJan Kara
Using SEEK_DATA in a huge sparse file can easily lead to sotflockups as ext4_seek_data() iterates hole block-by-block. Fix the problem by using returned hole size from ext4_map_blocks() and thus skip the hole in one go. Update also SEEK_HOLE implementation to follow the same pattern as SEEK_DATA to make future maintenance easier. Furthermore we add cond_resched() to both ext4_seek_data() and ext4_seek_hole() to avoid softlockups in case evil user creates huge fragmented file and we have to go through lots of extents. Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2016-03-09ext4: cleanup handling of bh->b_state in DAX mmapJan Kara
ext4_dax_mmap_get_block() updates bh->b_state directly instead of using ext4_update_bh_state(). This is mostly a cosmetic issue since DAX code always passes on-stack buffer_head but clean this up to make code more uniform. Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2016-03-09ext4: return hole from ext4_map_blocks()Jan Kara
Currently, ext4_map_blocks() just returns 0 when it finds a hole and allocation is not requested. However we have all the information available to tell how large the hole actually is and there are callers of ext4_map_blocks() which would save some block-by-block hole iteration if they knew this information. So fill in struct ext4_map_blocks even for holes with the information we have. We keep returning 0 for holes to maintain backward compatibility of the function. Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2016-03-09ext4: factor out determining of hole sizeJan Kara
ext4_ext_put_gap_in_cache() determines hole size in the extent tree, then trims this with possible delayed allocated blocks, and inserts the result into the extent status tree. Factor out determination of the size of the hole in the extent tree as we will need this information in ext4_ext_map_blocks() as well. Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2016-03-10Merge remote-tracking branch 'spi/fix/rockchip' into spi-linusMark Brown
2016-03-10Merge remote-tracking branch 'spi/fix/imx' into spi-linusMark Brown
2016-03-10ASoC: hdac_hdmi: Fix infoframe programmingSubhransu S. Prusty
Audio infoframe used incorrect buffer, so fix it. Signed-off-by: Subhransu S. Prusty <subhransu.s.prusty@intel.com> Signed-off-by: Vinod Koul <vinod.koul@intel.com> Signed-off-by: Mark Brown <broonie@kernel.org>
2016-03-09Merge tag 'ext4_for_linus_stable' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4 Pull ext4 fix from Ted Ts'o: "This fixes a regression which crept in v4.5-rc5" * tag 'ext4_for_linus_stable' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: ext4: iterate over buffer heads correctly in move_extent_per_page()
2016-03-10spi: xilinx: Add devicetree binding for spi-xilinxShubhrajyoti Datta
Add a binding document for the spi/spi-xilinx Signed-off-by: Shubhrajyoti Datta <shubhraj@xilinx.com> Signed-off-by: Mark Brown <broonie@kernel.org>
2016-03-09ext4: fix setting of referenced bit in ext4_es_lookup_extent()Jan Kara
We were setting referenced bit on the extent structure we return from ext4_es_lookup_extent() which is just a private structure on stack. Thus setting had no effect. Set the bit in the structure in the status tree instead. Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2016-03-09Merge branch 'variable-length-ll-headers'David S. Miller
Willem de Bruijn says: ==================== net: validate variable length ll headers Allow device-specific validation of link layer headers. Existing checks drop all packets shorter than hard_header_len. For variable length protocols, such packets can be valid. patch 1 adds header_ops.validate and dev_validate_header patch 2 implements the protocol specific callback for AX25 patch 3 replaces ll_header_truncated with dev_validate_header ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2016-03-09packet: validate variable length ll headersWillem de Bruijn
Replace link layer header validation check ll_header_truncate with more generic dev_validate_header. Validation based on hard_header_len incorrectly drops valid packets in variable length protocols, such as AX25. dev_validate_header calls header_ops.validate for such protocols to ensure correctness below hard_header_len. See also http://comments.gmane.org/gmane.linux.network/401064 Fixes 9c7077622dd9 ("packet: make packet_snd fail on len smaller than l2 header") Signed-off-by: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-03-09ax25: add link layer header validation functionWillem de Bruijn
As variable length protocol, AX25 fails link layer header validation tests based on a minimum length. header_ops.validate allows protocols to validate headers that are shorter than hard_header_len. Implement this callback for AX25. See also http://comments.gmane.org/gmane.linux.network/401064 Signed-off-by: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-03-09net: validate variable length ll headersWillem de Bruijn
Netdevice parameter hard_header_len is variously interpreted both as an upper and lower bound on link layer header length. The field is used as upper bound when reserving room at allocation, as lower bound when validating user input in PF_PACKET. Clarify the definition to be maximum header length. For validation of untrusted headers, add an optional validate member to header_ops. Allow bypassing of validation by passing CAP_SYS_RAWIO, for instance for deliberate testing of corrupt input. In this case, pad trailing bytes, as some device drivers expect completely initialized headers. See also http://comments.gmane.org/gmane.linux.network/401064 Signed-off-by: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-03-09Merge branch 'drm-fixes' of git://people.freedesktop.org/~airlied/linuxLinus Torvalds
Pull drm fixes from Dave Airlie: "A few imx fixes I missed from a couple of weeks ago, they still aren't that big and fix some regression and a fail to boot problem. Other than that, a couple of regression fixes for radeon/amdgpu, one regression fix for vmwgfx and one regression fix for tda998x" * 'drm-fixes' of git://people.freedesktop.org/~airlied/linux: Revert "drm/radeon/pm: adjust display configuration after powerstate" drm/amdgpu/dp: add back special handling for NUTMEG drm/radeon/dp: add back special handling for NUTMEG drm/i2c: tda998x: Choose between atomic or non atomic dpms helper drm/vmwgfx: Add back ->detect() and ->fill_modes() drm/radeon: Fix error handling in radeon_flip_work_func. drm/amdgpu: Fix error handling in amdgpu_flip_work_func. drm/imx: Add missing DRM_FORMAT_RGB565 to ipu_plane_formats drm/imx: notify DRM core about CRTC vblank state gpu: ipu-v3: Reset IPU before activating IRQ gpu: ipu-v3: Do not bail out on missing optional port nodes
2016-03-09Merge tag 'trace-fixes-v4.5-rc7' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace Pull tracing fix from Steven Rostedt: "I previously sent a fix that prevents all trace events from being called if the current cpu is offline. But I forgot that in 3.18, we added lockdep checks to test RCU usage even when the event is disabled. Although there cannot be any bug when a cpu is going offline, we now get false warnings triggered by the added checks of the event being disabled. I removed the check from the tracepoint code itself, and added it to the condition section (which is "1" for 'no condition'). This way the online cpu check will get checked in all the right locations" * tag 'trace-fixes-v4.5-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: tracing: Fix check for cpu online when event is disabled
2016-03-09ext4: iterate over buffer heads correctly in move_extent_per_page()Eryu Guan
In commit bcff24887d00 ("ext4: don't read blocks from disk after extents being swapped") bh is not updated correctly in the for loop and wrong data has been written to disk. generic/324 catches this on sub-page block size ext4. Fixes: bcff24887d00 ("ext4: don't read blocks from disk after extentsbeing swapped") Signed-off-by: Eryu Guan <guaneryu@gmail.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2016-03-10Merge tag 'v4.5-rc5' into develLinus Walleij
Linux 4.5-rc5
2016-03-09Merge branch 'akpm' (patches from Andrew)Linus Torvalds
Merge fixes from Andrew Morton: "13 fixes" * emailed patches from Andrew Morton <akpm@linux-foundation.org>: dma-mapping: avoid oops when parameter cpu_addr is null mm/hugetlb: use EOPNOTSUPP in hugetlb sysctl handlers memremap: check pfn validity before passing to pfn_to_page() mm, thp: fix migration of PTE-mapped transparent huge pages dax: check return value of dax_radix_entry() ocfs2: fix return value from ocfs2_page_mkwrite() arm64: kasan: clear stale stack poison sched/kasan: remove stale KASAN poison after hotplug kasan: add functions to clear stack poison mm: fix mixed zone detection in devm_memremap_pages list: kill list_force_poison() mm: __delete_from_page_cache show Bad page if mapped mm/hugetlb: hugetlb_no_page: rate-limit warning message
2016-03-09mpt3sas: Remove unnecessary synchronize_irq() before free_irq()Lars-Peter Clausen
Calling synchronize_irq() right before free_irq() is quite useless. On one hand the IRQ can easily fire again before free_irq() is entered, on the other hand free_irq() itself calls synchronize_irq() internally (in a race condition free way), before any state associated with the IRQ is freed. Patch was generated using the following semantic patch: // <smpl> @@ expression irq; @@ -synchronize_irq(irq); free_irq(irq, ...); // </smpl> Signed-off-by: Lars-Peter Clausen <lars@metafoo.de> Acked-by: Sreekanth Reddy <sreekanth.reddy@broadcom.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2016-03-09sg: fix dxferp in from_to caseDouglas Gilbert
One of the strange things that the original sg driver did was let the user provide both a data-out buffer (it followed the sg_header+cdb) _and_ specify a reply length greater than zero. What happened was that the user data-out buffer was copied into some kernel buffers and then the mid level was told a read type operation would take place with the data from the device overwriting the same kernel buffers. The user would then read those kernel buffers back into the user space. From what I can tell, the above action was broken by commit fad7f01e61bf ("sg: set dxferp to NULL for READ with the older SG interface") in 2008 and syzkaller found that out recently. Make sure that a user space pointer is passed through when data follows the sg_header structure and command. Fix the abnormal case when a non-zero reply_len is also given. Fixes: fad7f01e61bf737fe8a3740d803f000db57ecac6 Cc: <stable@vger.kernel.org> #v2.6.28+ Signed-off-by: Douglas Gilbert <dgilbert@interlog.com> Reviewed-by: Ewan Milne <emilne@redhat.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2016-03-09Merge branch 'for-4.6/pfn' into libnvdimm-for-nextDan Williams
2016-03-09dma-mapping: avoid oops when parameter cpu_addr is nullZhen Lei
To keep consistent with kfree, which tolerate ptr is NULL. We do this because sometimes we may use goto statement, so that success and failure case can share parts of the code. But unfortunately, dma_free_coherent called with parameter cpu_addr is null will cause oops, such as showed below: Unable to handle kernel paging request at virtual address ffffffc020d3b2b8 pgd = ffffffc083a61000 [ffffffc020d3b2b8] *pgd=0000000000000000, *pud=0000000000000000 CPU: 4 PID: 1489 Comm: malloc_dma_1 Tainted: G O 4.1.12 #1 Hardware name: ARM64 (DT) PC is at __dma_free_coherent.isra.10+0x74/0xc8 LR is at __dma_free+0x9c/0xb0 Process malloc_dma_1 (pid: 1489, stack limit = 0xffffffc0837fc020) [...] Call trace: __dma_free_coherent.isra.10+0x74/0xc8 __dma_free+0x9c/0xb0 malloc_dma+0x104/0x158 [dma_alloc_coherent_mtmalloc] kthread+0xec/0xfc Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-03-09mm/hugetlb: use EOPNOTSUPP in hugetlb sysctl handlersJan Stancek
Replace ENOTSUPP with EOPNOTSUPP. If hugepages are not supported, this value is propagated to userspace. EOPNOTSUPP is part of uapi and is widely supported by libc libraries. It gives nicer message to user, rather than: # cat /proc/sys/vm/nr_hugepages cat: /proc/sys/vm/nr_hugepages: Unknown error 524 And also LTP's proc01 test was failing because this ret code (524) was unexpected: proc01 1 TFAIL : proc01.c:396: read failed: /proc/sys/vm/nr_hugepages: errno=???(524): Unknown error 524 proc01 2 TFAIL : proc01.c:396: read failed: /proc/sys/vm/nr_hugepages_mempolicy: errno=???(524): Unknown error 524 proc01 3 TFAIL : proc01.c:396: read failed: /proc/sys/vm/nr_overcommit_hugepages: errno=???(524): Unknown error 524 Signed-off-by: Jan Stancek <jstancek@redhat.com> Acked-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Acked-by: David Rientjes <rientjes@google.com> Acked-by: Hillf Danton <hillf.zj@alibaba-inc.com> Cc: Mike Kravetz <mike.kravetz@oracle.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Paul Gortmaker <paul.gortmaker@windriver.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-03-09memremap: check pfn validity before passing to pfn_to_page()Ard Biesheuvel
In memremap's helper function try_ram_remap(), we dereference a struct page pointer that was derived from a PFN that is known to be covered by a 'System RAM' iomem region, and is thus assumed to be a 'valid' PFN, i.e., a PFN that has a struct page associated with it and is covered by the kernel direct mapping. However, the assumption that there is a 1:1 relation between the System RAM iomem region and the kernel direct mapping is not universally valid on all architectures, and on ARM and arm64, 'System RAM' may include regions for which pfn_valid() returns false. Generally speaking, both __va() and pfn_to_page() should only ever be called on PFNs/physical addresses for which pfn_valid() returns true, so add that check to try_ram_remap(). Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Cc: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-03-09mm, thp: fix migration of PTE-mapped transparent huge pagesKirill A. Shutemov
We don't have native support of THP migration, so we have to split huge page into small pages in order to migrate it to different node. This includes PTE-mapped huge pages. I made mistake in refcounting patchset: we don't actually split PTE-mapped huge page in queue_pages_pte_range(), if we step on head page. The result is that the head page is queued for migration, but none of tail pages: putting head page on queue takes pin on the page and any subsequent attempts of split_huge_pages() would fail and we skip queuing tail pages. unmap_and_move_huge_page() will eventually split the huge pages, but only one of 512 pages would get migrated. Let's fix the situation. Fixes: 248db92da13f2507 ("migrate_pages: try to split pages on queuing") Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-03-09dax: check return value of dax_radix_entry()Ross Zwisler
dax_pfn_mkwrite() previously wasn't checking the return value of the call to dax_radix_entry(), which was a mistake. Instead, capture this return value and return the appropriate VM_FAULT_ value. Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Matthew Wilcox <willy@linux.intel.com> Cc: Dave Chinner <david@fromorbit.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-03-09ocfs2: fix return value from ocfs2_page_mkwrite()Jan Kara
ocfs2_page_mkwrite() could mistakenly return error code instead of mkwrite status value. Fix it. Signed-off-by: Jan Kara <jack@suse.cz> Cc: Mark Fasheh <mfasheh@suse.de> Cc: Joel Becker <jlbec@evilplan.org> Cc: Junxiao Bi <junxiao.bi@oracle.com> Cc: Joseph Qi <joseph.qi@huawei.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-03-09arm64: kasan: clear stale stack poisonMark Rutland
Functions which the compiler has instrumented for KASAN place poison on the stack shadow upon entry and remove this poison prior to returning. In the case of cpuidle, CPUs exit the kernel a number of levels deep in C code. Any instrumented functions on this critical path will leave portions of the stack shadow poisoned. If CPUs lose context and return to the kernel via a cold path, we restore a prior context saved in __cpu_suspend_enter are forgotten, and we never remove the poison they placed in the stack shadow area by functions calls between this and the actual exit of the kernel. Thus, (depending on stackframe layout) subsequent calls to instrumented functions may hit this stale poison, resulting in (spurious) KASAN splats to the console. To avoid this, clear any stale poison from the idle thread for a CPU prior to bringing a CPU online. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Acked-by: Catalin Marinas <catalin.marinas@arm.com> Reviewed-by: Andrey Ryabinin <aryabinin@virtuozzo.com> Reviewed-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com> Cc: Alexander Potapenko <glider@google.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Will Deacon <will.deacon@arm.com> Cc: Ingo Molnar <mingo@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-03-09sched/kasan: remove stale KASAN poison after hotplugMark Rutland
Functions which the compiler has instrumented for KASAN place poison on the stack shadow upon entry and remove this poision prior to returning. In the case of CPU hotplug, CPUs exit the kernel a number of levels deep in C code. Any instrumented functions on this critical path will leave portions of the stack shadow poisoned. When a CPU is subsequently brought back into the kernel via a different path, depending on stackframe, layout calls to instrumented functions may hit this stale poison, resulting in (spurious) KASAN splats to the console. To avoid this, clear any stale poison from the idle thread for a CPU prior to bringing a CPU online. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Acked-by: Catalin Marinas <catalin.marinas@arm.com> Reviewed-by: Andrey Ryabinin <aryabinin@virtuozzo.com> Reviewed-by: Ingo Molnar <mingo@kernel.org> Cc: Alexander Potapenko <glider@google.com> Cc: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Will Deacon <will.deacon@arm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-03-09kasan: add functions to clear stack poisonMark Rutland
Functions which the compiler has instrumented for ASAN place poison on the stack shadow upon entry and remove this poison prior to returning. In some cases (e.g. hotplug and idle), CPUs may exit the kernel a number of levels deep in C code. If there are any instrumented functions on this critical path, these will leave portions of the idle thread stack shadow poisoned. If a CPU returns to the kernel via a different path (e.g. a cold entry), then depending on stack frame layout subsequent calls to instrumented functions may use regions of the stack with stale poison, resulting in (spurious) KASAN splats to the console. Contemporary GCCs always add stack shadow poisoning when ASAN is enabled, even when asked to not instrument a function [1], so we can't simply annotate functions on the critical path to avoid poisoning. Instead, this series explicitly removes any stale poison before it can be hit. In the common hotplug case we clear the entire stack shadow in common code, before a CPU is brought online. On architectures which perform a cold return as part of cpu idle may retain an architecture-specific amount of stack contents. To retain the poison for this retained context, the arch code must call the core KASAN code, passing a "watermark" stack pointer value beyond which shadow will be cleared. Architectures which don't perform a cold return as part of idle do not need any additional code. This patch (of 3): Functions which the compiler has instrumented for KASAN place poison on the stack shadow upon entry and remove this poision prior to returning. In some cases (e.g. hotplug and idle), CPUs may exit the kernel a number of levels deep in C code. If there are any instrumented functions on this critical path, these will leave portions of the stack shadow poisoned. If a CPU returns to the kernel via a different path (e.g. a cold entry), then depending on stack frame layout subsequent calls to instrumented functions may use regions of the stack with stale poison, resulting in (spurious) KASAN splats to the console. To avoid this, we must clear stale poison from the stack prior to instrumented functions being called. This patch adds functions to the KASAN core for removing poison from (portions of) a task's stack. These will be used by subsequent patches to avoid problems with hotplug and idle. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Acked-by: Catalin Marinas <catalin.marinas@arm.com> Reviewed-by: Andrey Ryabinin <aryabinin@virtuozzo.com> Cc: Alexander Potapenko <glider@google.com> Cc: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com> Cc: Will Deacon <will.deacon@arm.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-03-09mm: fix mixed zone detection in devm_memremap_pagesDan Williams
The check for whether we overlap "System RAM" needs to be done at section granularity. For example a system with the following mapping: 100000000-37bffffff : System RAM 37c000000-837ffffff : Persistent Memory ...is unable to use devm_memremap_pages() as it would result in two zones colliding within a given section. Signed-off-by: Dan Williams <dan.j.williams@intel.com> Cc: Ross Zwisler <ross.zwisler@linux.intel.com> Reviewed-by: Toshi Kani <toshi.kani@hpe.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-03-09list: kill list_force_poison()Dan Williams
Given we have uninitialized list_heads being passed to list_add() it will always be the case that those uninitialized values randomly trigger the poison value. Especially since a list_add() operation will seed the stack with the poison value for later stack allocations to trip over. For example, see these two false positive reports: list_add attempted on force-poisoned entry WARNING: at lib/list_debug.c:34 [..] NIP [c00000000043c390] __list_add+0xb0/0x150 LR [c00000000043c38c] __list_add+0xac/0x150 Call Trace: __list_add+0xac/0x150 (unreliable) __down+0x4c/0xf8 down+0x68/0x70 xfs_buf_lock+0x4c/0x150 [xfs] list_add attempted on force-poisoned entry(0000000000000500), new->next == d0000000059ecdb0, new->prev == 0000000000000500 WARNING: at lib/list_debug.c:33 [..] NIP [c00000000042db78] __list_add+0xa8/0x140 LR [c00000000042db74] __list_add+0xa4/0x140 Call Trace: __list_add+0xa4/0x140 (unreliable) rwsem_down_read_failed+0x6c/0x1a0 down_read+0x58/0x60 xfs_log_commit_cil+0x7c/0x600 [xfs] Fixes: commit 5c2c2587b132 ("mm, dax, pmem: introduce {get|put}_dev_pagemap() for dax-gup") Signed-off-by: Dan Williams <dan.j.williams@intel.com> Reported-by: Eryu Guan <eguan@redhat.com> Tested-by: Eryu Guan <eguan@redhat.com> Cc: Ross Zwisler <ross.zwisler@linux.intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-03-09mm: __delete_from_page_cache show Bad page if mappedHugh Dickins
Commit e1534ae95004 ("mm: differentiate page_mapped() from page_mapcount() for compound pages") changed the famous BUG_ON(page_mapped(page)) in __delete_from_page_cache() to VM_BUG_ON_PAGE(page_mapped(page)): which gives us more info when CONFIG_DEBUG_VM=y, but nothing at all when not. Although it has not usually been very helpul, being hit long after the error in question, we do need to know if it actually happens on users' systems; but reinstating a crash there is likely to be opposed :) In the non-debug case, pr_alert("BUG: Bad page cache") plus dump_page(), dump_stack(), add_taint() - I don't really believe LOCKDEP_NOW_UNRELIABLE, but that seems to be the standard procedure now. Move that, or the VM_BUG_ON_PAGE(), up before the deletion from tree: so that the unNULLified page->mapping gives a little more information. If the inode is being evicted (rather than truncated), it won't have any vmas left, so it's safe(ish) to assume that the raised mapcount is erroneous, and we can discount it from page_count to avoid leaking the page (I'm less worried by leaking the occasional 4kB, than losing a potential 2MB page with each 4kB page leaked). Signed-off-by: Hugh Dickins <hughd@google.com> Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Sasha Levin <sasha.levin@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-03-09mm/hugetlb: hugetlb_no_page: rate-limit warning messageGeoffrey Thomas
The warning message "killed due to inadequate hugepage pool" simply indicates that SIGBUS was sent, not that the process was forcibly killed. If the process has a signal handler installed does not fix the problem, this message can rapidly spam the kernel log. On my amd64 dev machine that does not have hugepages configured, I can reproduce the repeated warnings easily by setting vm.nr_hugepages=2 (i.e., 4 megabytes of huge pages) and running something that sets a signal handler and forks, like #include <sys/mman.h> #include <signal.h> #include <stdlib.h> #include <unistd.h> sig_atomic_t counter = 10; void handler(int signal) { if (counter-- == 0) exit(0); } int main(void) { int status; char *addr = mmap(NULL, 4 * 1048576, PROT_READ | PROT_WRITE, MAP_PRIVATE | MAP_ANONYMOUS | MAP_HUGETLB, -1, 0); if (addr == MAP_FAILED) {perror("mmap"); return 1;} *addr = 'x'; switch (fork()) { case -1: perror("fork"); return 1; case 0: signal(SIGBUS, handler); *addr = 'x'; break; default: *addr = 'x'; wait(&status); if (WIFSIGNALED(status)) { psignal(WTERMSIG(status), "child"); } break; } } Signed-off-by: Geoffrey Thomas <geofft@ldpreload.com> Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Cc: Hillf Danton <hillf.zj@alibaba-inc.com> Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>