linux.git - Linus' kernel tree

Age	Commit message (Collapse)	Author
2018-07-14	net/9p/client.c: put refcount of trans_mod in error case in parse_opts()	piaojun
	In my testing, the second mount will fail after umounting successfully. The reason is that we put refcount of trans_mod in the correct case rather than the error case in parse_opts() at last. That will cause the refcount decrease to -1, and when we try to get trans_mod again in try_module_get(), we could only increase refcount to 0 which will cause failure as follows: parse_opts v9fs_get_trans_by_name try_module_get : return NULL to caller which cause error So we should put refcount of trans_mod in error case. Link: http://lkml.kernel.org/r/5B3F39A0.2030509@huawei.com Fixes: 9421c3e64137ec ("net/9p/client.c: fix potential refcnt problem of trans module") Signed-off-by: Jun Piao <piaojun@huawei.com> Reviewed-by: Yiwen Jiang <jiangyiwen@huawei.com> Reviewed-by: Greg Kurz <groug@kaod.org> Reviewed-by: Dominique Martinet <dominique.martinet@cea.fr> Tested-by: Dominique Martinet <dominique.martinet@cea.fr> Cc: Eric Van Hensbergen <ericvh@gmail.com> Cc: Ron Minnich <rminnich@sandia.gov> Cc: Latchesar Ionkov <lucho@ionkov.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-07-14	mm: allow arch to supply p??_free_tlb functions	Nicholas Piggin
	The mmu_gather APIs keep track of the invalidated address range including the span covered by invalidated page table pages. Ranges covered by page tables but not ptes (and therefore no TLBs) still need to be invalidated because some architectures (x86) can cache intermediate page table entries, and invalidate those with normal TLB invalidation instructions to be almost-backward-compatible. Architectures which don't cache intermediate page table entries, or which invalidate these caches separately from TLB invalidation, do not require TLB invalidation range expanded over page tables. Allow architectures to supply their own p??_free_tlb functions, which can avoid the __tlb_adjust_range. Link: http://lkml.kernel.org/r/20180703013131.2807-1-npiggin@gmail.com Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Cc: "Aneesh Kumar K. V" <aneesh.kumar@linux.vnet.ibm.com> Cc: Minchan Kim <minchan@kernel.org> Cc: Mel Gorman <mgorman@techsingularity.net> Cc: Nadav Amit <nadav.amit@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-07-14	autofs: fix slab out of bounds read in getname_kernel()	Tomas Bortoli
	The autofs subsystem does not check that the "path" parameter is present for all cases where it is required when it is passed in via the "param" struct. In particular it isn't checked for the AUTOFS_DEV_IOCTL_OPENMOUNT_CMD ioctl command. To solve it, modify validate_dev_ioctl(function to check that a path has been provided for ioctl commands that require it. Link: http://lkml.kernel.org/r/153060031527.26631.18306637892746301555.stgit@pluto.themaw.net Signed-off-by: Tomas Bortoli <tomasbortoli@gmail.com> Signed-off-by: Ian Kent <raven@themaw.net> Reported-by: syzbot+60c837b428dc84e83a93@syzkaller.appspotmail.com Cc: Dmitry Vyukov <dvyukov@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-07-14	fs/proc/task_mmu.c: fix Locked field in /proc/pid/smaps*	Vlastimil Babka
	Thomas reports: "While looking around in /proc on my v4.14.52 system I noticed that all processes got a lot of "Locked" memory in /proc/*/smaps. A lot more memory than a regular user can usually lock with mlock(). Commit 493b0e9d945f (in v4.14-rc1) seems to have changed the behavior of "Locked". Before that commit the code was like this. Notice the VM_LOCKED check. (vma->vm_flags & VM_LOCKED) ? (unsigned long)(mss.pss >> (10 + PSS_SHIFT)) : 0); After that commit Locked is now the same as Pss: (unsigned long)(mss->pss >> (10 + PSS_SHIFT))); This looks like a mistake." Indeed, the commit has added mss->pss_locked with the correct value that depends on VM_LOCKED, but forgot to actually use it. Fix it. Link: http://lkml.kernel.org/r/ebf6c7fb-fec3-6a26-544f-710ed193c154@suse.cz Fixes: 493b0e9d945f ("mm: add /proc/pid/smaps_rollup") Signed-off-by: Vlastimil Babka <vbabka@suse.cz> Reported-by: Thomas Lindroth <thomas.lindroth@gmail.com> Cc: Alexey Dobriyan <adobriyan@gmail.com> Cc: Daniel Colascione <dancol@google.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-07-14	mm: do not drop unused pages when userfaultd is running	Christian Borntraeger
	KVM guests on s390 can notify the host of unused pages. This can result in pte_unused callbacks to be true for KVM guest memory. If a page is unused (checked with pte_unused) we might drop this page instead of paging it. This can have side-effects on userfaultd, when the page in question was already migrated: The next access of that page will trigger a fault and a user fault instead of faulting in a new and empty zero page. As QEMU does not expect a userfault on an already migrated page this migration will fail. The most straightforward solution is to ignore the pte_unused hint if a userfault context is active for this VMA. Link: http://lkml.kernel.org/r/20180703171854.63981-1-borntraeger@de.ibm.com Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Mike Rapoport <rppt@linux.vnet.ibm.com> Cc: Janosch Frank <frankja@linux.ibm.com> Cc: David Hildenbrand <david@redhat.com> Cc: Cornelia Huck <cohuck@redhat.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-07-14	mm: zero unavailable pages before memmap init	Pavel Tatashin
	We must zero struct pages for memory that is not backed by physical memory, or kernel does not have access to. Recently, there was a change which zeroed all memmap for all holes in e820. Unfortunately, it introduced a bug that is discussed here: https://www.spinics.net/lists/linux-mm/msg156764.html Linus, also saw this bug on his machine, and confirmed that reverting commit 124049decbb1 ("x86/e820: put !E820_TYPE_RAM regions into memblock.reserved") fixes the issue. The problem is that we incorrectly zero some struct pages after they were setup. The fix is to zero unavailable struct pages prior to initializing of struct pages. A more detailed fix should come later that would avoid double zeroing cases: one in __init_single_page(), the other one in zero_resv_unavail(). Fixes: 124049decbb1 ("x86/e820: put !E820_TYPE_RAM regions into memblock.reserved") Signed-off-by: Pavel Tatashin <pasha.tatashin@oracle.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-07-14	pinctrl: nsp: Fix potential NULL dereference	Wei Yongjun
	platform_get_resource() may fail and return NULL, so we should better check it's return value to avoid a NULL pointer dereference a bit later in the code. This is detected by Coccinelle semantic patch. @@ expression pdev, res, n, t, e, e1, e2; @@ res = platform_get_resource(pdev, t, n); + if (!res) + return -EINVAL; ... when != res == NULL e = devm_ioremap_nocache(e1, res->start, e2); Fixes: cc4fa83f66e9 ("pinctrl: nsp: add pinmux driver support for Broadcom NSP SoC") Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com> Reviewed-by: Ray Jui <ray.jui@broadcom.com> Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
2018-07-14	pinctrl: nsp: off by ones in nsp_pinmux_enable()	Dan Carpenter
	The > comparisons should be >= or else we read beyond the end of the pinctrl->functions[] array. Fixes: cc4fa83f66e9 ("pinctrl: nsp: add pinmux driver support for Broadcom NSP SoC") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Reviewed-by: Ray Jui <ray.jui@broadcom.com> Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
2018-07-14	pinctrl: sh-pfc: r8a77970: remove SH_PFC_PIN_CFG_DRIVE_STRENGTH flag	Niklas Söderlund
	The datasheet does not document any registers to control drive strength, and no drive strength registers are for this reason described for this SoC. The flags indicating that drive strength can be controlled are however set for some pins in the driver. This leads to a NULL pointer dereference when the sh-pfc core tries to access the struct describing the drive strength registers, for example when reading the sysfs file pinconf-pins. Fix this by removing the SH_PFC_PIN_CFG_DRIVE_STRENGTH from all pins. Fixes: b92ac66a1819602b ("pinctrl: sh-pfc: Add R8A77970 PFC support") Signed-off-by: Niklas Söderlund <niklas.soderlund+renesas@ragnatech.se> Reviewed-by: Simon Horman <horms+renesas@verge.net.au> Reviewed-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com> Reviewed-by: Geert Uytterhoeven <geert+renesas@glider.be> Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
2018-07-14	pinctrl: ingenic: Fix inverted direction for < JZ4770	Paul Cercueil
	The .gpio_set_direction() callback was setting inverted direction for SoCs older than the JZ4770, this restores the correct behaviour. Signed-off-by: Paul Cercueil <paul@crapouillou.net> Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
2018-07-14	pinctrl: mt7622: fix a kernel panic when gpio-hog is being applied	Sean Wang
	When we are explicitly using GPIO hogging mechanism in the pinctrl node, such as: &pio { line_input { gpio-hog; gpios = <95 0>, <96 0>, <97 0>; input; }; }; A kernel panic happens at dereferencing a NULL pointer: In this case, the drvdata is still not setup properly yet when it is being accessed. A better solution for fixing up this issue should be we should obtain the private data from struct gpio_chip using a specific gpiochip_get_data instead of a generic dev_get_drvdata. [ 0.249424] Unable to handle kernel NULL pointer dereference at virtual address 000000c8 [ 0.257818] Mem abort info: [ 0.260704] ESR = 0x96000005 [ 0.263869] Exception class = DABT (current EL), IL = 32 bits [ 0.270011] SET = 0, FnV = 0 [ 0.273167] EA = 0, S1PTW = 0 [ 0.276421] Data abort info: [ 0.279398] ISV = 0, ISS = 0x00000005 [ 0.283372] CM = 0, WnR = 0 [ 0.286440] [00000000000000c8] user address but active_mm is swapper [ 0.293027] Internal error: Oops: 96000005 [#1] PREEMPT SMP [ 0.298795] Modules linked in: [ 0.301958] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 4.16.0-rc1+ #389 [ 0.308716] Hardware name: MediaTek MT7622 RFB1 board (DT) [ 0.314396] pstate: 80000005 (Nzcv daif -PAN -UAO) [ 0.319362] pc : mtk_hw_pin_field_get+0x28/0x118 [ 0.324140] lr : mtk_hw_set_value+0x30/0x104 [ 0.328557] sp : ffffff800801b6d0 [ 0.331983] x29: ffffff800801b6d0 x28: ffffff80086b7970 [ 0.337484] x27: 0000000000000000 x26: ffffff80087b8000 [ 0.342986] x25: 0000000000000000 x24: ffffffc00324c230 [ 0.348487] x23: 0000000000000003 x22: 0000000000000000 [ 0.353988] x21: ffffff80087b8000 x20: 0000000000000000 [ 0.359489] x19: 0000000000000054 x18: 00000000fffff7c0 [ 0.364990] x17: 0000000000006300 x16: 000000000000003f [ 0.370492] x15: 000000000000000e x14: ffffffffffffffff [ 0.375993] x13: 0000000000000000 x12: 0000000000000020 [ 0.381494] x11: 0000000000000006 x10: 0101010101010101 [ 0.386995] x9 : fffffffffffffffa x8 : 0000000000000007 [ 0.392496] x7 : ffffff80085d63f8 x6 : 0000000000000003 [ 0.397997] x5 : 0000000000000054 x4 : ffffffc0031eb800 [ 0.403499] x3 : ffffff800801b728 x2 : 0000000000000003 [ 0.409000] x1 : 0000000000000054 x0 : 0000000000000000 [ 0.414502] Process swapper/0 (pid: 1, stack limit = 0x000000002a913c1c) [ 0.421441] Call trace: [ 0.423968] mtk_hw_pin_field_get+0x28/0x118 [ 0.428387] mtk_hw_set_value+0x30/0x104 [ 0.432445] mtk_gpio_set+0x20/0x28 [ 0.436052] mtk_gpio_direction_output+0x18/0x30 [ 0.440833] gpiod_direction_output_raw_commit+0x7c/0xa0 [ 0.446333] gpiod_direction_output+0x104/0x114 [ 0.451022] gpiod_configure_flags+0xbc/0xfc [ 0.455441] gpiod_hog+0x8c/0x140 [ 0.458869] of_gpiochip_add+0x27c/0x2d4 [ 0.462928] gpiochip_add_data_with_key+0x338/0x5f0 [ 0.467976] mtk_pinctrl_probe+0x388/0x400 [ 0.472217] platform_drv_probe+0x58/0xa4 [ 0.476365] driver_probe_device+0x204/0x44c [ 0.480783] __device_attach_driver+0xac/0x108 [ 0.485384] bus_for_each_drv+0x7c/0xac [ 0.489352] __device_attach+0xa0/0x144 [ 0.493320] device_initial_probe+0x10/0x18 [ 0.497647] bus_probe_device+0x2c/0x8c [ 0.501616] device_add+0x2f8/0x540 [ 0.505226] of_device_add+0x3c/0x44 [ 0.508925] of_platform_device_create_pdata+0x80/0xb8 [ 0.514245] of_platform_bus_create+0x290/0x3e8 [ 0.518933] of_platform_populate+0x78/0x100 [ 0.523352] of_platform_default_populate+0x24/0x2c [ 0.528403] of_platform_default_populate_init+0x94/0xa4 [ 0.533903] do_one_initcall+0x98/0x130 [ 0.537874] kernel_init_freeable+0x13c/0x1d4 [ 0.542385] kernel_init+0x10/0xf8 [ 0.545903] ret_from_fork+0x10/0x18 [ 0.549603] Code: 900020a1 f9400800 911dcc21 1400001f (f9406401) [ 0.555916] ---[ end trace de8c34787fdad3b3 ]--- [ 0.560722] Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b [ 0.560722] [ 0.570188] SMP: stopping secondary CPUs [ 0.574253] ---[ end Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b [ 0.574253] Cc: stable@vger.kernel.org Fixes: d6ed93551320 ("pinctrl: mediatek: add pinctrl driver for MT7622 SoC") Signed-off-by: Sean Wang <sean.wang@mediatek.com> Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
2018-07-14	pinctrl: mt7622: stop using the deprecated pinctrl_add_gpio_range	Sean Wang
	If the pinctrl node has the gpio-ranges property, the range will be added by the gpio core and doesn't need to be added by the pinctrl driver. But for keeping backward compatibility, an explicit pinctrl_add_gpio_range is still needed to be called when there is a missing gpio-ranges in pinctrl node in old dts files. Cc: stable@vger.kernel.org Fixes: d6ed93551320 ("pinctrl: mediatek: add pinctrl driver for MT7622 SoC") Signed-off-by: Sean Wang <sean.wang@mediatek.com> Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
2018-07-14	pinctrl: mt7622: fix that pinctrl_claim_hogs cannot work	Sean Wang
	To allow claiming hogs by pinctrl, we cannot enable pinctrl until all groups and functions are being added done. Also, it's necessary that the corresponding gpiochip is being added when the pinctrl device is enabled. Cc: stable@vger.kernel.org Fixes: d6ed93551320 ("pinctrl: mediatek: add pinctrl driver for MT7622 SoC") Signed-off-by: Sean Wang <sean.wang@mediatek.com> Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
2018-07-14	pinctrl: mt7622: fix initialization sequence between eint and gpiochip	Sean Wang
	Because gpichip applied in the driver must depend on mtk eint to implement the input data debouncing and the translation between gpio and irq, it's better to keep logic consistent with mtk eint being built prior to gpiochip being added. Cc: stable@vger.kernel.org Fixes: e6dabd38d8e7 ("pinctrl: mediatek: add EINT support to MT7622 SoC") Signed-off-by: Sean Wang <sean.wang@mediatek.com> Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
2018-07-14	pinctrl: mt7622: fix error path on failing at groups building	Sean Wang
	It should be to return an error code when failing at groups building. Cc: stable@vger.kernel.org Fixes: d6ed93551320 ("pinctrl: mediatek: add pinctrl driver for MT7622 SoC") Signed-off-by: Sean Wang <sean.wang@mediatek.com> Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
2018-07-13	Merge branch 'fix-DCTCP-delayed-ACK'	David S. Miller
	Yuchung Cheng says: ==================== fix DCTCP delayed ACK This patch series addresses the issue that sometimes DCTCP fail to acknowledge the latest sequence and result in sender timeout if inflight is small. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-13	tcp: remove DELAYED ACK events in DCTCP	Yuchung Cheng
	After fixing the way DCTCP tracking delayed ACKs, the delayed-ACK related callbacks are no longer needed Signed-off-by: Yuchung Cheng <ycheng@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Neal Cardwell <ncardwell@google.com> Acked-by: Lawrence Brakmo <brakmo@fb.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-13	tcp: fix dctcp delayed ACK schedule	Yuchung Cheng
	Previously, when a data segment was sent an ACK was piggybacked on the data segment without generating a CA_EVENT_NON_DELAYED_ACK event to notify congestion control modules. So the DCTCP ca->delayed_ack_reserved flag could incorrectly stay set when in fact there were no delayed ACKs being reserved. This could result in sending a special ECN notification ACK that carries an older ACK sequence, when in fact there was no need for such an ACK. DCTCP keeps track of the delayed ACK status with its own separate state ca->delayed_ack_reserved. Previously it may accidentally cancel the delayed ACK without updating this field upon sending a special ACK that carries a older ACK sequence. This inconsistency would lead to DCTCP receiver never acknowledging the latest data until the sender times out and retry in some cases. Packetdrill script (provided by Larry Brakmo) 0.000 socket(..., SOCK_STREAM, IPPROTO_TCP) = 3 0.000 setsockopt(3, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0 0.000 setsockopt(3, SOL_TCP, TCP_CONGESTION, "dctcp", 5) = 0 0.000 bind(3, ..., ...) = 0 0.000 listen(3, 1) = 0 0.100 < [ect0] SEW 0:0(0) win 32792 <mss 1000,sackOK,nop,nop,nop,wscale 7> 0.100 > SE. 0:0(0) ack 1 <mss 1460,nop,nop,sackOK,nop,wscale 8> 0.110 < [ect0] . 1:1(0) ack 1 win 257 0.200 accept(3, ..., ...) = 4 0.200 < [ect0] . 1:1001(1000) ack 1 win 257 0.200 > [ect01] . 1:1(0) ack 1001 0.200 write(4, ..., 1) = 1 0.200 > [ect01] P. 1:2(1) ack 1001 0.200 < [ect0] . 1001:2001(1000) ack 2 win 257 0.200 write(4, ..., 1) = 1 0.200 > [ect01] P. 2:3(1) ack 2001 0.200 < [ect0] . 2001:3001(1000) ack 3 win 257 0.200 < [ect0] . 3001:4001(1000) ack 3 win 257 0.200 > [ect01] . 3:3(0) ack 4001 0.210 < [ce] P. 4001:4501(500) ack 3 win 257 +0.001 read(4, ..., 4500) = 4500 +0 write(4, ..., 1) = 1 +0 > [ect01] PE. 3:4(1) ack 4501 +0.010 < [ect0] W. 4501:5501(1000) ack 4 win 257 // Previously the ACK sequence below would be 4501, causing a long RTO +0.040~+0.045 > [ect01] . 4:4(0) ack 5501 // delayed ack +0.311 < [ect0] . 5501:6501(1000) ack 4 win 257 // More data +0 > [ect01] . 4:4(0) ack 6501 // now acks everything +0.500 < F. 9501:9501(0) ack 4 win 257 Reported-by: Larry Brakmo <brakmo@fb.com> Signed-off-by: Yuchung Cheng <ycheng@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Neal Cardwell <ncardwell@google.com> Acked-by: Lawrence Brakmo <brakmo@fb.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-13	qlogic: check kstrtoul() for errors	Dan Carpenter
	We accidentally left out the error handling for kstrtoul(). Fixes: a520030e326a ("qlcnic: Implement flash sysfs callback for 83xx adapter") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-13	net: ethtool: fix spelling mistake: "tubale" -> "tunable"	Michael Heimpold
	Signed-off-by: Michael Heimpold <mhei@heimpold.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-13	Merge branch 'i2c/for-current' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux Pull i2c fixes from Wolfram Sang: - I2C core bugfix regarding bus recovery - driver bugfix for the tegra driver - typo correction * 'i2c/for-current' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux: i2c: recovery: if possible send STOP with recovery pulses i2c: tegra: Fix NACK error handling i2c: stu300: use non-archaic spelling of failes
2018-07-13	Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf	David S. Miller
	Daniel Borkmann says: ==================== pull-request: bpf 2018-07-13 The following pull-request contains BPF updates for your net tree. The main changes are: 1) Fix AF_XDP TX error reporting before final kernel release such that it becomes consistent between copy mode and zero-copy, from Magnus. 2) Fix three different syzkaller reported issues: oob due to ld_abs rewrite with too large offset, another oob in l3 based skb test run and a bug leaving mangled prog in subprog JITing error path, from Daniel. 3) Fix BTF handling for bitfield extraction on big endian, from Okash. 4) Fix a missing linux/errno.h include in cgroup/BPF found by kbuild bot, from Roman. 5) Fix xdp2skb_meta.sh sample by using just command names instead of absolute paths for tc and ip and allow them to be redefined, from Taeung. 6) Fix availability probing for BPF seg6 helpers before final kernel ships so they can be detected at prog load time, from Mathieu. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-13	skbuff: Unconditionally copy pfmemalloc in __skb_clone()	Stefano Brivio
	Commit 8b7008620b84 ("net: Don't copy pfmemalloc flag in __copy_skb_header()") introduced a different handling for the pfmemalloc flag in copy and clone paths. In __skb_clone(), now, the flag is set only if it was set in the original skb, but not cleared if it wasn't. This is wrong and might lead to socket buffers being flagged with pfmemalloc even if the skb data wasn't allocated from pfmemalloc reserves. Copy the flag instead of ORing it. Reported-by: Sabrina Dubroca <sd@queasysnail.net> Fixes: 8b7008620b84 ("net: Don't copy pfmemalloc flag in __copy_skb_header()") Signed-off-by: Stefano Brivio <sbrivio@redhat.com> Tested-by: Sabrina Dubroca <sd@queasysnail.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-13	Merge branch 'timers-urgent-for-linus' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull timer fixes from Ingo Molnar: "A clocksource driver fix and a revert" * 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: clocksource: arm_arch_timer: Set arch_mem_timer cpumask to cpu_possible_mask Revert "tick: Prefer a lower rating device only if it's CPU local device"
2018-07-13	Merge branch 'perf-urgent-for-linus' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull perf tool fixes from Ingo Molnar: "Misc tooling fixes: python3 related fixes, gcc8 fix, bashism fixes and some other smaller fixes" * 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: perf tools: Use python-config --includes rather than --cflags perf script python: Fix dict reference counting perf stat: Fix --interval_clear option perf tools: Fix compilation errors on gcc8 perf test shell: Prevent temporary editor files from being considered test scripts perf llvm-utils: Remove bashism from kernel include fetch script perf test shell: Make perf's inet_pton test more portable perf test shell: Replace '\|&' with '2>&1 \|' to work with more shells perf scripts python: Add Python 3 support to EventClass.py perf scripts python: Add Python 3 support to sched-migration.py perf scripts python: Add Python 3 support to Util.py perf scripts python: Add Python 3 support to SchedGui.py perf scripts python: Add Python 3 support to Core.py perf tools: Generate a Python script compatible with Python 2 and 3
2018-07-13	Merge branch 'efi-urgent-for-linus' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull EFI fix from Ingo Molnar: "Fix a UEFI mixed mode (64-bit kernel on 32-bit UEFI) reboot loop regression" * 'efi-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: efi/x86: Fix mixed mode reboot loop by removing pointless call to PciIo->Attributes()
2018-07-13	Merge branch 'core-urgent-for-linus' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull rseq fixes from Ingo Molnar: "Various rseq ABI fixes and cleanups: use get_user()/put_user(), validate parameters and use proper uapi types, etc" * 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: rseq/selftests: cleanup: Update comment above rseq_prepare_unload rseq: Remove unused types_32_64.h uapi header rseq: uapi: Declare rseq_cs field as union, update includes rseq: uapi: Update uapi comments rseq: Use get_user/put_user rather than __get_user/__put_user rseq: Use __u64 for rseq_cs fields, validate user inputs
2018-07-13	Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma	Linus Torvalds
	Pull rdma fixes from Jason Gunthorpe: "Things have been quite slow, only 6 RC patches have been sent to the list. Regression, user visible bugs, and crashing fixes: - cxgb4 could wrongly fail MR creation due to a typo - various crashes if the wrong QP type is mixed in with APIs that expect other types - syzkaller oops - using ERR_PTR and NULL together cases HFI1 to crash in some cases - mlx5 memory leak in error unwind" * tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: RDMA/mlx5: Fix memory leak in mlx5_ib_create_srq() error path RDMA/uverbs: Don't fail in creation of multiple flows IB/hfi1: Fix incorrect mixing of ERR_PTR and NULL return values RDMA/uverbs: Fix slab-out-of-bounds in ib_uverbs_ex_create_flow RDMA/uverbs: Protect from attempts to create flows on unsupported QP iw_cxgb4: correctly enforce the max reg_mr depth
2018-07-13	Merge tag 'vfio-v4.18-rc5' of git://github.com/awilliam/linux-vfio	Linus Torvalds
	Pull VFIO fix from Alex Williamson: "Fix deadlock in mbochs sample driver (Alexey Khoroshilov)" * tag 'vfio-v4.18-rc5' of git://github.com/awilliam/linux-vfio: sample: vfio-mdev: avoid deadlock in mdev_access()
2018-07-13	Merge tag 'kbuild-fixes-v4.18-2' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild Pull Kbuild fixes from Masahiro Yamada: - update Kbuild and Kconfig documents - sanitize -I compiler option handling - update extract-vmlinux script to recognize LZ4 and ZSTD - fix tools Makefiles - update tags.sh to handle __ro_after_init - suppress warnings in case getconf does not recognize LFS_* parameters * tag 'kbuild-fixes-v4.18-2' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild: kbuild: suppress warnings from 'getconf LFS_*' scripts/tags.sh: add __ro_after_init tools: build: Use HOSTLDFLAGS with fixdep tools: build: Fixup host c flags tools build: fix # escaping in .cmd files for future Make scripts: teach extract-vmlinux about LZ4 and ZSTD kbuild: remove duplicated comments about PHONY kbuild: .PHONY is not a variable, but PHONY is kbuild: do not drop -I without parameter kbuild: document the KBUILD_KCONFIG env. variable kconfig: update user kconfig tools doc. kbuild: delete INSTALL_FW_PATH from kbuild documentation kbuild: update ARCH alias info for sparc kbuild: update ARCH alias info for sh
2018-07-13	Merge tag 'arm64-fixes' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux Pull arm64 fixes from Will Deacon: "Catalin's out enjoying the sunshine, so I'm sending the fixes for a couple of weeks (although there hopefully won't be any more!). We've got a revert of a previous fix because it broke the build with some distro toolchains and a preemption fix when detemining whether or not the SIMD unit is in use. Summary: - Revert back to the 'linux' target for LD, as 'elf' breaks some distributions - Fix preemption race when testing whether the vector unit is in use or not" * tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: arm64: neon: Fix function may_use_simd() return error status Revert "arm64: Use aarch64elf and aarch64elfb emulation mode variants"
2018-07-13	Merge branch 'fixes' of git://git.armlinux.org.uk/~rmk/linux-arm	Linus Torvalds
	Pull ARM fixes from Russell King: "A couple of small fixes this time around from Steven for an interaction between ftrace and kernel read-only protection, and Vladimir for nommu" * 'fixes' of git://git.armlinux.org.uk/~rmk/linux-arm: ARM: 8780/1: ftrace: Only set kernel memory back to read-only after boot ARM: 8775/1: NOMMU: Use instr_sync instead of plain isb in common code
2018-07-13	Merge tag 'trace-v4.18-rc3-3' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace Pull tracing fixlet from Steven Rostedt: "Joel Fernandes asked to add a feature in tracing that Android had its own patch internally for. I took it back in 4.13. Now he realizes that he had a mistake, and swapped the values from what Android had. This means that the old Android tools will break when using a new kernel that has the new feature on it. The options are: 1. To swap it back to what Android wants. 2. Add a command line option or something to do the swap 3. Just let Android carry a patch that swaps it back Since it requires setting a tracing option to enable this anyway, I doubt there are other users of this than Android. Thus, I've decided to take option 1. If someone else is actually depending on the order that is in the kernel, then we will have to revert this change and go to option 2 or 3" * tag 'trace-v4.18-rc3-3' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: tracing: Reorder display of TGID to be after PID
2018-07-13	Merge tag 'sound-4.18-rc5' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound Pull sound fixes from Takashi Iwai: "Just a few HD-auio fixes: one fix for a possible mutex deadlock at HDMI hotplug handling is somewhat subtle and delicate, while the rest are usual device-specific quirks" * tag 'sound-4.18-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound: ALSA: hda/ca0132: Update a pci quirk device name ALSA: hda/ca0132: Add Recon3Di quirk for Gigabyte G1.Sniper Z97 ALSA: hda/realtek - two more lenovo models need fixup of MIC_LOCATION ALSA: hda - Handle pm failure during hotplug
2018-07-13	Merge tag 'libnvdimm-fixes-4.18-rc5' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm Pull libnvdimm fixes from Dave Jiang: - ensure that a variable passed in by reference to acpi_nfit_ctl is always set to a value. An incremental patch is provided due to notice from testing in -next. The rest of the commits did not exhibit issues. - fix a return path in nsio_rw_bytes() that was not returning "bytes remain" as expected for the function. - address an issue where applications polling on scrub-completion for the NVDIMM may falsely wakeup and read the wrong state value and cause hang. - change the test unit persistent capability attribute to fix up a broken assumption in the unit test infrastructure wrt the 'write_cache' attribute - ratelimit dev_info() in the dax device check_vma() function since this is easily triggered from userspace * tag 'libnvdimm-fixes-4.18-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm: nfit: fix unchecked dereference in acpi_nfit_ctl acpi, nfit: Fix scrub idle detection tools/testing/nvdimm: advertise a write cache for nfit_test acpi/nfit: fix cmd_rc for acpi_nfit_ctl to always return a value dev-dax: check_vma: ratelimit dev_info-s libnvdimm, pmem: Fix memcpy_mcsafe() return code handling in nsio_rw_bytes()
2018-07-13	drm/amdgpu/pp/smu7: use a local variable for toc indexing	Alex Deucher
	Rather than using the index variable stored in vram. If the device fails to come back online after a resume cycle, reads from vram will return all 1s which will cause a segfault. Based on a patch from Thomas Martitz <kugel@rockbox.org>. This avoids the segfault, but we still need to sort out why the GPU does not come back online after a resume. Bug: https://bugs.freedesktop.org/show_bug.cgi?id=105760 Acked-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org
2018-07-13	btrfs: fix use-after-free of cmp workspace pages	Naohiro Aota
	btrfs_cmp_data_free() puts cmp's src_pages and dst_pages, but leaves their page address intact. Now, if you hit "goto again" in btrfs_extent_same_range() and hit some error in btrfs_cmp_data_prepare(), you'll try to unlock/put already put pages. This is simple fix to reset the address to avoid use-after-free. Fixes: 67b07bd4bec5 ("Btrfs: reuse cmp workspace in EXTENT_SAME ioctl") Signed-off-by: Naohiro Aota <naota@elisp.net> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2018-07-13	Merge branch 'bpf-af-xdp-consistent-err-reporting'	Daniel Borkmann
	Magnus Karlsson says: ==================== This patch set adjusts the AF_XDP TX error reporting so that it becomes consistent between copy mode and zero-copy. First some background: Copy-mode for TX uses the SKB path in which the action of sending the packet is performed from process context using the sendmsg syscall. Completions are usually done asynchronously from NAPI mode by using a TX interrupt. In this mode, send errors can be returned back through the syscall. In zero-copy mode both the sending of the packet and the completions are done asynchronously from NAPI mode for performance reasons. In this mode, the sendmsg syscall only makes sure that the TX NAPI loop will be run that performs both the actions of sending and completing. In this mode it is therefore not possible to return errors through the sendmsg syscall as the sending is done from the NAPI loop. Note that it is possible to implement a synchronous send with our API, but in our benchmarks that made the TX performance drop by nearly half due to synchronization requirements and cache line bouncing. But for some netdevs this might be preferable so let us leave it up to the implementation to decide. The problem is that the current code base returns some errors in copy-mode that are not possible to return in zero-copy mode. This patch set aligns them so that the two modes always return the same error code. We achieve this by removing some of the errors returned by sendmsg in copy-mode (and in one case adding an error message for zero-copy mode) and offering alternative error detection methods that are consistent between the two modes. The structure of the patch set is as follows: Patch 1: removes the ENXIO return code from copy-mode when someone has forcefully changed the number of queues on the device so that the queue bound to the socket is no longer available. Just silently stop sending anything as in zero-copy mode. Patch 2: stop returning EAGAIN in copy mode when the completion queue is full as zero-copy does not do this. Instead this situation can be detected by comparing the head and tail pointers of the completion queue in both modes. In any case, EAGAIN was not the correct error code here since no amount of calling sendmsg will solve the problem. Only consuming one or more messages on the completion queue will fix this. Patch 3: Always return ENOBUFS from sendmsg if there is no TX queue configured. This was not the case for zero-copy mode. Patch 4: stop returning EMSGSIZE when the size of the packet is larger than the MTU. Just send it to the device so that it will drop it as in zero-copy mode. Note that copy-mode can still return EAGAIN in certain circumstances, but as these conditions cannot occur in zero-copy mode it is fine for copy-mode to return them. ==================== Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-07-13	xsk: do not return EMSGSIZE in copy mode for packets larger than MTU	Magnus Karlsson
	This patch stops returning EMSGSIZE from sendmsg in copy mode when the size of the packet is larger than the MTU. Just send it to the device so that it will drop it as in zero-copy mode. This makes the error reporting consistent between copy mode and zero-copy mode. Fixes: 35fcde7f8deb ("xsk: support for Tx") Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-07-13	xsk: always return ENOBUFS from sendmsg if there is no TX queue	Magnus Karlsson
	This patch makes sure ENOBUFS is always returned from sendmsg if there is no TX queue configured. This was not the case for zero-copy mode. With this patch this error reporting is consistent between copy mode and zero-copy mode. Fixes: ac98d8aab61b ("xsk: wire upp Tx zero-copy functions") Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-07-13	xsk: do not return EAGAIN from sendmsg when completion queue is full	Magnus Karlsson
	This patch stops returning EAGAIN in TX copy mode when the completion queue is full as zero-copy does not do this. Instead this situation can be detected by comparing the head and tail pointers of the completion queue in both modes. In any case, EAGAIN was not the correct error code here since no amount of calling sendmsg will solve the problem. Only consuming one or more messages on the completion queue will fix this. With this patch, the error reporting becomes consistent between copy mode and zero-copy mode. Fixes: 35fcde7f8deb ("xsk: support for Tx") Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-07-13	xsk: do not return ENXIO from TX copy mode	Magnus Karlsson
	This patch removes the ENXIO return code from TX copy-mode when someone has forcefully changed the number of queues on the device so that the queue bound to the socket is no longer available. Just silently stop sending anything as in zero-copy mode so the error reporting gets consistent between the two modes. Fixes: 35fcde7f8deb ("xsk: support for Tx") Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-07-13	btrfs: restore uuid_mutex in btrfs_open_devices	David Sterba
	Commit 542c5908abfe84f7b4c1 ("btrfs: replace uuid_mutex by device_list_mutex in btrfs_open_devices") switched to device_list_mutex as we need that for the device list traversal, but we also need uuid_mutex to protect access to fs_devices::opened to be consistent with other users of that. Fixes: 542c5908abfe84f7b4c1 ("btrfs: replace uuid_mutex by device_list_mutex in btrfs_open_devices") Reviewed-by: Anand Jain <anand.jain@oracle.com> Signed-off-by: David Sterba <dsterba@suse.com>
2018-07-13	crypto: af_alg - Initialize sg_num_bytes in error code path	Stephan Mueller
	The RX SGL in processing is already registered with the RX SGL tracking list to support proper cleanup. The cleanup code path uses the sg_num_bytes variable which must therefore be always initialized, even in the error code path. Signed-off-by: Stephan Mueller <smueller@chronox.de> Reported-by: syzbot+9c251bdd09f83b92ba95@syzkaller.appspotmail.com #syz test: https://github.com/google/kmsan.git master CC: <stable@vger.kernel.org> #4.14 Fixes: e870456d8e7c ("crypto: algif_skcipher - overhaul memory management") Fixes: d887c52d6ae4 ("crypto: algif_aead - overhaul memory management") Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2018-07-13	rtc: fix alarm read and set offset	Alexandre Belloni
	The offset needs to be added after reading the alarm value. It also needs to be subtracted after the now < alarm test. Tested-by: Jon Hunter <jonathanh@nvidia.com> Signed-off-by: Alexandre Belloni <alexandre.belloni@bootlin.com>
2018-07-13	xen: setup pv irq ops vector earlier	Juergen Gross
	Setting pv_irq_ops for Xen PV domains should be done as early as possible in order to support e.g. very early printk() usage. The same applies to xen_vcpu_info_reset(0), as it is needed for the pv irq ops. Move the call of xen_setup_machphys_mapping() after initializing the pv functions as it contains a WARN_ON(), too. Remove the no longer necessary conditional in xen_init_irq_ops() from PVH V1 times to make clear this is a PV only function. Cc: <stable@vger.kernel.org> # 4.14 Signed-off-by: Juergen Gross <jgross@suse.com> Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Signed-off-by: Juergen Gross <jgross@suse.com>
2018-07-12	selftests: in udpgso_bench do not test udp zerocopy	Willem de Bruijn
	The udpgso benchmark compares various configurations of UDP and TCP. Including one that is not upstream, udp zerocopy. This is a leftover from the earlier RFC patchset. The test is part of kselftests and run in continuous spinners. Remove the failing case to make the test start passing. Fixes: 3a687bef148d ("selftests: udp gso benchmark") Reported-by: Naresh Kamboju <naresh.kamboju@linaro.org> Signed-off-by: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-12	tracing: Reorder display of TGID to be after PID	Joel Fernandes (Google)
	Currently ftrace displays data in trace output like so: _-----=> irqs-off / _----=> need-resched \| / _---=> hardirq/softirq \|\| / _--=> preempt-depth \|\|\| / delay TASK-PID CPU TGID \|\|\|\| TIMESTAMP FUNCTION \| \| \| \| \|\|\|\| \| \| bash-1091 [000] ( 1091) d..2 28.313544: sched_switch: However Android's trace visualization tools expect a slightly different format due to an out-of-tree patch patch that was been carried for a decade, notice that the TGID and CPU fields are reversed: _-----=> irqs-off / _----=> need-resched \| / _---=> hardirq/softirq \|\| / _--=> preempt-depth \|\|\| / delay TASK-PID TGID CPU \|\|\|\| TIMESTAMP FUNCTION \| \| \| \| \|\|\|\| \| \| bash-1091 ( 1091) [002] d..2 64.965177: sched_switch: From kernel v4.13 onwards, during which TGID was introduced, tracing with systrace on all Android kernels will break (most Android kernels have been on 4.9 with Android patches, so this issues hasn't been seen yet). From v4.13 onwards things will break. The chrome browser's tracing tools also embed the systrace viewer which uses the legacy TGID format and updates to that are known to be difficult to make. Considering this, I suggest we make this change to the upstream kernel and backport it to all Android kernels. I believe this feature is merged recently enough into the upstream kernel that it shouldn't be a problem. Also logically, IMO it makes more sense to group the TGID with the TASK-PID and the CPU after these. Link: http://lkml.kernel.org/r/20180626000822.113931-1-joel@joelfernandes.org Cc: jreck@google.com Cc: tkjos@google.com Cc: stable@vger.kernel.org Fixes: 441dae8f2f29 ("tracing: Add support for display of tgid in trace output") Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2018-07-12	packet: reset network header if packet shorter than ll reserved space	Willem de Bruijn
	If variable length link layer headers result in a packet shorter than dev->hard_header_len, reset the network header offset. Else skb->mac_len may exceed skb->len after skb_mac_reset_len. packet_sendmsg_spkt already has similar logic. Fixes: b84bbaf7a6c8 ("packet: in packet_snd start writing at link layer allocation") Signed-off-by: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-12	nsh: set mac len based on inner packet	Willem de Bruijn
	When pulling the NSH header in nsh_gso_segment, set the mac length based on the encapsulated packet type. skb_reset_mac_len computes an offset to the network header, which here still points to the outer packet: > skb_reset_network_header(skb); > [...] > __skb_pull(skb, nsh_len); > skb_reset_mac_header(skb); // now mac hdr starts nsh_len == 8B after net hdr > skb_reset_mac_len(skb); // mac len = net hdr - mac hdr == (u16) -8 == 65528 > [..] > skb_mac_gso_segment(skb, ..) Link: http://lkml.kernel.org/r/CAF=yD-KeAcTSOn4AxirAxL8m7QAS8GBBe1w09eziYwvPbbUeYA@mail.gmail.com Reported-by: syzbot+7b9ed9872dab8c32305d@syzkaller.appspotmail.com Fixes: c411ed854584 ("nsh: add GSO support") Signed-off-by: Willem de Bruijn <willemb@google.com> Acked-by: Jiri Benc <jbenc@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>