linux.git - Linus' kernel tree

Age	Commit message (Collapse)	Author
2018-10-02	r8169: always autoneg on resume	Alex Xu (Hello71)
	This affects at least versions 25 and 33, so assume all cards are broken and just renegotiate by default. Fixes: 10bc6a6042c9 ("r8169: fix autoneg issue on resume with RTL8168E") Signed-off-by: Alex Xu (Hello71) <alex_y_xu@yahoo.ca> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-02	ipv4: fix use-after-free in ip_cmsg_recv_dstaddr()	Eric Dumazet
	Caching ip_hdr(skb) before a call to pskb_may_pull() is buggy, do not do it. Fixes: 2efd4fca703a ("ip: in cmsg IP(V6)_ORIGDSTADDR call pskb_may_pull") Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Willem de Bruijn <willemb@google.com> Reported-by: syzbot <syzkaller@googlegroups.com> Acked-by: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-02	Merge tag 'mlx5-fixes-2018-10-01' of ↵	David S. Miller
	git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux Saeed Mahameed says: ==================== Mellanox, mlx5 fixes 2018-10-01 This pull request includes some fixes to mlx5 driver, Please pull and let me know if there's any problem. For -stable v4.11: "6e0a4a23c59a ('net/mlx5: E-Switch, Fix out of bound access when setting vport rate')" For -stable v4.18: "98d6627c372a ('net/mlx5e: Set vlan masks for all offloaded TC rules')" ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-02	Merge branch 'rmnet-fixes'	David S. Miller
	Subash Abhinov Kasiviswanathan says: ==================== net: qualcomm: rmnet: Updates 2018-10-02 This series is a set of small fixes for rmnet driver Patch 1 is a fix for a scenario reported by syzkaller Patch 2 & 3 are fixes for incorrect allocation flags ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-02	net: qualcomm: rmnet: Fix incorrect allocation flag in receive path	Subash Abhinov Kasiviswanathan
	The incoming skb needs to be reallocated in case the headroom is not sufficient to adjust the ethernet header. This allocation needs to be atomic otherwise it results in this splat [<600601bb>] ___might_sleep+0x185/0x1a3 [<603f6314>] ? _raw_spin_unlock_irqrestore+0x0/0x27 [<60069bb0>] ? __wake_up_common_lock+0x95/0xd1 [<600602b0>] __might_sleep+0xd7/0xe2 [<60065598>] ? enqueue_task_fair+0x112/0x209 [<600eea13>] __kmalloc_track_caller+0x5d/0x124 [<600ee9b6>] ? __kmalloc_track_caller+0x0/0x124 [<602696d5>] __kmalloc_reserve.isra.34+0x30/0x7e [<603f629b>] ? _raw_spin_lock_irqsave+0x0/0x3d [<6026b744>] pskb_expand_head+0xbf/0x310 [<6025ca6a>] rmnet_rx_handler+0x7e/0x16b [<6025c9ec>] ? rmnet_rx_handler+0x0/0x16b [<6027ad0c>] __netif_receive_skb_core+0x301/0x96f [<60033c17>] ? set_signals+0x0/0x40 [<6027bbcb>] __netif_receive_skb+0x24/0x8e Fixes: 74692caf1b0b ("net: qualcomm: rmnet: Process packets over ethernet") Signed-off-by: Sean Tranchetti <stranche@codeaurora.org> Signed-off-by: Subash Abhinov Kasiviswanathan <subashab@codeaurora.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-02	net: qualcomm: rmnet: Fix incorrect allocation flag in transmit	Subash Abhinov Kasiviswanathan
	The incoming skb needs to be reallocated in case the headroom is not sufficient to add the MAP header. This allocation needs to be atomic otherwise it results in the following splat [32805.801456] BUG: sleeping function called from invalid context [32805.841141] Internal error: Oops - BUG: 0 [#1] PREEMPT SMP [32805.904773] task: ffffffd7c5f62280 task.stack: ffffff80464a8000 [32805.910851] pc : ___might_sleep+0x180/0x188 [32805.915143] lr : ___might_sleep+0x180/0x188 [32806.131520] Call trace: [32806.134041] ___might_sleep+0x180/0x188 [32806.137980] __might_sleep+0x50/0x84 [32806.141653] __kmalloc_track_caller+0x80/0x3bc [32806.146215] __kmalloc_reserve+0x3c/0x88 [32806.150241] pskb_expand_head+0x74/0x288 [32806.154269] rmnet_egress_handler+0xb0/0x1d8 [32806.162239] rmnet_vnd_start_xmit+0xc8/0x13c [32806.166627] dev_hard_start_xmit+0x148/0x280 [32806.181181] sch_direct_xmit+0xa4/0x198 [32806.185125] __qdisc_run+0x1f8/0x310 [32806.188803] net_tx_action+0x23c/0x26c [32806.192655] __do_softirq+0x220/0x408 [32806.196420] do_softirq+0x4c/0x70 Fixes: ceed73a2cf4a ("drivers: net: ethernet: qualcomm: rmnet: Initial implementation") Signed-off-by: Subash Abhinov Kasiviswanathan <subashab@codeaurora.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-02	net: qualcomm: rmnet: Skip processing loopback packets	Sean Tranchetti
	RMNET RX handler was processing invalid packets that were originally sent on the real device and were looped back via dev_loopback_xmit(). This was detected using syzkaller. Fixes: ceed73a2cf4a ("drivers: net: ethernet: qualcomm: rmnet: Initial implementation") Signed-off-by: Sean Tranchetti <stranche@codeaurora.org> Signed-off-by: Subash Abhinov Kasiviswanathan <subashab@codeaurora.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-02	net: systemport: Fix wake-up interrupt race during resume	Florian Fainelli
	The AON_PM_L2 is normally used to trigger and identify the source of a wake-up event. Since the RX_SYS clock is no longer turned off, we also have an interrupt being sent to the SYSTEMPORT INTRL_2_0 controller, and that interrupt remains active up until the magic packet detector is disabled which happens much later during the driver resumption. The race happens if we have a CPU that is entering the SYSTEMPORT INTRL2_0 handler during resume, and another CPU has managed to clear the wake-up interrupt during bcm_sysport_resume_from_wol(). In that case, we have the first CPU stuck in the interrupt handler with an interrupt cause that has been cleared under its feet, and so we keep returning IRQ_NONE and we never make any progress. This was not a problem before because we would always turn off the RX_SYS clock during WoL, so the SYSTEMPORT INTRL2_0 would also be turned off as well, thus not latching the interrupt. The fix is to make sure we do not enable either the MPD or BRCM_TAG_MATCH interrupts since those are redundant with what the AON_PM_L2 interrupt controller already processes and they would cause such a race to occur. Fixes: bb9051a2b230 ("net: systemport: Add support for WAKE_FILTER") Fixes: 83e82f4c706b ("net: systemport: add Wake-on-LAN support") Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-02	smb3: fix lease break problem introduced by compounding	Steve French
	Fixes problem (discovered by Aurelien) introduced by recent commit: commit b24df3e30cbf48255db866720fb71f14bf9d2f39 ("cifs: update receive_encrypted_standard to handle compounded responses") which broke the ability to respond to some lease breaks (lease breaks being ignored is a problem since can block server response for duration of the lease break timeout). Signed-off-by: Steve French <stfrench@microsoft.com> Reviewed-by: Ronnie Sahlberg <lsahlber@redhat.com> Reviewed-by: Pavel Shilovsky <pshilov@microsoft.com>
2018-10-02	cifs: only wake the thread for the very last PDU in a compound	Ronnie Sahlberg
	For compounded PDUs we whould only wake the waiting thread for the very last PDU of the compound. We do this so that we are guaranteed that the demultiplex_thread will not process or access any of those MIDs any more once the send/recv thread starts processing. Else there is a race where at the end of the send/recv processing we will try to delete all the mids of the compound. If the multiplex thread still has other mids to process at this point for this compound this can lead to an oops. Needed to fix recent commit: commit 730928c8f4be88e9d6a027a16b1e8fa9c59fc077 ("cifs: update smb2_queryfs() to use compounding") Signed-off-by: Ronnie Sahlberg <lsahlber@redhat.com> Signed-off-by: Steve French <stfrench@microsoft.com> Reviewed-by: Pavel Shilovsky <pshilov@microsoft.com>
2018-10-02	Merge tag 'wireless-drivers-for-davem-2018-10-01' of ↵	David S. Miller
	git://git.kernel.org/pub/scm/linux/kernel/git/kvalo/wireless-drivers Kalle Valo says: ==================== wireless-drivers fixes for 4.19 First, and also hopefully the last, set of fixes for 4.19. All small but still important fixes mt76x0 * fix a bug when a virtual interface is removed multiple times b43 * fix DMA error related regression with proprietary firmware iwlwifi * fix an oops which was a regression in v4.19-rc1 ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-02	cifs: add a warning if we try to to dequeue a deleted mid	Ronnie Sahlberg
	cifs_delete_mid() is called once we are finished handling a mid and we expect no more work done on this mid. Needed to fix recent commit: commit 730928c8f4be88e9d6a027a16b1e8fa9c59fc077 ("cifs: update smb2_queryfs() to use compounding") Add a warning if someone tries to dequeue a mid that has already been flagged to be deleted. Also change list_del() to list_del_init() so that if we have similar bugs resurface in the future we will not oops. Signed-off-by: Ronnie Sahlberg <lsahlber@redhat.com> Signed-off-by: Steve French <stfrench@microsoft.com> Reviewed-by: Pavel Shilovsky <pshilov@microsoft.com>
2018-10-02	smb2: fix missing files in root share directory listing	Aurelien Aptel
	When mounting a Windows share that is the root of a drive (eg. C$) the server does not return . and .. directory entries. This results in the smb2 code path erroneously skipping the 2 first entries. Pseudo-code of the readdir() code path: cifs_readdir(struct file, struct dir_context) initiate_cifs_search <-- if no reponse cached yet server->ops->query_dir_first dir_emit_dots dir_emit <-- adds "." and ".." if we're at pos=0 find_cifs_entry initiate_cifs_search <-- if pos < start of current response (restart search) server->ops->query_dir_next <-- if pos > end of current response (fetch next search res) for(...) <-- loops over cur response entries starting at pos cifs_filldir <-- skip . and .., emit entry cifs_fill_dirent dir_emit pos++ A) dir_emit_dots() always adds . & .. and sets the current dir pos to 2 (0 and 1 are done). Therefore we always want the index_to_find to be 2 regardless of if the response has . and .. B) smb1 code initializes index_of_last_entry with a +2 offset in cifssmb.c CIFSFindFirst(): psrch_inf->index_of_last_entry = 2 /* skip . and .. / + psrch_inf->entries_in_buffer; Later in find_cifs_entry() we want to find the next dir entry at pos=2 as a result of (A) first_entry_in_buffer = cfile->srch_inf.index_of_last_entry - cfile->srch_inf.entries_in_buffer; This var is the dir pos that the first entry in the buffer will have therefore it must be 2 in the first call. If we don't offset index_of_last_entry by 2 (like in (B)), first_entry_in_buffer=0 but we were instructed to get pos=2 so this code in find_cifs_entry() skips the 2 first which is ok for non-root shares, as it skips . and .. from the response but is not ok for root shares where the 2 first are actual files pos_in_buf = index_to_find - first_entry_in_buffer; // pos_in_buf=2 // we skip 2 first response entries :( for (i = 0; (i < (pos_in_buf)) && (cur_ent != NULL); i++) { / go entry by entry figuring out which is first / cur_ent = nxt_dir_entry(cur_ent, end_of_smb, cfile->srch_inf.info_level); } C) cifs_filldir() skips . and .. so we can safely ignore them for now. Sample program: int main(int argc, char argv) { const char path = argc >= 2 ? argv[1] : "."; DIR dh; struct dirent de; printf("listing path <%s>\n", path); dh = opendir(path); if (!dh) { printf("opendir error %d\n", errno); return 1; } while (1) { de = readdir(dh); if (!de) { if (errno) { printf("readdir error %d\n", errno); return 1; } printf("end of listing\n"); break; } printf("off=%lu <%s>\n", de->d_off, de->d_name); } return 0; } Before the fix with SMB1 on root shares: <.> off=1 <..> off=2 <$Recycle.Bin> off=3 <bootmgr> off=4 and on non-root shares: <.> off=1 <..> off=4 <-- after adding .., the offsets jumps to +2 because <2536> off=5 we skipped . and .. from response buffer (C) <411> off=6 but still incremented pos <file> off=7 <fsx> off=8 Therefore the fix for smb2 is to mimic smb1 behaviour and offset the index_of_last_entry by 2. Test results comparing smb1 and smb2 before/after the fix on root share, non-root shares and on large directories (ie. multi-response dir listing): PRE FIX ======= pre-1-root VS pre-2-root: ERR pre-2-root is missing [bootmgr, $Recycle.Bin] pre-1-nonroot VS pre-2-nonroot: OK~ same files, same order, different offsets pre-1-nonroot-large VS pre-2-nonroot-large: OK~ same files, same order, different offsets POST FIX ======== post-1-root VS post-2-root: OK same files, same order, same offsets post-1-nonroot VS post-2-nonroot: OK same files, same order, same offsets post-1-nonroot-large VS post-2-nonroot-large: OK same files, same order, same offsets REGRESSION? =========== pre-1-root VS post-1-root: OK same files, same order, same offsets pre-1-nonroot VS post-1-nonroot: OK same files, same order, same offsets BugLink: https://bugzilla.samba.org/show_bug.cgi?id=13107 Signed-off-by: Aurelien Aptel <aaptel@suse.com> Signed-off-by: Paulo Alcantara <palcantara@suse.deR> Reviewed-by: Ronnie Sahlberg <lsahlber@redhat.com> Signed-off-by: Steve French <stfrench@microsoft.com> CC: Stable <stable@vger.kernel.org>
2018-10-02	rtnl: limit IFLA_NUM_TX_QUEUES and IFLA_NUM_RX_QUEUES to 4096	Eric Dumazet
	We have an impressive number of syzkaller bugs that are linked to the fact that syzbot was able to create a networking device with millions of TX (or RX) queues. Let's limit the number of RX/TX queues to 4096, this really should cover all known cases. A separate patch will add various cond_resched() in the loops handling sysfs entries at device creation and dismantle. Tested: lpaa6:~# ip link add gre-4097 numtxqueues 4097 numrxqueues 4097 type ip6gretap RTNETLINK answers: Invalid argument lpaa6:~# time ip link add gre-4096 numtxqueues 4096 numrxqueues 4096 type ip6gretap real 0m0.180s user 0m0.000s sys 0m0.107s Fixes: 76ff5cc91935 ("rtnl: allow to specify number of rx and tx queues on device creation") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: syzbot <syzkaller@googlegroups.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-02	bonding: fix warning message	Mahesh Bandewar
	RX queue config for bonding master could be different from its slave device(s). With the commit 6a9e461f6fe4 ("bonding: pass link-local packets to bonding master also."), the packet is reinjected into stack with skb->dev as bonding master. This potentially triggers the message: "bondX received packet on queue Y, but number of RX queues is Z" whenever the queue that packet is received on is higher than the numrxqueues on bonding master (Y > Z). Fixes: 6a9e461f6fe4 ("bonding: pass link-local packets to bonding master also.") Reported-by: John Sperbeck <jsperbeck@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Mahesh Bandewar <maheshb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-02	inet: make sure to grab rcu_read_lock before using ireq->ireq_opt	Eric Dumazet
	Timer handlers do not imply rcu_read_lock(), so my recent fix triggered a LOCKDEP warning when SYNACK is retransmit. Lets add rcu_read_lock()/rcu_read_unlock() pairs around ireq->ireq_opt usages instead of guessing what is done by callers, since it is not worth the pain. Get rid of ireq_opt_deref() helper since it hides the logic without real benefit, since it is now a standard rcu_dereference(). Fixes: 1ad98e9d1bdf ("tcp/dccp: fix lockdep issue when SYN is backlogged") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-02	Revert "serial: sh-sci: Allow for compressed SCIF address"	Geert Uytterhoeven
	This reverts commit 2d4dd0da45401c7ae7332b4d1eb7bbb1348edde9. This broke earlycon on all Renesas ARM platforms using a SCIF port for the serial console (R-Car, RZ/A1, RZ/G1, RZ/G2 SoCs), due to an incorrect value of port->regshift. Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be> Acked-by: Chris Brandt <chris.brandt@renesas.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-10-02	Revert "serial: sh-sci: Remove SCIx_RZ_SCIFA_REGTYPE"	Geert Uytterhoeven
	This reverts commit 7acece71a517cad83a0842a94d94c13f271b680c. Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be> Acked-by: Chris Brandt <chris.brandt@renesas.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-10-02	Revert "serial: 8250_dw: Fix runtime PM handling"	Guenter Roeck
	This reverts commit d76c74387e1c978b6c5524a146ab0f3f72206f98. While commit d76c74387e1c ("serial: 8250_dw: Fix runtime PM handling") fixes runtime PM handling when using kgdb, it introduces a traceback for everyone else. BUG: sleeping function called from invalid context at /mnt/host/source/src/third_party/kernel/next/drivers/base/power/runtime.c:1034 in_atomic(): 1, irqs_disabled(): 1, pid: 1, name: swapper/0 7 locks held by swapper/0/1: #0: 000000005ec5bc72 (&dev->mutex){....}, at: __driver_attach+0xb5/0x12b #1: 000000005d5fa9e5 (&dev->mutex){....}, at: __device_attach+0x3e/0x15b #2: 0000000047e93286 (serial_mutex){+.+.}, at: serial8250_register_8250_port+0x51/0x8bb #3: 000000003b328f07 (port_mutex){+.+.}, at: uart_add_one_port+0xab/0x8b0 #4: 00000000fa313d4d (&port->mutex){+.+.}, at: uart_add_one_port+0xcc/0x8b0 #5: 00000000090983ca (console_lock){+.+.}, at: vprintk_emit+0xdb/0x217 #6: 00000000c743e583 (console_owner){-...}, at: console_unlock+0x211/0x60f irq event stamp: 735222 __down_trylock_console_sem+0x4a/0x84 console_unlock+0x338/0x60f __do_softirq+0x4a4/0x50d irq_exit+0x64/0xe2 CPU: 2 PID: 1 Comm: swapper/0 Not tainted 4.19.0-rc5 #6 Hardware name: Google Caroline/Caroline, BIOS Google_Caroline.7820.286.0 03/15/2017 Call Trace: dump_stack+0x7d/0xbd ___might_sleep+0x238/0x259 __pm_runtime_resume+0x4e/0xa4 ? serial8250_rpm_get+0x2e/0x44 serial8250_console_write+0x44/0x301 ? lock_acquire+0x1b8/0x1fa console_unlock+0x577/0x60f vprintk_emit+0x1f0/0x217 printk+0x52/0x6e register_console+0x43b/0x524 uart_add_one_port+0x672/0x8b0 ? set_io_from_upio+0x150/0x162 serial8250_register_8250_port+0x825/0x8bb dw8250_probe+0x80c/0x8b0 ? dw8250_serial_inq+0x8e/0x8e ? dw8250_check_lcr+0x108/0x108 ? dw8250_runtime_resume+0x5b/0x5b ? dw8250_serial_outq+0xa1/0xa1 ? dw8250_remove+0x115/0x115 platform_drv_probe+0x76/0xc5 really_probe+0x1f1/0x3ee ? driver_allows_async_probing+0x5d/0x5d driver_probe_device+0xd6/0x112 ? driver_allows_async_probing+0x5d/0x5d bus_for_each_drv+0xbe/0xe5 __device_attach+0xdd/0x15b bus_probe_device+0x5a/0x10b device_add+0x501/0x894 ? _raw_write_unlock+0x27/0x3a platform_device_add+0x224/0x2b7 mfd_add_device+0x718/0x75b ? __kmalloc+0x144/0x16a ? mfd_add_devices+0x38/0xdb mfd_add_devices+0x9b/0xdb intel_lpss_probe+0x7d4/0x8ee intel_lpss_pci_probe+0xac/0xd4 pci_device_probe+0x101/0x18e ... Revert the offending patch until a more comprehensive solution is available. Cc: Tony Lindgren <tony@atomide.com> Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Cc: Phil Edworthy <phil.edworthy@renesas.com> Fixes: d76c74387e1c ("serial: 8250_dw: Fix runtime PM handling") Signed-off-by: Guenter Roeck <linux@roeck-us.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-10-02	RISCV: Fix end PFN for low memory	Atish Patra
	Use memblock_end_of_DRAM which provides correct last low memory PFN. Without that, DMA32 region becomes empty resulting in zero pages being allocated for DMA32. This patch is based on earlier patch from palmer which never merged into 4.19. I just edited the commit text to make more sense. Signed-off-by: Atish Patra <atish.patra@wdc.com> Signed-off-by: Palmer Dabbelt <palmer@sifive.com>
2018-10-02	x86/tsc: Fix UV TSC initialization	Mike Travis
	The recent rework of the TSC calibration code introduced a regression on UV systems as it added a call to tsc_early_init() which initializes the TSC ADJUST values before acpi_boot_table_init(). In the case of UV systems, that is a necessary step that calls uv_system_init(). This informs tsc_sanitize_first_cpu() that the kernel runs on a platform with async TSC resets as documented in commit 341102c3ef29 ("x86/tsc: Add option that TSC on Socket 0 being non-zero is valid") Fix it by skipping the early tsc initialization on UV systems and let TSC init tests take place later in tsc_init(). Fixes: cf7a63ef4e02 ("x86/tsc: Calibrate tsc only once") Suggested-by: Hedi Berriche <hedi.berriche@hpe.com> Signed-off-by: Mike Travis <mike.travis@hpe.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Russ Anderson <rja@hpe.com> Reviewed-by: Dimitri Sivanich <sivanich@hpe.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Russ Anderson <russ.anderson@hpe.com> Cc: Dimitri Sivanich <dimitri.sivanich@hpe.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Kate Stewart <kstewart@linuxfoundation.org> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Philippe Ombredanne <pombredanne@nexb.com> Cc: Pavel Tatashin <pasha.tatashin@oracle.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Len Brown <len.brown@intel.com> Cc: Dou Liyang <douly.fnst@cn.fujitsu.com> Cc: Xiaoming Gao <gxm.linux.kernel@gmail.com> Cc: Rajvi Jingar <rajvi.jingar@intel.com> Link: https://lkml.kernel.org/r/20181002180144.923579706@stormcage.americas.sgi.com
2018-10-02	x86/platform/uv: Provide is_early_uv_system()	Mike Travis
	Introduce is_early_uv_system() which uses efi.uv_systab to decide early in the boot process whether the kernel runs on a UV system. This is needed to skip other early setup/init code that might break the UV platform if done too early such as before necessary ACPI tables parsing takes place. Suggested-by: Hedi Berriche <hedi.berriche@hpe.com> Signed-off-by: Mike Travis <mike.travis@hpe.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Russ Anderson <rja@hpe.com> Reviewed-by: Dimitri Sivanich <sivanich@hpe.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Russ Anderson <russ.anderson@hpe.com> Cc: Dimitri Sivanich <dimitri.sivanich@hpe.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Kate Stewart <kstewart@linuxfoundation.org> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Philippe Ombredanne <pombredanne@nexb.com> Cc: Pavel Tatashin <pasha.tatashin@oracle.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Len Brown <len.brown@intel.com> Cc: Dou Liyang <douly.fnst@cn.fujitsu.com> Cc: Xiaoming Gao <gxm.linux.kernel@gmail.com> Cc: Rajvi Jingar <rajvi.jingar@intel.com> Link: https://lkml.kernel.org/r/20181002180144.801700401@stormcage.americas.sgi.com
2018-10-02	nfp: avoid soft lockups under control message storm	Jakub Kicinski
	When FW floods the driver with control messages try to exit the cmsg processing loop every now and then to avoid soft lockups. Cmsg processing is generally very lightweight so 512 seems like a reasonable budget, which should not be exceeded under normal conditions. Fixes: 77ece8d5f196 ("nfp: add control vNIC datapath") Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Simon Horman <simon.horman@netronome.com> Tested-by: Pieter Jansen van Vuuren <pieter.jansenvanvuuren@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-02	declance: Fix continuation with the adapter identification message	Maciej W. Rozycki
	Fix a commit 4bcc595ccd80 ("printk: reinstate KERN_CONT for printing continuation lines") regression with the `declance' driver, which caused the adapter identification message to be split between two lines, e.g.: declance.c: v0.011 by Linux MIPS DECstation task force tc6: PMAD-AA , addr = 08:00:2b:1b:2a:6a, irq = 14 tc6: registered as eth0. Address that properly, by printing identification with a single call, making the messages now look like: declance.c: v0.011 by Linux MIPS DECstation task force tc6: PMAD-AA, addr = 08:00:2b:1b:2a:6a, irq = 14 tc6: registered as eth0. Signed-off-by: Maciej W. Rozycki <macro@linux-mips.org> Fixes: 4bcc595ccd80 ("printk: reinstate KERN_CONT for printing continuation lines") Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-02	net: fec: fix rare tx timeout	Rickard x Andersson
	During certain heavy network loads TX could time out with TX ring dump. TX is sometimes never restarted after reaching "tx_stop_threshold" because function "fec_enet_tx_queue" only tests the first queue. In addition the TX timeout callback function failed to recover because it also operated only on the first queue. Signed-off-by: Rickard x Andersson <rickaran@axis.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-02	thunderbolt: Initialize after IOMMUs	Mika Westerberg
	If IOMMU is enabled and Thunderbolt driver is built into the kernel image, it will be probed before IOMMUs are attached to the PCI bus. Because of this DMA mappings the driver does will not go through IOMMU and start failing right after IOMMUs are enabled. For this reason move the Thunderbolt driver initialization happen at rootfs level. Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-10-02	thunderbolt: Do not handle ICM events after domain is stopped	Mika Westerberg
	If there is a long chain of devices connected when the driver is loaded ICM sends device connected event for each and those are put to tb->wq for later processing. Now if the driver gets unloaded in the middle, so that the work queue is not yet empty it gets flushed by tb_domain_stop(). However, by that time the root switch is already removed so the driver crashes when it tries to dereference it in ICM event handling callbacks. Fix this by checking whether the root switch is already removed. If it is we know that the domain is stopped and we should merely skip handling the event. Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-10-02	regulator: bd718xx: fix build warning on x86_64	Matti Vaittinen
	Casting address to unsigned int causes a warning on some 64 bit architectures. Fix the cast. Signed-off-by: Matti Vaittinen <matti.vaittinen@fi.rohmeurope.com> Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Mark Brown <broonie@kernel.org>
2018-10-02	pktcdvd: fix fall-through annotation	Gustavo A. R. Silva
	Replace "fallthru" with a proper "fall through" annotation. This fix is part of the ongoing efforts to enabling -Wimplicit-fallthrough Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-10-02	dma-mapping: move dma_default_get_required_mask under ifdef	Christoph Hellwig
	This avoids a warning on powerpc. Signed-off-by: Christoph Hellwig <hch@lst.de> Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
2018-10-02	powerpc/lib: fix book3s/32 boot failure due to code patching	Christophe Leroy
	Commit 51c3c62b58b3 ("powerpc: Avoid code patching freed init sections") accesses 'init_mem_is_free' flag too early, before the kernel is relocated. This provokes early boot failure (before the console is active). As it is not necessary to do this verification that early, this patch moves the test into patch_instruction() instead of __patch_instruction(). This modification also has the advantage of avoiding unnecessary remappings. Fixes: 51c3c62b58b3 ("powerpc: Avoid code patching freed init sections") Cc: stable@vger.kernel.org # 4.13+ Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-10-02	regulator: fixed: Default enable high on DT regulators	Linus Walleij
	commit efdfeb079cc3 ("regulator: fixed: Convert to use GPIO descriptor only") switched to use gpiod_get() to look up the regulator from the gpiolib core whether that is device tree or boardfile. This meant that we activate the code in a603a2b8d86e ("gpio: of: Add special quirk to parse regulator flags") which means the descriptors coming from the device tree already have the right inversion and open drain semantics set up from the gpiolib core. As the fixed regulator was inspected again we got the inverted inversion and things broke. Fix it by ignoring the config in the device tree for now: the later patches in the series will push all inversion handling over to the gpiolib core and set it up properly in the boardfiles for legacy devices, but I did not finish that for this kernel cycle. Fixes: commit efdfeb079cc3 ("regulator: fixed: Convert to use GPIO descriptor only") Reported-by: Leonard Crestez <leonard.crestez@nxp.com> Reported-by: Fabio Estevam <festevam@gmail.com> Reported-by: John Stultz <john.stultz@linaro.org> Reported-by: Anders Roxell <anders.roxell@linaro.org> Signed-off-by: Linus Walleij <linus.walleij@linaro.org> Tested-by: John Stultz <john.stultz@linaro.org> Signed-off-by: Mark Brown <broonie@kernel.org>
2018-10-02	bpf: don't accept cgroup local storage with zero value size	Roman Gushchin
	Explicitly forbid creating cgroup local storage maps with zero value size, as it makes no sense and might even cause a panic. Reported-by: syzbot+18628320d3b14a5c459c@syzkaller.appspotmail.com Signed-off-by: Roman Gushchin <guro@fb.com> Cc: Alexei Starovoitov <ast@kernel.org> Cc: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-10-02	Merge tag 'fbdev-v4.19-rc7' of https://github.com/bzolnier/linux	Greg Kroah-Hartman
	Bartlomiej writes: "fbdev fixes for v4.19-rc7: - fix OMAPFB_MEMORY_READ ioctl to not leak kernel memory in omapfb driver (Tomi Valkeinen) - add missing prepare/unprepare clock operations in pxa168fb driver (Lubomir Rintel) - add nobgrt option in efifb driver to disable ACPI BGRT logo restore (Hans de Goede) - fix spelling mistake in fall-through annotation in stifb driver (Gustavo A. R. Silva) - fix URL for uvesafb repository in the documentation (Adam Jackson)" * tag 'fbdev-v4.19-rc7' of https://github.com/bzolnier/linux: video/fbdev/stifb: Fix spelling mistake in fall-through annotation uvesafb: Fix URLs in the documentation efifb: BGRT: Add nobgrt option fbdev/omapfb: fix omapfb_memory_read infoleak pxa168fb: prepare the clock
2018-10-02	Merge tag 'mmc-v4.19-rc4' of ↵	Greg Kroah-Hartman
	git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc Ulf writes: "MMC core: - Fixup conversion of debounce time to/from ms/us MMC host: - sdhi: Fixup whitelisting for Gen3 types" * tag 'mmc-v4.19-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc: mmc: slot-gpio: Fix debounce time to use miliseconds again mmc: core: Fix debounce time to use microseconds mmc: sdhi: sys_dmac: check for all Gen3 types when whitelisting
2018-10-02	drm/cma-helper: Fix crash in fbdev error path	Noralf Trønnes
	Sergey Suloev reported a crash happening in drm_client_dev_hotplug() when fbdev had failed to register. [ 9.124598] vc4_hdmi 3f902000.hdmi: ASoC: Failed to create component debugfs directory [ 9.147667] vc4_hdmi 3f902000.hdmi: vc4-hdmi-hifi <-> 3f902000.hdmi mapping ok [ 9.155184] vc4_hdmi 3f902000.hdmi: ASoC: no DMI vendor name! [ 9.166544] vc4-drm soc:gpu: bound 3f902000.hdmi (ops vc4_hdmi_ops [vc4]) [ 9.173840] vc4-drm soc:gpu: bound 3f806000.vec (ops vc4_vec_ops [vc4]) [ 9.181029] vc4-drm soc:gpu: bound 3f004000.txp (ops vc4_txp_ops [vc4]) [ 9.188519] vc4-drm soc:gpu: bound 3f400000.hvs (ops vc4_hvs_ops [vc4]) [ 9.195690] vc4-drm soc:gpu: bound 3f206000.pixelvalve (ops vc4_crtc_ops [vc4]) [ 9.203523] vc4-drm soc:gpu: bound 3f207000.pixelvalve (ops vc4_crtc_ops [vc4]) [ 9.215032] vc4-drm soc:gpu: bound 3f807000.pixelvalve (ops vc4_crtc_ops [vc4]) [ 9.274785] vc4-drm soc:gpu: bound 3fc00000.v3d (ops vc4_v3d_ops [vc4]) [ 9.290246] [drm] Initialized vc4 0.0.0 20140616 for soc:gpu on minor 0 [ 9.297464] [drm] Supports vblank timestamp caching Rev 2 (21.10.2013). [ 9.304600] [drm] Driver supports precise vblank timestamp query. [ 9.382856] vc4-drm soc:gpu: [drm:drm_fb_helper_fbdev_setup [drm_kms_helper]] ERROR Failed to set fbdev configuration [ 10.404937] Unable to handle kernel paging request at virtual address 00330a656369768a [ 10.441620] [00330a656369768a] address between user and kernel address ranges [ 10.449087] Internal error: Oops: 96000004 [#1] PREEMPT SMP [ 10.454762] Modules linked in: brcmfmac vc4 drm_kms_helper cfg80211 drm rfkill smsc95xx brcmutil usbnet drm_panel_orientation_quirks raspberrypi_hwmon bcm2835_dma crc32_ce pwm_bcm2835 bcm2835_rng virt_dma rng_core i2c_bcm2835 ip_tables x_tables ipv6 [ 10.477296] CPU: 2 PID: 45 Comm: kworker/2:1 Not tainted 4.19.0-rc5 #3 [ 10.483934] Hardware name: Raspberry Pi 3 Model B Rev 1.2 (DT) [ 10.489966] Workqueue: events output_poll_execute [drm_kms_helper] [ 10.596515] Process kworker/2:1 (pid: 45, stack limit = 0x000000007e8924dc) [ 10.603590] Call trace: [ 10.606259] drm_client_dev_hotplug+0x5c/0xb0 [drm] [ 10.611303] drm_kms_helper_hotplug_event+0x30/0x40 [drm_kms_helper] [ 10.617849] output_poll_execute+0xc4/0x1e0 [drm_kms_helper] [ 10.623616] process_one_work+0x1c8/0x318 [ 10.627695] worker_thread+0x48/0x428 [ 10.631420] kthread+0xf8/0x128 [ 10.634615] ret_from_fork+0x10/0x18 [ 10.638255] Code: 54000220 f9401261 aa1303e0 b4000141 (f9400c21) [ 10.644456] ---[ end trace c75b4a4b0e141908 ]--- The reason for this is that drm_fbdev_cma_init() removes the drm_client when fbdev registration fails, but it doesn't remove the client from the drm_device client list. So the client list now has a pointer that points into the unknown and we have a 'use after free' situation. Split drm_client_new() into drm_client_init() and drm_client_add() to fix removal in the error path. Fixes: 894a677f4b3e ("drm/cma-helper: Use the generic fbdev emulation") Reported-by: Sergey Suloev <ssuloev@orpaltech.com> Cc: Stefan Wahren <stefan.wahren@i2se.com> Cc: Eric Anholt <eric@anholt.net> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Signed-off-by: Noralf Trønnes <noralf@tronnes.org> Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch> Link: https://patchwork.freedesktop.org/patch/msgid/20181001194536.57756-1-noralf@tronnes.org
2018-10-02	pinctrl: sh-pfc: r8a77990: Add INTC-EX pins, groups and function	Geert Uytterhoeven
	Add pins, groups, and function for the Interrupt Controller for External Devices (INTC-EX) on the R-Car E3 SoC. Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be> Reviewed-by: Simon Horman <horms+renesas@verge.net.au>
2018-10-02	pinctrl: renesas: Renesas RZ/N1 pinctrl driver	Phil Edworthy
	This provides a pinctrl driver for the Renesas RZ/N1 device family. Based on a patch originally written by Michel Pollet at Renesas. Signed-off-by: Phil Edworthy <phil.edworthy@renesas.com> Reviewed-by: Jacopo Mondi <jacopo+renesas@jmondi.org> Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
2018-10-02	dt-bindings: pinctrl: renesas,rzn1-pinctrl: documentation	Phil Edworthy
	The Renesas RZ/N1 device family PINCTRL node description. Based on a patch originally written by Michel Pollet at Renesas. Signed-off-by: Phil Edworthy <phil.edworthy@renesas.com> Reviewed-by: Jacopo Mondi <jacopo+renesas@jmondi.org> Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
2018-10-02	sched/numa: Migrate pages to local nodes quicker early in the lifetime of a task	Mel Gorman
	Automatic NUMA Balancing uses a multi-stage pass to decide whether a page should migrate to a local node. This filter avoids excessive ping-ponging if a page is shared or used by threads that migrate cross-node frequently. Threads inherit both page tables and the preferred node ID from the parent. This means that threads can trigger hinting faults earlier than a new task which delays scanning for a number of seconds. As it can be load balanced very early in its lifetime there can be an unnecessary delay before it starts migrating thread-local data. This patch migrates private pages faster early in the lifetime of a thread using the sequence counter as an identifier of new tasks. With this patch applied, STREAM performance is the same as 4.17 even though processes are not spread cross-node prematurely. Other workloads showed a mix of minor gains and losses. This is somewhat expected most workloads are not very sensitive to the starting conditions of a process. 4.19.0-rc5 4.19.0-rc5 4.17.0 numab-v1r1 fastmigrate-v1r1 vanilla MB/sec copy 43298.52 ( 0.00%) 47335.46 ( 9.32%) 47219.24 ( 9.06%) MB/sec scale 30115.06 ( 0.00%) 32568.12 ( 8.15%) 32527.56 ( 8.01%) MB/sec add 32825.12 ( 0.00%) 36078.94 ( 9.91%) 35928.02 ( 9.45%) MB/sec triad 32549.52 ( 0.00%) 35935.94 ( 10.40%) 35969.88 ( 10.51%) Signed-off-by: Mel Gorman <mgorman@techsingularity.net> Reviewed-by: Rik van Riel <riel@surriel.com> Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Jirka Hladky <jhladky@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Linux-MM <linux-mm@kvack.org> Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/20181001100525.29789-3-mgorman@techsingularity.net Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-10-02	mm, sched/numa: Remove rate-limiting of automatic NUMA balancing migration	Mel Gorman
	Rate limiting of page migrations due to automatic NUMA balancing was introduced to mitigate the worst-case scenario of migrating at high frequency due to false sharing or slowly ping-ponging between nodes. Since then, a lot of effort was spent on correctly identifying these pages and avoiding unnecessary migrations and the safety net may no longer be required. Jirka Hladky reported a regression in 4.17 due to a scheduler patch that avoids spreading STREAM tasks wide prematurely. However, once the task was properly placed, it delayed migrating the memory due to rate limiting. Increasing the limit fixed the problem for him. Currently, the limit is hard-coded and does not account for the real capabilities of the hardware. Even if an estimate was attempted, it would not properly account for the number of memory controllers and it could not account for the amount of bandwidth used for normal accesses. Rather than fudging, this patch simply eliminates the rate limiting. However, Jirka reports that a STREAM configuration using multiple processes achieved similar performance to 4.16. In local tests, this patch improved performance of STREAM relative to the baseline but it is somewhat machine-dependent. Most workloads show little or not performance difference implying that there is not a heavily reliance on the throttling mechanism and it is safe to remove. STREAM on 2-socket machine 4.19.0-rc5 4.19.0-rc5 numab-v1r1 noratelimit-v1r1 MB/sec copy 43298.52 ( 0.00%) 44673.38 ( 3.18%) MB/sec scale 30115.06 ( 0.00%) 31293.06 ( 3.91%) MB/sec add 32825.12 ( 0.00%) 34883.62 ( 6.27%) MB/sec triad 32549.52 ( 0.00%) 34906.60 ( 7.24% Signed-off-by: Mel Gorman <mgorman@techsingularity.net> Reviewed-by: Rik van Riel <riel@surriel.com> Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Jirka Hladky <jhladky@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Linux-MM <linux-mm@kvack.org> Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/20181001100525.29789-2-mgorman@techsingularity.net Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-10-02	Documentation/lockstat: Fix trivial typo	Andrew Murray
	Fix incorrect line number in example output Signed-off-by: Andrew Murray <andrew.murray@arm.com> Cc: Jiri Kosina <trivial@kernel.org> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Will Deacon <will.deacon@arm.com> Cc: linux-doc@vger.kernel.org Link: http://lkml.kernel.org/r/1538391663-54524-1-git-send-email-andrew.murray@arm.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-10-02	pinctrl: msm: Actually use function 0 for gpio selection	Stephen Boyd
	This code needs to select function #0, which is the first int in the array of functions, not the number 0 which may or may not be the function for "GPIO mode" per the enum mapping. We were getting lucky on SDM845, where this was tested, because the function 0 matched the enum value for "GPIO mode". On other platforms, e.g. MSM8996, the gpio enum value is the last one in the list so this code doesn't work and we see a warning at boot. Fix it by grabbing the first element out of the array of functions. Cc: Doug Anderson <dianders@chromium.org> Cc: Bjorn Andersson <bjorn.andersson@linaro.org> Cc: Niklas Cassel <niklas.cassel@linaro.org> Reported-by: Niklas Cassel <niklas.cassel@linaro.org> Fixes: 1de7ddb3a15c ("pinctrl: msm: Mux out gpio function with gpio_request()") Signed-off-by: Stephen Boyd <swboyd@chromium.org> Reviewed-by: Bjorn Andersson <bjorn.andersson@linaro.org> Tested-by: Niklas Cassel <niklas.cassel@linaro.org> Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
2018-10-02	MAINTAINERS: Remove dead path from LOCKING PRIMITIVES entry	Will Deacon
	Since 890658b7ab48 ("locking/mutex: Kill arch specific code"), there are no mutex header files under arch/, so we can remove the redundant entry from MAINTAINERS. Reported-by: Joe Perches <joe@perches.com> Signed-off-by: Will Deacon <will.deacon@arm.com> Cc: Jason Low <jason.low2@hpe.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/20181001142856.GC9716@arm.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-10-02	locking/memory-barriers: Replace smp_cond_acquire() with smp_cond_load_acquire()	Andrea Parri
	Amend the changes in commit: 1f03e8d2919270 ("locking/barriers: Replace smp_cond_acquire() with smp_cond_load_acquire()") ... by updating the documentation accordingly. Also remove some obsolete information related to the implementation. Signed-off-by: Andrea Parri <andrea.parri@amarulasolutions.com> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Acked-by: Will Deacon <will.deacon@arm.com> Acked-by: Alan Stern <stern@rowland.harvard.edu> Cc: Akira Yokosawa <akiyks@gmail.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Boqun Feng <boqun.feng@gmail.com> Cc: Daniel Lustig <dlustig@nvidia.com> Cc: David Howells <dhowells@redhat.com> Cc: Jade Alglave <j.alglave@ucl.ac.uk> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Luc Maranget <luc.maranget@inria.fr> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Cc: linux-arch@vger.kernel.org Cc: parri.andrea@gmail.com Link: http://lkml.kernel.org/r/20180926182920.27644-5-paulmck@linux.ibm.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-10-02	tools/memory-model: Add more LKMM limitations	Paul E. McKenney
	This commit adds more detail about compiler optimizations and not-yet-modeled Linux-kernel APIs. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Andrea Parri <andrea.parri@amarulasolutions.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Cc: akiyks@gmail.com Cc: boqun.feng@gmail.com Cc: dhowells@redhat.com Cc: j.alglave@ucl.ac.uk Cc: linux-arch@vger.kernel.org Cc: luc.maranget@inria.fr Cc: npiggin@gmail.com Cc: parri.andrea@gmail.com Cc: stern@rowland.harvard.edu Cc: will.deacon@arm.com Link: http://lkml.kernel.org/r/20180926182920.27644-4-paulmck@linux.ibm.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-10-02	tools/memory-model: Fix a README typo	SeongJae Park
	This commit fixes a duplicate-"the" typo in README. Signed-off-by: SeongJae Park <sj38.park@gmail.com> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Acked-by: Alan Stern <stern@rowland.harvard.edu> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Cc: akiyks@gmail.com Cc: boqun.feng@gmail.com Cc: dhowells@redhat.com Cc: j.alglave@ucl.ac.uk Cc: linux-arch@vger.kernel.org Cc: luc.maranget@inria.fr Cc: npiggin@gmail.com Cc: parri.andrea@gmail.com Cc: will.deacon@arm.com Link: http://lkml.kernel.org/r/20180926182920.27644-3-paulmck@linux.ibm.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-10-02	tools/memory-model: Add extra ordering for locks and remove it for ordinary ↵	Alan Stern
	release/acquire More than one kernel developer has expressed the opinion that the LKMM should enforce ordering of writes by locking. In other words, given the following code: WRITE_ONCE(x, 1); spin_unlock(&s): spin_lock(&s); WRITE_ONCE(y, 1); the stores to x and y should be propagated in order to all other CPUs, even though those other CPUs might not access the lock s. In terms of the memory model, this means expanding the cumul-fence relation. Locks should also provide read-read (and read-write) ordering in a similar way. Given: READ_ONCE(x); spin_unlock(&s); spin_lock(&s); READ_ONCE(y); // or WRITE_ONCE(y, 1); the load of x should be executed before the load of (or store to) y. The LKMM already provides this ordering, but it provides it even in the case where the two accesses are separated by a release/acquire pair of fences rather than unlock/lock. This would prevent architectures from using weakly ordered implementations of release and acquire, which seems like an unnecessary restriction. The patch therefore removes the ordering requirement from the LKMM for that case. There are several arguments both for and against this change. Let us refer to these enhanced ordering properties by saying that the LKMM would require locks to be RCtso (a bit of a misnomer, but analogous to RCpc and RCsc) and it would require ordinary acquire/release only to be RCpc. (Note: In the following, the phrase "all supported architectures" is meant not to include RISC-V. Although RISC-V is indeed supported by the kernel, the implementation is still somewhat in a state of flux and therefore statements about it would be premature.) Pros: The kernel already provides RCtso ordering for locks on all supported architectures, even though this is not stated explicitly anywhere. Therefore the LKMM should formalize it. In theory, guaranteeing RCtso ordering would reduce the need for additional barrier-like constructs meant to increase the ordering strength of locks. Will Deacon and Peter Zijlstra are strongly in favor of formalizing the RCtso requirement. Linus Torvalds and Will would like to go even further, requiring locks to have RCsc behavior (ordering preceding writes against later reads), but they recognize that this would incur a noticeable performance degradation on the POWER architecture. Linus also points out that people have made the mistake, in the past, of assuming that locking has stronger ordering properties than is currently guaranteed, and this change would reduce the likelihood of such mistakes. Not requiring ordinary acquire/release to be any stronger than RCpc may prove advantageous for future architectures, allowing them to implement smp_load_acquire() and smp_store_release() with more efficient machine instructions than would be possible if the operations had to be RCtso. Will and Linus approve this rationale, hypothetical though it is at the moment (it may end up affecting the RISC-V implementation). The same argument may or may not apply to RMW-acquire/release; see also the second Con entry below. Linus feels that locks should be easy for people to use without worrying about memory consistency issues, since they are so pervasive in the kernel, whereas acquire/release is much more of an "experts only" tool. Requiring locks to be RCtso is a step in this direction. Cons: Andrea Parri and Luc Maranget think that locks should have the same ordering properties as ordinary acquire/release (indeed, Luc points out that the names "acquire" and "release" derive from the usage of locks). Andrea points out that having different ordering properties for different forms of acquires and releases is not only unnecessary, it would also be confusing and unmaintainable. Locks are constructed from lower-level primitives, typically RMW-acquire (for locking) and ordinary release (for unlock). It is illogical to require stronger ordering properties from the high-level operations than from the low-level operations they comprise. Thus, this change would make while (cmpxchg_acquire(&s, 0, 1) != 0) cpu_relax(); an incorrect implementation of spin_lock(&s) as far as the LKMM is concerned. In theory this weakness can be ameliorated by changing the LKMM even further, requiring RMW-acquire/release also to be RCtso (which it already is on all supported architectures). As far as I know, nobody has singled out any examples of code in the kernel that actually relies on locks being RCtso. (People mumble about RCU and the scheduler, but nobody has pointed to any actual code. If there are any real cases, their number is likely quite small.) If RCtso ordering is not needed, why require it? A handful of locking constructs (qspinlocks, qrwlocks, and mcs_spinlocks) are built on top of smp_cond_load_acquire() instead of an RMW-acquire instruction. It currently provides only the ordinary acquire semantics, not the stronger ordering this patch would require of locks. In theory this could be ameliorated by requiring smp_cond_load_acquire() in combination with ordinary release also to be RCtso (which is currently true on all supported architectures). On future weakly ordered architectures, people may be able to implement locks in a non-RCtso fashion with significant performance improvement. Meeting the RCtso requirement would necessarily add run-time overhead. Overall, the technical aspects of these arguments seem relatively minor, and it appears mostly to boil down to a matter of opinion. Since the opinions of senior kernel maintainers such as Linus, Peter, and Will carry more weight than those of Luc and Andrea, this patch changes the model in accordance with the maintainers' wishes. Signed-off-by: Alan Stern <stern@rowland.harvard.edu> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Will Deacon <will.deacon@arm.com> Reviewed-by: Andrea Parri <andrea.parri@amarulasolutions.com> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Cc: akiyks@gmail.com Cc: boqun.feng@gmail.com Cc: dhowells@redhat.com Cc: j.alglave@ucl.ac.uk Cc: linux-arch@vger.kernel.org Cc: luc.maranget@inria.fr Cc: npiggin@gmail.com Cc: parri.andrea@gmail.com Link: http://lkml.kernel.org/r/20180926182920.27644-2-paulmck@linux.ibm.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-10-02	tools/memory-model: Add litmus-test naming scheme	Paul E. McKenney
	This commit documents the scheme used to generate the names for the litmus tests. [ paulmck: Apply feedback from Andrea Parri and Will Deacon. ] Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Acked-by: Will Deacon <will.deacon@arm.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Cc: akiyks@gmail.com Cc: boqun.feng@gmail.com Cc: dhowells@redhat.com Cc: j.alglave@ucl.ac.uk Cc: linux-arch@vger.kernel.org Cc: luc.maranget@inria.fr Cc: npiggin@gmail.com Cc: parri.andrea@gmail.com Cc: stern@rowland.harvard.edu Link: http://lkml.kernel.org/r/20180926182920.27644-1-paulmck@linux.ibm.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-10-02	drm: fix use-after-free read in drm_mode_create_lease_ioctl()	Jann Horn
	fd_install() moves the reference given to it into the file descriptor table of the current process. If the current process is multithreaded, then immediately after fd_install(), another thread can close() the file descriptor and cause the file's resources to be cleaned up. Since the reference to "lessee" is held by the file, we must not access "lessee" after the fd_install() call. As far as I can tell, to reach this codepath, the caller must have an open file descriptor to a DRI device in master mode. I'm not sure what the requirements for that are. Signed-off-by: Jann Horn <jannh@google.com> Fixes: 62884cd386b8 ("drm: Add four ioctls for managing drm mode object leases [v7]") Cc: stable@vger.kernel.org Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch> Link: https://patchwork.freedesktop.org/patch/msgid/20181001153117.216923-1-jannh@google.com