summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2018-08-09x86/relocs: Add __end_rodata_aligned to S_RELJoerg Roedel
This new symbol needs to be in the workaround-list for buggy binutils, otherwise the build with gcc-4.6 fails. Fixes: 39d668e04eda ('x86/mm/pti: Make pti_clone_kernel_text() compile on 32 bit') Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Joerg Roedel <jroedel@suse.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Sedat Dilek <sedat.dilek@gmail.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Linux-Next Mailing List <linux-next@vger.kernel.org> Link: https://lkml.kernel.org/r/20180809094449.ddmnrkz7qkvo3j2x@suse.de
2018-08-09Merge branch 'linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6 Pull crypto fix from Herbert Xu: "This fixes a performance regression in arm64 NEON crypto as well as a crash in x86 aegis/morus on unsupported CPUs" * 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6: crypto: x86/aegis,morus - Fix and simplify CPUID checks crypto: arm64 - revert NEON yield for fast AEAD implementations
2018-08-09Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netLinus Torvalds
Pull networking fixes from David Miller: 1) The real fix for the ipv6 route metric leak Sabrina was seeing, from Cong Wang. 2) Fix syzbot triggers AF_PACKET v3 ring buffer insufficient room conditions, from Willem de Bruijn. 3) vsock can reinitialize active work struct, fix from Cong Wang. 4) RXRPC keepalive generator can wedge a cpu, fix from David Howells. 5) Fix locking in AF_SMC ioctl, from Ursula Braun. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: dsa: slave: eee: Allow ports to use phylink net/smc: move sock lock in smc_ioctl() net/smc: allow sysctl rmem and wmem defaults for servers net/smc: no shutdown in state SMC_LISTEN net: aquantia: Fix IFF_ALLMULTI flag functionality rxrpc: Fix the keepalive generator [ver #2] net/mlx5e: Cleanup of dcbnl related fields net/mlx5e: Properly check if hairpin is possible between two functions vhost: reset metadata cache when initializing new IOTLB llc: use refcount_inc_not_zero() for llc_sap_find() dccp: fix undefined behavior with 'cwnd' shift in ccid2_cwnd_restart() tipc: fix an interrupt unsafe locking scenario vsock: split dwork to avoid reinitializations net: thunderx: check for failed allocation lmac->dmacs cxgb4: mk_act_open_req() buggers ->{local, peer}_ip on big-endian hosts packet: refine ring v3 block size test to hold one frame ip6_tunnel: use the right value for ipv4 min mtu check in ip6_tnl_xmit ipv6: fix double refcount of fib6_metrics
2018-08-09i2c: xlp9xx: Fix case where SSIF read transaction completes earlyGeorge Cherian
During ipmi stress tests we see occasional failure of transactions at the boot time. This happens in the case of a I2C_M_RECV_LEN transactions, when the read transfer completes (with the initial read length of 34) before the driver gets a chance to handle interrupts. The current driver code expects at least 2 interrupts for I2C_M_RECV_LEN transactions. The length is updated during the first interrupt, and the buffer contents are only copied during subsequent interrupts. In case of just one interrupt, we will complete the transaction without copying out the bytes from RX fifo. Update the code to drain the RX fifo after the length update, so that the transaction completes correctly in all cases. Signed-off-by: George Cherian <george.cherian@cavium.com> Signed-off-by: Wolfram Sang <wsa@the-dreams.de> Cc: stable@kernel.org
2018-08-09s390/dasd: fix hanging offline processing due to canceled workerStefan Haberland
During offline processing two worker threads are canceled without freeing the device reference which leads to a hanging offline process. Reviewed-by: Jan Hoeppner <hoeppner@linux.ibm.com> Signed-off-by: Stefan Haberland <sth@linux.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2018-08-09s390/dasd: fix panic for failed online processingStefan Haberland
Fix a panic that occurs for a device that got an error in dasd_eckd_check_characteristics() during online processing. For example the read configuration data command may have failed. If this error occurs the device is not being set online and the earlier invoked steps during online processing are rolled back. Therefore dasd_eckd_uncheck_device() is called which needs a valid private structure. But this pointer is not valid if dasd_eckd_check_characteristics() has failed. Check for a valid device->private pointer to prevent a panic. Reviewed-by: Jan Hoeppner <hoeppner@linux.ibm.com> Signed-off-by: Stefan Haberland <sth@linux.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2018-08-09tools headers: Synchronise x86 cpufeatures.h for L1TF additionsDavid Woodhouse
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
2018-08-09s390/mm: fix addressing exception after suspend/resumeGerald Schaefer
Commit c9b5ad546e7d "s390/mm: tag normal pages vs pages used in page tables" accidentally changed the logic in arch_set_page_states(), which is used by the suspend/resume code. set_page_stable(page, order) was changed to set_page_stable_dat(page, 0). After this, only the first page of higher order pages will be set to stable, and a write to one of the unstable pages will result in an addressing exception. Fix this by using "order" again, instead of "0". Fixes: c9b5ad546e7d ("s390/mm: tag normal pages vs pages used in page tables") Cc: stable@vger.kernel.org # 4.14+ Reviewed-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Gerald Schaefer <gerald.schaefer@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2018-08-09rseq/selftests: add s390 supportVasily Gorbik
Implement support for s390 in the rseq selftests, in order to sanity check the recently enabled rseq syscall. The Implementation covers both 64-bit and 31-bit mode. Acked-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2018-08-08dsa: slave: eee: Allow ports to use phylinkAndrew Lunn
For a port to be able to use EEE, both the MAC and the PHY must support EEE. A phy can be provided by both a phydev or phylink. Verify at least one of these exist, not just phydev. Fixes: aab9c4067d23 ("net: dsa: Plug in PHYLINK support") Signed-off-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-08-08Merge branch 'smc-fixes'David S. Miller
Ursula Braun says: ==================== net/smc: fixes 2018-08-08 here are small fixes for SMC: The first patch makes sure, shutdown code is not executed for sockets in state SMC_LISTEN. The second patch resets send and receive buffer values for accepted sockets, since TCP buffer size optimizations for the internal CLC socket should not be forwarded to the outer SMC socket. The third patch solves a race between connect and ioctl reported by syzbot. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2018-08-08net/smc: move sock lock in smc_ioctl()Ursula Braun
When an SMC socket is connecting it is decided whether fallback to TCP is needed. To avoid races between connect and ioctl move the sock lock before the use_fallback check. Reported-by: syzbot+5b2cece1a8ecb2ca77d8@syzkaller.appspotmail.com Reported-by: syzbot+19557374321ca3710990@syzkaller.appspotmail.com Fixes: 1992d99882af ("net/smc: take sock lock in smc_ioctl()") Signed-off-by: Ursula Braun <ubraun@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-08-08net/smc: allow sysctl rmem and wmem defaults for serversUrsula Braun
Without setsockopt SO_SNDBUF and SO_RCVBUF settings, the sysctl defaults net.ipv4.tcp_wmem and net.ipv4.tcp_rmem should be the base for the sizes of the SMC sndbuf and rcvbuf. Any TCP buffer size optimizations for servers should be ignored. Signed-off-by: Ursula Braun <ubraun@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-08-08net/smc: no shutdown in state SMC_LISTENUrsula Braun
Invoking shutdown for a socket in state SMC_LISTEN does not make sense. Nevertheless programs like syzbot fuzzing the kernel may try to do this. For SMC this means a socket refcounting problem. This patch makes sure a shutdown call for an SMC socket in state SMC_LISTEN simply returns with -ENOTCONN. Signed-off-by: Ursula Braun <ubraun@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-08-08net: aquantia: Fix IFF_ALLMULTI flag functionalityDmitry Bogdanov
It was noticed that NIC always pass all multicast traffic to the host regardless of IFF_ALLMULTI flag on the interface. The rule in MC Filter Table in NIC, that is configured to accept any multicast packets, is turning on if IFF_MULTICAST flag is set on the interface. It leads to passing all multicast traffic to the host. This fix changes the condition to turn on that rule by checking IFF_ALLMULTI flag as it should. Fixes: b21f502f84be ("net:ethernet:aquantia: Fix for multicast filter handling.") Signed-off-by: Dmitry Bogdanov <dmitry.bogdanov@aquantia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-08-08rxrpc: Fix the keepalive generator [ver #2]David Howells
AF_RXRPC has a keepalive message generator that generates a message for a peer ~20s after the last transmission to that peer to keep firewall ports open. The implementation is incorrect in the following ways: (1) It mixes up ktime_t and time64_t types. (2) It uses ktime_get_real(), the output of which may jump forward or backward due to adjustments to the time of day. (3) If the current time jumps forward too much or jumps backwards, the generator function will crank the base of the time ring round one slot at a time (ie. a 1s period) until it catches up, spewing out VERSION packets as it goes. Fix the problem by: (1) Only using time64_t. There's no need for sub-second resolution. (2) Use ktime_get_seconds() rather than ktime_get_real() so that time isn't perceived to go backwards. (3) Simplifying rxrpc_peer_keepalive_worker() by splitting it into two parts: (a) The "worker" function that manages the buckets and the timer. (b) The "dispatch" function that takes the pending peers and potentially transmits a keepalive packet before putting them back in the ring into the slot appropriate to the revised last-Tx time. (4) Taking everything that's pending out of the ring and splicing it into a temporary collector list for processing. In the case that there's been a significant jump forward, the ring gets entirely emptied and then the time base can be warped forward before the peers are processed. The warping can't happen if the ring isn't empty because the slot a peer is in is keepalive-time dependent, relative to the base time. (5) Limit the number of iterations of the bucket array when scanning it. (6) Set the timer to skip any empty slots as there's no point waking up if there's nothing to do yet. This can be triggered by an incoming call from a server after a reboot with AF_RXRPC and AFS built into the kernel causing a peer record to be set up before userspace is started. The system clock is then adjusted by userspace, thereby potentially causing the keepalive generator to have a meltdown - which leads to a message like: watchdog: BUG: soft lockup - CPU#0 stuck for 23s! [kworker/0:1:23] ... Workqueue: krxrpcd rxrpc_peer_keepalive_worker EIP: lock_acquire+0x69/0x80 ... Call Trace: ? rxrpc_peer_keepalive_worker+0x5e/0x350 ? _raw_spin_lock_bh+0x29/0x60 ? rxrpc_peer_keepalive_worker+0x5e/0x350 ? rxrpc_peer_keepalive_worker+0x5e/0x350 ? __lock_acquire+0x3d3/0x870 ? process_one_work+0x110/0x340 ? process_one_work+0x166/0x340 ? process_one_work+0x110/0x340 ? worker_thread+0x39/0x3c0 ? kthread+0xdb/0x110 ? cancel_delayed_work+0x90/0x90 ? kthread_stop+0x70/0x70 ? ret_from_fork+0x19/0x24 Fixes: ace45bec6d77 ("rxrpc: Fix firewall route keepalive") Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-08-08Merge branch 'mlx5-fixes'David S. Miller
Saeed Mahameed says: ==================== Mellanox, mlx5e fixes 2018-08-07 I know it is late into 4.18 release, and this is why I am submitting only two mlx5e ethernet fixes. The first one from Or, is needed for -stable and it fixes hairpin for "same device" check. The second fix is a non risk fix from Huy which cleans up and improves error return value reporting for dcbnl_ieee_setapp. For -stable v4.16 - net/mlx5e: Properly check if hairpin is possible between two functions ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2018-08-08net/mlx5e: Cleanup of dcbnl related fieldsHuy Nguyen
Remove unused netdev_registered_init/remove in en.h Return ENOSUPPORT if the check MLX5_DSCP_SUPPORTED fails. Remove extra white space Fixes: 2a5e7a1344f4 ("net/mlx5e: Add dcbnl dscp to priority support") Signed-off-by: Huy Nguyen <huyn@mellanox.com> Cc: Yuval Shaia <yuval.shaia@oracle.com> Reviewed-by: Parav Pandit <parav@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-08-08net/mlx5e: Properly check if hairpin is possible between two functionsOr Gerlitz
The current check relies on function BDF addresses and can get us wrong e.g when two VFs are assigned into a VM and the PCI v-address is set by the hypervisor. Fixes: 5c65c564c962 ('net/mlx5e: Support offloading TC NIC hairpin flows') Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Reported-by: Alaa Hleihel <alaa@mellanox.com> Tested-by: Alaa Hleihel <alaa@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-08-08cifs: create SMB2_open_init()/SMB2_open_free() helpers.Ronnie Sahlberg
Signed-off-by: Ronnie Sahlberg <lsahlber@redhat.com> Signed-off-by: Steve French <stfrench@microsoft.com> Reviewed-by: Paulo Alcantara <palcantara@suse.com> Reviewed-by: Pavel Shilovsky <pshilov@microsoft.com>
2018-08-08cifs: add SMB2_query_info_[init|free]()Ronnie Sahlberg
Signed-off-by: Ronnie Sahlberg <lsahlber@redhat.com> Signed-off-by: Steve French <stfrench@microsoft.com> Reviewed-by: Pavel Shilovsky <pshilov@microsoft.com> Reviewed-by: Paulo Alcantara <palcantara@suse.com>
2018-08-08cifs: add SMB2_close_init()/SMB2_close_free()Ronnie Sahlberg
Signed-off-by: Ronnie Sahlberg <lsahlber@redhat.com> Signed-off-by: Steve French <stfrench@microsoft.com> Reviewed-by: Pavel Shilovsky <pshilov@microsoft.com> Reviewed-by: Paulo Alcantara <palcantara@suse.com>
2018-08-08x86/mm/kmmio: Make the tracer robust against L1TFAndi Kleen
The mmio tracer sets io mapping PTEs and PMDs to non present when enabled without inverting the address bits, which makes the PTE entry vulnerable for L1TF. Make it use the right low level macros to actually invert the address bits to protect against L1TF. In principle this could be avoided because MMIO tracing is not likely to be enabled on production machines, but the fix is straigt forward and for consistency sake it's better to get rid of the open coded PTE manipulation. Signed-off-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2018-08-08parisc: Define mb() and add memory barriers to assembler unlock sequencesJohn David Anglin
For years I thought all parisc machines executed loads and stores in order. However, Jeff Law recently indicated on gcc-patches that this is not correct. There are various degrees of out-of-order execution all the way back to the PA7xxx processor series (hit-under-miss). The PA8xxx series has full out-of-order execution for both integer operations, and loads and stores. This is described in the following article: http://web.archive.org/web/20040214092531/http://www.cpus.hp.com/technical_references/advperf.shtml For this reason, we need to define mb() and to insert a memory barrier before the store unlocking spinlocks. This ensures that all memory accesses are complete prior to unlocking. The ldcw instruction performs the same function on entry. Signed-off-by: John David Anglin <dave.anglin@bell.net> Cc: stable@vger.kernel.org # 4.0+ Signed-off-by: Helge Deller <deller@gmx.de>
2018-08-08parisc: Enable CONFIG_MLONGCALLS by defaultHelge Deller
Enable the -mlong-calls compiler option by default, because otherwise in most cases linking the vmlinux binary fails due to truncations of R_PARISC_PCREL22F relocations. This fixes building the 64-bit defconfig. Cc: stable@vger.kernel.org # 4.0+ Signed-off-by: Helge Deller <deller@gmx.de>
2018-08-08Merge branch 'sockmap-fixes'Alexei Starovoitov
Daniel Borkmann says: ==================== Two sockmap fixes in bpf_tcp_sendmsg(), and one fix for the sockmap kernel selftest. Thanks! ==================== Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-08-08bpf, sockmap: fix cork timeout for select due to epipeDaniel Borkmann
I ran into the same issue as a009f1f396d0 ("selftests/bpf: test_sockmap, timing improvements") where I had a broken pipe error on the socket due to remote end timing out on select and then shutting down it's sockets while the other side was still sending. We may need to do a bigger rework in general on the test_sockmap.c, but for now increase it to a more suitable timeout. Fixes: a18fda1a62c3 ("bpf: reduce runtime of test_sockmap tests") Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-08-08bpf, sockmap: fix leak in bpf_tcp_sendmsg wait for mem pathDaniel Borkmann
In bpf_tcp_sendmsg() the sk_alloc_sg() may fail. In the case of ENOMEM, it may also mean that we've partially filled the scatterlist entries with pages. Later jumping to sk_stream_wait_memory() we could further fail with an error for several reasons, however we miss to call free_start_sg() if the local sk_msg_buff was used. Fixes: 4f738adba30a ("bpf: create tcp_bpf_ulp allowing BPF to monitor socket TX/RX data") Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-08-08bpf, sockmap: fix bpf_tcp_sendmsg sock error handlingDaniel Borkmann
While working on bpf_tcp_sendmsg() code, I noticed that when a sk->sk_err is set we error out with err = sk->sk_err. However this is problematic since sk->sk_err is a positive error value and therefore we will neither go into sk_stream_error() nor will we report an error back to user space. I had this case with EPIPE and user space was thinking sendmsg() succeeded since EPIPE is a positive value, thinking we submitted 32 bytes. Fix it by negating the sk->sk_err value. Fixes: 4f738adba30a ("bpf: create tcp_bpf_ulp allowing BPF to monitor socket TX/RX data") Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-08-08locks: remove misleading obsolete commentJeff Layton
The spinlock handling in this file has changed significantly since this comment was written, and the file_lock_lock is no more. In addition, this overall comment no longer applies. Deleting an entry now requires both locks. Signed-off-by: Jeff Layton <jlayton@kernel.org>
2018-08-08MIPS: netlogic: xlr: Remove erroneous check in nlm_fmn_send()Paul Burton
In nlm_fmn_send() we have a loop which attempts to send a message multiple times in order to handle the transient failure condition of a lack of available credit. When examining the status register to detect the failure we check for a condition that can never be true, which falls foul of gcc 8's -Wtautological-compare: In file included from arch/mips/netlogic/common/irq.c:65: ./arch/mips/include/asm/netlogic/xlr/fmn.h: In function 'nlm_fmn_send': ./arch/mips/include/asm/netlogic/xlr/fmn.h:304:22: error: bitwise comparison always evaluates to false [-Werror=tautological-compare] if ((status & 0x2) == 1) ^~ If the path taken if this condition were true all we do is print a message to the kernel console. Since failures seem somewhat expected here (making the console message questionable anyway) and the condition has clearly never evaluated true we simply remove it, rather than attempting to fix it to check status correctly. Signed-off-by: Paul Burton <paul.burton@mips.com> Patchwork: https://patchwork.linux-mips.org/patch/20174/ Cc: Ganesan Ramalingam <ganesanr@broadcom.com> Cc: James Hogan <jhogan@kernel.org> Cc: Jayachandran C <jnair@caviumnetworks.com> Cc: John Crispin <john@phrozen.org> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: linux-mips@linux-mips.org
2018-08-08vhost: reset metadata cache when initializing new IOTLBJason Wang
We need to reset metadata cache during new IOTLB initialization, otherwise the stale pointers to previous IOTLB may be still accessed which will lead a use after free. Reported-by: syzbot+c51e6736a1bf614b3272@syzkaller.appspotmail.com Fixes: f88949138058 ("vhost: introduce O(1) vq metadata cache") Signed-off-by: Jason Wang <jasowang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-08-08x86/mm/pat: Make set_memory_np() L1TF safeAndi Kleen
set_memory_np() is used to mark kernel mappings not present, but it has it's own open coded mechanism which does not have the L1TF protection of inverting the address bits. Replace the open coded PTE manipulation with the L1TF protecting low level PTE routines. Passes the CPA self test. Signed-off-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2018-08-08x86/speculation/l1tf: Make pmd/pud_mknotpresent() invertAndi Kleen
Some cases in THP like: - MADV_FREE - mprotect - split mark the PMD non present for temporarily to prevent races. The window for an L1TF attack in these contexts is very small, but it wants to be fixed for correctness sake. Use the proper low level functions for pmd/pud_mknotpresent() to address this. Signed-off-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2018-08-08x86/speculation/l1tf: Invert all not present mappingsAndi Kleen
For kernel mappings PAGE_PROTNONE is not necessarily set for a non present mapping, but the inversion logic explicitely checks for !PRESENT and PROT_NONE. Remove the PROT_NONE check and make the inversion unconditional for all not present mappings. Signed-off-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2018-08-07MIPS: VDSO: Force link endiannessPaul Burton
When building the VDSO with clang it appears to invoke ld without specifying endianness, even though clang itself was provided with a -EB or -EL flag. This results in the build failing due to a mismatch between the objects that are the input to ld, and the output it is attempting to create: VDSO arch/mips/vdso/vdso.so.dbg.raw mips-linux-ld: arch/mips/vdso/elf.o: compiled for a big endian system and target is little endian mips-linux-ld: arch/mips/vdso/elf.o: endianness incompatible with that of the selected emulation mips-linux-ld: failed to merge target specific data of file arch/mips/vdso/elf.o ... Work around this problem by explicitly specifying the link endianness using -Wl,-EB or -Wl,-EL when -EB or -EL are part of KBUILD_CFLAGS. This resolves the build failure when using clang, and doesn't have any negative effect on gcc. Signed-off-by: Paul Burton <paul.burton@mips.com>
2018-08-07MIPS: Always specify -EB or -EL when using clangPaul Burton
When building using clang, always specify -EB or -EL in order to ensure we target the desired endianness. Since clang cross compiles using a single compiler build with multiple targets, our -dumpmachine tests which don't specify clang's --target argument check output based upon the build machine rather than the machine our build will target. This means our detection of whether to specify -EB fails miserably & we never do. Providing the endianness flag unconditionally for clang resolves this issue & simplifies the clang path somewhat. Signed-off-by: Paul Burton <paul.burton@mips.com>
2018-08-07llc: use refcount_inc_not_zero() for llc_sap_find()Cong Wang
llc_sap_put() decreases the refcnt before deleting sap from the global list. Therefore, there is a chance llc_sap_find() could find a sap with zero refcnt in this global list. Close this race condition by checking if refcnt is zero or not in llc_sap_find(), if it is zero then it is being removed so we can just treat it as gone. Reported-by: <syzbot+278893f3f7803871f7ce@syzkaller.appspotmail.com> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-08-07dccp: fix undefined behavior with 'cwnd' shift in ccid2_cwnd_restart()Alexey Kodanev
The shift of 'cwnd' with '(now - hc->tx_lsndtime) / hc->tx_rto' value can lead to undefined behavior [1]. In order to fix this use a gradual shift of the window with a 'while' loop, similar to what tcp_cwnd_restart() is doing. When comparing delta and RTO there is a minor difference between TCP and DCCP, the last one also invokes dccp_cwnd_restart() and reduces 'cwnd' if delta equals RTO. That case is preserved in this change. [1]: [40850.963623] UBSAN: Undefined behaviour in net/dccp/ccids/ccid2.c:237:7 [40851.043858] shift exponent 67 is too large for 32-bit type 'unsigned int' [40851.127163] CPU: 3 PID: 15940 Comm: netstress Tainted: G W E 4.18.0-rc7.x86_64 #1 ... [40851.377176] Call Trace: [40851.408503] dump_stack+0xf1/0x17b [40851.451331] ? show_regs_print_info+0x5/0x5 [40851.503555] ubsan_epilogue+0x9/0x7c [40851.548363] __ubsan_handle_shift_out_of_bounds+0x25b/0x2b4 [40851.617109] ? __ubsan_handle_load_invalid_value+0x18f/0x18f [40851.686796] ? xfrm4_output_finish+0x80/0x80 [40851.739827] ? lock_downgrade+0x6d0/0x6d0 [40851.789744] ? xfrm4_prepare_output+0x160/0x160 [40851.845912] ? ip_queue_xmit+0x810/0x1db0 [40851.895845] ? ccid2_hc_tx_packet_sent+0xd36/0x10a0 [dccp] [40851.963530] ccid2_hc_tx_packet_sent+0xd36/0x10a0 [dccp] [40852.029063] dccp_xmit_packet+0x1d3/0x720 [dccp] [40852.086254] dccp_write_xmit+0x116/0x1d0 [dccp] [40852.142412] dccp_sendmsg+0x428/0xb20 [dccp] [40852.195454] ? inet_dccp_listen+0x200/0x200 [dccp] [40852.254833] ? sched_clock+0x5/0x10 [40852.298508] ? sched_clock+0x5/0x10 [40852.342194] ? inet_create+0xdf0/0xdf0 [40852.388988] sock_sendmsg+0xd9/0x160 ... Fixes: 113ced1f52e5 ("dccp ccid-2: Perform congestion-window validation") Signed-off-by: Alexey Kodanev <alexey.kodanev@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-08-07x86/mm/pti: Clone kernel-image on PTE level for 32 bitJoerg Roedel
On 32 bit the kernel sections are not huge-page aligned. When we clone them on PMD-level we unevitably map some areas that are normal kernel memory and may contain secrets to user-space. To prevent that we need to clone the kernel-image on PTE-level for 32 bit. Also make the page-table cloning code more general so that it can handle PMD and PTE level cloning. This can be generalized further in the future to also handle clones on the P4D-level. Signed-off-by: Joerg Roedel <jroedel@suse.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: "H . Peter Anvin" <hpa@zytor.com> Cc: linux-mm@kvack.org Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Juergen Gross <jgross@suse.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Jiri Kosina <jkosina@suse.cz> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: Brian Gerst <brgerst@gmail.com> Cc: David Laight <David.Laight@aculab.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: Eduardo Valentin <eduval@amazon.com> Cc: Greg KH <gregkh@linuxfoundation.org> Cc: Will Deacon <will.deacon@arm.com> Cc: aliguori@amazon.com Cc: daniel.gruss@iaik.tugraz.at Cc: hughd@google.com Cc: keescook@google.com Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Waiman Long <llong@redhat.com> Cc: Pavel Machek <pavel@ucw.cz> Cc: "David H . Gutteridge" <dhgutteridge@sympatico.ca> Cc: joro@8bytes.org Link: https://lkml.kernel.org/r/1533637471-30953-4-git-send-email-joro@8bytes.org
2018-08-07x86/mm/pti: Don't clear permissions in pti_clone_pmd()Joerg Roedel
The function sets the global-bit on cloned PMD entries, which only makes sense when the permissions are identical between the user and the kernel page-table. Further, only write-permissions are cleared for entry-text and kernel-text sections, which are not writeable at the end of the boot process. The reason why this RW clearing exists is that in the early PTI implementations the cloned kernel areas were set up during early boot before the kernel text is set to read only and not touched afterwards. This is not longer true. The cloned areas are still set up early to get the entry code working for interrupts and other things, but after the kernel text has been set RO the clone is repeated which copies the RO PMD/PTEs over to the user visible clone. That means the initial clearing of the writable bit can be avoided. [ tglx: Amended changelog ] Signed-off-by: Joerg Roedel <jroedel@suse.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Dave Hansen <dave.hansen@intel.com> Cc: "H . Peter Anvin" <hpa@zytor.com> Cc: linux-mm@kvack.org Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Juergen Gross <jgross@suse.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Jiri Kosina <jkosina@suse.cz> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: Brian Gerst <brgerst@gmail.com> Cc: David Laight <David.Laight@aculab.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: Eduardo Valentin <eduval@amazon.com> Cc: Greg KH <gregkh@linuxfoundation.org> Cc: Will Deacon <will.deacon@arm.com> Cc: aliguori@amazon.com Cc: daniel.gruss@iaik.tugraz.at Cc: hughd@google.com Cc: keescook@google.com Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Waiman Long <llong@redhat.com> Cc: Pavel Machek <pavel@ucw.cz> Cc: "David H . Gutteridge" <dhgutteridge@sympatico.ca> Cc: joro@8bytes.org Link: https://lkml.kernel.org/r/1533637471-30953-3-git-send-email-joro@8bytes.org
2018-08-07tipc: fix an interrupt unsafe locking scenarioYing Xue
Commit 9faa89d4ed9d ("tipc: make function tipc_net_finalize() thread safe") tries to make it thread safe to set node address, so it uses node_list_lock lock to serialize the whole process of setting node address in tipc_net_finalize(). But it causes the following interrupt unsafe locking scenario: CPU0 CPU1 ---- ---- rht_deferred_worker() rhashtable_rehash_table() lock(&(&ht->lock)->rlock) tipc_nl_compat_doit() tipc_net_finalize() local_irq_disable(); lock(&(&tn->node_list_lock)->rlock); tipc_sk_reinit() rhashtable_walk_enter() lock(&(&ht->lock)->rlock); <Interrupt> tipc_disc_rcv() tipc_node_check_dest() tipc_node_create() lock(&(&tn->node_list_lock)->rlock); *** DEADLOCK *** When rhashtable_rehash_table() holds ht->lock on CPU0, it doesn't disable BH. So if an interrupt happens after the lock, it can create an inverse lock ordering between ht->lock and tn->node_list_lock. As a consequence, deadlock might happen. The reason causing the inverse lock ordering scenario above is because the initial purpose of node_list_lock is not designed to do the serialization of node address setting. As cmpxchg() can guarantee CAS (compare-and-swap) process is atomic, we use it to replace node_list_lock to ensure setting node address can be atomically finished. It turns out the potential deadlock can be avoided as well. Fixes: 9faa89d4ed9d ("tipc: make function tipc_net_finalize() thread safe") Signed-off-by: Ying Xue <ying.xue@windriver.com> Acked-by: Jon Maloy <maloy@donjonn.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-08-07x86/paravirt: Fix spectre-v2 mitigations for paravirt guestsPeter Zijlstra
Nadav reported that on guests we're failing to rewrite the indirect calls to CALLEE_SAVE paravirt functions. In particular the pv_queued_spin_unlock() call is left unpatched and that is all over the place. This obviously wrecks Spectre-v2 mitigation (for paravirt guests) which relies on not actually having indirect calls around. The reason is an incorrect clobber test in paravirt_patch_call(); this function rewrites an indirect call with a direct call to the _SAME_ function, there is no possible way the clobbers can be different because of this. Therefore remove this clobber check. Also put WARNs on the other patch failure case (not enough room for the instruction) which I've not seen trigger in my (limited) testing. Three live kernel image disassemblies for lock_sock_nested (as a small function that illustrates the problem nicely). PRE is the current situation for guests, POST is with this patch applied and NATIVE is with or without the patch for !guests. PRE: (gdb) disassemble lock_sock_nested Dump of assembler code for function lock_sock_nested: 0xffffffff817be970 <+0>: push %rbp 0xffffffff817be971 <+1>: mov %rdi,%rbp 0xffffffff817be974 <+4>: push %rbx 0xffffffff817be975 <+5>: lea 0x88(%rbp),%rbx 0xffffffff817be97c <+12>: callq 0xffffffff819f7160 <_cond_resched> 0xffffffff817be981 <+17>: mov %rbx,%rdi 0xffffffff817be984 <+20>: callq 0xffffffff819fbb00 <_raw_spin_lock_bh> 0xffffffff817be989 <+25>: mov 0x8c(%rbp),%eax 0xffffffff817be98f <+31>: test %eax,%eax 0xffffffff817be991 <+33>: jne 0xffffffff817be9ba <lock_sock_nested+74> 0xffffffff817be993 <+35>: movl $0x1,0x8c(%rbp) 0xffffffff817be99d <+45>: mov %rbx,%rdi 0xffffffff817be9a0 <+48>: callq *0xffffffff822299e8 0xffffffff817be9a7 <+55>: pop %rbx 0xffffffff817be9a8 <+56>: pop %rbp 0xffffffff817be9a9 <+57>: mov $0x200,%esi 0xffffffff817be9ae <+62>: mov $0xffffffff817be993,%rdi 0xffffffff817be9b5 <+69>: jmpq 0xffffffff81063ae0 <__local_bh_enable_ip> 0xffffffff817be9ba <+74>: mov %rbp,%rdi 0xffffffff817be9bd <+77>: callq 0xffffffff817be8c0 <__lock_sock> 0xffffffff817be9c2 <+82>: jmp 0xffffffff817be993 <lock_sock_nested+35> End of assembler dump. POST: (gdb) disassemble lock_sock_nested Dump of assembler code for function lock_sock_nested: 0xffffffff817be970 <+0>: push %rbp 0xffffffff817be971 <+1>: mov %rdi,%rbp 0xffffffff817be974 <+4>: push %rbx 0xffffffff817be975 <+5>: lea 0x88(%rbp),%rbx 0xffffffff817be97c <+12>: callq 0xffffffff819f7160 <_cond_resched> 0xffffffff817be981 <+17>: mov %rbx,%rdi 0xffffffff817be984 <+20>: callq 0xffffffff819fbb00 <_raw_spin_lock_bh> 0xffffffff817be989 <+25>: mov 0x8c(%rbp),%eax 0xffffffff817be98f <+31>: test %eax,%eax 0xffffffff817be991 <+33>: jne 0xffffffff817be9ba <lock_sock_nested+74> 0xffffffff817be993 <+35>: movl $0x1,0x8c(%rbp) 0xffffffff817be99d <+45>: mov %rbx,%rdi 0xffffffff817be9a0 <+48>: callq 0xffffffff810a0c20 <__raw_callee_save___pv_queued_spin_unlock> 0xffffffff817be9a5 <+53>: xchg %ax,%ax 0xffffffff817be9a7 <+55>: pop %rbx 0xffffffff817be9a8 <+56>: pop %rbp 0xffffffff817be9a9 <+57>: mov $0x200,%esi 0xffffffff817be9ae <+62>: mov $0xffffffff817be993,%rdi 0xffffffff817be9b5 <+69>: jmpq 0xffffffff81063aa0 <__local_bh_enable_ip> 0xffffffff817be9ba <+74>: mov %rbp,%rdi 0xffffffff817be9bd <+77>: callq 0xffffffff817be8c0 <__lock_sock> 0xffffffff817be9c2 <+82>: jmp 0xffffffff817be993 <lock_sock_nested+35> End of assembler dump. NATIVE: (gdb) disassemble lock_sock_nested Dump of assembler code for function lock_sock_nested: 0xffffffff817be970 <+0>: push %rbp 0xffffffff817be971 <+1>: mov %rdi,%rbp 0xffffffff817be974 <+4>: push %rbx 0xffffffff817be975 <+5>: lea 0x88(%rbp),%rbx 0xffffffff817be97c <+12>: callq 0xffffffff819f7160 <_cond_resched> 0xffffffff817be981 <+17>: mov %rbx,%rdi 0xffffffff817be984 <+20>: callq 0xffffffff819fbb00 <_raw_spin_lock_bh> 0xffffffff817be989 <+25>: mov 0x8c(%rbp),%eax 0xffffffff817be98f <+31>: test %eax,%eax 0xffffffff817be991 <+33>: jne 0xffffffff817be9ba <lock_sock_nested+74> 0xffffffff817be993 <+35>: movl $0x1,0x8c(%rbp) 0xffffffff817be99d <+45>: mov %rbx,%rdi 0xffffffff817be9a0 <+48>: movb $0x0,(%rdi) 0xffffffff817be9a3 <+51>: nopl 0x0(%rax) 0xffffffff817be9a7 <+55>: pop %rbx 0xffffffff817be9a8 <+56>: pop %rbp 0xffffffff817be9a9 <+57>: mov $0x200,%esi 0xffffffff817be9ae <+62>: mov $0xffffffff817be993,%rdi 0xffffffff817be9b5 <+69>: jmpq 0xffffffff81063ae0 <__local_bh_enable_ip> 0xffffffff817be9ba <+74>: mov %rbp,%rdi 0xffffffff817be9bd <+77>: callq 0xffffffff817be8c0 <__lock_sock> 0xffffffff817be9c2 <+82>: jmp 0xffffffff817be993 <lock_sock_nested+35> End of assembler dump. Fixes: 63f70270ccd9 ("[PATCH] i386: PARAVIRT: add common patching machinery") Fixes: 3010a0663fd9 ("x86/paravirt, objtool: Annotate indirect calls") Reported-by: Nadav Amit <namit@vmware.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Juergen Gross <jgross@suse.com> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: David Woodhouse <dwmw2@infradead.org> Cc: stable@vger.kernel.org
2018-08-07vsock: split dwork to avoid reinitializationsCong Wang
syzbot reported that we reinitialize an active delayed work in vsock_stream_connect(): ODEBUG: init active (active state 0) object type: timer_list hint: delayed_work_timer_fn+0x0/0x90 kernel/workqueue.c:1414 WARNING: CPU: 1 PID: 11518 at lib/debugobjects.c:329 debug_print_object+0x16a/0x210 lib/debugobjects.c:326 The pattern is apparently wrong, we should only initialize the dealyed work once and could repeatly schedule it. So we have to move out the initializations to allocation side. And to avoid confusion, we can split the shared dwork into two, instead of re-using the same one. Fixes: d021c344051a ("VSOCK: Introduce VM Sockets") Reported-by: <syzbot+8a9b1bd330476a4f3db6@syzkaller.appspotmail.com> Cc: Andy king <acking@vmware.com> Cc: Stefan Hajnoczi <stefanha@redhat.com> Cc: Jorgen Hansen <jhansen@vmware.com> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-08-07net: thunderx: check for failed allocation lmac->dmacsColin Ian King
The allocation of lmac->dmacs is not being checked for allocation failure. Add the check. Fixes: 3a34ecfd9d3f ("net: thunderx: add MAC address filter tracking for LMAC") Signed-off-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-08-07cxgb4: mk_act_open_req() buggers ->{local, peer}_ip on big-endian hostsAl Viro
Unlike fs.val.lport and fs.val.fport, cxgb4_process_flow_match() sets fs.val.{l,f}ip to net-endian values without conversion - they come straight from flow_dissector_key_ipv4_addrs ->dst and ->src resp. So the assignment in mk_act_open_req() ought to be a straight copy. As far as I know, T4 PCIe cards do exist, so it's not as if that thing could only be found on little-endian systems... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Acked-by: Rahul Lakkireddy <rahul.lakkireddy@chelsio.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-08-07smb3: display stats counters for number of slow commandsSteve French
When CONFIG_CIFS_STATS2 is enabled keep counters for slow commands (ie server took longer than 1 second to respond) by SMB2/SMB3 command code. This can help in diagnosing whether performance problems are on server (instead of client) and which commands are causing the problem. Sample output (the new lines contain words "slow responses ...") $ cat /proc/fs/cifs/Stats Resources in use CIFS Session: 1 Share (unique mount targets): 2 SMB Request/Response Buffer: 1 Pool size: 5 SMB Small Req/Resp Buffer: 1 Pool size: 30 Total Large 10 Small 490 Allocations Operations (MIDs): 0 0 session 0 share reconnects Total vfs operations: 67 maximum at one time: 2 4 slow responses from localhost for command 5 1 slow responses from localhost for command 6 1 slow responses from localhost for command 14 1 slow responses from localhost for command 16 1) \\localhost\test SMBs: 243 Bytes read: 1024000 Bytes written: 104857600 TreeConnects: 1 total 0 failed TreeDisconnects: 0 total 0 failed Creates: 40 total 0 failed Closes: 39 total 0 failed ... Signed-off-by: Steve French <stfrench@microsoft.com> Reviewed-by: Pavel Shilovsky <pshilov@microsoft.com> Reviewed-by: Aurelien Aptel <aaptel@suse.com>
2018-08-07CIFS: fix uninitialized ptr deref in smb2 signingAurelien Aptel
server->secmech.sdeschmacsha256 is not properly initialized before smb2_shash_allocate(), set shash after that call. also fix typo in error message Fixes: 8de8c4608fe9 ("cifs: Fix validation of signed data in smb2") Signed-off-by: Aurelien Aptel <aaptel@suse.com> Reviewed-by: Paulo Alcantara <palcantara@suse.com> Reported-by: Xiaoli Feng <xifeng@redhat.com> Signed-off-by: Steve French <stfrench@microsoft.com> CC: Stable <stable@vger.kernel.org>
2018-08-07smb3: Do not send SMB3 SET_INFO if nothing changedSteve French
An earlier commit had a typo which prevented the optimization from working: commit 18dd8e1a65dd ("Do not send SMB3 SET_INFO request if nothing is changing") Thank you to Metze for noticing this. Also clear a reserved field in the FILE_BASIC_INFO struct we send that should be zero (all the other fields in that struct were set or cleared explicitly already in cifs_set_file_info). Reviewed-by: Pavel Shilovsky <pshilov@microsoft.com> CC: Stable <stable@vger.kernel.org> # 4.9.x+ Reported-by: Stefan Metzmacher <metze@samba.org> Signed-off-by: Steve French <stfrench@microsoft.com>
2018-08-07smb3: fix minor debug output for CONFIG_CIFS_STATSSteve French
CONFIG_CIFS_STATS is now always enabled (to simplify the code and since the STATS are important for some common customer use cases and also debugging), but needed one minor change so that STATS shows as enabled in the debug output in /proc/fs/cifs/DebugData, otherwise it could get confusing with STATS no longer showing up in the "Features" list in /proc/fs/cifs/DebugData when basic stats were in fact available. Signed-off-by: Steve French <stfrench@microsoft.com> Reviewed-by: Ronnie Sahlberg <lsahlber@redhat.com>