summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2018-10-02Merge tag 'mlx5-fixes-2018-10-01' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux Saeed Mahameed says: ==================== Mellanox, mlx5 fixes 2018-10-01 This pull request includes some fixes to mlx5 driver, Please pull and let me know if there's any problem. For -stable v4.11: "6e0a4a23c59a ('net/mlx5: E-Switch, Fix out of bound access when setting vport rate')" For -stable v4.18: "98d6627c372a ('net/mlx5e: Set vlan masks for all offloaded TC rules')" ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-02Merge branch 'rmnet-fixes'David S. Miller
Subash Abhinov Kasiviswanathan says: ==================== net: qualcomm: rmnet: Updates 2018-10-02 This series is a set of small fixes for rmnet driver Patch 1 is a fix for a scenario reported by syzkaller Patch 2 & 3 are fixes for incorrect allocation flags ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-02net: qualcomm: rmnet: Fix incorrect allocation flag in receive pathSubash Abhinov Kasiviswanathan
The incoming skb needs to be reallocated in case the headroom is not sufficient to adjust the ethernet header. This allocation needs to be atomic otherwise it results in this splat [<600601bb>] ___might_sleep+0x185/0x1a3 [<603f6314>] ? _raw_spin_unlock_irqrestore+0x0/0x27 [<60069bb0>] ? __wake_up_common_lock+0x95/0xd1 [<600602b0>] __might_sleep+0xd7/0xe2 [<60065598>] ? enqueue_task_fair+0x112/0x209 [<600eea13>] __kmalloc_track_caller+0x5d/0x124 [<600ee9b6>] ? __kmalloc_track_caller+0x0/0x124 [<602696d5>] __kmalloc_reserve.isra.34+0x30/0x7e [<603f629b>] ? _raw_spin_lock_irqsave+0x0/0x3d [<6026b744>] pskb_expand_head+0xbf/0x310 [<6025ca6a>] rmnet_rx_handler+0x7e/0x16b [<6025c9ec>] ? rmnet_rx_handler+0x0/0x16b [<6027ad0c>] __netif_receive_skb_core+0x301/0x96f [<60033c17>] ? set_signals+0x0/0x40 [<6027bbcb>] __netif_receive_skb+0x24/0x8e Fixes: 74692caf1b0b ("net: qualcomm: rmnet: Process packets over ethernet") Signed-off-by: Sean Tranchetti <stranche@codeaurora.org> Signed-off-by: Subash Abhinov Kasiviswanathan <subashab@codeaurora.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-02net: qualcomm: rmnet: Fix incorrect allocation flag in transmitSubash Abhinov Kasiviswanathan
The incoming skb needs to be reallocated in case the headroom is not sufficient to add the MAP header. This allocation needs to be atomic otherwise it results in the following splat [32805.801456] BUG: sleeping function called from invalid context [32805.841141] Internal error: Oops - BUG: 0 [#1] PREEMPT SMP [32805.904773] task: ffffffd7c5f62280 task.stack: ffffff80464a8000 [32805.910851] pc : ___might_sleep+0x180/0x188 [32805.915143] lr : ___might_sleep+0x180/0x188 [32806.131520] Call trace: [32806.134041] ___might_sleep+0x180/0x188 [32806.137980] __might_sleep+0x50/0x84 [32806.141653] __kmalloc_track_caller+0x80/0x3bc [32806.146215] __kmalloc_reserve+0x3c/0x88 [32806.150241] pskb_expand_head+0x74/0x288 [32806.154269] rmnet_egress_handler+0xb0/0x1d8 [32806.162239] rmnet_vnd_start_xmit+0xc8/0x13c [32806.166627] dev_hard_start_xmit+0x148/0x280 [32806.181181] sch_direct_xmit+0xa4/0x198 [32806.185125] __qdisc_run+0x1f8/0x310 [32806.188803] net_tx_action+0x23c/0x26c [32806.192655] __do_softirq+0x220/0x408 [32806.196420] do_softirq+0x4c/0x70 Fixes: ceed73a2cf4a ("drivers: net: ethernet: qualcomm: rmnet: Initial implementation") Signed-off-by: Subash Abhinov Kasiviswanathan <subashab@codeaurora.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-02net: qualcomm: rmnet: Skip processing loopback packetsSean Tranchetti
RMNET RX handler was processing invalid packets that were originally sent on the real device and were looped back via dev_loopback_xmit(). This was detected using syzkaller. Fixes: ceed73a2cf4a ("drivers: net: ethernet: qualcomm: rmnet: Initial implementation") Signed-off-by: Sean Tranchetti <stranche@codeaurora.org> Signed-off-by: Subash Abhinov Kasiviswanathan <subashab@codeaurora.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-02net: systemport: Fix wake-up interrupt race during resumeFlorian Fainelli
The AON_PM_L2 is normally used to trigger and identify the source of a wake-up event. Since the RX_SYS clock is no longer turned off, we also have an interrupt being sent to the SYSTEMPORT INTRL_2_0 controller, and that interrupt remains active up until the magic packet detector is disabled which happens much later during the driver resumption. The race happens if we have a CPU that is entering the SYSTEMPORT INTRL2_0 handler during resume, and another CPU has managed to clear the wake-up interrupt during bcm_sysport_resume_from_wol(). In that case, we have the first CPU stuck in the interrupt handler with an interrupt cause that has been cleared under its feet, and so we keep returning IRQ_NONE and we never make any progress. This was not a problem before because we would always turn off the RX_SYS clock during WoL, so the SYSTEMPORT INTRL2_0 would also be turned off as well, thus not latching the interrupt. The fix is to make sure we do not enable either the MPD or BRCM_TAG_MATCH interrupts since those are redundant with what the AON_PM_L2 interrupt controller already processes and they would cause such a race to occur. Fixes: bb9051a2b230 ("net: systemport: Add support for WAKE_FILTER") Fixes: 83e82f4c706b ("net: systemport: add Wake-on-LAN support") Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-02smb3: fix lease break problem introduced by compoundingSteve French
Fixes problem (discovered by Aurelien) introduced by recent commit: commit b24df3e30cbf48255db866720fb71f14bf9d2f39 ("cifs: update receive_encrypted_standard to handle compounded responses") which broke the ability to respond to some lease breaks (lease breaks being ignored is a problem since can block server response for duration of the lease break timeout). Signed-off-by: Steve French <stfrench@microsoft.com> Reviewed-by: Ronnie Sahlberg <lsahlber@redhat.com> Reviewed-by: Pavel Shilovsky <pshilov@microsoft.com>
2018-10-02cifs: only wake the thread for the very last PDU in a compoundRonnie Sahlberg
For compounded PDUs we whould only wake the waiting thread for the very last PDU of the compound. We do this so that we are guaranteed that the demultiplex_thread will not process or access any of those MIDs any more once the send/recv thread starts processing. Else there is a race where at the end of the send/recv processing we will try to delete all the mids of the compound. If the multiplex thread still has other mids to process at this point for this compound this can lead to an oops. Needed to fix recent commit: commit 730928c8f4be88e9d6a027a16b1e8fa9c59fc077 ("cifs: update smb2_queryfs() to use compounding") Signed-off-by: Ronnie Sahlberg <lsahlber@redhat.com> Signed-off-by: Steve French <stfrench@microsoft.com> Reviewed-by: Pavel Shilovsky <pshilov@microsoft.com>
2018-10-02Merge tag 'wireless-drivers-for-davem-2018-10-01' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/kvalo/wireless-drivers Kalle Valo says: ==================== wireless-drivers fixes for 4.19 First, and also hopefully the last, set of fixes for 4.19. All small but still important fixes mt76x0 * fix a bug when a virtual interface is removed multiple times b43 * fix DMA error related regression with proprietary firmware iwlwifi * fix an oops which was a regression in v4.19-rc1 ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-02cifs: add a warning if we try to to dequeue a deleted midRonnie Sahlberg
cifs_delete_mid() is called once we are finished handling a mid and we expect no more work done on this mid. Needed to fix recent commit: commit 730928c8f4be88e9d6a027a16b1e8fa9c59fc077 ("cifs: update smb2_queryfs() to use compounding") Add a warning if someone tries to dequeue a mid that has already been flagged to be deleted. Also change list_del() to list_del_init() so that if we have similar bugs resurface in the future we will not oops. Signed-off-by: Ronnie Sahlberg <lsahlber@redhat.com> Signed-off-by: Steve French <stfrench@microsoft.com> Reviewed-by: Pavel Shilovsky <pshilov@microsoft.com>
2018-10-02smb2: fix missing files in root share directory listingAurelien Aptel
When mounting a Windows share that is the root of a drive (eg. C$) the server does not return . and .. directory entries. This results in the smb2 code path erroneously skipping the 2 first entries. Pseudo-code of the readdir() code path: cifs_readdir(struct file, struct dir_context) initiate_cifs_search <-- if no reponse cached yet server->ops->query_dir_first dir_emit_dots dir_emit <-- adds "." and ".." if we're at pos=0 find_cifs_entry initiate_cifs_search <-- if pos < start of current response (restart search) server->ops->query_dir_next <-- if pos > end of current response (fetch next search res) for(...) <-- loops over cur response entries starting at pos cifs_filldir <-- skip . and .., emit entry cifs_fill_dirent dir_emit pos++ A) dir_emit_dots() always adds . & .. and sets the current dir pos to 2 (0 and 1 are done). Therefore we always want the index_to_find to be 2 regardless of if the response has . and .. B) smb1 code initializes index_of_last_entry with a +2 offset in cifssmb.c CIFSFindFirst(): psrch_inf->index_of_last_entry = 2 /* skip . and .. */ + psrch_inf->entries_in_buffer; Later in find_cifs_entry() we want to find the next dir entry at pos=2 as a result of (A) first_entry_in_buffer = cfile->srch_inf.index_of_last_entry - cfile->srch_inf.entries_in_buffer; This var is the dir pos that the first entry in the buffer will have therefore it must be 2 in the first call. If we don't offset index_of_last_entry by 2 (like in (B)), first_entry_in_buffer=0 but we were instructed to get pos=2 so this code in find_cifs_entry() skips the 2 first which is ok for non-root shares, as it skips . and .. from the response but is not ok for root shares where the 2 first are actual files pos_in_buf = index_to_find - first_entry_in_buffer; // pos_in_buf=2 // we skip 2 first response entries :( for (i = 0; (i < (pos_in_buf)) && (cur_ent != NULL); i++) { /* go entry by entry figuring out which is first */ cur_ent = nxt_dir_entry(cur_ent, end_of_smb, cfile->srch_inf.info_level); } C) cifs_filldir() skips . and .. so we can safely ignore them for now. Sample program: int main(int argc, char **argv) { const char *path = argc >= 2 ? argv[1] : "."; DIR *dh; struct dirent *de; printf("listing path <%s>\n", path); dh = opendir(path); if (!dh) { printf("opendir error %d\n", errno); return 1; } while (1) { de = readdir(dh); if (!de) { if (errno) { printf("readdir error %d\n", errno); return 1; } printf("end of listing\n"); break; } printf("off=%lu <%s>\n", de->d_off, de->d_name); } return 0; } Before the fix with SMB1 on root shares: <.> off=1 <..> off=2 <$Recycle.Bin> off=3 <bootmgr> off=4 and on non-root shares: <.> off=1 <..> off=4 <-- after adding .., the offsets jumps to +2 because <2536> off=5 we skipped . and .. from response buffer (C) <411> off=6 but still incremented pos <file> off=7 <fsx> off=8 Therefore the fix for smb2 is to mimic smb1 behaviour and offset the index_of_last_entry by 2. Test results comparing smb1 and smb2 before/after the fix on root share, non-root shares and on large directories (ie. multi-response dir listing): PRE FIX ======= pre-1-root VS pre-2-root: ERR pre-2-root is missing [bootmgr, $Recycle.Bin] pre-1-nonroot VS pre-2-nonroot: OK~ same files, same order, different offsets pre-1-nonroot-large VS pre-2-nonroot-large: OK~ same files, same order, different offsets POST FIX ======== post-1-root VS post-2-root: OK same files, same order, same offsets post-1-nonroot VS post-2-nonroot: OK same files, same order, same offsets post-1-nonroot-large VS post-2-nonroot-large: OK same files, same order, same offsets REGRESSION? =========== pre-1-root VS post-1-root: OK same files, same order, same offsets pre-1-nonroot VS post-1-nonroot: OK same files, same order, same offsets BugLink: https://bugzilla.samba.org/show_bug.cgi?id=13107 Signed-off-by: Aurelien Aptel <aaptel@suse.com> Signed-off-by: Paulo Alcantara <palcantara@suse.deR> Reviewed-by: Ronnie Sahlberg <lsahlber@redhat.com> Signed-off-by: Steve French <stfrench@microsoft.com> CC: Stable <stable@vger.kernel.org>
2018-10-02rtnl: limit IFLA_NUM_TX_QUEUES and IFLA_NUM_RX_QUEUES to 4096Eric Dumazet
We have an impressive number of syzkaller bugs that are linked to the fact that syzbot was able to create a networking device with millions of TX (or RX) queues. Let's limit the number of RX/TX queues to 4096, this really should cover all known cases. A separate patch will add various cond_resched() in the loops handling sysfs entries at device creation and dismantle. Tested: lpaa6:~# ip link add gre-4097 numtxqueues 4097 numrxqueues 4097 type ip6gretap RTNETLINK answers: Invalid argument lpaa6:~# time ip link add gre-4096 numtxqueues 4096 numrxqueues 4096 type ip6gretap real 0m0.180s user 0m0.000s sys 0m0.107s Fixes: 76ff5cc91935 ("rtnl: allow to specify number of rx and tx queues on device creation") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: syzbot <syzkaller@googlegroups.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-02bonding: fix warning messageMahesh Bandewar
RX queue config for bonding master could be different from its slave device(s). With the commit 6a9e461f6fe4 ("bonding: pass link-local packets to bonding master also."), the packet is reinjected into stack with skb->dev as bonding master. This potentially triggers the message: "bondX received packet on queue Y, but number of RX queues is Z" whenever the queue that packet is received on is higher than the numrxqueues on bonding master (Y > Z). Fixes: 6a9e461f6fe4 ("bonding: pass link-local packets to bonding master also.") Reported-by: John Sperbeck <jsperbeck@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Mahesh Bandewar <maheshb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-02inet: make sure to grab rcu_read_lock before using ireq->ireq_optEric Dumazet
Timer handlers do not imply rcu_read_lock(), so my recent fix triggered a LOCKDEP warning when SYNACK is retransmit. Lets add rcu_read_lock()/rcu_read_unlock() pairs around ireq->ireq_opt usages instead of guessing what is done by callers, since it is not worth the pain. Get rid of ireq_opt_deref() helper since it hides the logic without real benefit, since it is now a standard rcu_dereference(). Fixes: 1ad98e9d1bdf ("tcp/dccp: fix lockdep issue when SYN is backlogged") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-02RISCV: Fix end PFN for low memoryAtish Patra
Use memblock_end_of_DRAM which provides correct last low memory PFN. Without that, DMA32 region becomes empty resulting in zero pages being allocated for DMA32. This patch is based on earlier patch from palmer which never merged into 4.19. I just edited the commit text to make more sense. Signed-off-by: Atish Patra <atish.patra@wdc.com> Signed-off-by: Palmer Dabbelt <palmer@sifive.com>
2018-10-02nfp: avoid soft lockups under control message stormJakub Kicinski
When FW floods the driver with control messages try to exit the cmsg processing loop every now and then to avoid soft lockups. Cmsg processing is generally very lightweight so 512 seems like a reasonable budget, which should not be exceeded under normal conditions. Fixes: 77ece8d5f196 ("nfp: add control vNIC datapath") Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Simon Horman <simon.horman@netronome.com> Tested-by: Pieter Jansen van Vuuren <pieter.jansenvanvuuren@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-02declance: Fix continuation with the adapter identification messageMaciej W. Rozycki
Fix a commit 4bcc595ccd80 ("printk: reinstate KERN_CONT for printing continuation lines") regression with the `declance' driver, which caused the adapter identification message to be split between two lines, e.g.: declance.c: v0.011 by Linux MIPS DECstation task force tc6: PMAD-AA , addr = 08:00:2b:1b:2a:6a, irq = 14 tc6: registered as eth0. Address that properly, by printing identification with a single call, making the messages now look like: declance.c: v0.011 by Linux MIPS DECstation task force tc6: PMAD-AA, addr = 08:00:2b:1b:2a:6a, irq = 14 tc6: registered as eth0. Signed-off-by: Maciej W. Rozycki <macro@linux-mips.org> Fixes: 4bcc595ccd80 ("printk: reinstate KERN_CONT for printing continuation lines") Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-02net: fec: fix rare tx timeoutRickard x Andersson
During certain heavy network loads TX could time out with TX ring dump. TX is sometimes never restarted after reaching "tx_stop_threshold" because function "fec_enet_tx_queue" only tests the first queue. In addition the TX timeout callback function failed to recover because it also operated only on the first queue. Signed-off-by: Rickard x Andersson <rickaran@axis.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-02Merge tag 'fbdev-v4.19-rc7' of https://github.com/bzolnier/linuxGreg Kroah-Hartman
Bartlomiej writes: "fbdev fixes for v4.19-rc7: - fix OMAPFB_MEMORY_READ ioctl to not leak kernel memory in omapfb driver (Tomi Valkeinen) - add missing prepare/unprepare clock operations in pxa168fb driver (Lubomir Rintel) - add nobgrt option in efifb driver to disable ACPI BGRT logo restore (Hans de Goede) - fix spelling mistake in fall-through annotation in stifb driver (Gustavo A. R. Silva) - fix URL for uvesafb repository in the documentation (Adam Jackson)" * tag 'fbdev-v4.19-rc7' of https://github.com/bzolnier/linux: video/fbdev/stifb: Fix spelling mistake in fall-through annotation uvesafb: Fix URLs in the documentation efifb: BGRT: Add nobgrt option fbdev/omapfb: fix omapfb_memory_read infoleak pxa168fb: prepare the clock
2018-10-02Merge tag 'mmc-v4.19-rc4' of ↵Greg Kroah-Hartman
git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc Ulf writes: "MMC core: - Fixup conversion of debounce time to/from ms/us MMC host: - sdhi: Fixup whitelisting for Gen3 types" * tag 'mmc-v4.19-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc: mmc: slot-gpio: Fix debounce time to use miliseconds again mmc: core: Fix debounce time to use microseconds mmc: sdhi: sys_dmac: check for all Gen3 types when whitelisting
2018-10-02drm/cma-helper: Fix crash in fbdev error pathNoralf Trønnes
Sergey Suloev reported a crash happening in drm_client_dev_hotplug() when fbdev had failed to register. [ 9.124598] vc4_hdmi 3f902000.hdmi: ASoC: Failed to create component debugfs directory [ 9.147667] vc4_hdmi 3f902000.hdmi: vc4-hdmi-hifi <-> 3f902000.hdmi mapping ok [ 9.155184] vc4_hdmi 3f902000.hdmi: ASoC: no DMI vendor name! [ 9.166544] vc4-drm soc:gpu: bound 3f902000.hdmi (ops vc4_hdmi_ops [vc4]) [ 9.173840] vc4-drm soc:gpu: bound 3f806000.vec (ops vc4_vec_ops [vc4]) [ 9.181029] vc4-drm soc:gpu: bound 3f004000.txp (ops vc4_txp_ops [vc4]) [ 9.188519] vc4-drm soc:gpu: bound 3f400000.hvs (ops vc4_hvs_ops [vc4]) [ 9.195690] vc4-drm soc:gpu: bound 3f206000.pixelvalve (ops vc4_crtc_ops [vc4]) [ 9.203523] vc4-drm soc:gpu: bound 3f207000.pixelvalve (ops vc4_crtc_ops [vc4]) [ 9.215032] vc4-drm soc:gpu: bound 3f807000.pixelvalve (ops vc4_crtc_ops [vc4]) [ 9.274785] vc4-drm soc:gpu: bound 3fc00000.v3d (ops vc4_v3d_ops [vc4]) [ 9.290246] [drm] Initialized vc4 0.0.0 20140616 for soc:gpu on minor 0 [ 9.297464] [drm] Supports vblank timestamp caching Rev 2 (21.10.2013). [ 9.304600] [drm] Driver supports precise vblank timestamp query. [ 9.382856] vc4-drm soc:gpu: [drm:drm_fb_helper_fbdev_setup [drm_kms_helper]] *ERROR* Failed to set fbdev configuration [ 10.404937] Unable to handle kernel paging request at virtual address 00330a656369768a [ 10.441620] [00330a656369768a] address between user and kernel address ranges [ 10.449087] Internal error: Oops: 96000004 [#1] PREEMPT SMP [ 10.454762] Modules linked in: brcmfmac vc4 drm_kms_helper cfg80211 drm rfkill smsc95xx brcmutil usbnet drm_panel_orientation_quirks raspberrypi_hwmon bcm2835_dma crc32_ce pwm_bcm2835 bcm2835_rng virt_dma rng_core i2c_bcm2835 ip_tables x_tables ipv6 [ 10.477296] CPU: 2 PID: 45 Comm: kworker/2:1 Not tainted 4.19.0-rc5 #3 [ 10.483934] Hardware name: Raspberry Pi 3 Model B Rev 1.2 (DT) [ 10.489966] Workqueue: events output_poll_execute [drm_kms_helper] [ 10.596515] Process kworker/2:1 (pid: 45, stack limit = 0x000000007e8924dc) [ 10.603590] Call trace: [ 10.606259] drm_client_dev_hotplug+0x5c/0xb0 [drm] [ 10.611303] drm_kms_helper_hotplug_event+0x30/0x40 [drm_kms_helper] [ 10.617849] output_poll_execute+0xc4/0x1e0 [drm_kms_helper] [ 10.623616] process_one_work+0x1c8/0x318 [ 10.627695] worker_thread+0x48/0x428 [ 10.631420] kthread+0xf8/0x128 [ 10.634615] ret_from_fork+0x10/0x18 [ 10.638255] Code: 54000220 f9401261 aa1303e0 b4000141 (f9400c21) [ 10.644456] ---[ end trace c75b4a4b0e141908 ]--- The reason for this is that drm_fbdev_cma_init() removes the drm_client when fbdev registration fails, but it doesn't remove the client from the drm_device client list. So the client list now has a pointer that points into the unknown and we have a 'use after free' situation. Split drm_client_new() into drm_client_init() and drm_client_add() to fix removal in the error path. Fixes: 894a677f4b3e ("drm/cma-helper: Use the generic fbdev emulation") Reported-by: Sergey Suloev <ssuloev@orpaltech.com> Cc: Stefan Wahren <stefan.wahren@i2se.com> Cc: Eric Anholt <eric@anholt.net> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Signed-off-by: Noralf Trønnes <noralf@tronnes.org> Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch> Link: https://patchwork.freedesktop.org/patch/msgid/20181001194536.57756-1-noralf@tronnes.org
2018-10-02drm: fix use-after-free read in drm_mode_create_lease_ioctl()Jann Horn
fd_install() moves the reference given to it into the file descriptor table of the current process. If the current process is multithreaded, then immediately after fd_install(), another thread can close() the file descriptor and cause the file's resources to be cleaned up. Since the reference to "lessee" is held by the file, we must not access "lessee" after the fd_install() call. As far as I can tell, to reach this codepath, the caller must have an open file descriptor to a DRI device in master mode. I'm not sure what the requirements for that are. Signed-off-by: Jann Horn <jannh@google.com> Fixes: 62884cd386b8 ("drm: Add four ioctls for managing drm mode object leases [v7]") Cc: stable@vger.kernel.org Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch> Link: https://patchwork.freedesktop.org/patch/msgid/20181001153117.216923-1-jannh@google.com
2018-10-01r8169: fix network stalls due to missing bit TXCFG_AUTO_FIFOHeiner Kallweit
Some of the chip-specific hw_start functions set bit TXCFG_AUTO_FIFO in register TxConfig. The original patch changed the order of some calls resulting in these changes being overwritten by rtl_set_tx_config_registers() in rtl_hw_start(). This eventually resulted in network stalls especially under high load. Analyzing the chip-specific hw_start functions all chip version from 34, with the exception of version 39, need this bit set. This patch moves setting this bit to rtl_set_tx_config_registers(). Fixes: 4fd48c4ac0a0 ("r8169: move common initializations to tp->hw_start") Reported-by: Ortwin Glück <odi@odi.ch> Reported-by: David Arendt <admin@prnet.org> Root-caused-by: Maciej S. Szmigiero <mail@maciej.szmigiero.name> Tested-by: Tony Atkinson <tatkinson@linux.com> Tested-by: David Arendt <admin@prnet.org> Tested-by: Ortwin Glück <odi@odi.ch> Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-01Merge branch 'tun-races'David S. Miller
Eric Dumazet says: ==================== tun: address two syzbot reports Small changes addressing races discovered by syzbot. First patch is a cleanup. Second patch moves a mutex init sooner. Third patch makes sure each tfile gets its own napi enable flags. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-01tun: napi flags belong to tfileEric Dumazet
Since tun->flags might be shared by multiple tfile structures, it is better to make sure tun_get_user() is using the flags for the current tfile. Presence of the READ_ONCE() in tun_napi_frags_enabled() gave a hint of what could happen, but we need something stronger to please syzbot. kasan: CONFIG_KASAN_INLINE enabled kasan: GPF could be caused by NULL-ptr deref or user memory access general protection fault: 0000 [#1] PREEMPT SMP KASAN CPU: 0 PID: 13647 Comm: syz-executor5 Not tainted 4.19.0-rc5+ #59 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 RIP: 0010:dev_gro_receive+0x132/0x2720 net/core/dev.c:5427 Code: 48 c1 ea 03 80 3c 02 00 0f 85 6e 20 00 00 48 b8 00 00 00 00 00 fc ff df 4d 8b 6e 10 49 8d bd d0 00 00 00 48 89 fa 48 c1 ea 03 <80> 3c 02 00 0f 85 59 20 00 00 4d 8b a5 d0 00 00 00 31 ff 41 81 e4 RSP: 0018:ffff8801c400f410 EFLAGS: 00010202 RAX: dffffc0000000000 RBX: 0000000000000000 RCX: ffffffff8618d325 RDX: 000000000000001a RSI: ffffffff86189f97 RDI: 00000000000000d0 RBP: ffff8801c400f608 R08: ffff8801c8fb4300 R09: 0000000000000000 R10: ffffed0038801ed7 R11: 0000000000000003 R12: ffff8801d327d358 R13: 0000000000000000 R14: ffff8801c16dd8c0 R15: 0000000000000004 FS: 00007fe003615700(0000) GS:ffff8801dac00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007fe1f3c43db8 CR3: 00000001bebb2000 CR4: 00000000001406f0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: napi_gro_frags+0x3f4/0xc90 net/core/dev.c:5715 tun_get_user+0x31d5/0x42a0 drivers/net/tun.c:1922 tun_chr_write_iter+0xb9/0x154 drivers/net/tun.c:1967 call_write_iter include/linux/fs.h:1808 [inline] new_sync_write fs/read_write.c:474 [inline] __vfs_write+0x6b8/0x9f0 fs/read_write.c:487 vfs_write+0x1fc/0x560 fs/read_write.c:549 ksys_write+0x101/0x260 fs/read_write.c:598 __do_sys_write fs/read_write.c:610 [inline] __se_sys_write fs/read_write.c:607 [inline] __x64_sys_write+0x73/0xb0 fs/read_write.c:607 do_syscall_64+0x1b9/0x820 arch/x86/entry/common.c:290 entry_SYSCALL_64_after_hwframe+0x49/0xbe RIP: 0033:0x457579 Code: 1d b4 fb ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 eb b3 fb ff c3 66 2e 0f 1f 84 00 00 00 00 RSP: 002b:00007fe003614c78 EFLAGS: 00000246 ORIG_RAX: 0000000000000001 RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 0000000000457579 RDX: 0000000000000012 RSI: 0000000020000000 RDI: 000000000000000a RBP: 000000000072c040 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000246 R12: 00007fe0036156d4 R13: 00000000004c5574 R14: 00000000004d8e98 R15: 00000000ffffffff Modules linked in: RIP: 0010:dev_gro_receive+0x132/0x2720 net/core/dev.c:5427 Code: 48 c1 ea 03 80 3c 02 00 0f 85 6e 20 00 00 48 b8 00 00 00 00 00 fc ff df 4d 8b 6e 10 49 8d bd d0 00 00 00 48 89 fa 48 c1 ea 03 <80> 3c 02 00 0f 85 59 20 00 00 4d 8b a5 d0 00 00 00 31 ff 41 81 e4 RSP: 0018:ffff8801c400f410 EFLAGS: 00010202 RAX: dffffc0000000000 RBX: 0000000000000000 RCX: ffffffff8618d325 RDX: 000000000000001a RSI: ffffffff86189f97 RDI: 00000000000000d0 RBP: ffff8801c400f608 R08: ffff8801c8fb4300 R09: 0000000000000000 R10: ffffed0038801ed7 R11: 0000000000000003 R12: ffff8801d327d358 R13: 0000000000000000 R14: ffff8801c16dd8c0 R15: 0000000000000004 FS: 00007fe003615700(0000) GS:ffff8801dac00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007fe1f3c43db8 CR3: 00000001bebb2000 CR4: 00000000001406f0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Fixes: 90e33d459407 ("tun: enable napi_gro_frags() for TUN/TAP driver") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: syzbot <syzkaller@googlegroups.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-01tun: initialize napi_mutex unconditionallyEric Dumazet
This is the first part to fix following syzbot report : console output: https://syzkaller.appspot.com/x/log.txt?x=145378e6400000 kernel config: https://syzkaller.appspot.com/x/.config?x=443816db871edd66 dashboard link: https://syzkaller.appspot.com/bug?extid=e662df0ac1d753b57e80 Following patch is fixing the race condition, but it seems safer to initialize this mutex at tfile creation anyway. Fixes: 90e33d459407 ("tun: enable napi_gro_frags() for TUN/TAP driver") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: syzbot+e662df0ac1d753b57e80@syzkaller.appspotmail.com Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-01tun: remove unused parametersEric Dumazet
tun_napi_disable() and tun_napi_del() do not need a pointer to the tun_struct Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-01bond: take rcu lock in netpoll_send_skb_on_devDave Jones
The bonding driver lacks the rcu lock when it calls down into netdev_lower_get_next_private_rcu from bond_poll_controller, which results in a trace like: WARNING: CPU: 2 PID: 179 at net/core/dev.c:6567 netdev_lower_get_next_private_rcu+0x34/0x40 CPU: 2 PID: 179 Comm: kworker/u16:15 Not tainted 4.19.0-rc5-backup+ #1 Workqueue: bond0 bond_mii_monitor RIP: 0010:netdev_lower_get_next_private_rcu+0x34/0x40 Code: 48 89 fb e8 fe 29 63 ff 85 c0 74 1e 48 8b 45 00 48 81 c3 c0 00 00 00 48 8b 00 48 39 d8 74 0f 48 89 45 00 48 8b 40 f8 5b 5d c3 <0f> 0b eb de 31 c0 eb f5 0f 1f 40 00 0f 1f 44 00 00 48 8> RSP: 0018:ffffc9000087fa68 EFLAGS: 00010046 RAX: 0000000000000000 RBX: ffff880429614560 RCX: 0000000000000000 RDX: 0000000000000001 RSI: 00000000ffffffff RDI: ffffffffa184ada0 RBP: ffffc9000087fa80 R08: 0000000000000001 R09: 0000000000000000 R10: ffffc9000087f9f0 R11: ffff880429798040 R12: ffff8804289d5980 R13: ffffffffa1511f60 R14: 00000000000000c8 R15: 00000000ffffffff FS: 0000000000000000(0000) GS:ffff88042f880000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007f4b78fce180 CR3: 000000018180f006 CR4: 00000000001606e0 Call Trace: bond_poll_controller+0x52/0x170 netpoll_poll_dev+0x79/0x290 netpoll_send_skb_on_dev+0x158/0x2c0 netpoll_send_udp+0x2d5/0x430 write_ext_msg+0x1e0/0x210 console_unlock+0x3c4/0x630 vprintk_emit+0xfa/0x2f0 printk+0x52/0x6e ? __netdev_printk+0x12b/0x220 netdev_info+0x64/0x80 ? bond_3ad_set_carrier+0xe9/0x180 bond_select_active_slave+0x1fc/0x310 bond_mii_monitor+0x709/0x9b0 process_one_work+0x221/0x5e0 worker_thread+0x4f/0x3b0 kthread+0x100/0x140 ? process_one_work+0x5e0/0x5e0 ? kthread_delayed_work_timer_fn+0x90/0x90 ret_from_fork+0x24/0x30 We're also doing rcu dereferences a layer up in netpoll_send_skb_on_dev before we call down into netpoll_poll_dev, so just take the lock there. Suggested-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: Dave Jones <davej@codemonkey.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-01rtnetlink: Fail dump if target netnsid is invalidDavid Ahern
Link dumps can return results from a target namespace. If the namespace id is invalid, then the dump request should fail if get_target_net fails rather than continuing with a dump of the current namespace. Fixes: 79e1ad148c844 ("rtnetlink: use netnsid to query interface") Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-01Revert "openvswitch: Fix template leak in error cases."Flavio Leitner
This reverts commit 90c7afc96cbbd77f44094b5b651261968e97de67. When the commit was merged, the code used nf_ct_put() to free the entry, but later on commit 76644232e612 ("openvswitch: Free tmpl with tmpl_free.") replaced that with nf_ct_tmpl_free which is a more appropriate. Now the original problem is removed. Then 44d6e2f27328 ("net: Replace NF_CT_ASSERT() with WARN_ON().") replaced a debug assert with a WARN_ON() which is trigged now. Signed-off-by: Flavio Leitner <fbl@redhat.com> Acked-by: Joe Stringer <joe@ovn.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-01Merge branch 'for-upstream' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth Johan Hedberg says: ==================== pull request: bluetooth 2018-09-27 Here's one more Bluetooth fix for 4.19, fixing the handling of an attempt to unpair a device while pairing is in progress. Let me know if there are any issues pulling. Thanks. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-01tipc: ignore STATE_MSG on wrong link sessionLUU Duc Canh
The initial session number when a link is created is based on a random value, taken from struct tipc_net->random. It is then incremented for each link reset to avoid mixing protocol messages from different link sessions. However, when a bearer is reset all its links are deleted, and will later be re-created using the same random value as the first time. This means that if the link never went down between creation and deletion we will still sometimes have two subsequent sessions with the same session number. In virtual environments with potentially long transmission times this has turned out to be a real problem. We now fix this by randomizing the session number each time a link is created. With a session number size of 16 bits this gives a risk of session collision of 1/64k. To reduce this further, we also introduce a sanity check on the very first STATE message arriving at a link. If this has an acknowledge value differing from 0, which is logically impossible, we ignore the message. The final risk for session collision is hence reduced to 1/4G, which should be sufficient. Signed-off-by: LUU Duc Canh <canh.d.luu@dektech.com.au> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-01net: sched: act_ipt: check for underflow in __tcf_ipt_init()Dan Carpenter
If "td->u.target_size" is larger than sizeof(struct xt_entry_target) we return -EINVAL. But we don't check whether it's smaller than sizeof(struct xt_entry_target) and that could lead to an out of bounds read. Fixes: 7ba699c604ab ("[NET_SCHED]: Convert actions from rtnetlink to new netlink API") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-01Merge branch 'master' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec Steffen Klassert says: ==================== pull request (net): ipsec 2018-10-01 1) Validate address prefix lengths in the xfrm selector, otherwise we may hit undefined behaviour in the address matching functions if the prefix is too big for the given address family. 2) Fix skb leak on local message size errors. From Thadeu Lima de Souza Cascardo. 3) We currently reset the transport header back to the network header after a transport mode transformation is applied. This leads to an incorrect transport header when multiple transport mode transformations are applied. Reset the transport header only after all transformations are already applied to fix this. From Sowmini Varadhan. 4) We only support one offloaded xfrm, so reset crypto_done after the first transformation in xfrm_input(). Otherwise we may call the wrong input method for subsequent transformations. From Sowmini Varadhan. 5) Fix NULL pointer dereference when skb_dst_force clears the dst_entry. skb_dst_force does not really force a dst refcount anymore, it might clear it instead. xfrm code did not expect this, add a check to not dereference skb_dst() if it was cleared by skb_dst_force. 6) Validate xfrm template mode, otherwise we can get a stack-out-of-bounds read in xfrm_state_find. From Sean Tranchetti. Please pull or let me know if there are problems. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-01Merge tag 'arm64-fixes' of ↵Greg Kroah-Hartman
git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux Will writes: "Late arm64 fixes - Fix handling of young contiguous ptes for hugetlb mappings - Fix livelock when taking access faults on contiguous hugetlb mappings - Tighten up register accesses via KVM SET_ONE_REG ioctl()s" * tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: arm64: KVM: Sanitize PSTATE.M when being set from userspace arm64: KVM: Tighten guest core register access from userspace arm64: hugetlb: Avoid unnecessary clearing in huge_ptep_set_access_flags arm64: hugetlb: Fix handling of young ptes
2018-10-01Merge tag 'armsoc-fixes' of ↵Greg Kroah-Hartman
git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc Olof writes: "ARM: SoC fixes A handful of fixes that have been coming in the last couple of weeks: - Freescale fixes for on-chip accellerators - A DT fix for stm32 to avoid fallback to non-DMA SPI mode - Fixes for badly specified interrupts on BCM63xx SoCs - Allwinner A64 HDMI was incorrectly specified as fully compatble with R40 - Drive strength fix for SAMA5D2 NAND pins on one board" * tag 'armsoc-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc: ARM: dts: stm32: update SPI6 dmas property on stm32mp157c soc: fsl: qe: Fix copy/paste bug in ucc_get_tdm_sync_shift() soc: fsl: qbman: qman: avoid allocating from non existing gen_pool ARM: dts: BCM63xx: Fix incorrect interrupt specifiers MAINTAINERS: update the Annapurna Labs maintainer email ARM: dts: sun8i: drop A64 HDMI PHY fallback compatible from R40 DT ARM: dts: at91: sama5d2_ptc_ek: fix nand pinctrl
2018-10-01Merge tag 'pstore-v4.19-rc7' of ↵Greg Kroah-Hartman
https://git.kernel.org/pub/scm/linux/kernel/git/kees/linux Kees writes: "Pstore fixes for v4.19-rc7 - Fix failure-path memory leak in ramoops_init (nixiaoming)" * tag 'pstore-v4.19-rc7' of https://git.kernel.org/pub/scm/linux/kernel/git/kees/linux: pstore/ram: Fix failure-path memory leak in ramoops_init
2018-10-01tcp/dccp: fix lockdep issue when SYN is backloggedEric Dumazet
In normal SYN processing, packets are handled without listener lock and in RCU protected ingress path. But syzkaller is known to be able to trick us and SYN packets might be processed in process context, after being queued into socket backlog. In commit 06f877d613be ("tcp/dccp: fix other lockdep splats accessing ireq_opt") I made a very stupid fix, that happened to work mostly because of the regular path being RCU protected. Really the thing protecting ireq->ireq_opt is RCU read lock, and the pseudo request refcnt is not relevant. This patch extends what I did in commit 449809a66c1d ("tcp/dccp: block BH for SYN processing") by adding an extra rcu_read_{lock|unlock} pair in the paths that might be taken when processing SYN from socket backlog (thus possibly in process context) Fixes: 06f877d613be ("tcp/dccp: fix other lockdep splats accessing ireq_opt") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: syzbot <syzkaller@googlegroups.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-01Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nfDavid S. Miller
Pablo Neira Ayuso says: ==================== Netfilter fixes for net The following patchset contains Netfilter fixes for your net tree: 1) Skip ip_sabotage_in() for packet making into the VRF driver, otherwise packets are dropped, from David Ahern. 2) Clang compilation warning uncovering typo in the nft_validate_register_store() call from nft_osf, from Stefan Agner. 3) Double sizeof netlink message length calculations in ctnetlink, from zhong jiang. 4) Missing rb_erase() on batch full in rbtree garbage collector, from Taehee Yoo. 5) Calm down compilation warning in nf_hook(), from Florian Westphal. 6) Missing check for non-null sk in xt_socket before validating netns procedence, from Flavio Leitner. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-01net/mlx5e: Set vlan masks for all offloaded TC rulesJianbo Liu
In flow steering, if asked to, the hardware matches on the first ethertype which is not vlan. It's possible to set a rule as follows, which is meant to match on untagged packet, but will match on a vlan packet: tc filter add dev eth0 parent ffff: protocol ip flower ... To avoid this for packets with single tag, we set vlan masks to tell hardware to check the tags for every matched packet. Fixes: 095b6cfd69ce ('net/mlx5e: Add TC vlan match parsing') Signed-off-by: Jianbo Liu <jianbol@mellanox.com> Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-10-01net/mlx5: E-Switch, Fix out of bound access when setting vport rateEran Ben Elisha
The code that deals with eswitch vport bw guarantee was going beyond the eswitch vport array limit, fix that. This was pointed out by the kernel address sanitizer (KASAN). The error from KASAN log: [2018-09-15 15:04:45] BUG: KASAN: slab-out-of-bounds in mlx5_eswitch_set_vport_rate+0x8c1/0xae0 [mlx5_core] Fixes: c9497c98901c ("net/mlx5: Add support for setting VF min rate") Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com> Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-10-01net/mlx5e: Avoid unbounded peer devices when unpairing TC hairpin rulesAlaa Hleihel
If the peer device was already unbound, then do not attempt to modify it's resources, otherwise we will crash on dereferencing non-existing device. Fixes: 5c65c564c962 ("net/mlx5e: Support offloading TC NIC hairpin flows") Signed-off-by: Alaa Hleihel <alaa@mellanox.com> Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-10-01drm/i915: Avoid compiler warning for maybe unused gu_misc_iirChris Wilson
/kisskb/src/drivers/gpu/drm/i915/i915_irq.c: warning: 'gu_misc_iir' may be used uninitialized in this function [-Wuninitialized]: => 3120:10 Silence the compiler warning by ensuring that the local variable is initialised and removing the guard that is confusing the older gcc. Reported-by: Geert Uytterhoeven <geert@linux-m68k.org> Fixes: df0d28c185ad ("drm/i915/icl: GSE interrupt moves from DE_MISC to GU_MISC") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> Cc: Paulo Zanoni <paulo.r.zanoni@intel.com> Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20180926104718.17462-1-chris@chris-wilson.co.uk (cherry picked from commit 7a90938332d80faf973fbcffdf6e674e7b8f0914) Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2018-10-01drm/i915: Do not redefine the has_csr parameter.Anusha Srivatsa
Let us reuse the already defined has_csr check and not redefine it. The main difference is that in effect this will flip .has_csr to 1 (via GEN9_FEATURES which GEN11_FEATURES pulls in). Suggested-by: Imre Deak <imre.deak@intel.com> Cc: Imre Deak <imre.deak@intel.com> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com> Signed-off-by: Anusha Srivatsa <anusha.srivatsa@intel.com> Fixes: https://bugs.freedesktop.org/show_bug.cgi?id=107382 Reviewed-by: Imre Deak <imre.deak@intel.com> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/1534527210-16841-1-git-send-email-anusha.srivatsa@intel.com (cherry picked from commit da4468a1aa75457e6134127b19761b7ba62ce945) Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2018-10-01KVM: x86: fix L1TF's MMIO GFN calculationSean Christopherson
One defense against L1TF in KVM is to always set the upper five bits of the *legal* physical address in the SPTEs for non-present and reserved SPTEs, e.g. MMIO SPTEs. In the MMIO case, the GFN of the MMIO SPTE may overlap with the upper five bits that are being usurped to defend against L1TF. To preserve the GFN, the bits of the GFN that overlap with the repurposed bits are shifted left into the reserved bits, i.e. the GFN in the SPTE will be split into high and low parts. When retrieving the GFN from the MMIO SPTE, e.g. to check for an MMIO access, get_mmio_spte_gfn() unshifts the affected bits and restores the original GFN for comparison. Unfortunately, get_mmio_spte_gfn() neglects to mask off the reserved bits in the SPTE that were used to store the upper chunk of the GFN. As a result, KVM fails to detect MMIO accesses whose GPA overlaps the repurprosed bits, which in turn causes guest panics and hangs. Fix the bug by generating a mask that covers the lower chunk of the GFN, i.e. the bits that aren't shifted by the L1TF mitigation. The alternative approach would be to explicitly zero the five reserved bits that are used to store the upper chunk of the GFN, but that requires additional run-time computation and makes an already-ugly bit of code even more inscrutable. I considered adding a WARN_ON_ONCE(low_phys_bits-1 <= PAGE_SHIFT) to warn if GENMASK_ULL() generated a nonsensical value, but that seemed silly since that would mean a system that supports VMX has less than 18 bits of physical address space... Reported-by: Sakari Ailus <sakari.ailus@iki.fi> Fixes: d9b47449c1a1 ("kvm: x86: Set highest physical address bits in non-present/reserved SPTEs") Cc: Junaid Shahid <junaids@google.com> Cc: Jim Mattson <jmattson@google.com> Cc: stable@vger.kernel.org Reviewed-by: Junaid Shahid <junaids@google.com> Tested-by: Sakari Ailus <sakari.ailus@linux.intel.com> Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-10-01tools/kvm_stat: cut down decimal places in update interval dialogStefan Raspl
We currently display the default number of decimal places for floats in _show_set_update_interval(), which is quite pointless. Cutting down to a single decimal place. Signed-off-by: Stefan Raspl <raspl@linux.ibm.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-10-01KVM: nVMX: Fix emulation of VM_ENTRY_LOAD_BNDCFGSLiran Alon
L2 IA32_BNDCFGS should be updated with vmcs12->guest_bndcfgs only when VM_ENTRY_LOAD_BNDCFGS is specified in vmcs12->vm_entry_controls. Otherwise, L2 IA32_BNDCFGS should be set to vmcs01->guest_bndcfgs which is L1 IA32_BNDCFGS. Reviewed-by: Nikita Leshchenko <nikita.leshchenko@oracle.com> Reviewed-by: Darren Kenny <darren.kenny@oracle.com> Signed-off-by: Liran Alon <liran.alon@oracle.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-10-01KVM: x86: Do not use kvm_x86_ops->mpx_supported() directlyLiran Alon
Commit a87036add092 ("KVM: x86: disable MPX if host did not enable MPX XSAVE features") introduced kvm_mpx_supported() to return true iff MPX is enabled in the host. However, that commit seems to have missed replacing some calls to kvm_x86_ops->mpx_supported() to kvm_mpx_supported(). Complete original commit by replacing remaining calls to kvm_mpx_supported(). Fixes: a87036add092 ("KVM: x86: disable MPX if host did not enable MPX XSAVE features") Suggested-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Liran Alon <liran.alon@oracle.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-10-01KVM: nVMX: Do not expose MPX VMX controls when guest MPX disabledLiran Alon
Before this commit, KVM exposes MPX VMX controls to L1 guest only based on if KVM and host processor supports MPX virtualization. However, these controls should be exposed to guest only in case guest vCPU supports MPX. Without this change, a L1 guest running with kernel which don't have commit 691bd4340bef ("kvm: vmx: allow host to access guest MSR_IA32_BNDCFGS") asserts in QEMU on the following: qemu-kvm: error: failed to set MSR 0xd90 to 0x0 qemu-kvm: .../qemu-2.10.0/target/i386/kvm.c:1801 kvm_put_msrs: Assertion 'ret == cpu->kvm_msr_buf->nmsrs failed' This is because L1 KVM kvm_init_msr_list() will see that vmx_mpx_supported() (As it only checks MPX VMX controls support) and therefore KVM_GET_MSR_INDEX_LIST IOCTL will include MSR_IA32_BNDCFGS. However, later when L1 will attempt to set this MSR via KVM_SET_MSRS IOCTL, it will fail because !guest_cpuid_has_mpx(vcpu). Therefore, fix the issue by exposing MPX VMX controls to L1 guest only when vCPU supports MPX. Fixes: 36be0b9deb23 ("KVM: x86: Add nested virtualization support for MPX") Reported-by: Eyal Moscovici <eyal.moscovici@oracle.com> Reviewed-by: Nikita Leshchenko <nikita.leshchenko@oracle.com> Reviewed-by: Darren Kenny <darren.kenny@oracle.com> Signed-off-by: Liran Alon <liran.alon@oracle.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-10-01arm64: KVM: Sanitize PSTATE.M when being set from userspaceMarc Zyngier
Not all execution modes are valid for a guest, and some of them depend on what the HW actually supports. Let's verify that what userspace provides is compatible with both the VM settings and the HW capabilities. Cc: <stable@vger.kernel.org> Fixes: 0d854a60b1d7 ("arm64: KVM: enable initialization of a 32bit vcpu") Reviewed-by: Christoffer Dall <christoffer.dall@arm.com> Reviewed-by: Mark Rutland <mark.rutland@arm.com> Reviewed-by: Dave Martin <Dave.Martin@arm.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com>