summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2016-11-25efi/libstub: Make efi_random_alloc() allocate below 4 GB on 32-bitArd Biesheuvel
The UEFI stub executes in the context of the firmware, which identity maps the available system RAM, which implies that only memory below 4 GB can be used for allocations on 32-bit architectures, even on [L]PAE capable hardware. So ignore any reported memory above 4 GB in efi_random_alloc(). This also fixes a reported build problem on ARM under -Os, where the 64-bit logical shift relies on a software routine that the ARM decompressor does not provide. A second [minor] issue is also fixed, where the '+ 1' is moved out of the shift, where it belongs: the reason for its presence is that a memory region where start == end should count as a single slot, given that 'end' takes the desired size and alignment of the allocation into account. To clarify the code in this regard, rename start/end to 'first_slot' and 'last_slot', respectively, and introduce 'region_end' to describe the last usable address of the current region. Reported-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Matt Fleming <matt@codeblueprint.co.uk> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-efi@vger.kernel.org Link: http://lkml.kernel.org/r/1480010543-25709-2-git-send-email-ard.biesheuvel@linaro.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-11-25locking/selftest: Fix output since KERN_CONT changesMichael Ellerman
Since the KERN_CONT changes the locking-selftest output is messed up, eg: ---------------------------------------------------------------------------- | spin |wlock |rlock |mutex | wsem | rsem | -------------------------------------------------------------------------- A-A deadlock: ok | ok | ok | ok | ok | ok | Use pr_cont() to get it looking normal again: ---------------------------------------------------------------------------- | spin |wlock |rlock |mutex | wsem | rsem | -------------------------------------------------------------------------- A-A deadlock: ok | ok | ok | ok | ok | ok | Reported-by: Christian Kujau <lists@nerdbynature.de> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linuxppc-dev@ozlabs.org Link: http://lkml.kernel.org/r/1480027528-934-1-git-send-email-mpe@ellerman.id.au Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-11-25x86/boot/64: Use defines for page sizeBorislav Petkov
... instead of naked numbers like the rest of the asm does in this file. No code changed: # arch/x86/kernel/head_64.o: text data bss dec hex filename 1124 290864 4096 296084 48494 head_64.o.before 1124 290864 4096 296084 48494 head_64.o.after md5: 87086e202588939296f66e892414ffe2 head_64.o.before.asm 87086e202588939296f66e892414ffe2 head_64.o.after.asm Signed-off-by: Borislav Petkov <bp@suse.de> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/20161124210550.15025-1-bp@alien8.de Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-11-25Merge branch 'mediatek-drm-fixes-2016-11-24' of ↵Dave Airlie
https://github.com/ckhu-mediatek/linux.git-tags into drm-fixes This branch include patches of fixing a typo, accurate dsi frame rate, and fixing null pointer dereference. * 'mediatek-drm-fixes-2016-11-24' of https://github.com/ckhu-mediatek/linux.git-tags: drm/mediatek: fix null pointer dereference drm/mediatek: fixed the calc method of data rate per lane drm/mediatek: fix a typo of DISP_OD_CFG to OD_RELAYMODE
2016-11-25powerpc/mm: Fixup kernel read only mappingAneesh Kumar K.V
With commit e58e87adc8bf9 ("powerpc/mm: Update _PAGE_KERNEL_RO") we started using the ppp value 0b110 to map kernel readonly. But that facility was only added as part of ISA 2.04. For earlier ISA version only supported ppp bit value for readonly mapping is 0b011. (This implies both user and kernel get mapped using the same ppp bit value for readonly mapping.). Update the code such that for earlier architecture version we use ppp value 0b011 for readonly mapping. We don't differentiate between power5+ and power5 here and apply the new ppp bits only from power6 (ISA 2.05). This keep the changes minimal. This fixes issue with PS3 spu usage reported at https://lkml.kernel.org/r/rep.1421449714.geoff@infradead.org Fixes: e58e87adc8bf9 ("powerpc/mm: Update _PAGE_KERNEL_RO") Cc: stable@vger.kernel.org # v4.7+ Tested-by: Geoff Levand <geoff@infradead.org> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-11-25mpi: Fix NULL ptr dereference in mpi_powm() [ver #3]Andrey Ryabinin
This fixes CVE-2016-8650. If mpi_powm() is given a zero exponent, it wants to immediately return either 1 or 0, depending on the modulus. However, if the result was initalised with zero limb space, no limbs space is allocated and a NULL-pointer exception ensues. Fix this by allocating a minimal amount of limb space for the result when the 0-exponent case when the result is 1 and not touching the limb space when the result is 0. This affects the use of RSA keys and X.509 certificates that carry them. BUG: unable to handle kernel NULL pointer dereference at (null) IP: [<ffffffff8138ce5d>] mpi_powm+0x32/0x7e6 PGD 0 Oops: 0002 [#1] SMP Modules linked in: CPU: 3 PID: 3014 Comm: keyctl Not tainted 4.9.0-rc6-fscache+ #278 Hardware name: ASUS All Series/H97-PLUS, BIOS 2306 10/09/2014 task: ffff8804011944c0 task.stack: ffff880401294000 RIP: 0010:[<ffffffff8138ce5d>] [<ffffffff8138ce5d>] mpi_powm+0x32/0x7e6 RSP: 0018:ffff880401297ad8 EFLAGS: 00010212 RAX: 0000000000000000 RBX: ffff88040868bec0 RCX: ffff88040868bba0 RDX: ffff88040868b260 RSI: ffff88040868bec0 RDI: ffff88040868bee0 RBP: ffff880401297ba8 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000047 R11: ffffffff8183b210 R12: 0000000000000000 R13: ffff8804087c7600 R14: 000000000000001f R15: ffff880401297c50 FS: 00007f7a7918c700(0000) GS:ffff88041fb80000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000000 CR3: 0000000401250000 CR4: 00000000001406e0 Stack: ffff88040868bec0 0000000000000020 ffff880401297b00 ffffffff81376cd4 0000000000000100 ffff880401297b10 ffffffff81376d12 ffff880401297b30 ffffffff81376f37 0000000000000100 0000000000000000 ffff880401297ba8 Call Trace: [<ffffffff81376cd4>] ? __sg_page_iter_next+0x43/0x66 [<ffffffff81376d12>] ? sg_miter_get_next_page+0x1b/0x5d [<ffffffff81376f37>] ? sg_miter_next+0x17/0xbd [<ffffffff8138ba3a>] ? mpi_read_raw_from_sgl+0xf2/0x146 [<ffffffff8132a95c>] rsa_verify+0x9d/0xee [<ffffffff8132acca>] ? pkcs1pad_sg_set_buf+0x2e/0xbb [<ffffffff8132af40>] pkcs1pad_verify+0xc0/0xe1 [<ffffffff8133cb5e>] public_key_verify_signature+0x1b0/0x228 [<ffffffff8133d974>] x509_check_for_self_signed+0xa1/0xc4 [<ffffffff8133cdde>] x509_cert_parse+0x167/0x1a1 [<ffffffff8133d609>] x509_key_preparse+0x21/0x1a1 [<ffffffff8133c3d7>] asymmetric_key_preparse+0x34/0x61 [<ffffffff812fc9f3>] key_create_or_update+0x145/0x399 [<ffffffff812fe227>] SyS_add_key+0x154/0x19e [<ffffffff81001c2b>] do_syscall_64+0x80/0x191 [<ffffffff816825e4>] entry_SYSCALL64_slow_path+0x25/0x25 Code: 56 41 55 41 54 53 48 81 ec a8 00 00 00 44 8b 71 04 8b 42 04 4c 8b 67 18 45 85 f6 89 45 80 0f 84 b4 06 00 00 85 c0 75 2f 41 ff ce <49> c7 04 24 01 00 00 00 b0 01 75 0b 48 8b 41 18 48 83 38 01 0f RIP [<ffffffff8138ce5d>] mpi_powm+0x32/0x7e6 RSP <ffff880401297ad8> CR2: 0000000000000000 ---[ end trace d82015255d4a5d8d ]--- Basically, this is a backport of a libgcrypt patch: http://git.gnupg.org/cgi-bin/gitweb.cgi?p=libgcrypt.git;a=patch;h=6e1adb05d290aeeb1c230c763970695f4a538526 Fixes: cdec9cb5167a ("crypto: GnuPG based MPI lib - source files (part 1)") Signed-off-by: Andrey Ryabinin <aryabinin@virtuozzo.com> Signed-off-by: David Howells <dhowells@redhat.com> cc: Dmitry Kasatkin <dmitry.kasatkin@gmail.com> cc: linux-ima-devel@lists.sourceforge.net cc: stable@vger.kernel.org Signed-off-by: James Morris <james.l.morris@oracle.com>
2016-11-25X.509: Fix double free in x509_cert_parse() [ver #3]Andrey Ryabinin
We shouldn't free cert->pub->key in x509_cert_parse() because x509_free_certificate() also does this: BUG: Double free or freeing an invalid pointer ... Call Trace: [<ffffffff81896c20>] dump_stack+0x63/0x83 [<ffffffff81356571>] kasan_object_err+0x21/0x70 [<ffffffff81356ed9>] kasan_report_double_free+0x49/0x60 [<ffffffff813561ad>] kasan_slab_free+0x9d/0xc0 [<ffffffff81350b7a>] kfree+0x8a/0x1a0 [<ffffffff81844fbf>] public_key_free+0x1f/0x30 [<ffffffff818455d4>] x509_free_certificate+0x24/0x90 [<ffffffff818460bc>] x509_cert_parse+0x2bc/0x300 [<ffffffff81846cae>] x509_key_preparse+0x3e/0x330 [<ffffffff818444cf>] asymmetric_key_preparse+0x6f/0x100 [<ffffffff8178bec0>] key_create_or_update+0x260/0x5f0 [<ffffffff8178e6d9>] SyS_add_key+0x199/0x2a0 [<ffffffff821d823b>] entry_SYSCALL_64_fastpath+0x1e/0xad Object at ffff880110bd1900, in cache kmalloc-512 size: 512 .... Freed: PID = 2579 [<ffffffff8104283b>] save_stack_trace+0x1b/0x20 [<ffffffff813558f6>] save_stack+0x46/0xd0 [<ffffffff81356183>] kasan_slab_free+0x73/0xc0 [<ffffffff81350b7a>] kfree+0x8a/0x1a0 [<ffffffff818460a3>] x509_cert_parse+0x2a3/0x300 [<ffffffff81846cae>] x509_key_preparse+0x3e/0x330 [<ffffffff818444cf>] asymmetric_key_preparse+0x6f/0x100 [<ffffffff8178bec0>] key_create_or_update+0x260/0x5f0 [<ffffffff8178e6d9>] SyS_add_key+0x199/0x2a0 [<ffffffff821d823b>] entry_SYSCALL_64_fastpath+0x1e/0xad Fixes: db6c43bd2132 ("crypto: KEYS: convert public key and digsig asym to the akcipher api") Signed-off-by: Andrey Ryabinin <aryabinin@virtuozzo.com> Cc: <stable@vger.kernel.org> Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: James Morris <james.l.morris@oracle.com>
2016-11-25gpu/drm/exynos/exynos_hdmi - Unmap region obtained by of_iomapArvind Yadav
Free memory mapping, if hdmi_probe is not successful. Signed-off-by: Arvind Yadav <arvind.yadav.cs@gmail.com> Signed-off-by: Inki Dae <inki.dae@samsung.com> Signed-off-by: Dave Airlie <airlied@redhat.com>
2016-11-24Merge branch 'for-upstream' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth Johan Hedberg says: ==================== pull request: bluetooth 2016-11-23 Sorry about the late pull request for 4.9, but we have one more important Bluetooth patch that should make it to the release. It fixes connection creation for Bluetooth LE controllers that do not have a public address (only a random one). Please let me know if there are any issues pulling. Thanks. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-24tuntap: remove unnecessary sk_receive_queue length check during xmitJason Wang
After commit 1576d9860599 ("tun: switch to use skb array for tx"), sk_receive_queue was not used any more. So remove the uncessary sk_receive_queue length check during xmit. Signed-off-by: Jason Wang <jasowang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-24net sched filters: fix filter handle ID in tfilter_notify_chain()Roman Mashak
Should pass valid filter handle, not the netlink flags. Fixes: 30a391a13ab92 ("net sched filters: pass netlink message flags in event notification") Signed-off-by: Roman Mashak <mrv@mojatatu.com> Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com> Reported-by: Cong Wang <xiyou.wangcong@gmail.com> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-24samples/bpf: fix bpf loaderAlexei Starovoitov
llvm can emit relocations into sections other than program code (like debug info sections). Ignore them during parsing of elf file Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-24samples/bpf: fix sockex2 exampleAlexei Starovoitov
since llvm commit "Do not expand UNDEF SDNode during insn selection lowering" llvm will generate code that uses uninitialized registers for cases where C code is actually uses uninitialized data. So this sockex2 example is technically broken. Fix it by initializing on the stack variable fully. Also increase verifier buffer limit, since verifier output may not fit in 64k for this sockex2 code depending on llvm version. Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-24mlx4: reorganize struct mlx4_en_tx_ringEric Dumazet
Goal is to reorganize this critical structure to increase performance. ndo_start_xmit() should only dirty one cache line, and access as few cache lines as possible. Add sp_ (Slow Path) prefix to fields that are not used in fast path, to make clear what is going on. After this patch pahole reports something much better, as all ndo_start_xmit() needed fields are packed into two cache lines instead of seven or eight struct mlx4_en_tx_ring { u32 last_nr_txbb; /* 0 0x4 */ u32 cons; /* 0x4 0x4 */ long unsigned int wake_queue; /* 0x8 0x8 */ struct netdev_queue * tx_queue; /* 0x10 0x8 */ u32 (*free_tx_desc)(struct mlx4_en_priv *, struct mlx4_en_tx_ring *, int, u8, u64, int); /* 0x18 0x8 */ struct mlx4_en_rx_ring * recycle_ring; /* 0x20 0x8 */ /* XXX 24 bytes hole, try to pack */ /* --- cacheline 1 boundary (64 bytes) --- */ u32 prod; /* 0x40 0x4 */ unsigned int tx_dropped; /* 0x44 0x4 */ long unsigned int bytes; /* 0x48 0x8 */ long unsigned int packets; /* 0x50 0x8 */ long unsigned int tx_csum; /* 0x58 0x8 */ long unsigned int tso_packets; /* 0x60 0x8 */ long unsigned int xmit_more; /* 0x68 0x8 */ struct mlx4_bf bf; /* 0x70 0x18 */ /* --- cacheline 2 boundary (128 bytes) was 8 bytes ago --- */ __be32 doorbell_qpn; /* 0x88 0x4 */ __be32 mr_key; /* 0x8c 0x4 */ u32 size; /* 0x90 0x4 */ u32 size_mask; /* 0x94 0x4 */ u32 full_size; /* 0x98 0x4 */ u32 buf_size; /* 0x9c 0x4 */ void * buf; /* 0xa0 0x8 */ struct mlx4_en_tx_info * tx_info; /* 0xa8 0x8 */ int qpn; /* 0xb0 0x4 */ u8 queue_index; /* 0xb4 0x1 */ bool bf_enabled; /* 0xb5 0x1 */ bool bf_alloced; /* 0xb6 0x1 */ u8 hwtstamp_tx_type; /* 0xb7 0x1 */ u8 * bounce_buf; /* 0xb8 0x8 */ /* --- cacheline 3 boundary (192 bytes) --- */ long unsigned int queue_stopped; /* 0xc0 0x8 */ struct mlx4_hwq_resources sp_wqres; /* 0xc8 0x58 */ /* --- cacheline 4 boundary (256 bytes) was 32 bytes ago --- */ struct mlx4_qp sp_qp; /* 0x120 0x30 */ /* --- cacheline 5 boundary (320 bytes) was 16 bytes ago --- */ struct mlx4_qp_context sp_context; /* 0x150 0xf8 */ /* --- cacheline 9 boundary (576 bytes) was 8 bytes ago --- */ cpumask_t sp_affinity_mask; /* 0x248 0x20 */ enum mlx4_qp_state sp_qp_state; /* 0x268 0x4 */ u16 sp_stride; /* 0x26c 0x2 */ u16 sp_cqn; /* 0x26e 0x2 */ /* size: 640, cachelines: 10, members: 36 */ /* sum members: 600, holes: 1, sum holes: 24 */ /* padding: 16 */ }; Instead of this silly placement : struct mlx4_en_tx_ring { u32 last_nr_txbb; /* 0 0x4 */ u32 cons; /* 0x4 0x4 */ long unsigned int wake_queue; /* 0x8 0x8 */ /* XXX 48 bytes hole, try to pack */ /* --- cacheline 1 boundary (64 bytes) --- */ u32 prod; /* 0x40 0x4 */ /* XXX 4 bytes hole, try to pack */ long unsigned int bytes; /* 0x48 0x8 */ long unsigned int packets; /* 0x50 0x8 */ long unsigned int tx_csum; /* 0x58 0x8 */ long unsigned int tso_packets; /* 0x60 0x8 */ long unsigned int xmit_more; /* 0x68 0x8 */ unsigned int tx_dropped; /* 0x70 0x4 */ /* XXX 4 bytes hole, try to pack */ struct mlx4_bf bf; /* 0x78 0x18 */ /* --- cacheline 2 boundary (128 bytes) was 16 bytes ago --- */ long unsigned int queue_stopped; /* 0x90 0x8 */ cpumask_t affinity_mask; /* 0x98 0x10 */ struct mlx4_qp qp; /* 0xa8 0x30 */ /* --- cacheline 3 boundary (192 bytes) was 24 bytes ago --- */ struct mlx4_hwq_resources wqres; /* 0xd8 0x58 */ /* --- cacheline 4 boundary (256 bytes) was 48 bytes ago --- */ u32 size; /* 0x130 0x4 */ u32 size_mask; /* 0x134 0x4 */ u16 stride; /* 0x138 0x2 */ /* XXX 2 bytes hole, try to pack */ u32 full_size; /* 0x13c 0x4 */ /* --- cacheline 5 boundary (320 bytes) --- */ u16 cqn; /* 0x140 0x2 */ /* XXX 2 bytes hole, try to pack */ u32 buf_size; /* 0x144 0x4 */ __be32 doorbell_qpn; /* 0x148 0x4 */ __be32 mr_key; /* 0x14c 0x4 */ void * buf; /* 0x150 0x8 */ struct mlx4_en_tx_info * tx_info; /* 0x158 0x8 */ struct mlx4_en_rx_ring * recycle_ring; /* 0x160 0x8 */ u32 (*free_tx_desc)(struct mlx4_en_priv *, struct mlx4_en_tx_ring *, int, u8, u64, int); /* 0x168 0x8 */ u8 * bounce_buf; /* 0x170 0x8 */ struct mlx4_qp_context context; /* 0x178 0xf8 */ /* --- cacheline 9 boundary (576 bytes) was 48 bytes ago --- */ int qpn; /* 0x270 0x4 */ enum mlx4_qp_state qp_state; /* 0x274 0x4 */ u8 queue_index; /* 0x278 0x1 */ bool bf_enabled; /* 0x279 0x1 */ bool bf_alloced; /* 0x27a 0x1 */ /* XXX 5 bytes hole, try to pack */ /* --- cacheline 10 boundary (640 bytes) --- */ struct netdev_queue * tx_queue; /* 0x280 0x8 */ int hwtstamp_tx_type; /* 0x288 0x4 */ /* size: 704, cachelines: 11, members: 36 */ /* sum members: 587, holes: 6, sum holes: 65 */ /* padding: 52 */ }; Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-24ethtool: Protect {get, set}_phy_tunable with PHY device mutexFlorian Fainelli
PHY drivers should be able to rely on the caller of {get,set}_tunable to have acquired the PHY device mutex, in order to both serialize against concurrent calls of these functions, but also against PHY state machine changes. All ethtool PHY-level functions do this, except {get,set}_tunable, so we make them consistent here as well. We need to update the Microsemi PHY driver in the same commit to avoid introducing either deadlocks, or lack of proper locking. Fixes: 968ad9da7e0e ("ethtool: Implements ETHTOOL_PHY_GTUNABLE/ETHTOOL_PHY_STUNABLE") Fixes: 310d9ad57ae0 ("net: phy: Add downshift get/set support in Microsemi PHYs driver") Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Allan W. Nielsen <allan.nielsen@microsemi.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-24Merge branch 'mlx5-next'David S. Miller
Saeed Mahameed says: ==================== Mellanox 100G mlx5 SRIOV switchdev update This series from Roi and Or further enhances the new SRIOV switchdev mode. Roi's patches deal with allowing users to configure though devlink the level of inline headers that the VF should be setting in order for the eswitch HW to do proper matching. We also enforce that the matching required for offloaded TC rules is aligned with that level on the PF driver. Or's patches deals with allowing the user to control on the VF operational link state through admin directives on the mlx5 VF rep link. Also in this series is implementation of HW and SW counters for the mlx5 VF rep which is aligned with the design set by commit a5ea31f57309 'Merge branch net-offloaded-stats'. v1 --> v2: * constified the net-device param of get offloaded stats ndo in mlxsw (pointed by 0-day screaming on us...) * added Or's Review-by tags for Roi's patches This series was generated against commit e796f49d826a ("net: ieee802154: constify ieee802154_ops structures") ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-24net/mlx5e: Enforce min inline mode when offloading flowsRoi Dayan
A flow should be offloaded only if the matches are allowed according to min inline mode. Signed-off-by: Roi Dayan <roid@mellanox.com> Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-24net/mlx5: E-Switch, Add control for inline modeRoi Dayan
Implement devlink show and set of HW inline-mode. The supported modes: none, link, network, transport. We currently support one mode for all vports so set is done on all vports. When eswitch is first initialized the inline-mode is queried from the FW. Signed-off-by: Roi Dayan <roid@mellanox.com> Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-24net/mlx5: Enable to query min inline for a specific vportRoi Dayan
Also move the inline capablities enum to a shared header vport.h Signed-off-by: Roi Dayan <roid@mellanox.com> Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-24devlink: Add E-Switch inline mode controlRoi Dayan
Some HWs need the VF driver to put part of the packet headers on the TX descriptor so the e-switch can do proper matching and steering. The supported modes: none, link, network, transport. Signed-off-by: Roi Dayan <roid@mellanox.com> Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-24net/mlx5e: Support VF vport link state control for SRIOV switchdev modeOr Gerlitz
Reflect the administative link changes done on the VF representor to the VF e-switch vport. This means that doing ip link set down/up commands on the VF rep will modify the e-switch vport state which in turn will make proper VF drivers to set their carrier accordingly. Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-24net/mlx5e: Support HW (offloaded) and SW counters for SRIOV switchdev modeOr Gerlitz
Switchdev driver net-device port statistics should follow the model introduced in commit a5ea31f57309 'Merge branch net-offloaded-stats'. For VF reps we return the SRIOV eswitch vport stats as the usual ones and SW stats if asked. For the PF, if we're in the switchdev mode, we return the uplink stats and SW stats if asked, otherwise as before. The uplink stats are implemented using the PPCNT 802_3 counters which are already being read/cached by the driver. Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-24net: Add net-device param to the get offloaded stats ndoOr Gerlitz
Some drivers would need to check few internal matters for that. To be used in downstream mlx5 commit. Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-24net: dsa: bcm_sf2: Ensure we re-negotiate EEE during after link changeFlorian Fainelli
In case the link change and EEE is enabled or disabled, always try to re-negotiate this with the link partner. Fixes: 450b05c15f9c ("net: dsa: bcm_sf2: add support for controlling EEE") Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-24Merge branch 'phy-broadcom-wirespeed-downshift-support'David S. Miller
Florian Fainelli says: ==================== net: phy: broadcom: Wirespeed/downshift support This patch series adds support for the Broadcom Wirespeed, aka downsfhit feature utilizing the recently added ethtool PHY tunables. Tested with two Gigabit link partners with a 4-wire cable having only 2 pairs connected. Last patch in the series is a fix that was required for testing, which should make it to -stable, which I can submit separate against net if you prefer David. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-24net: dsa: bcm_sf2: Ensure we re-negotiate EEE during after link changeFlorian Fainelli
In case the link change and EEE is enabled or disabled, always try to re-negotiate this with the link partner. Fixes: 450b05c15f9c ("net: dsa: bcm_sf2: add support for controlling EEE") Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-24net: phy: bcm7xxx: Add support for downshift/WirespeedFlorian Fainelli
Add support for configuring the downshift/Wirespeed enable/disable toggles and specify a link retry value ranging from 1 to 9. Since the integrated BCM7xxx have issues when wirespeed is enabled and EEE is also enabled, we do disable EEE if wirespeed is enabled. Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-24net: phy: broadcom: Allow enabling or disabling of EEEFlorian Fainelli
In preparation for adding support for Wirespeed/downshift, we need to change bcm_phy_eee_enable() to allow enabling or disabling EEE, so make the function take an extra enable/disable boolean parameter and rename it to illustrate it sets EEE, not necessarily just enables it. Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-24net: phy: broadcom: Add support code for downshift/WirespeedFlorian Fainelli
Broadcom's Wirespeed feature allows us to configure how auto-negotiation should behave with fewer working pairs of wires on a cable. Add support code for retrieving and setting such downshift counters using the recently added ethtool downshift tunables. Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-24net: phy: broadcom: Move bcm54xx_auxctl_{read, write} to common libraryFlorian Fainelli
We are going to need these functions to implement support for Broadcom Wirespeed, aka downshift. Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-24tcp: enhance tcp_collapse_retrans() with skb_shift()Eric Dumazet
In commit 2331ccc5b323 ("tcp: enhance tcp collapsing"), we made a first step allowing copying right skb to left skb head. Since all skbs in socket write queue are headless (but possibly the very first one), this strategy often does not work. This patch extends tcp_collapse_retrans() to perform frag shifting, thanks to skb_shift() helper. This helper needs to not BUG on non headless skbs, as callers are ok with that. Tested: Following packetdrill test now passes : 0.000 socket(..., SOCK_STREAM, IPPROTO_TCP) = 3 +0 setsockopt(3, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0 +0 bind(3, ..., ...) = 0 +0 listen(3, 1) = 0 +0 < S 0:0(0) win 32792 <mss 1460,sackOK,nop,nop,nop,wscale 8> +0 > S. 0:0(0) ack 1 <mss 1460,nop,nop,sackOK,nop,wscale 8> +.100 < . 1:1(0) ack 1 win 257 +0 accept(3, ..., ...) = 4 +0 setsockopt(4, SOL_TCP, TCP_NODELAY, [1], 4) = 0 +0 write(4, ..., 200) = 200 +0 > P. 1:201(200) ack 1 +.001 write(4, ..., 200) = 200 +0 > P. 201:401(200) ack 1 +.001 write(4, ..., 200) = 200 +0 > P. 401:601(200) ack 1 +.001 write(4, ..., 200) = 200 +0 > P. 601:801(200) ack 1 +.001 write(4, ..., 200) = 200 +0 > P. 801:1001(200) ack 1 +.001 write(4, ..., 100) = 100 +0 > P. 1001:1101(100) ack 1 +.001 write(4, ..., 100) = 100 +0 > P. 1101:1201(100) ack 1 +.001 write(4, ..., 100) = 100 +0 > P. 1201:1301(100) ack 1 +.001 write(4, ..., 100) = 100 +0 > P. 1301:1401(100) ack 1 +.099 < . 1:1(0) ack 201 win 257 +.001 < . 1:1(0) ack 201 win 257 <nop,nop,sack 1001:1401> +0 > P. 201:1001(800) ack 1 Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Neal Cardwell <ncardwell@google.com> Cc: Yuchung Cheng <ycheng@google.com> Acked-by: Yuchung Cheng <ycheng@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-24bnxt: do not busy-poll when link is downAndy Gospodarek
When busy polling while a link is down (during a link-flap test), TX timeouts were observed as well as the following messages in the ring buffer: bnxt_en 0008:01:00.2 enP8p1s0f2d2: Resp cmpl intr err msg: 0x51 bnxt_en 0008:01:00.2 enP8p1s0f2d2: hwrm_ring_free tx failed. rc:-1 bnxt_en 0008:01:00.2 enP8p1s0f2d2: Resp cmpl intr err msg: 0x51 bnxt_en 0008:01:00.2 enP8p1s0f2d2: hwrm_ring_free rx failed. rc:-1 These were resolved by checking for link status and returning if link was not up. Signed-off-by: Andy Gospodarek <gospo@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Tested-by: Rob Miller <rob.miller@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-24udplite: call proper backlog handlersEric Dumazet
In commits 93821778def10 ("udp: Fix rcv socket locking") and f7ad74fef3af ("net/ipv6/udp: UDP encapsulation: break backlog_rcv into __udpv6_queue_rcv_skb") UDP backlog handlers were renamed, but UDPlite was forgotten. This leads to crashes if UDPlite header is pulled twice, which happens starting from commit e6afc8ace6dd ("udp: remove headers from UDP packets before queueing") Bug found by syzkaller team, thanks a lot guys ! Note that backlog use in UDP/UDPlite is scheduled to be removed starting from linux-4.10, so this patch is only needed up to linux-4.9 Fixes: 93821778def1 ("udp: Fix rcv socket locking") Fixes: f7ad74fef3af ("net/ipv6/udp: UDP encapsulation: break backlog_rcv into __udpv6_queue_rcv_skb") Fixes: e6afc8ace6dd ("udp: remove headers from UDP packets before queueing") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: Andrey Konovalov <andreyknvl@google.com> Cc: Benjamin LaHaise <bcrl@kvack.org> Cc: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-24net: dsa: mv88e6xxx: add MV88E6097 switchStefan Eichenberger
Add support for the MV88E6097 switch. The change was tested on an Armada based platform with a MV88E6097 switch. Signed-off-by: Stefan Eichenberger <stefan.eichenberger@netmodule.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-24cpufreq/intel_pstate: Use CPPC to get max performanceRafael J. Wysocki
Use the acpi cppc_lib interface to get CPPC performance limits and update the per cpu priority for the ITMT scheduler. If the highest performance of CPUs differs the ITMT feature is enabled. Co-developed-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Tim Chen <tim.c.chen@linux.intel.com> Cc: linux-pm@vger.kernel.org Cc: peterz@infradead.org Cc: jolsa@redhat.com Cc: rjw@rjwysocki.net Cc: linux-acpi@vger.kernel.org Cc: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> Cc: bp@suse.de Link: http://lkml.kernel.org/r/0998b98943bcdec7d1ddd4ff27358da555ea8e92.1479844244.git.tim.c.chen@linux.intel.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2016-11-24acpi/bus: Set _OSC for diverse core supportSrinivas Pandruvada
Set the OSC_SB_CPC_DIVERSE_HIGH_SUPPORT (bit 12) to enable diverse core support. This is required to enable the BIOS support of the Intel Turbo Boost Max Technology 3.0 feature. Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> Signed-off-by: Tim Chen <tim.c.chen@linux.intel.com> Cc: linux-pm@vger.kernel.org Cc: peterz@infradead.org Cc: jolsa@redhat.com Cc: rjw@rjwysocki.net Cc: linux-acpi@vger.kernel.org Cc: bp@suse.de Link: http://lkml.kernel.org/r/a023623a727e86040a1715797055f6402caefd7e.1479844244.git.tim.c.chen@linux.intel.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2016-11-24acpi/bus: Enable HWP CPPC objectsSrinivas Pandruvada
Need to set platform wide _OSC bits to enable CPPC and CPPC version 2. If platform supports CPPC, then BIOS exposes CPPC tables. The primary reason to enable CPPC support is to get the maximum performance of each CPU to check and enable Intel Turbo Boost Max Technology 3.0 (ITMT). Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> Signed-off-by: Tim Chen <tim.c.chen@linux.intel.com> Cc: linux-pm@vger.kernel.org Cc: peterz@infradead.org Cc: jolsa@redhat.com Cc: rjw@rjwysocki.net Cc: linux-acpi@vger.kernel.org Cc: bp@suse.de Link: http://lkml.kernel.org/r/a696f6b17843cee9a542482fae6abab087be9587.1479844244.git.tim.c.chen@linux.intel.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2016-11-24x86/sched: Add SD_ASYM_PACKING flags to x86 ITMT CPUTim Chen
Some Intel cores in a package can be boosted to a higher turbo frequency with ITMT 3.0 technology. The scheduler can use the asymmetric packing feature to move tasks to the more capable cores. If ITMT is enabled, add SD_ASYM_PACKING flag to the thread and core sched domains to enable asymmetric packing. Co-developed-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Tim Chen <tim.c.chen@linux.intel.com> Cc: linux-pm@vger.kernel.org Cc: peterz@infradead.org Cc: jolsa@redhat.com Cc: rjw@rjwysocki.net Cc: linux-acpi@vger.kernel.org Cc: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> Cc: bp@suse.de Link: http://lkml.kernel.org/r/9bbb885bedbef4eb50e197305eb16b160cff0831.1479844244.git.tim.c.chen@linux.intel.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2016-11-24x86/sysctl: Add sysctl for ITMT scheduling featureTim Chen
Intel Turbo Boost Max Technology 3.0 (ITMT) feature allows some cores to be boosted to higher turbo frequency than others. Add /proc/sys/kernel/sched_itmt_enabled so operator can enable/disable scheduling of tasks that favor cores with higher turbo boost frequency potential. By default, system that is ITMT capable and single socket has this feature turned on. It is more likely to be lightly loaded and operates in Turbo range. When there is a change in the ITMT scheduling operation desired, a rebuild of the sched domain is initiated so the scheduler can set up sched domains with appropriate flag to enable/disable ITMT scheduling operations. Co-developed-by: Peter Zijlstra (Intel) <peterz@infradead.org> Co-developed-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> Signed-off-by: Tim Chen <tim.c.chen@linux.intel.com> Cc: linux-pm@vger.kernel.org Cc: peterz@infradead.org Cc: jolsa@redhat.com Cc: rjw@rjwysocki.net Cc: linux-acpi@vger.kernel.org Cc: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> Cc: bp@suse.de Link: http://lkml.kernel.org/r/07cc62426a28bad57b01ab16bb903a9c84fa5421.1479844244.git.tim.c.chen@linux.intel.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2016-11-24x86: Enable Intel Turbo Boost Max Technology 3.0Tim Chen
On platforms supporting Intel Turbo Boost Max Technology 3.0, the maximum turbo frequencies of some cores in a CPU package may be higher than for the other cores in the same package. In that case, better performance (and possibly lower energy consumption as well) can be achieved by making the scheduler prefer to run tasks on the CPUs with higher max turbo frequencies. To that end, set up a core priority metric to abstract the core preferences based on the maximum turbo frequency. In that metric, the cores with higher maximum turbo frequencies are higher-priority than the other cores in the same package and that causes the scheduler to favor them when making load-balancing decisions using the asymmertic packing approach. At the same time, the priority of SMT threads with a higher CPU number is reduced so as to avoid scheduling tasks on all of the threads that belong to a favored core before all of the other cores have been given a task to run. The priority metric will be initialized by the P-state driver with the help of the sched_set_itmt_core_prio() function. The P-state driver will also determine whether or not ITMT is supported by the platform and will call sched_set_itmt_support() to indicate that. Co-developed-by: Peter Zijlstra (Intel) <peterz@infradead.org> Co-developed-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> Signed-off-by: Tim Chen <tim.c.chen@linux.intel.com> Cc: linux-pm@vger.kernel.org Cc: peterz@infradead.org Cc: jolsa@redhat.com Cc: rjw@rjwysocki.net Cc: linux-acpi@vger.kernel.org Cc: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> Cc: bp@suse.de Link: http://lkml.kernel.org/r/cd401ccdff88f88c8349314febdc25d51f7c48f7.1479844244.git.tim.c.chen@linux.intel.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2016-11-24x86/topology: Define x86's arch_update_cpu_topologyTim Chen
The scheduler calls arch_update_cpu_topology() to check whether the scheduler domains have to be rebuilt. So far x86 has no requirement for this, but the upcoming ITMT support makes this necessary. Request the rebuild when the x86 internal update flag is set. Suggested-by: Morten Rasmussen <morten.rasmussen@arm.com> Signed-off-by: Tim Chen <tim.c.chen@linux.intel.com> Cc: linux-pm@vger.kernel.org Cc: peterz@infradead.org Cc: jolsa@redhat.com Cc: rjw@rjwysocki.net Cc: linux-acpi@vger.kernel.org Cc: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> Cc: bp@suse.de Link: http://lkml.kernel.org/r/bfbf5591276ec60b2af2da798adc1060df1e2a5f.1479844244.git.tim.c.chen@linux.intel.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2016-11-24Merge tag 'mmc-v4.9-rc5' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc Pull MMC fixes from Ulf Hansson: "MMC host: - sdhci-of-esdhc: Fix card detection - dw_mmc: Fix DMA error path" * tag 'mmc-v4.9-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc: mmc: dw_mmc: fix the error handling for dma operation mmc: sdhci-of-esdhc: fixup PRESENT_STATE read
2016-11-24Merge tag 'usb-4.9-rc7' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb Pull USB fixes from Greg KH: "Here are a few small USB fixes and new device ids for 4.9-rc7. The majority of these fixes are in the musb driver, fixing a number of regressions that have been reported but took a while to resolve. The other fixes are all small ones, to resolve other reported minor issues. All have been in linux-next for a while with no reported issues" * tag 'usb-4.9-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb: usb: gadget: f_fs: fix wrong parenthesis in ffs_func_req_match() phy: twl4030-usb: Fix for musb session bit based PM usb: musb: Drop pointless PM runtime code for dsps glue usb: musb: Add missing pm_runtime_disable and drop 2430 PM timeout usb: musb: Fix PM for hub disconnect usb: musb: Fix sleeping function called from invalid context for hdrc glue usb: musb: Fix broken use of static variable for multiple instances USB: serial: cp210x: add ID for the Zone DPMX usb: chipidea: move the lock initialization to core file Fix USB CB/CBI storage devices with CONFIG_VMAP_STACK=y USB: serial: ftdi_sio: add support for TI CC3200 LaunchPad
2016-11-24Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jikos/hid Pull HID fixes from Jiri Kosina: - DMA-on-stack fixes for a couple drivers, from Benjamin Tissoires - small memory sanitization fix for sensor-hub driver, from Song Hongyan * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/hid: HID: hid-sensor-hub: clear memory to avoid random data HID: rmi: make transfer buffers DMA capable HID: magicmouse: make transfer buffers DMA capable HID: lg: make transfer buffers DMA capable HID: cp2112: make transfer buffers DMA capable
2016-11-24KVM: x86: check for pic and ioapic presence before useRadim Krčmář
Split irqchip allows pic and ioapic routes to be used without them being created, which results in NULL access. Check for NULL and avoid it. (The setup is too racy for a nicer solutions.) Found by syzkaller: general protection fault: 0000 [#1] SMP DEBUG_PAGEALLOC KASAN Dumping ftrace buffer: (ftrace buffer empty) Modules linked in: CPU: 3 PID: 11923 Comm: kworker/3:2 Not tainted 4.9.0-rc5+ #27 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011 Workqueue: events irqfd_inject task: ffff88006a06c7c0 task.stack: ffff880068638000 RIP: 0010:[...] [...] __lock_acquire+0xb35/0x3380 kernel/locking/lockdep.c:3221 RSP: 0000:ffff88006863ea20 EFLAGS: 00010006 RAX: dffffc0000000000 RBX: dffffc0000000000 RCX: 0000000000000000 RDX: 0000000000000039 RSI: 0000000000000000 RDI: 1ffff1000d0c7d9e RBP: ffff88006863ef58 R08: 0000000000000001 R09: 0000000000000000 R10: 00000000000001c8 R11: 0000000000000000 R12: ffff88006a06c7c0 R13: 0000000000000001 R14: ffffffff8baab1a0 R15: 0000000000000001 FS: 0000000000000000(0000) GS:ffff88006d100000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00000000004abdd0 CR3: 000000003e2f2000 CR4: 00000000000026e0 Stack: ffffffff894d0098 1ffff1000d0c7d56 ffff88006863ecd0 dffffc0000000000 ffff88006a06c7c0 0000000000000000 ffff88006863ecf8 0000000000000082 0000000000000000 ffffffff815dd7c1 ffffffff00000000 ffffffff00000000 Call Trace: [...] lock_acquire+0x2a2/0x790 kernel/locking/lockdep.c:3746 [...] __raw_spin_lock include/linux/spinlock_api_smp.h:144 [...] _raw_spin_lock+0x38/0x50 kernel/locking/spinlock.c:151 [...] spin_lock include/linux/spinlock.h:302 [...] kvm_ioapic_set_irq+0x4c/0x100 arch/x86/kvm/ioapic.c:379 [...] kvm_set_ioapic_irq+0x8f/0xc0 arch/x86/kvm/irq_comm.c:52 [...] kvm_set_irq+0x239/0x640 arch/x86/kvm/../../../virt/kvm/irqchip.c:101 [...] irqfd_inject+0xb4/0x150 arch/x86/kvm/../../../virt/kvm/eventfd.c:60 [...] process_one_work+0xb40/0x1ba0 kernel/workqueue.c:2096 [...] worker_thread+0x214/0x18a0 kernel/workqueue.c:2230 [...] kthread+0x328/0x3e0 kernel/kthread.c:209 [...] ret_from_fork+0x2a/0x40 arch/x86/entry/entry_64.S:433 Reported-by: Dmitry Vyukov <dvyukov@google.com> Cc: stable@vger.kernel.org Fixes: 49df6397edfc ("KVM: x86: Split the APIC from the rest of IRQCHIP.") Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2016-11-24KVM: x86: fix out-of-bounds accesses of rtc_eoi mapRadim Krčmář
KVM was using arrays of size KVM_MAX_VCPUS with vcpu_id, but ID can be bigger that the maximal number of VCPUs, resulting in out-of-bounds access. Found by syzkaller: BUG: KASAN: slab-out-of-bounds in __apic_accept_irq+0xb33/0xb50 at addr [...] Write of size 1 by task a.out/27101 CPU: 1 PID: 27101 Comm: a.out Not tainted 4.9.0-rc5+ #49 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011 [...] Call Trace: [...] __apic_accept_irq+0xb33/0xb50 arch/x86/kvm/lapic.c:905 [...] kvm_apic_set_irq+0x10e/0x180 arch/x86/kvm/lapic.c:495 [...] kvm_irq_delivery_to_apic+0x732/0xc10 arch/x86/kvm/irq_comm.c:86 [...] ioapic_service+0x41d/0x760 arch/x86/kvm/ioapic.c:360 [...] ioapic_set_irq+0x275/0x6c0 arch/x86/kvm/ioapic.c:222 [...] kvm_ioapic_inject_all arch/x86/kvm/ioapic.c:235 [...] kvm_set_ioapic+0x223/0x310 arch/x86/kvm/ioapic.c:670 [...] kvm_vm_ioctl_set_irqchip arch/x86/kvm/x86.c:3668 [...] kvm_arch_vm_ioctl+0x1a08/0x23c0 arch/x86/kvm/x86.c:3999 [...] kvm_vm_ioctl+0x1fa/0x1a70 arch/x86/kvm/../../../virt/kvm/kvm_main.c:3099 Reported-by: Dmitry Vyukov <dvyukov@google.com> Cc: stable@vger.kernel.org Fixes: af1bae5497b9 ("KVM: x86: bump KVM_MAX_VCPU_ID to 1023") Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: David Hildenbrand <david@redhat.com> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2016-11-24KVM: x86: drop error recovery in em_jmp_far and em_ret_farRadim Krčmář
em_jmp_far and em_ret_far assumed that setting IP can only fail in 64 bit mode, but syzkaller proved otherwise (and SDM agrees). Code segment was restored upon failure, but it was left uninitialized outside of long mode, which could lead to a leak of host kernel stack. We could have fixed that by always saving and restoring the CS, but we take a simpler approach and just break any guest that manages to fail as the error recovery is error-prone and modern CPUs don't need emulator for this. Found by syzkaller: WARNING: CPU: 2 PID: 3668 at arch/x86/kvm/emulate.c:2217 em_ret_far+0x428/0x480 Kernel panic - not syncing: panic_on_warn set ... CPU: 2 PID: 3668 Comm: syz-executor Not tainted 4.9.0-rc4+ #49 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011 [...] Call Trace: [...] __dump_stack lib/dump_stack.c:15 [...] dump_stack+0xb3/0x118 lib/dump_stack.c:51 [...] panic+0x1b7/0x3a3 kernel/panic.c:179 [...] __warn+0x1c4/0x1e0 kernel/panic.c:542 [...] warn_slowpath_null+0x2c/0x40 kernel/panic.c:585 [...] em_ret_far+0x428/0x480 arch/x86/kvm/emulate.c:2217 [...] em_ret_far_imm+0x17/0x70 arch/x86/kvm/emulate.c:2227 [...] x86_emulate_insn+0x87a/0x3730 arch/x86/kvm/emulate.c:5294 [...] x86_emulate_instruction+0x520/0x1ba0 arch/x86/kvm/x86.c:5545 [...] emulate_instruction arch/x86/include/asm/kvm_host.h:1116 [...] complete_emulated_io arch/x86/kvm/x86.c:6870 [...] complete_emulated_mmio+0x4e9/0x710 arch/x86/kvm/x86.c:6934 [...] kvm_arch_vcpu_ioctl_run+0x3b7a/0x5a90 arch/x86/kvm/x86.c:6978 [...] kvm_vcpu_ioctl+0x61e/0xdd0 arch/x86/kvm/../../../virt/kvm/kvm_main.c:2557 [...] vfs_ioctl fs/ioctl.c:43 [...] do_vfs_ioctl+0x18c/0x1040 fs/ioctl.c:679 [...] SYSC_ioctl fs/ioctl.c:694 [...] SyS_ioctl+0x8f/0xc0 fs/ioctl.c:685 [...] entry_SYSCALL_64_fastpath+0x1f/0xc2 Reported-by: Dmitry Vyukov <dvyukov@google.com> Cc: stable@vger.kernel.org Fixes: d1442d85cc30 ("KVM: x86: Handle errors when RIP is set during far jumps") Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2016-11-24KVM: x86: fix out-of-bounds access in lapicRadim Krčmář
Cluster xAPIC delivery incorrectly assumed that dest_id <= 0xff. With enabled KVM_X2APIC_API_USE_32BIT_IDS in KVM_CAP_X2APIC_API, a userspace can send an interrupt with dest_id that results in out-of-bounds access. Found by syzkaller: BUG: KASAN: slab-out-of-bounds in kvm_irq_delivery_to_apic_fast+0x11fa/0x1210 at addr ffff88003d9ca750 Read of size 8 by task syz-executor/22923 CPU: 0 PID: 22923 Comm: syz-executor Not tainted 4.9.0-rc4+ #49 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011 [...] Call Trace: [...] __dump_stack lib/dump_stack.c:15 [...] dump_stack+0xb3/0x118 lib/dump_stack.c:51 [...] kasan_object_err+0x1c/0x70 mm/kasan/report.c:156 [...] print_address_description mm/kasan/report.c:194 [...] kasan_report_error mm/kasan/report.c:283 [...] kasan_report+0x231/0x500 mm/kasan/report.c:303 [...] __asan_report_load8_noabort+0x14/0x20 mm/kasan/report.c:329 [...] kvm_irq_delivery_to_apic_fast+0x11fa/0x1210 arch/x86/kvm/lapic.c:824 [...] kvm_irq_delivery_to_apic+0x132/0x9a0 arch/x86/kvm/irq_comm.c:72 [...] kvm_set_msi+0x111/0x160 arch/x86/kvm/irq_comm.c:157 [...] kvm_send_userspace_msi+0x201/0x280 arch/x86/kvm/../../../virt/kvm/irqchip.c:74 [...] kvm_vm_ioctl+0xba5/0x1670 arch/x86/kvm/../../../virt/kvm/kvm_main.c:3015 [...] vfs_ioctl fs/ioctl.c:43 [...] do_vfs_ioctl+0x18c/0x1040 fs/ioctl.c:679 [...] SYSC_ioctl fs/ioctl.c:694 [...] SyS_ioctl+0x8f/0xc0 fs/ioctl.c:685 [...] entry_SYSCALL_64_fastpath+0x1f/0xc2 Reported-by: Dmitry Vyukov <dvyukov@google.com> Cc: stable@vger.kernel.org Fixes: e45115b62f9a ("KVM: x86: use physical LAPIC array for logical x2APIC") Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2016-11-24init: use pr_cont() when displaying rotator during ramdisk loading.Nicolas Schichan
Otherwise each individual rotator char would be printed in a new line: (...) [ 0.642350] - [ 0.644374] | [ 0.646367] - (...) Signed-off-by: Nicolas Schichan <nicolas.schichan@gmail.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-11-24ipv6: bump genid when the IFA_F_TENTATIVE flag is clearPaolo Abeni
When an ipv6 address has the tentative flag set, it can't be used as source for egress traffic, while the associated route, if any, can be looked up and even stored into some dst_cache. In the latter scenario, the source ipv6 address selected and stored in the cache is most probably wrong (e.g. with link-local scope) and the entity using the dst_cache will experience lack of ipv6 connectivity until said cache is cleared or invalidated. Overall this may cause lack of connectivity over most IPv6 tunnels (comprising geneve and vxlan), if the first egress packet reaches the tunnel before the DaD is completed for the used ipv6 address. This patch bumps a new genid after that the IFA_F_TENTATIVE flag is cleared, so that dst_cache will be invalidated on next lookup and ipv6 connectivity restored. Fixes: 0c1d70af924b ("net: use dst_cache for vxlan device") Fixes: 468dfffcd762 ("geneve: add dst caching support") Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>