Age | Commit message (Collapse) | Author |
|
The UEFI stub executes in the context of the firmware, which identity
maps the available system RAM, which implies that only memory below
4 GB can be used for allocations on 32-bit architectures, even on [L]PAE
capable hardware.
So ignore any reported memory above 4 GB in efi_random_alloc(). This
also fixes a reported build problem on ARM under -Os, where the 64-bit
logical shift relies on a software routine that the ARM decompressor does
not provide.
A second [minor] issue is also fixed, where the '+ 1' is moved out of
the shift, where it belongs: the reason for its presence is that a
memory region where start == end should count as a single slot, given
that 'end' takes the desired size and alignment of the allocation into
account.
To clarify the code in this regard, rename start/end to 'first_slot' and
'last_slot', respectively, and introduce 'region_end' to describe the
last usable address of the current region.
Reported-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Matt Fleming <matt@codeblueprint.co.uk>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-efi@vger.kernel.org
Link: http://lkml.kernel.org/r/1480010543-25709-2-git-send-email-ard.biesheuvel@linaro.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
Since the KERN_CONT changes the locking-selftest output is messed up, eg:
----------------------------------------------------------------------------
| spin |wlock |rlock |mutex | wsem | rsem |
--------------------------------------------------------------------------
A-A deadlock:
ok |
ok |
ok |
ok |
ok |
ok |
Use pr_cont() to get it looking normal again:
----------------------------------------------------------------------------
| spin |wlock |rlock |mutex | wsem | rsem |
--------------------------------------------------------------------------
A-A deadlock: ok | ok | ok | ok | ok | ok |
Reported-by: Christian Kujau <lists@nerdbynature.de>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linuxppc-dev@ozlabs.org
Link: http://lkml.kernel.org/r/1480027528-934-1-git-send-email-mpe@ellerman.id.au
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
... instead of naked numbers like the rest of the asm does in this file.
No code changed:
# arch/x86/kernel/head_64.o:
text data bss dec hex filename
1124 290864 4096 296084 48494 head_64.o.before
1124 290864 4096 296084 48494 head_64.o.after
md5:
87086e202588939296f66e892414ffe2 head_64.o.before.asm
87086e202588939296f66e892414ffe2 head_64.o.after.asm
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20161124210550.15025-1-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
https://github.com/ckhu-mediatek/linux.git-tags into drm-fixes
This branch include patches of fixing a typo, accurate dsi frame rate,
and fixing null pointer dereference.
* 'mediatek-drm-fixes-2016-11-24' of https://github.com/ckhu-mediatek/linux.git-tags:
drm/mediatek: fix null pointer dereference
drm/mediatek: fixed the calc method of data rate per lane
drm/mediatek: fix a typo of DISP_OD_CFG to OD_RELAYMODE
|
|
With commit e58e87adc8bf9 ("powerpc/mm: Update _PAGE_KERNEL_RO") we
started using the ppp value 0b110 to map kernel readonly. But that
facility was only added as part of ISA 2.04. For earlier ISA version
only supported ppp bit value for readonly mapping is 0b011. (This
implies both user and kernel get mapped using the same ppp bit value for
readonly mapping.).
Update the code such that for earlier architecture version we use ppp
value 0b011 for readonly mapping. We don't differentiate between power5+
and power5 here and apply the new ppp bits only from power6 (ISA 2.05).
This keep the changes minimal.
This fixes issue with PS3 spu usage reported at
https://lkml.kernel.org/r/rep.1421449714.geoff@infradead.org
Fixes: e58e87adc8bf9 ("powerpc/mm: Update _PAGE_KERNEL_RO")
Cc: stable@vger.kernel.org # v4.7+
Tested-by: Geoff Levand <geoff@infradead.org>
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|
|
This fixes CVE-2016-8650.
If mpi_powm() is given a zero exponent, it wants to immediately return
either 1 or 0, depending on the modulus. However, if the result was
initalised with zero limb space, no limbs space is allocated and a
NULL-pointer exception ensues.
Fix this by allocating a minimal amount of limb space for the result when
the 0-exponent case when the result is 1 and not touching the limb space
when the result is 0.
This affects the use of RSA keys and X.509 certificates that carry them.
BUG: unable to handle kernel NULL pointer dereference at (null)
IP: [<ffffffff8138ce5d>] mpi_powm+0x32/0x7e6
PGD 0
Oops: 0002 [#1] SMP
Modules linked in:
CPU: 3 PID: 3014 Comm: keyctl Not tainted 4.9.0-rc6-fscache+ #278
Hardware name: ASUS All Series/H97-PLUS, BIOS 2306 10/09/2014
task: ffff8804011944c0 task.stack: ffff880401294000
RIP: 0010:[<ffffffff8138ce5d>] [<ffffffff8138ce5d>] mpi_powm+0x32/0x7e6
RSP: 0018:ffff880401297ad8 EFLAGS: 00010212
RAX: 0000000000000000 RBX: ffff88040868bec0 RCX: ffff88040868bba0
RDX: ffff88040868b260 RSI: ffff88040868bec0 RDI: ffff88040868bee0
RBP: ffff880401297ba8 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000047 R11: ffffffff8183b210 R12: 0000000000000000
R13: ffff8804087c7600 R14: 000000000000001f R15: ffff880401297c50
FS: 00007f7a7918c700(0000) GS:ffff88041fb80000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000000 CR3: 0000000401250000 CR4: 00000000001406e0
Stack:
ffff88040868bec0 0000000000000020 ffff880401297b00 ffffffff81376cd4
0000000000000100 ffff880401297b10 ffffffff81376d12 ffff880401297b30
ffffffff81376f37 0000000000000100 0000000000000000 ffff880401297ba8
Call Trace:
[<ffffffff81376cd4>] ? __sg_page_iter_next+0x43/0x66
[<ffffffff81376d12>] ? sg_miter_get_next_page+0x1b/0x5d
[<ffffffff81376f37>] ? sg_miter_next+0x17/0xbd
[<ffffffff8138ba3a>] ? mpi_read_raw_from_sgl+0xf2/0x146
[<ffffffff8132a95c>] rsa_verify+0x9d/0xee
[<ffffffff8132acca>] ? pkcs1pad_sg_set_buf+0x2e/0xbb
[<ffffffff8132af40>] pkcs1pad_verify+0xc0/0xe1
[<ffffffff8133cb5e>] public_key_verify_signature+0x1b0/0x228
[<ffffffff8133d974>] x509_check_for_self_signed+0xa1/0xc4
[<ffffffff8133cdde>] x509_cert_parse+0x167/0x1a1
[<ffffffff8133d609>] x509_key_preparse+0x21/0x1a1
[<ffffffff8133c3d7>] asymmetric_key_preparse+0x34/0x61
[<ffffffff812fc9f3>] key_create_or_update+0x145/0x399
[<ffffffff812fe227>] SyS_add_key+0x154/0x19e
[<ffffffff81001c2b>] do_syscall_64+0x80/0x191
[<ffffffff816825e4>] entry_SYSCALL64_slow_path+0x25/0x25
Code: 56 41 55 41 54 53 48 81 ec a8 00 00 00 44 8b 71 04 8b 42 04 4c 8b 67 18 45 85 f6 89 45 80 0f 84 b4 06 00 00 85 c0 75 2f 41 ff ce <49> c7 04 24 01 00 00 00 b0 01 75 0b 48 8b 41 18 48 83 38 01 0f
RIP [<ffffffff8138ce5d>] mpi_powm+0x32/0x7e6
RSP <ffff880401297ad8>
CR2: 0000000000000000
---[ end trace d82015255d4a5d8d ]---
Basically, this is a backport of a libgcrypt patch:
http://git.gnupg.org/cgi-bin/gitweb.cgi?p=libgcrypt.git;a=patch;h=6e1adb05d290aeeb1c230c763970695f4a538526
Fixes: cdec9cb5167a ("crypto: GnuPG based MPI lib - source files (part 1)")
Signed-off-by: Andrey Ryabinin <aryabinin@virtuozzo.com>
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Dmitry Kasatkin <dmitry.kasatkin@gmail.com>
cc: linux-ima-devel@lists.sourceforge.net
cc: stable@vger.kernel.org
Signed-off-by: James Morris <james.l.morris@oracle.com>
|
|
We shouldn't free cert->pub->key in x509_cert_parse() because
x509_free_certificate() also does this:
BUG: Double free or freeing an invalid pointer
...
Call Trace:
[<ffffffff81896c20>] dump_stack+0x63/0x83
[<ffffffff81356571>] kasan_object_err+0x21/0x70
[<ffffffff81356ed9>] kasan_report_double_free+0x49/0x60
[<ffffffff813561ad>] kasan_slab_free+0x9d/0xc0
[<ffffffff81350b7a>] kfree+0x8a/0x1a0
[<ffffffff81844fbf>] public_key_free+0x1f/0x30
[<ffffffff818455d4>] x509_free_certificate+0x24/0x90
[<ffffffff818460bc>] x509_cert_parse+0x2bc/0x300
[<ffffffff81846cae>] x509_key_preparse+0x3e/0x330
[<ffffffff818444cf>] asymmetric_key_preparse+0x6f/0x100
[<ffffffff8178bec0>] key_create_or_update+0x260/0x5f0
[<ffffffff8178e6d9>] SyS_add_key+0x199/0x2a0
[<ffffffff821d823b>] entry_SYSCALL_64_fastpath+0x1e/0xad
Object at ffff880110bd1900, in cache kmalloc-512 size: 512
....
Freed:
PID = 2579
[<ffffffff8104283b>] save_stack_trace+0x1b/0x20
[<ffffffff813558f6>] save_stack+0x46/0xd0
[<ffffffff81356183>] kasan_slab_free+0x73/0xc0
[<ffffffff81350b7a>] kfree+0x8a/0x1a0
[<ffffffff818460a3>] x509_cert_parse+0x2a3/0x300
[<ffffffff81846cae>] x509_key_preparse+0x3e/0x330
[<ffffffff818444cf>] asymmetric_key_preparse+0x6f/0x100
[<ffffffff8178bec0>] key_create_or_update+0x260/0x5f0
[<ffffffff8178e6d9>] SyS_add_key+0x199/0x2a0
[<ffffffff821d823b>] entry_SYSCALL_64_fastpath+0x1e/0xad
Fixes: db6c43bd2132 ("crypto: KEYS: convert public key and digsig asym to the akcipher api")
Signed-off-by: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: James Morris <james.l.morris@oracle.com>
|
|
Free memory mapping, if hdmi_probe is not successful.
Signed-off-by: Arvind Yadav <arvind.yadav.cs@gmail.com>
Signed-off-by: Inki Dae <inki.dae@samsung.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth
Johan Hedberg says:
====================
pull request: bluetooth 2016-11-23
Sorry about the late pull request for 4.9, but we have one more
important Bluetooth patch that should make it to the release. It fixes
connection creation for Bluetooth LE controllers that do not have a
public address (only a random one).
Please let me know if there are any issues pulling. Thanks.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
After commit 1576d9860599 ("tun: switch to use skb array for tx"),
sk_receive_queue was not used any more. So remove the uncessary
sk_receive_queue length check during xmit.
Signed-off-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Should pass valid filter handle, not the netlink flags.
Fixes: 30a391a13ab92 ("net sched filters: pass netlink message flags in event notification")
Signed-off-by: Roman Mashak <mrv@mojatatu.com>
Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
Reported-by: Cong Wang <xiyou.wangcong@gmail.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
llvm can emit relocations into sections other than program code
(like debug info sections). Ignore them during parsing of elf file
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
since llvm commit "Do not expand UNDEF SDNode during insn selection lowering"
llvm will generate code that uses uninitialized registers for cases
where C code is actually uses uninitialized data.
So this sockex2 example is technically broken.
Fix it by initializing on the stack variable fully.
Also increase verifier buffer limit, since verifier output
may not fit in 64k for this sockex2 code depending on llvm version.
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Goal is to reorganize this critical structure to increase performance.
ndo_start_xmit() should only dirty one cache line, and access as few
cache lines as possible.
Add sp_ (Slow Path) prefix to fields that are not used in fast path,
to make clear what is going on.
After this patch pahole reports something much better, as all
ndo_start_xmit() needed fields are packed into two cache lines instead
of seven or eight
struct mlx4_en_tx_ring {
u32 last_nr_txbb; /* 0 0x4 */
u32 cons; /* 0x4 0x4 */
long unsigned int wake_queue; /* 0x8 0x8 */
struct netdev_queue * tx_queue; /* 0x10 0x8 */
u32 (*free_tx_desc)(struct mlx4_en_priv *, struct mlx4_en_tx_ring *, int, u8, u64, int); /* 0x18 0x8 */
struct mlx4_en_rx_ring * recycle_ring; /* 0x20 0x8 */
/* XXX 24 bytes hole, try to pack */
/* --- cacheline 1 boundary (64 bytes) --- */
u32 prod; /* 0x40 0x4 */
unsigned int tx_dropped; /* 0x44 0x4 */
long unsigned int bytes; /* 0x48 0x8 */
long unsigned int packets; /* 0x50 0x8 */
long unsigned int tx_csum; /* 0x58 0x8 */
long unsigned int tso_packets; /* 0x60 0x8 */
long unsigned int xmit_more; /* 0x68 0x8 */
struct mlx4_bf bf; /* 0x70 0x18 */
/* --- cacheline 2 boundary (128 bytes) was 8 bytes ago --- */
__be32 doorbell_qpn; /* 0x88 0x4 */
__be32 mr_key; /* 0x8c 0x4 */
u32 size; /* 0x90 0x4 */
u32 size_mask; /* 0x94 0x4 */
u32 full_size; /* 0x98 0x4 */
u32 buf_size; /* 0x9c 0x4 */
void * buf; /* 0xa0 0x8 */
struct mlx4_en_tx_info * tx_info; /* 0xa8 0x8 */
int qpn; /* 0xb0 0x4 */
u8 queue_index; /* 0xb4 0x1 */
bool bf_enabled; /* 0xb5 0x1 */
bool bf_alloced; /* 0xb6 0x1 */
u8 hwtstamp_tx_type; /* 0xb7 0x1 */
u8 * bounce_buf; /* 0xb8 0x8 */
/* --- cacheline 3 boundary (192 bytes) --- */
long unsigned int queue_stopped; /* 0xc0 0x8 */
struct mlx4_hwq_resources sp_wqres; /* 0xc8 0x58 */
/* --- cacheline 4 boundary (256 bytes) was 32 bytes ago --- */
struct mlx4_qp sp_qp; /* 0x120 0x30 */
/* --- cacheline 5 boundary (320 bytes) was 16 bytes ago --- */
struct mlx4_qp_context sp_context; /* 0x150 0xf8 */
/* --- cacheline 9 boundary (576 bytes) was 8 bytes ago --- */
cpumask_t sp_affinity_mask; /* 0x248 0x20 */
enum mlx4_qp_state sp_qp_state; /* 0x268 0x4 */
u16 sp_stride; /* 0x26c 0x2 */
u16 sp_cqn; /* 0x26e 0x2 */
/* size: 640, cachelines: 10, members: 36 */
/* sum members: 600, holes: 1, sum holes: 24 */
/* padding: 16 */
};
Instead of this silly placement :
struct mlx4_en_tx_ring {
u32 last_nr_txbb; /* 0 0x4 */
u32 cons; /* 0x4 0x4 */
long unsigned int wake_queue; /* 0x8 0x8 */
/* XXX 48 bytes hole, try to pack */
/* --- cacheline 1 boundary (64 bytes) --- */
u32 prod; /* 0x40 0x4 */
/* XXX 4 bytes hole, try to pack */
long unsigned int bytes; /* 0x48 0x8 */
long unsigned int packets; /* 0x50 0x8 */
long unsigned int tx_csum; /* 0x58 0x8 */
long unsigned int tso_packets; /* 0x60 0x8 */
long unsigned int xmit_more; /* 0x68 0x8 */
unsigned int tx_dropped; /* 0x70 0x4 */
/* XXX 4 bytes hole, try to pack */
struct mlx4_bf bf; /* 0x78 0x18 */
/* --- cacheline 2 boundary (128 bytes) was 16 bytes ago --- */
long unsigned int queue_stopped; /* 0x90 0x8 */
cpumask_t affinity_mask; /* 0x98 0x10 */
struct mlx4_qp qp; /* 0xa8 0x30 */
/* --- cacheline 3 boundary (192 bytes) was 24 bytes ago --- */
struct mlx4_hwq_resources wqres; /* 0xd8 0x58 */
/* --- cacheline 4 boundary (256 bytes) was 48 bytes ago --- */
u32 size; /* 0x130 0x4 */
u32 size_mask; /* 0x134 0x4 */
u16 stride; /* 0x138 0x2 */
/* XXX 2 bytes hole, try to pack */
u32 full_size; /* 0x13c 0x4 */
/* --- cacheline 5 boundary (320 bytes) --- */
u16 cqn; /* 0x140 0x2 */
/* XXX 2 bytes hole, try to pack */
u32 buf_size; /* 0x144 0x4 */
__be32 doorbell_qpn; /* 0x148 0x4 */
__be32 mr_key; /* 0x14c 0x4 */
void * buf; /* 0x150 0x8 */
struct mlx4_en_tx_info * tx_info; /* 0x158 0x8 */
struct mlx4_en_rx_ring * recycle_ring; /* 0x160 0x8 */
u32 (*free_tx_desc)(struct mlx4_en_priv *, struct mlx4_en_tx_ring *, int, u8, u64, int); /* 0x168 0x8 */
u8 * bounce_buf; /* 0x170 0x8 */
struct mlx4_qp_context context; /* 0x178 0xf8 */
/* --- cacheline 9 boundary (576 bytes) was 48 bytes ago --- */
int qpn; /* 0x270 0x4 */
enum mlx4_qp_state qp_state; /* 0x274 0x4 */
u8 queue_index; /* 0x278 0x1 */
bool bf_enabled; /* 0x279 0x1 */
bool bf_alloced; /* 0x27a 0x1 */
/* XXX 5 bytes hole, try to pack */
/* --- cacheline 10 boundary (640 bytes) --- */
struct netdev_queue * tx_queue; /* 0x280 0x8 */
int hwtstamp_tx_type; /* 0x288 0x4 */
/* size: 704, cachelines: 11, members: 36 */
/* sum members: 587, holes: 6, sum holes: 65 */
/* padding: 52 */
};
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
PHY drivers should be able to rely on the caller of {get,set}_tunable to
have acquired the PHY device mutex, in order to both serialize against
concurrent calls of these functions, but also against PHY state machine
changes. All ethtool PHY-level functions do this, except
{get,set}_tunable, so we make them consistent here as well.
We need to update the Microsemi PHY driver in the same commit to avoid
introducing either deadlocks, or lack of proper locking.
Fixes: 968ad9da7e0e ("ethtool: Implements ETHTOOL_PHY_GTUNABLE/ETHTOOL_PHY_STUNABLE")
Fixes: 310d9ad57ae0 ("net: phy: Add downshift get/set support in Microsemi PHYs driver")
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Allan W. Nielsen <allan.nielsen@microsemi.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Saeed Mahameed says:
====================
Mellanox 100G mlx5 SRIOV switchdev update
This series from Roi and Or further enhances the new SRIOV switchdev mode.
Roi's patches deal with allowing users to configure though devlink
the level of inline headers that the VF should be setting in order for
the eswitch HW to do proper matching. We also enforce that the matching
required for offloaded TC rules is aligned with that level on the PF driver.
Or's patches deals with allowing the user to control on the VF operational
link state through admin directives on the mlx5 VF rep link. Also in this series
is implementation of HW and SW counters for the mlx5 VF rep which is aligned
with the design set by commit a5ea31f57309 'Merge branch net-offloaded-stats'.
v1 --> v2:
* constified the net-device param of get offloaded stats ndo in mlxsw
(pointed by 0-day screaming on us...)
* added Or's Review-by tags for Roi's patches
This series was generated against commit
e796f49d826a ("net: ieee802154: constify ieee802154_ops structures")
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
A flow should be offloaded only if the matches are
allowed according to min inline mode.
Signed-off-by: Roi Dayan <roid@mellanox.com>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Implement devlink show and set of HW inline-mode.
The supported modes: none, link, network, transport.
We currently support one mode for all vports so set is done on all vports.
When eswitch is first initialized the inline-mode is queried from the FW.
Signed-off-by: Roi Dayan <roid@mellanox.com>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Also move the inline capablities enum to a shared header vport.h
Signed-off-by: Roi Dayan <roid@mellanox.com>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Some HWs need the VF driver to put part of the packet headers on the
TX descriptor so the e-switch can do proper matching and steering.
The supported modes: none, link, network, transport.
Signed-off-by: Roi Dayan <roid@mellanox.com>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Reflect the administative link changes done on the VF representor to the
VF e-switch vport. This means that doing ip link set down/up commands on
the VF rep will modify the e-switch vport state which in turn will make
proper VF drivers to set their carrier accordingly.
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Switchdev driver net-device port statistics should follow the model introduced
in commit a5ea31f57309 'Merge branch net-offloaded-stats'.
For VF reps we return the SRIOV eswitch vport stats as the usual ones and SW stats
if asked. For the PF, if we're in the switchdev mode, we return the uplink stats
and SW stats if asked, otherwise as before. The uplink stats are implemented using
the PPCNT 802_3 counters which are already being read/cached by the driver.
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Some drivers would need to check few internal matters for
that. To be used in downstream mlx5 commit.
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
In case the link change and EEE is enabled or disabled, always try to
re-negotiate this with the link partner.
Fixes: 450b05c15f9c ("net: dsa: bcm_sf2: add support for controlling EEE")
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Florian Fainelli says:
====================
net: phy: broadcom: Wirespeed/downshift support
This patch series adds support for the Broadcom Wirespeed, aka
downsfhit feature utilizing the recently added ethtool PHY tunables.
Tested with two Gigabit link partners with a 4-wire cable having only
2 pairs connected.
Last patch in the series is a fix that was required for testing, which
should make it to -stable, which I can submit separate against net if
you prefer David.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
In case the link change and EEE is enabled or disabled, always try to
re-negotiate this with the link partner.
Fixes: 450b05c15f9c ("net: dsa: bcm_sf2: add support for controlling EEE")
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Add support for configuring the downshift/Wirespeed enable/disable
toggles and specify a link retry value ranging from 1 to 9. Since the
integrated BCM7xxx have issues when wirespeed is enabled and EEE is also
enabled, we do disable EEE if wirespeed is enabled.
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
In preparation for adding support for Wirespeed/downshift, we need to
change bcm_phy_eee_enable() to allow enabling or disabling EEE, so make
the function take an extra enable/disable boolean parameter and rename
it to illustrate it sets EEE, not necessarily just enables it.
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Broadcom's Wirespeed feature allows us to configure how auto-negotiation
should behave with fewer working pairs of wires on a cable. Add support
code for retrieving and setting such downshift counters using the
recently added ethtool downshift tunables.
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
We are going to need these functions to implement support for Broadcom
Wirespeed, aka downshift.
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
In commit 2331ccc5b323 ("tcp: enhance tcp collapsing"),
we made a first step allowing copying right skb to left skb head.
Since all skbs in socket write queue are headless (but possibly the very
first one), this strategy often does not work.
This patch extends tcp_collapse_retrans() to perform frag shifting,
thanks to skb_shift() helper.
This helper needs to not BUG on non headless skbs, as callers are ok
with that.
Tested:
Following packetdrill test now passes :
0.000 socket(..., SOCK_STREAM, IPPROTO_TCP) = 3
+0 setsockopt(3, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0
+0 bind(3, ..., ...) = 0
+0 listen(3, 1) = 0
+0 < S 0:0(0) win 32792 <mss 1460,sackOK,nop,nop,nop,wscale 8>
+0 > S. 0:0(0) ack 1 <mss 1460,nop,nop,sackOK,nop,wscale 8>
+.100 < . 1:1(0) ack 1 win 257
+0 accept(3, ..., ...) = 4
+0 setsockopt(4, SOL_TCP, TCP_NODELAY, [1], 4) = 0
+0 write(4, ..., 200) = 200
+0 > P. 1:201(200) ack 1
+.001 write(4, ..., 200) = 200
+0 > P. 201:401(200) ack 1
+.001 write(4, ..., 200) = 200
+0 > P. 401:601(200) ack 1
+.001 write(4, ..., 200) = 200
+0 > P. 601:801(200) ack 1
+.001 write(4, ..., 200) = 200
+0 > P. 801:1001(200) ack 1
+.001 write(4, ..., 100) = 100
+0 > P. 1001:1101(100) ack 1
+.001 write(4, ..., 100) = 100
+0 > P. 1101:1201(100) ack 1
+.001 write(4, ..., 100) = 100
+0 > P. 1201:1301(100) ack 1
+.001 write(4, ..., 100) = 100
+0 > P. 1301:1401(100) ack 1
+.099 < . 1:1(0) ack 201 win 257
+.001 < . 1:1(0) ack 201 win 257 <nop,nop,sack 1001:1401>
+0 > P. 201:1001(800) ack 1
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Neal Cardwell <ncardwell@google.com>
Cc: Yuchung Cheng <ycheng@google.com>
Acked-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
When busy polling while a link is down (during a link-flap test), TX
timeouts were observed as well as the following messages in the ring
buffer:
bnxt_en 0008:01:00.2 enP8p1s0f2d2: Resp cmpl intr err msg: 0x51
bnxt_en 0008:01:00.2 enP8p1s0f2d2: hwrm_ring_free tx failed. rc:-1
bnxt_en 0008:01:00.2 enP8p1s0f2d2: Resp cmpl intr err msg: 0x51
bnxt_en 0008:01:00.2 enP8p1s0f2d2: hwrm_ring_free rx failed. rc:-1
These were resolved by checking for link status and returning if link
was not up.
Signed-off-by: Andy Gospodarek <gospo@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Tested-by: Rob Miller <rob.miller@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
In commits 93821778def10 ("udp: Fix rcv socket locking") and
f7ad74fef3af ("net/ipv6/udp: UDP encapsulation: break backlog_rcv into
__udpv6_queue_rcv_skb") UDP backlog handlers were renamed, but UDPlite
was forgotten.
This leads to crashes if UDPlite header is pulled twice, which happens
starting from commit e6afc8ace6dd ("udp: remove headers from UDP packets
before queueing")
Bug found by syzkaller team, thanks a lot guys !
Note that backlog use in UDP/UDPlite is scheduled to be removed starting
from linux-4.10, so this patch is only needed up to linux-4.9
Fixes: 93821778def1 ("udp: Fix rcv socket locking")
Fixes: f7ad74fef3af ("net/ipv6/udp: UDP encapsulation: break backlog_rcv into __udpv6_queue_rcv_skb")
Fixes: e6afc8ace6dd ("udp: remove headers from UDP packets before queueing")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Andrey Konovalov <andreyknvl@google.com>
Cc: Benjamin LaHaise <bcrl@kvack.org>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Add support for the MV88E6097 switch. The change was tested on an Armada
based platform with a MV88E6097 switch.
Signed-off-by: Stefan Eichenberger <stefan.eichenberger@netmodule.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Use the acpi cppc_lib interface to get CPPC performance limits and update
the per cpu priority for the ITMT scheduler. If the highest performance of
CPUs differs the ITMT feature is enabled.
Co-developed-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Tim Chen <tim.c.chen@linux.intel.com>
Cc: linux-pm@vger.kernel.org
Cc: peterz@infradead.org
Cc: jolsa@redhat.com
Cc: rjw@rjwysocki.net
Cc: linux-acpi@vger.kernel.org
Cc: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Cc: bp@suse.de
Link: http://lkml.kernel.org/r/0998b98943bcdec7d1ddd4ff27358da555ea8e92.1479844244.git.tim.c.chen@linux.intel.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
|
Set the OSC_SB_CPC_DIVERSE_HIGH_SUPPORT (bit 12) to enable diverse
core support.
This is required to enable the BIOS support of the Intel Turbo Boost Max
Technology 3.0 feature.
Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Signed-off-by: Tim Chen <tim.c.chen@linux.intel.com>
Cc: linux-pm@vger.kernel.org
Cc: peterz@infradead.org
Cc: jolsa@redhat.com
Cc: rjw@rjwysocki.net
Cc: linux-acpi@vger.kernel.org
Cc: bp@suse.de
Link: http://lkml.kernel.org/r/a023623a727e86040a1715797055f6402caefd7e.1479844244.git.tim.c.chen@linux.intel.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
|
Need to set platform wide _OSC bits to enable CPPC and CPPC version 2.
If platform supports CPPC, then BIOS exposes CPPC tables.
The primary reason to enable CPPC support is to get the maximum
performance of each CPU to check and enable Intel Turbo Boost Max
Technology 3.0 (ITMT).
Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Signed-off-by: Tim Chen <tim.c.chen@linux.intel.com>
Cc: linux-pm@vger.kernel.org
Cc: peterz@infradead.org
Cc: jolsa@redhat.com
Cc: rjw@rjwysocki.net
Cc: linux-acpi@vger.kernel.org
Cc: bp@suse.de
Link: http://lkml.kernel.org/r/a696f6b17843cee9a542482fae6abab087be9587.1479844244.git.tim.c.chen@linux.intel.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
|
Some Intel cores in a package can be boosted to a higher turbo frequency
with ITMT 3.0 technology. The scheduler can use the asymmetric packing
feature to move tasks to the more capable cores.
If ITMT is enabled, add SD_ASYM_PACKING flag to the thread and core
sched domains to enable asymmetric packing.
Co-developed-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Tim Chen <tim.c.chen@linux.intel.com>
Cc: linux-pm@vger.kernel.org
Cc: peterz@infradead.org
Cc: jolsa@redhat.com
Cc: rjw@rjwysocki.net
Cc: linux-acpi@vger.kernel.org
Cc: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Cc: bp@suse.de
Link: http://lkml.kernel.org/r/9bbb885bedbef4eb50e197305eb16b160cff0831.1479844244.git.tim.c.chen@linux.intel.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
|
Intel Turbo Boost Max Technology 3.0 (ITMT) feature
allows some cores to be boosted to higher turbo
frequency than others.
Add /proc/sys/kernel/sched_itmt_enabled so operator
can enable/disable scheduling of tasks that favor cores
with higher turbo boost frequency potential.
By default, system that is ITMT capable and single
socket has this feature turned on. It is more likely
to be lightly loaded and operates in Turbo range.
When there is a change in the ITMT scheduling operation
desired, a rebuild of the sched domain is initiated
so the scheduler can set up sched domains with appropriate
flag to enable/disable ITMT scheduling operations.
Co-developed-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Co-developed-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Signed-off-by: Tim Chen <tim.c.chen@linux.intel.com>
Cc: linux-pm@vger.kernel.org
Cc: peterz@infradead.org
Cc: jolsa@redhat.com
Cc: rjw@rjwysocki.net
Cc: linux-acpi@vger.kernel.org
Cc: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Cc: bp@suse.de
Link: http://lkml.kernel.org/r/07cc62426a28bad57b01ab16bb903a9c84fa5421.1479844244.git.tim.c.chen@linux.intel.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
|
On platforms supporting Intel Turbo Boost Max Technology 3.0, the maximum
turbo frequencies of some cores in a CPU package may be higher than for
the other cores in the same package. In that case, better performance
(and possibly lower energy consumption as well) can be achieved by
making the scheduler prefer to run tasks on the CPUs with higher max
turbo frequencies.
To that end, set up a core priority metric to abstract the core
preferences based on the maximum turbo frequency. In that metric,
the cores with higher maximum turbo frequencies are higher-priority
than the other cores in the same package and that causes the scheduler
to favor them when making load-balancing decisions using the asymmertic
packing approach. At the same time, the priority of SMT threads with a
higher CPU number is reduced so as to avoid scheduling tasks on all of
the threads that belong to a favored core before all of the other cores
have been given a task to run.
The priority metric will be initialized by the P-state driver with the
help of the sched_set_itmt_core_prio() function. The P-state driver
will also determine whether or not ITMT is supported by the platform
and will call sched_set_itmt_support() to indicate that.
Co-developed-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Co-developed-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Signed-off-by: Tim Chen <tim.c.chen@linux.intel.com>
Cc: linux-pm@vger.kernel.org
Cc: peterz@infradead.org
Cc: jolsa@redhat.com
Cc: rjw@rjwysocki.net
Cc: linux-acpi@vger.kernel.org
Cc: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Cc: bp@suse.de
Link: http://lkml.kernel.org/r/cd401ccdff88f88c8349314febdc25d51f7c48f7.1479844244.git.tim.c.chen@linux.intel.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
|
The scheduler calls arch_update_cpu_topology() to check whether the
scheduler domains have to be rebuilt.
So far x86 has no requirement for this, but the upcoming ITMT support
makes this necessary.
Request the rebuild when the x86 internal update flag is set.
Suggested-by: Morten Rasmussen <morten.rasmussen@arm.com>
Signed-off-by: Tim Chen <tim.c.chen@linux.intel.com>
Cc: linux-pm@vger.kernel.org
Cc: peterz@infradead.org
Cc: jolsa@redhat.com
Cc: rjw@rjwysocki.net
Cc: linux-acpi@vger.kernel.org
Cc: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Cc: bp@suse.de
Link: http://lkml.kernel.org/r/bfbf5591276ec60b2af2da798adc1060df1e2a5f.1479844244.git.tim.c.chen@linux.intel.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc
Pull MMC fixes from Ulf Hansson:
"MMC host:
- sdhci-of-esdhc: Fix card detection
- dw_mmc: Fix DMA error path"
* tag 'mmc-v4.9-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc:
mmc: dw_mmc: fix the error handling for dma operation
mmc: sdhci-of-esdhc: fixup PRESENT_STATE read
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb
Pull USB fixes from Greg KH:
"Here are a few small USB fixes and new device ids for 4.9-rc7.
The majority of these fixes are in the musb driver, fixing a number of
regressions that have been reported but took a while to resolve. The
other fixes are all small ones, to resolve other reported minor
issues.
All have been in linux-next for a while with no reported issues"
* tag 'usb-4.9-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb:
usb: gadget: f_fs: fix wrong parenthesis in ffs_func_req_match()
phy: twl4030-usb: Fix for musb session bit based PM
usb: musb: Drop pointless PM runtime code for dsps glue
usb: musb: Add missing pm_runtime_disable and drop 2430 PM timeout
usb: musb: Fix PM for hub disconnect
usb: musb: Fix sleeping function called from invalid context for hdrc glue
usb: musb: Fix broken use of static variable for multiple instances
USB: serial: cp210x: add ID for the Zone DPMX
usb: chipidea: move the lock initialization to core file
Fix USB CB/CBI storage devices with CONFIG_VMAP_STACK=y
USB: serial: ftdi_sio: add support for TI CC3200 LaunchPad
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/jikos/hid
Pull HID fixes from Jiri Kosina:
- DMA-on-stack fixes for a couple drivers, from Benjamin Tissoires
- small memory sanitization fix for sensor-hub driver, from Song
Hongyan
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/hid:
HID: hid-sensor-hub: clear memory to avoid random data
HID: rmi: make transfer buffers DMA capable
HID: magicmouse: make transfer buffers DMA capable
HID: lg: make transfer buffers DMA capable
HID: cp2112: make transfer buffers DMA capable
|
|
Split irqchip allows pic and ioapic routes to be used without them being
created, which results in NULL access. Check for NULL and avoid it.
(The setup is too racy for a nicer solutions.)
Found by syzkaller:
general protection fault: 0000 [#1] SMP DEBUG_PAGEALLOC KASAN
Dumping ftrace buffer:
(ftrace buffer empty)
Modules linked in:
CPU: 3 PID: 11923 Comm: kworker/3:2 Not tainted 4.9.0-rc5+ #27
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
Workqueue: events irqfd_inject
task: ffff88006a06c7c0 task.stack: ffff880068638000
RIP: 0010:[...] [...] __lock_acquire+0xb35/0x3380 kernel/locking/lockdep.c:3221
RSP: 0000:ffff88006863ea20 EFLAGS: 00010006
RAX: dffffc0000000000 RBX: dffffc0000000000 RCX: 0000000000000000
RDX: 0000000000000039 RSI: 0000000000000000 RDI: 1ffff1000d0c7d9e
RBP: ffff88006863ef58 R08: 0000000000000001 R09: 0000000000000000
R10: 00000000000001c8 R11: 0000000000000000 R12: ffff88006a06c7c0
R13: 0000000000000001 R14: ffffffff8baab1a0 R15: 0000000000000001
FS: 0000000000000000(0000) GS:ffff88006d100000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00000000004abdd0 CR3: 000000003e2f2000 CR4: 00000000000026e0
Stack:
ffffffff894d0098 1ffff1000d0c7d56 ffff88006863ecd0 dffffc0000000000
ffff88006a06c7c0 0000000000000000 ffff88006863ecf8 0000000000000082
0000000000000000 ffffffff815dd7c1 ffffffff00000000 ffffffff00000000
Call Trace:
[...] lock_acquire+0x2a2/0x790 kernel/locking/lockdep.c:3746
[...] __raw_spin_lock include/linux/spinlock_api_smp.h:144
[...] _raw_spin_lock+0x38/0x50 kernel/locking/spinlock.c:151
[...] spin_lock include/linux/spinlock.h:302
[...] kvm_ioapic_set_irq+0x4c/0x100 arch/x86/kvm/ioapic.c:379
[...] kvm_set_ioapic_irq+0x8f/0xc0 arch/x86/kvm/irq_comm.c:52
[...] kvm_set_irq+0x239/0x640 arch/x86/kvm/../../../virt/kvm/irqchip.c:101
[...] irqfd_inject+0xb4/0x150 arch/x86/kvm/../../../virt/kvm/eventfd.c:60
[...] process_one_work+0xb40/0x1ba0 kernel/workqueue.c:2096
[...] worker_thread+0x214/0x18a0 kernel/workqueue.c:2230
[...] kthread+0x328/0x3e0 kernel/kthread.c:209
[...] ret_from_fork+0x2a/0x40 arch/x86/entry/entry_64.S:433
Reported-by: Dmitry Vyukov <dvyukov@google.com>
Cc: stable@vger.kernel.org
Fixes: 49df6397edfc ("KVM: x86: Split the APIC from the rest of IRQCHIP.")
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
|
|
KVM was using arrays of size KVM_MAX_VCPUS with vcpu_id, but ID can be
bigger that the maximal number of VCPUs, resulting in out-of-bounds
access.
Found by syzkaller:
BUG: KASAN: slab-out-of-bounds in __apic_accept_irq+0xb33/0xb50 at addr [...]
Write of size 1 by task a.out/27101
CPU: 1 PID: 27101 Comm: a.out Not tainted 4.9.0-rc5+ #49
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
[...]
Call Trace:
[...] __apic_accept_irq+0xb33/0xb50 arch/x86/kvm/lapic.c:905
[...] kvm_apic_set_irq+0x10e/0x180 arch/x86/kvm/lapic.c:495
[...] kvm_irq_delivery_to_apic+0x732/0xc10 arch/x86/kvm/irq_comm.c:86
[...] ioapic_service+0x41d/0x760 arch/x86/kvm/ioapic.c:360
[...] ioapic_set_irq+0x275/0x6c0 arch/x86/kvm/ioapic.c:222
[...] kvm_ioapic_inject_all arch/x86/kvm/ioapic.c:235
[...] kvm_set_ioapic+0x223/0x310 arch/x86/kvm/ioapic.c:670
[...] kvm_vm_ioctl_set_irqchip arch/x86/kvm/x86.c:3668
[...] kvm_arch_vm_ioctl+0x1a08/0x23c0 arch/x86/kvm/x86.c:3999
[...] kvm_vm_ioctl+0x1fa/0x1a70 arch/x86/kvm/../../../virt/kvm/kvm_main.c:3099
Reported-by: Dmitry Vyukov <dvyukov@google.com>
Cc: stable@vger.kernel.org
Fixes: af1bae5497b9 ("KVM: x86: bump KVM_MAX_VCPU_ID to 1023")
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
|
|
em_jmp_far and em_ret_far assumed that setting IP can only fail in 64
bit mode, but syzkaller proved otherwise (and SDM agrees).
Code segment was restored upon failure, but it was left uninitialized
outside of long mode, which could lead to a leak of host kernel stack.
We could have fixed that by always saving and restoring the CS, but we
take a simpler approach and just break any guest that manages to fail
as the error recovery is error-prone and modern CPUs don't need emulator
for this.
Found by syzkaller:
WARNING: CPU: 2 PID: 3668 at arch/x86/kvm/emulate.c:2217 em_ret_far+0x428/0x480
Kernel panic - not syncing: panic_on_warn set ...
CPU: 2 PID: 3668 Comm: syz-executor Not tainted 4.9.0-rc4+ #49
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
[...]
Call Trace:
[...] __dump_stack lib/dump_stack.c:15
[...] dump_stack+0xb3/0x118 lib/dump_stack.c:51
[...] panic+0x1b7/0x3a3 kernel/panic.c:179
[...] __warn+0x1c4/0x1e0 kernel/panic.c:542
[...] warn_slowpath_null+0x2c/0x40 kernel/panic.c:585
[...] em_ret_far+0x428/0x480 arch/x86/kvm/emulate.c:2217
[...] em_ret_far_imm+0x17/0x70 arch/x86/kvm/emulate.c:2227
[...] x86_emulate_insn+0x87a/0x3730 arch/x86/kvm/emulate.c:5294
[...] x86_emulate_instruction+0x520/0x1ba0 arch/x86/kvm/x86.c:5545
[...] emulate_instruction arch/x86/include/asm/kvm_host.h:1116
[...] complete_emulated_io arch/x86/kvm/x86.c:6870
[...] complete_emulated_mmio+0x4e9/0x710 arch/x86/kvm/x86.c:6934
[...] kvm_arch_vcpu_ioctl_run+0x3b7a/0x5a90 arch/x86/kvm/x86.c:6978
[...] kvm_vcpu_ioctl+0x61e/0xdd0 arch/x86/kvm/../../../virt/kvm/kvm_main.c:2557
[...] vfs_ioctl fs/ioctl.c:43
[...] do_vfs_ioctl+0x18c/0x1040 fs/ioctl.c:679
[...] SYSC_ioctl fs/ioctl.c:694
[...] SyS_ioctl+0x8f/0xc0 fs/ioctl.c:685
[...] entry_SYSCALL_64_fastpath+0x1f/0xc2
Reported-by: Dmitry Vyukov <dvyukov@google.com>
Cc: stable@vger.kernel.org
Fixes: d1442d85cc30 ("KVM: x86: Handle errors when RIP is set during far jumps")
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
|
|
Cluster xAPIC delivery incorrectly assumed that dest_id <= 0xff.
With enabled KVM_X2APIC_API_USE_32BIT_IDS in KVM_CAP_X2APIC_API, a
userspace can send an interrupt with dest_id that results in
out-of-bounds access.
Found by syzkaller:
BUG: KASAN: slab-out-of-bounds in kvm_irq_delivery_to_apic_fast+0x11fa/0x1210 at addr ffff88003d9ca750
Read of size 8 by task syz-executor/22923
CPU: 0 PID: 22923 Comm: syz-executor Not tainted 4.9.0-rc4+ #49
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
[...]
Call Trace:
[...] __dump_stack lib/dump_stack.c:15
[...] dump_stack+0xb3/0x118 lib/dump_stack.c:51
[...] kasan_object_err+0x1c/0x70 mm/kasan/report.c:156
[...] print_address_description mm/kasan/report.c:194
[...] kasan_report_error mm/kasan/report.c:283
[...] kasan_report+0x231/0x500 mm/kasan/report.c:303
[...] __asan_report_load8_noabort+0x14/0x20 mm/kasan/report.c:329
[...] kvm_irq_delivery_to_apic_fast+0x11fa/0x1210 arch/x86/kvm/lapic.c:824
[...] kvm_irq_delivery_to_apic+0x132/0x9a0 arch/x86/kvm/irq_comm.c:72
[...] kvm_set_msi+0x111/0x160 arch/x86/kvm/irq_comm.c:157
[...] kvm_send_userspace_msi+0x201/0x280 arch/x86/kvm/../../../virt/kvm/irqchip.c:74
[...] kvm_vm_ioctl+0xba5/0x1670 arch/x86/kvm/../../../virt/kvm/kvm_main.c:3015
[...] vfs_ioctl fs/ioctl.c:43
[...] do_vfs_ioctl+0x18c/0x1040 fs/ioctl.c:679
[...] SYSC_ioctl fs/ioctl.c:694
[...] SyS_ioctl+0x8f/0xc0 fs/ioctl.c:685
[...] entry_SYSCALL_64_fastpath+0x1f/0xc2
Reported-by: Dmitry Vyukov <dvyukov@google.com>
Cc: stable@vger.kernel.org
Fixes: e45115b62f9a ("KVM: x86: use physical LAPIC array for logical x2APIC")
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
|
|
Otherwise each individual rotator char would be printed in a new line:
(...)
[ 0.642350] -
[ 0.644374] |
[ 0.646367] -
(...)
Signed-off-by: Nicolas Schichan <nicolas.schichan@gmail.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
When an ipv6 address has the tentative flag set, it can't be
used as source for egress traffic, while the associated route,
if any, can be looked up and even stored into some dst_cache.
In the latter scenario, the source ipv6 address selected and
stored in the cache is most probably wrong (e.g. with
link-local scope) and the entity using the dst_cache will
experience lack of ipv6 connectivity until said cache is
cleared or invalidated.
Overall this may cause lack of connectivity over most IPv6 tunnels
(comprising geneve and vxlan), if the first egress packet reaches
the tunnel before the DaD is completed for the used ipv6
address.
This patch bumps a new genid after that the IFA_F_TENTATIVE flag
is cleared, so that dst_cache will be invalidated on
next lookup and ipv6 connectivity restored.
Fixes: 0c1d70af924b ("net: use dst_cache for vxlan device")
Fixes: 468dfffcd762 ("geneve: add dst caching support")
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|