Age | Commit message (Collapse) | Author |
|
Pull virtio fixes from Michael Tsirkin:
"Fixes to multiple issues in virtio.
Most notably a regression fix for crashes reported by Fedora users.
Hibernate is still reportedly broken, working on it"
* tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost:
virtio_balloon: prevent uninitialized variable use
virtio-balloon: use actual number of stats for stats queue buffers
virtio_balloon: init 1st buffer in stats vq
virtio_pci: fix out of bound access for msix_names
|
|
Pull KVM fixes from Paolo Bonzini:
"All x86-specific, apart from some arch-independent syzkaller fixes"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
KVM: x86: cleanup the page tracking SRCU instance
KVM: nVMX: fix nested EPT detection
KVM: pci-assign: do not map smm memory slot pages in vt-d page tables
KVM: kvm_io_bus_unregister_dev() should never fail
KVM: VMX: Fix enable VPID conditions
KVM: nVMX: Fix nested VPID vmx exec control
KVM: x86: correct async page present tracepoint
kvm: vmx: Flush TLB when the APIC-access address changes
KVM: x86: use pic/ioapic destructor when destroy vm
KVM: x86: check existance before destroy
KVM: x86: clear bus pointer when destroyed
KVM: Documentation: document MCE ioctls
KVM: nVMX: don't reset kvm mmu twice
PTP: fix ptr_ret.cocci warnings
kvm: fix usage of uninit spinlock in avic_vm_destroy()
KVM: VMX: downgrade warning on unexpected exit code
|
|
The latest gcc-7.0.1 snapshot reports a new warning:
virtio/virtio_balloon.c: In function 'update_balloon_stats':
virtio/virtio_balloon.c:258:26: error: 'events[2]' is used uninitialized in this function [-Werror=uninitialized]
virtio/virtio_balloon.c:260:26: error: 'events[3]' is used uninitialized in this function [-Werror=uninitialized]
virtio/virtio_balloon.c:261:56: error: 'events[18]' is used uninitialized in this function [-Werror=uninitialized]
virtio/virtio_balloon.c:262:56: error: 'events[17]' is used uninitialized in this function [-Werror=uninitialized]
This seems absolutely right, so we should add an extra check to
prevent copying uninitialized stack data into the statistics.
>From all I can tell, this has been broken since the statistics code
was originally added in 2.6.34.
Fixes: 9564e138b1f6 ("virtio: Add memory statistics reporting to the balloon driver (V4)")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Ladi Prosek <lprosek@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
|
|
The virtio balloon driver contained a not-so-obvious invariant that
update_balloon_stats has to update exactly VIRTIO_BALLOON_S_NR counters
in order to send valid stats to the host. This commit fixes it by having
update_balloon_stats return the actual number of counters, and its
callers use it when pushing buffers to the stats virtqueue.
Note that it is still out of spec to change the number of counters
at run-time. "Driver MUST supply the same subset of statistics in all
buffers submitted to the statsq."
Suggested-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Ladi Prosek <lprosek@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
|
|
When init_vqs runs, virtio_balloon.stats is either uninitialized or
contains stale values. The host updates its state with garbage data
because it has no way of knowing that this is just a marker buffer
used for signaling.
This patch updates the stats before pushing the initial buffer.
Alternative fixes:
* Push an empty buffer in init_vqs. Not easily done with the current
virtio implementation and violates the spec "Driver MUST supply the
same subset of statistics in all buffers submitted to the statsq".
* Push a buffer with invalid tags in init_vqs. Violates the same
spec clause, plus "invalid tag" is not really defined.
Note: the spec says:
When using the legacy interface, the device SHOULD ignore all values in
the first buffer in the statsq supplied by the driver after device
initialization. Note: Historically, drivers supplied an uninitialized
buffer in the first buffer.
Unfortunately QEMU does not seem to implement the recommendation
even for the legacy interface.
Cc: stable@vger.kernel.org
Signed-off-by: Ladi Prosek <lprosek@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
|
|
Fedora has received multiple reports of crashes when running
4.11 as a guest
https://bugzilla.redhat.com/show_bug.cgi?id=1430297
https://bugzilla.redhat.com/show_bug.cgi?id=1434462
https://bugzilla.kernel.org/show_bug.cgi?id=194911
https://bugzilla.redhat.com/show_bug.cgi?id=1433899
The crashes are not always consistent but they are generally
some flavor of oops or GPF in virtio related code. Multiple people
have done bisections (Thank you Thorsten Leemhuis and
Richard W.M. Jones) and found this commit to be at fault
07ec51480b5eb1233f8c1b0f5d7a7c8d1247c507 is the first bad commit
commit 07ec51480b5eb1233f8c1b0f5d7a7c8d1247c507
Author: Christoph Hellwig <hch@lst.de>
Date: Sun Feb 5 18:15:19 2017 +0100
virtio_pci: use shared interrupts for virtqueues
The issue seems to be an out of bounds access to the msix_names
array corrupting kernel memory.
Fixes: 07ec51480b5e ("virtio_pci: use shared interrupts for virtqueues")
Reported-by: Laura Abbott <labbott@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Tested-by: Richard W.M. Jones <rjones@redhat.com>
Tested-by: Thorsten Leemhuis <linux@leemhuis.info>
|
|
Fix a filelayout GETDEVICEINFO call hang triggered from the LAYOUTGET
pnfs_layout_process where the GETDEVICEINFO call is waiting for a
session slot, and the LAYOUGET call is waiting for pnfs_layout_process
to complete before freeing the slot GETDEVICEINFO is waiting for..
This occurs in testing against the pynfs pNFS server where the
the on-wire reply highest_slotid and slot id are zero, and the
target high slot id is 8 (negotiated in CREATE_SESSION).
The internal fore channel slot table max_slotid, the maximum allowed
table slotid value, has been reduced via nfs41_set_max_slotid_locked
from 8 to 1. Thus there is one slot (slotid 0) available for use but
it has not been freed by LAYOUTGET proir to the GETDEVICEINFO request.
In order to ensure that layoutrecall callbacks are processed in the
correct order, nfs4_proc_layoutget processing needs to be finished
e.g. pnfs_layout_process) before giving up the slot that identifies
the layoutget (see referring_call_exists).
Move the filelayout_check_layout nfs4_find_get_device call outside of
the pnfs_layout_process call tree.
Signed-off-by: Andy Adamson <andros@netapp.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
|
|
In preparation for moving the filelayout getdeviceinfo call from
filelayout_alloc_lseg called by pnfs_process_layout
Signed-off-by: Andy Adamson <andros@netapp.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
|
|
This includes calling the parsing code that translates from pedit
speak to the HW API, allocation (deallocation) of a modify header
context and setting the modify header id associated with this
context to the FTE of that flow.
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Reviewed-by: Hadar Hen Zion <hadarh@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
|
This includes calling the parsing code that translates from pedit
speak to the HW API, allocation (deallocation) of a modify header
context and setting the modify header id associated with this
context to the FTE of that flow.
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Reviewed-by: Hadar Hen Zion <hadarh@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
|
Parse/translate a set of TC pedit actions to be formed in the HW API format.
User-space provides set of keys where each one of them is made of: command (add or
set), header-type, byte offset within that header along with a 32 bit mask and value.
The mask dictates what bits in the 32 bit word that starts on the offset we should
be dealing with, but under negative polarity (unset bits are to be modified).
We do a 1st pass over the set of keys while using the header-type and offset to
fill the masks and the values into a data-structure containting all the
supported network headers.
We then do a 2nd pass over the set of fields to re-write supported by the HW,
where for each such candidate field, we use the masks filled on the 1st pass to
realize if we should offloading re-write it.
In case offloading is required, we fill a HW descriptor with the following:
(1) the header field to modify
(2) the bit offset within the field from where to modify (set command only)
(3) the value to set/add
(4) the length in bits 1...32 to modify (set command only)
Note that it's possible for a given pedit mask to dictate modifying the
same header field multiple times or to modify multiple header fields.
Currently such combinations are not supported for offloading, hence, for set
commands, the offset within the field is always zero, and the length to modify
is the field size.
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Reviewed-by: Amir Vadai <amir@vadai.me>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
|
HW drivers will use the header-type and command fields from the extended
keys, and some fields (e.g mask, val, offset) from the legacy keys.
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Reviewed-by: Hadar Hen Zion <hadarh@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
|
Implement the low-level commands to support packet header re-write.
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Reviewed-by: Hadar Hen Zion <hadarh@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
|
definitions
Add the definitions related to creation/deletion of a modify header
context and the modify header steering action which are used for HW
packet header modify (re-write) as part of steering. Add as well the
modify header id into two intermediate structs and set it to the FTE.
Note that as the push/pop vlan steering actions are emulated by the
ewitch management code, we're not breaking any compatibility while
changing their values to make room for the modify header action which
is not emulated and whose value is part of the FW API. The new bit
values for the emulated actions are at the end of the possible range.
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Reviewed-by: Hadar Hen Zion <hadarh@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
|
Move the commands related to scheduling elements and vport qos to
a suitable location (according to the MLX5_CMD_OP enum values) in
the command string and internal error helpers.
This patch doesn't change any functionality.
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Reviewed-by: Hadar Hen Zion <hadarh@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
|
There are bunch of places in the code where the intermediate struct
that keeps the elements related to flow actions is initialized with
the same default values. Put that into a small DECLARE type helper.
This patch doesn't change any functionality.
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Reviewed-by: Hadar Hen Zion <hadarh@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
|
The code for adding tc fdb flows leaves things half set when it fails
in the middle. Currently we are not leaking things (e.g eswitch
vlan reference, encap reference and HW resources) since the main
code to add flower rules does a cleanup by calling mlx5e_tc_del_flow().
This cleanup further works just b/c we're checking there if the HW rule
for the flow we are attempting to delete is valid before touching it, and
since under the current possible combinations of supported actions it's okay
to go and blidnly deref or delete all the action related resources (encap, vlan).
Instead, do things properly, namely make sure that if add flow fails we
clean all what was allocated or referenced. Now, the flow delete code can
blindly deref/deallocate both the rule and the actions related resources and
when more action combinations are introduced (such as the upcoming header
re-write) we are fine with clear and robust code.
While here, align all of nic/fdb parse actions/add flow functions to get
mlx5e_tc_flow struct param and pick the attributes or whatever else needed
from there.
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Reviewed-by: Hadar Hen Zion <hadarh@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
|
Add intermediate structure to store attributes parsed from TC filter
matching/actions parts which are soon to be configured into the HW.
Currently put there the flow matching spec after being parsed. More
content to be added in down-stream patch.
This patch doesn't change any functionality.
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
|
Add structure that contains the attributes related to offloaded
NIC flows. Currently it has the actions and flow tag.
While here, do xmas tree cleanup of the TC configure function.
This patch doesn't change any functionality.
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Reviewed-by: Hadar Hen Zion <hadarh@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
|
Add esw_ prefix to the flow attributes attached to offloaded e-switch
TC flows. This is a pre-step to add attributes to offloaded NIC TC flows.
Also, save one pointer space by using gcc's zero size array, this would
be beneficial for environments where 100Ks (or Ms) of flows are offloaded.
This patch doesn't change any functionality.
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Reviewed-by: Hadar Hen Zion <hadarh@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
|
SRCU uses a delayed work item. Skip cleaning it up, and
the result is use-after-free in the work item callbacks.
Reported-by: Dmitry Vyukov <dvyukov@google.com>
Suggested-by: Dmitry Vyukov <dvyukov@google.com>
Cc: stable@vger.kernel.org
Fixes: 0eb05bf290cfe8610d9680b49abef37febd1c38a
Reviewed-by: Xiao Guangrong <xiaoguangrong.eric@gmail.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
On Power8 & Power9 the early CPU inititialisation in __init_HFSCR()
turns on HFSCR[TM] (Hypervisor Facility Status and Control Register
[Transactional Memory]), but that doesn't take into account that TM
might be disabled by CPU features, or disabled by the kernel being built
with CONFIG_PPC_TRANSACTIONAL_MEM=n.
So later in boot, when we have setup the CPU features, clear HSCR[TM] if
the TM CPU feature has been disabled. We use CPU_FTR_TM_COMP to account
for the CONFIG_PPC_TRANSACTIONAL_MEM=n case.
Without this a KVM guest might try use TM, even if told not to, and
cause an oops in the host kernel. Typically the oops is seen in
__kvmppc_vcore_entry() and may or may not be fatal to the host, but is
always bad news.
In practice all shipping CPU revisions do support TM, and all host
kernels we are aware of build with TM support enabled, so no one should
actually be able to hit this in the wild.
Fixes: 2a3563b023e5 ("powerpc: Setup in HFSCR for POWER8")
Cc: stable@vger.kernel.org # v3.10+
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Tested-by: Sam Bobroff <sam.bobroff@au1.ibm.com>
[mpe: Rewrite change log with input from Sam, add Fixes/stable]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|
|
The nested_ept_enabled flag introduced in commit 7ca29de2136 was not
computed correctly. We are interested only in L1's EPT state, not the
the combined L0+L1 value.
In particular, if L0 uses EPT but L1 does not, nested_ept_enabled must
be false to make sure that PDPSTRs are loaded based on CR3 as usual,
because the special case described in 26.3.2.4 Loading Page-Directory-
Pointer-Table Entries does not apply.
Fixes: 7ca29de21362 ("KVM: nVMX: fix CR3 load if L2 uses PAE paging and EPT")
Cc: qemu-stable@nongnu.org
Reported-by: Wanpeng Li <wanpeng.li@hotmail.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Ladi Prosek <lprosek@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
or VM memory are not put thus leaked in kvm_iommu_unmap_memslots() when
destroy VM.
This is consistent with current vfio implementation.
Signed-off-by: herongguang <herongguang.he@huawei.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
Vito Caputo reported that the sched.h split-up series
introduced a duplicate #include <linux/sched/debug.h> line
in drivers/tty/vt/keyboard.c.
Remove it.
Reported-by: Vito Caputo <vcaputo@pengaru.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Jiri Slaby <jslaby@suse.cz>
Cc: Alan Cox <alan@linux.intel.com>
Cc: Greg Kroah-Hartman <gregkh@suse.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
The code introduced by commit 0c8893c9095d ("clockevents: Add a
clkevt-of mechanism like clksrc-of") refer to __clkevt_of_table
what doesn't exist in the vmlinux. As a result kernel build
failed with error: "clkevt-probe.c:63: undefined reference to
`__clkevt_of_table’"
Fixes: 0c8893c9095d ("clockevents: Add a clkevt-of mechanism like clksrc-of")
Signed-off-by: Alexander Kochetkov <al.kochet@gmail.com>
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
|
|
The patch fix syntax errors introduced by commit 0c8893c9095d
("clockevents: Add a clkevt-of mechanism like clksrc-of").
Fixes: 0c8893c9095d ("clockevents: Add a clkevt-of mechanism like clksrc-of")
Signed-off-by: Alexander Kochetkov <al.kochet@gmail.com>
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
|
|
Since:
cd9c57cad3fe ("x86/MCE: Dump MCE to dmesg if no consumers")
all MCEs are printed even when mcelog is running. Fix the regression to
not print to dmesg when mcelog is running as it is a consumer too.
Signed-off-by: Andi Kleen <ak@linux.intel.com>
[ Massage commit message. ]
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tony Luck <tony.luck@intel.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Cc: stable@vger.kernel.org # 4.10..
Fixes: cd9c57cad3fe ("x86/MCE: Dump MCE to dmesg if no consumers")
Link: http://lkml.kernel.org/r/20170327093304.10683-2-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
While reviewing the -stable patch for commit 86ef58a4e35e "nfit,
libnvdimm: fix interleave set cookie calculation" Ben noted:
"This is returning an int, thus it's effectively doing a 32-bit
comparison and not the 64-bit comparison you say is needed."
Update the compare operation to be immune to this integer demotion problem.
Cc: <stable@vger.kernel.org>
Cc: Nicholas Moulin <nicholas.w.moulin@linux.intel.com>
Fixes: 86ef58a4e35e ("nfit, libnvdimm: fix interleave set cookie calculation")
Reported-by: Ben Hutchings <ben@decadent.org.uk>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
|
|
Avoid un-intended DCTI Couples. Use of DCTI couples is deprecated.
Also address the "Programming Note" for optimal performance.
Here is the complete text from Oracle SPARC Architecture Specs.
6.3.4.7 DCTI Couples
"A delayed control transfer instruction (DCTI) in the delay slot of
another DCTI is referred to as a “DCTI couple”. The use of DCTI couples
is deprecated in the Oracle SPARC Architecture; no new software should
place a DCTI in the delay slot of another DCTI, because on future Oracle
SPARC Architecture implementations DCTI couples may execute either
slowly or differently than the programmer assumes it will.
SPARC V8 and SPARC V9 Compatibility Note
The SPARC V8 architecture left behavior undefined for a DCTI couple. The
SPARC V9 architecture defined behavior in that case, but as of
UltraSPARC Architecture 2005, use of DCTI couples was deprecated.
Software should not expect high performance from DCTI couples, and
performance of DCTI couples should be expected to decline further in
future processors.
Programming Note
As noted in TABLE 6-5 on page 115, an annulled branch-always
(branch-always with a = 1) instruction is not architecturally a DCTI.
However, since not all implementations make that distinction, for
optimal performance, a DCTI should not be placed in the instruction word
immediately following an annulled branch-always instruction (BA,A or
BPA,A)."
Signed-off-by: Babu Moger <babu.moger@oracle.com>
Reviewed-by: Rob Gardner <rob.gardner@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
I encountered this bug when using /proc/kcore to examine the kernel. Plus a
coworker inquired about debugging tools. We computed pa but did
not use it during the maximum physical address bits test. Instead we used
the identity mapped virtual address which will always fail this test.
I believe the defect came in here:
[bpicco@zareason linus.git]$ git describe --contains bb4e6e85daa52
v3.18-rc1~87^2~4
.
Signed-off-by: Bob Picco <bob.picco@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Signed-off-by: Nitin Gupta <nitin.m.gupta@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Signed-off-by: Nitin Gupta <nitin.m.gupta@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux
Saeed Mahameed says:
====================
mlx5e-failsafe 27-03-2017
This series provides a fail-safe mechanism to allow safely re-configuring
mlx5e netdevice and provides a resiliency against sporadic
configuration failures.
To enable this we do some refactoring and code reorganizing to allow
breaking the drivers open/close flows to stages:
open -> activate -> deactivate -> close.
In addition we need to allow creating fresh HW ring resources
(mlx5e_channels) with their own "new" set of parameters, while keeping
the current ones running and active until the new channels are
successfully created with the new configuration, and only then we can
safly replace (switch) old channels with new ones.
For that we introduce mlx5e_channels object and an API to manage it:
- channels = open_channels(new_params):
open fresh TX/RX channels
- activate_channels(channels):
redirect traffic to them and attach them to the netdev
- deactivate_channes(channels)
stop traffic and detach from netdev
- close(channels)
Free the TX/RX HW resources of those channels
With the above strategy it is straightforward to achieve the desired
behavior of fail-safe configuration. In pseudo code:
make_new_config(new_params)
{
old_channels = current_active_channels;
new_channels = create_channels(new_params);
if (!new_channels)
return "Failed, but current channels are still active :)"
deactivate_channels(old_channels); /* Can't fail */
set_hw_new_state(); /* If needed */
activate_channels(new_channels); /* Can't fail */
close_channels(old_channels);
current_active_channels = new_channels;
return "SUCCESS";
}
At the top of this series, we change the following flows to be fail-safe:
ethtool:
- ring parameters
- coalesce parameters
- tx copy break parameters
- cqe compressing/moderation mode setting (priv flags)
ndos:
- tc setup
- set features: LRO
- change mtu
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Mahesh Bandewar says:
====================
link-status fixes for mii-monitoring
The mii monitoring is divided into two phases - inspect and commit. The
inspect phase technically should not make any changes to the state and
defer it to the commit phase. However detected link state inconsistencies
on several machines and discovered that it's the result of some
inconsistent update to link states and assumption that you *always* get
rtnl-mutex. In reality when trylock() fails to acquire rtnl-mutex, the
commit phase is postponed until next mii-mon run. At the next round
because of the state change performed in the previous inspect-run, this
round does not detect any changes and would skip calling commit phase.
This would result in an inconsistent state until next link event happens
(if it ever happens).
During the the commit phase, it's always assumed that speed and duplex
fetch is always successful, but that's always not the case. However the
slave state is marked UP irrespective of speed / duplex fetch operation.
If the speed / duplex fetch operation results in insane values for either
of these two fields, then keeping internal link state UP is not going to
provide fruitful results either.
Please see into individual patches for more details.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Signed-off-by: Mahesh Bandewar <maheshb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
bond_miimon_commit() marks the link UP after attempting to get the speed
and duplex settings for the link. There is a possibility that
bond_update_speed_duplex() could fail. This is another place where it
could result into an inconsistent bonding link state.
With this patch the link will be marked UP only if the speed and duplex
values retrieved have sane values and processed further.
Signed-off-by: Mahesh Bandewar <maheshb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
bond_update_speed_duplex() retrieves speed and duplex settings. There
is a possibility of failure in retrieving these values but caller has
to assume it's always successful. This leads to having inconsistent
slave link settings. If these (speed, duplex) values cannot be
retrieved, then keeping the link UP causes problems.
The updated bond_update_speed_duplex() returns 0 on success if it
retrieves sane values for speed and duplex. On failure it returns 1
and marks the link down.
Signed-off-by: Mahesh Bandewar <maheshb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The primary issue is that mii-inspect phase updates link-state and
expects changes to be committed during the mii-commit phase. After
the inspect phase if it fails to acquire rtnl-mutex, the commit
phase (bond_mii_commit) doesn't get to run. This partially updated
state stays and makes the internal-state inconsistent.
e.g. setup bond0 => slaves: eth1, eth2
eth1 goes DOWN -> UP
mii_monitor()
mii-inspect()
bond_set_slave_link_state(eth1, UP, DontNotify)
rtnl_trylock() <- fails!
Next mii-monitor round
eth1: No change
mii_monitor()
mii-inspect()
eth1->link == current-status (ethtool_ops->get_link)
no-change-detected
End result:
eth1:
Link = BOND_LINK_UP
Speed = 0xfffff [SpeedUnknown]
Duplex = 0xff [DuplexUnknown]
This doesn't always happen but for some unlucky machines in a large set
of machines it creates problems.
The fix for this is to avoid making changes during inspect phase and
postpone them until acquiring the rtnl-mutex / invoking commit phase.
Signed-off-by: Mahesh Bandewar <maheshb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Split the function into two (a) propose (b) commit phase without
changing the semantics for the original API.
Signed-off-by: Mahesh Bandewar <maheshb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Our chosen ic_dev may be anywhere in our list of ic_devs, and we may
free it before attempting to close others. When we compare d->dev and
ic_dev->dev, we're potentially dereferencing memory returned to the
allocator. This causes KASAN to scream for each subsequent ic_dev we
check.
As there's a 1-1 mapping between ic_devs and netdevs, we can instead
compare d and ic_dev directly, which implicitly handles the !ic_dev
case, and avoids the use-after-free. The ic_dev pointer may be stale,
but we will not dereference it.
Original splat:
[ 6.487446] ==================================================================
[ 6.494693] BUG: KASAN: use-after-free in ic_close_devs+0xc4/0x154 at addr ffff800367efa708
[ 6.503013] Read of size 8 by task swapper/0/1
[ 6.507452] CPU: 5 PID: 1 Comm: swapper/0 Not tainted 4.11.0-rc3-00002-gda42158 #8
[ 6.514993] Hardware name: AppliedMicro Mustang/Mustang, BIOS 3.05.05-beta_rc Jan 27 2016
[ 6.523138] Call trace:
[ 6.525590] [<ffff200008094778>] dump_backtrace+0x0/0x570
[ 6.530976] [<ffff200008094d08>] show_stack+0x20/0x30
[ 6.536017] [<ffff200008bee928>] dump_stack+0x120/0x188
[ 6.541231] [<ffff20000856d5e4>] kasan_object_err+0x24/0xa0
[ 6.546790] [<ffff20000856d924>] kasan_report_error+0x244/0x738
[ 6.552695] [<ffff20000856dfec>] __asan_report_load8_noabort+0x54/0x80
[ 6.559204] [<ffff20000aae86ac>] ic_close_devs+0xc4/0x154
[ 6.564590] [<ffff20000aaedbac>] ip_auto_config+0x2ed4/0x2f1c
[ 6.570321] [<ffff200008084b04>] do_one_initcall+0xcc/0x370
[ 6.575882] [<ffff20000aa31de8>] kernel_init_freeable+0x5f8/0x6c4
[ 6.581959] [<ffff20000a16df00>] kernel_init+0x18/0x190
[ 6.587171] [<ffff200008084710>] ret_from_fork+0x10/0x40
[ 6.592468] Object at ffff800367efa700, in cache kmalloc-128 size: 128
[ 6.598969] Allocated:
[ 6.601324] PID = 1
[ 6.603427] save_stack_trace_tsk+0x0/0x418
[ 6.607603] save_stack_trace+0x20/0x30
[ 6.611430] kasan_kmalloc+0xd8/0x188
[ 6.615087] ip_auto_config+0x8c4/0x2f1c
[ 6.619002] do_one_initcall+0xcc/0x370
[ 6.622832] kernel_init_freeable+0x5f8/0x6c4
[ 6.627178] kernel_init+0x18/0x190
[ 6.630660] ret_from_fork+0x10/0x40
[ 6.634223] Freed:
[ 6.636233] PID = 1
[ 6.638334] save_stack_trace_tsk+0x0/0x418
[ 6.642510] save_stack_trace+0x20/0x30
[ 6.646337] kasan_slab_free+0x88/0x178
[ 6.650167] kfree+0xb8/0x478
[ 6.653131] ic_close_devs+0x130/0x154
[ 6.656875] ip_auto_config+0x2ed4/0x2f1c
[ 6.660875] do_one_initcall+0xcc/0x370
[ 6.664705] kernel_init_freeable+0x5f8/0x6c4
[ 6.669051] kernel_init+0x18/0x190
[ 6.672534] ret_from_fork+0x10/0x40
[ 6.676098] Memory state around the buggy address:
[ 6.680880] ffff800367efa600: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 6.688078] ffff800367efa680: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 6.695276] >ffff800367efa700: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 6.702469] ^
[ 6.705952] ffff800367efa780: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 6.713149] ffff800367efa800: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 6.720343] ==================================================================
[ 6.727536] Disabling lock debugging due to kernel taint
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru>
Cc: David S. Miller <davem@davemloft.net>
Cc: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org>
Cc: James Morris <jmorris@namei.org>
Cc: Patrick McHardy <kaber@trash.net>
Cc: netdev@vger.kernel.org
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
There are same conditions for checking whether supporting clkscaling or
not. When ufshcd is supporting clkscaling, active_reqs should be
decreased by one.
[mkp: addressed comment from Bartlomiej]
Signed-off-by: Jaehoon Chung <jh80.chung@samsung.com>
Reviewed-by: Subhash Jadavani <subhashj@codeaurora.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/next-queue
Jeff Kirsher says:
====================
40GbE Intel Wired LAN Driver Updates 2017-03-27
This series contains updates to i40e and i40evf only.
Alex updates the driver code so that we can do bulk updates of the page
reference count instead of just incrementing it by one reference at a
time. Fixed an issue where we were not resetting skb back to NULL when
we have freed it. Cleaned up the i40e_process_skb_fields() to align with
other Intel drivers. Removed FCoE code, since it is not supported in any
of the Fortville/Fortpark hardware, so there is not much point of carrying
the code around, especially if it is broken and untested.
Harshitha fixes a bug in the driver where the calculation of the RSS size
was not taking into account the number of traffic classes enabled.
Robert fixes a potential race condition during VF reset by eliminating
IOMMU DMAR Faults caused by VF hardware and when the OS initiates a VF
reset and before the reset is finished we modify the VF's settings.
Bimmy removes a delay that is no longer needed, since it was only needed
for preproduction hardware.
Colin King fixes null pointer dereference, where VSI was being
dereferenced before the VSI NULL check.
Jake fixes an issue with the recent addition of the "client code" to the
driver, where we attempt to use an uninitialized variable, so correctly
initialize the params variable by calling i40e_client_get_params().
v2: dropped patch 5 of the original series from Carolyn since we need
more documentation and reason why the added delay, so Carolyn is
taking the time to update the patch before we re-submit it for
kernel inclusion.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Andrew has been contributing a lot to PHYLIB over the past months and
his feedback on patches is more than welcome.
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Acked-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Probably due to some mis-merging fix a bug associated with commits
d7ce6422d6e6 ("i40e: don't check params until after checking for client
instance", 2017-02-09) and 3140aa9a78c9 ("i40e: KISS the client
interface", 2017-03-14)
The first commit tried to move the initialization of the params
structure so that we didn't bother doing this if we didn't have a client
interface. You can already see that it looks fishy because of the
indentation. The second commit refactors a bunch of the interface, and
incorrectly drops the params initialization.
I believe what occurred is that internally the two patches were
re-ordered, and the merge conflicts as a result were performed
incorrectly.
Fix the use of an uninitialized variable by correctly initializing the
params variable via i40e_client_get_params().
Reported-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
|
|
VSI is being dereferenced before the VSI null check; if VSI is
null we end up with a null pointer dereference. Fix this by
performing VSI deference after the VSI null check. Also remove
the need for using adapter by using vsi->back->cinst.
Detected by CoverityScan, CID#1419696, CID#1419697
("Dereference before null check")
Fixes: ed0e894de7c133 ("i40evf: add client interface")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
|
|
Since FCoE isn't supported by the i40e products there isn't much point in
carrying around code that will always evaluate to false. This patch goes
through and strips out the code in several spots so that we don't go around
carrying variables and/or code that is always going to evaluate to false or
0.
Change-ID: I39d1d779c66c638b75525839db2b6208fdc809d7
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
|
|
Looking over the code for FCoE it looks like the Rx path has been broken at
least since the last major Rx refactor almost a year ago. It seems like
FCoE isn't supported for any of the Fortville/Fortpark hardware so there
isn't much point in carrying the code around, especially if it is broken
and untested.
Change-ID: I892de8fa551cb129ce2361e738ff82ce55fa229e
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
|
|
This is a minor clean-up to make the i40e/i40evf process_skb_fields
function look a little more like what we have in igb. The Rx checksum
function called out a need for skb->protocol but I can't see where it
actually needs it. I am assuming this is something that was likely
refactored out some time ago as the Rx checksum code has gone through a few
rewrites.
Change-ID: I0b4668a34d90b61b66ded7c7c26e19a3e2d06251
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
|
|
Removed no longer needed delays. At preproduction stage those delays were
needed but now these delays are not needed.
Signed-off-by: Bimmy Pujari <bimmy.pujari@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
|