Age | Commit message (Collapse) | Author |
|
There's an off-by-one issue in nft_payload_fast_eval, skb_tail_pointer
and ptr + priv->len all point to the last valid address plus 1. So if
they are equal, we can still fetch the valid data. It's unnecessary to
fall back to nft_payload_eval.
Signed-off-by: Liping Zhang <liping.zhang@spreadtrum.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
After commit ac2863445686 ("netfilter: bridge: add nf_afinfo to enable
queuing to userspace"), we can queue packets to the user space in bridge
family. But when the user specify the queue range, packets will be only
delivered to the first queue num. Because in nfqueue_hash, we only support
ipv4 and ipv6 family. Now add support for bridge family too.
Suggested-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Liping Zhang <liping.zhang@spreadtrum.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
Currently, the user can specify the queue numbers by _QUEUE_NUM and
_QUEUE_TOTAL attributes, this is enough in most situations.
But acctually, it is not very flexible, for example:
tcp dport 80 mapped to queue0
tcp dport 81 mapped to queue1
tcp dport 82 mapped to queue2
In order to do this thing, we must add 3 nft rules, and more
mapping meant more rules ...
So take one register to select the queue number, then we can add one
simple rule to mapping queues, maybe like this:
queue num tcp dport map { 80:0, 81:1, 82:2 ... }
Florian Westphal also proposed wider usage scenarios:
queue num jhash ip saddr . ip daddr mod ...
queue num meta cpu ...
queue num meta mark ...
The last point is how to load a queue number from sreg, although we can
use *(u16*)®s->data[reg] to load the queue number, just like nat expr
to load its l4port do.
But we will cooperate with hash expr, meta cpu, meta mark expr and so on.
They all store the result to u32 type, so cast it to u16 pointer and
dereference it will generate wrong result in the big endian system.
So just keep it simple, we treat queue number as u32 type, although u16
type is already enough.
Suggested-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Liping Zhang <liping.zhang@spreadtrum.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
Fetch value and validate u32 netlink attribute. This validation is
usually required when the u32 netlink attributes are being stored in a
field whose size is smaller.
This patch revisits 4da449ae1df9 ("netfilter: nft_exthdr: Add size check
on u8 nft_exthdr attributes").
Fixes: 96518518cc41 ("netfilter: add nftables")
Suggested-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Laura Garcia Liebana <nevola@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
Enabling SRIOV while the ixgbevf driver is loaded will result in all
mailbox requests from ixgbevf_open() being rejected by ixgbe because
adapter->clear_to_send is set to false on reset.
Call ixgbe_sriov_reinit() before pci_enable_sriov() to make sure that
mailbox requests are handled from the time ixgbevf is loaded.
Reported-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Emil Tantilov <emil.s.tantilov@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
|
|
Instead of limiting the VFs if we don't use 4 queues for RSS in the PF we
can instead just limit the RSS queues used to a power of 2. By doing this
we can support use cases where VFs are using more queues than the PF is
currently using and can support RSS if so desired.
The only limitation on this is that we cannot support 3 queues of RSS in
the PF or VF. In either of these cases we should fall back to 2 queues in
order to be able to use the power of 2 masking provided by the psrtype
register.
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
|
|
The hardware redirection table can support more queues then the PF
currently has when SR-IOV is enabled. In order to account for this use the
RSS mask to trim of the bits that are not used.
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
|
|
The maximum queue count reported was 1, however support for multiple queues
with SR-IOV was added some time ago so we should report support for it to
the user so that they can select multiple queues if they so desire.
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
|
|
The value MDIO_PRTAD_NONE should be used to indicate no PHY address.
Not 0, not 0xFFFF. Use the MDIO_PRTAD_NONE value consistently.
Signed-off-by: Mark Rustad <mark.d.rustad@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
|
|
Protect set_rlpml with mailbox lock to make sure the MTU configuration
is handled properly.
This change resolves an issue where set_rlpml can fail when the VF
interface is brought up:
ixgbevf 0000:03:1d.6: Failed to set MTU at 1500
Signed-off-by: Emil Tantilov <emil.s.tantilov@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
|
|
All the MACs supported by ixgbe support pause frames, so indicate
that support in ethtool. Also set advertising according to requested
mode.
Signed-off-by: Mark Rustad <mark.d.rustad@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
|
|
Set the read_reg_mdi and write_reg_mdi method pointers for
X550EM_A_10G_T devices to resolve jumping to NULL.
Signed-off-by: Mark Rustad <mark.d.rustad@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
|
|
These functions are only used in ixgbe_x550.c.
Fixes a warning when compiling with -Wmissing-prototypes
Reported-by: Krishneil Singh <krishneil.k.singh@intel.com>
Signed-off-by: Emil Tantilov <emil.s.tantilov@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
|
|
Simplify the logic for setting VLNCTRL.VFE by checking the VMDQ flag
and 82598 MAC instead of having to maintain a list of MAC types.
Signed-off-by: Emil Tantilov <emil.s.tantilov@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
|
|
The DragonFly quirk added in 42e3121d90f4 ("ALSA: usb-audio: Add a more
accurate volume quirk for AudioQuest DragonFly") applies a custom dB map
on the volume control when its range is reported as 0..50 (0 .. 0.2dB).
However, there exists at least one other variant (hw v1.0c, as opposed
to the tested v1.2) which reports a different non-sensical volume range
(0..53) and the custom map is therefore not applied for that device.
This results in all of the volume change appearing close to 100% on
mixer UIs that utilize the dB TLV information.
Add a fallback case where no dB TLV is reported at all if the control
range is not 0..50 but still 0..N where N <= 1000 (3.9 dB). Also
restrict the quirk to only apply to the volume control as there is also
a mute control which would match the check otherwise.
Fixes: 42e3121d90f4 ("ALSA: usb-audio: Add a more accurate volume quirk for AudioQuest DragonFly")
Signed-off-by: Anssi Hannula <anssi.hannula@iki.fi>
Reported-by: David W <regulars@d-dub.org.uk>
Tested-by: David W <regulars@d-dub.org.uk>
Cc: <stable@vger.kernel.org>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
|
|
The i40e_shutdown_adminq function never returns failure. There is no need to
check the non-0 return value. Clean up the unnecessary error checking and
warning against it.
Change-ID: Ibb616f09cfb93bd1a872ebf3241a15fb8354b31b
Signed-off-by: Lihong Yang <lihong.yang@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
|
|
The i40e driver was incorrectly assuming that we would always be pulling
no more than 1 descriptor from each fragment. It is in fact possible for
us to end up with the case where 2 descriptors worth of data may be pulled
when a frame is larger than one of the pieces generated when aligning the
payload to either 4K or pieces smaller than 16K.
To adjust for this we just need to make certain to test all the way to the
end of the fragments as it is possible for us to span 2 descriptors in the
block before us so we need to guarantee that even the last 6 descriptors
have enough data to fill a full frame.
Change-ID: Ic2ecb4d6b745f447d334e66c14002152f50e2f99
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
|
|
Function i40evf_up_complete() always returns success. Changed this to a
void type and removed the code that checks the return status and prints
an error message.
Change-ID: I8c400f174786b9c855f679e470f35af292fb50ad
Signed-off-by: Bimmy Pujari <bimmy.pujari@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
|
|
Currently disabling the link state from PF via
ip link set enp5s0f0 vf 0 state disable
doesn't disable the CARRIER on the VF.
This patch updates the carrier and starts/stops the tx queues based on the
link state notification from PF.
PF: enp5s0f0, VF: enp5s2
#modprobe i40e
#echo 2 > /sys/class/net/enp5s0f0/device/sriov_numvfs
#ip link set enp5s2 up
#ip -d link show enp5s2
175: enp5s2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP mode DEFAULT group default qlen 1000
link/ether ea:4d:60:bc:6f:85 brd ff:ff:ff:ff:ff:ff promiscuity 0 addrgenmode eui64
#ip link set enp5s0f0 vf 0 state disable
#ip -d link show enp5s0f0
171: enp5s0f0: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000
link/ether 68:05:ca:2e:72:68 brd ff:ff:ff:ff:ff:ff promiscuity 0 addrgenmode eui64 numtxqueues 72 numrxqueues 72 portid 6805ca2e7268
vf 0 MAC 00:00:00:00:00:00, spoof checking on, link-state disable, trust off
vf 1 MAC 00:00:00:00:00:00, spoof checking on, link-state auto, trust off
#ip -d link show enp5s2
175: enp5s2: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc mq state DOWN mode DEFAULT group default qlen 1000
link/ether ea:4d:60:bc:6f:85 brd ff:ff:ff:ff:ff:ff promiscuity 0 addrgenmode eui64 numtxqueues 16 numrxqueues 16
Signed-off-by: Sridhar Samudrala <sridhar.samudrala@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
|
|
There is a sanitcy check for desc being null in the first line of
function i40evf_debug_aq. However, before that, aq_desc is cast from
desc, and aq_desc is being dereferenced on the assignment of len, so
this could be a potential null pointer deference. Fix this by moving
the initialization of len to the code block where len is being used
and hence at this point we know it is OK to dereference aq_desc.
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
|
|
This patch fixes an issue where we were byte swapping the port
parameter, then byte swapping it again in function execution.
Obviously, that's unnecessary, so take it out of the function calls.
Without this patch, the udp based tunnel configuration would
not be correct.
Change-ID: I788d83c5bd5732170f1a81dbfa0b1ac3ca8ea5b7
Signed-off-by: Carolyn Wyborny <carolyn.wyborny@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
|
|
This patch fixes an issue in the virt channel code, where a return
from i40e_find_vsi_from_id was not checked for NULL when applicable.
Without this patch, there is a risk for panic and static analysis
tools complain. This patch fixes the problem by adding the check
and adding an additional input check for similar reasons.
Change-ID: I7e9be88eb7a3addb50eadc451c8336d9e06f5394
Signed-off-by: Carolyn Wyborny <carolyn.wyborny@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
|
|
This conditional is backward, so the driver responds back to the VF with
the wrong opcode. Do the old switcheroo to fix this.
Change-ID: I384035b0fef8a3881c176de4b4672009b3400b25
Signed-off-by: Mitch Williams <mitch.a.williams@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
|
|
When using the debugfs to issue the "dump port" command
with NPAR enabled, the firmware reports back with invalid argument.
The issue occurs because the pf->mac_seid was used to perform the query.
This is fine when NPAR is disabled because the switch ID == pf->mac_seid,
however this is not the case when NPAR is enabled. This fix instead
goes through the VSI to determine the correct ID to use in either case.
Change-ID: I0cd67913a7f2c4a2962e06d39e32e7447cc55b6a
Signed-off-by: Alan Brady <alan.brady@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
|
|
Previously, when using ethtool to change the RSS hash key, ethtool would
report back saying the old key was still being used and no error was
reported. It was unclear whether it was being reported incorrectly or
being set incorrectly. Debugging revealed 'i40e_set_rxfh()' returned
zero immediately instead of setting the key because a user defined
indirection table is not supplied when changing the hash key.
This fix instead changes it such that if an indirection table is not
supplied, then a default one is created and the hash key is now
correctly set.
Change-ID: Iddb621897ecf208650272b7ee46702cad7b69a71
Signed-off-by: Alan Brady <alan.brady@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
|
|
Fix the indefinitiley -> indefinitely typo in Kconfig.debug.
Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20160922205513.17821-1-vivien.didelot@savoirfairelinux.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
objtool reports the following new warning:
kernel/exit.o: warning: objtool: do_exit() falls through to next function complete_and_exit()
The warning is caused by do_exit()'s new call to do_task_dead(), which
is a new "noreturn" function which objtool doesn't know about yet,
introduced by:
9af6528ee9b6 ("sched/core: Optimize __schedule()")
( objtool has to know all the global noreturn functions so it can follow
the control flow of any functions which call them. Unfortunately they
need to be hard-coded because there's no automated way to detect them. )
Reported-by: kbuild test robot <fengguang.wu@intel.com>
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: kbuild-all@01.org
Cc: tipbuild@zytor.com
Link: http://lkml.kernel.org/r/20160922212125.zbuewckqll4yur25@treble
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux into perf/core
Pull perf/core improvements from Arnaldo Carvalho de Melo:
New features:
- Add support for interacting with Coresight PMU ETMs/PTMs, that are IP blocks
to perform hardware assisted tracing on a ARM CPU core (Mathieu Poirier)
Infrastructure changes:
- Histogram prep work for the upcoming c2c tool (Jiri Olsa)
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
Otherwise, nvme_rdma_stop_and_clear_queue() will incorrectly
try to stop/free rdma qps/cm_ids that are already freed.
Fixes: e89ca58f9c90 ("nvme-rdma: add DELETING queue flag")
Reported-by: Steve Wise <swise@opengridcomputing.com>
Tested-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@fb.com>
|
|
Move mntget from the very beginning of __ns_get_path to
the success path of __ns_get_path, and remove the mntget
calls.
This removes the possibility that there will be a mntget/mntput
pair of __ns_get_path has to retry, and generally simplifies the code.
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
|
|
From: Andrey Vagin <avagin@openvz.org>
Each namespace has an owning user namespace and now there is not way
to discover these relationships.
Pid and user namepaces are hierarchical. There is no way to discover
parent-child relationships too.
Why we may want to know relationships between namespaces?
One use would be visualization, in order to understand the running
system. Another would be to answer the question: what capability does
process X have to perform operations on a resource governed by namespace
Y?
One more use-case (which usually called abnormal) is checkpoint/restart.
In CRIU we are going to dump and restore nested namespaces.
There [1] was a discussion about which interface to choose to determing
relationships between namespaces.
Eric suggested to add two ioctl-s [2]:
> Grumble, Grumble. I think this may actually a case for creating ioctls
> for these two cases. Now that random nsfs file descriptors are bind
> mountable the original reason for using proc files is not as pressing.
>
> One ioctl for the user namespace that owns a file descriptor.
> One ioctl for the parent namespace of a namespace file descriptor.
Here is an implementaions of these ioctl-s.
$ man man7/namespaces.7
...
Since Linux 4.X, the following ioctl(2) calls are supported for
namespace file descriptors. The correct syntax is:
fd = ioctl(ns_fd, ioctl_type);
where ioctl_type is one of the following:
NS_GET_USERNS
Returns a file descriptor that refers to an owning user names‐
pace.
NS_GET_PARENT
Returns a file descriptor that refers to a parent namespace.
This ioctl(2) can be used for pid and user namespaces. For
user namespaces, NS_GET_PARENT and NS_GET_USERNS have the same
meaning.
In addition to generic ioctl(2) errors, the following specific ones
can occur:
EINVAL NS_GET_PARENT was called for a nonhierarchical namespace.
EPERM The requested namespace is outside of the current namespace
scope.
[1] https://lkml.org/lkml/2016/7/6/158
[2] https://lkml.org/lkml/2016/7/9/101
Changes for v2:
* don't return ENOENT for init_user_ns and init_pid_ns. There is nothing
outside of the init namespace, so we can return EPERM in this case too.
> The fewer special cases the easier the code is to get
> correct, and the easier it is to read. // Eric
Changes for v3:
* rename ns->get_owner() to ns->owner(). get_* usually means that it
grabs a reference.
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: James Bottomley <James.Bottomley@HansenPartnership.com>
Cc: "Michael Kerrisk (man-pages)" <mtk.manpages@gmail.com>
Cc: "W. Trevor King" <wking@tremily.us>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Serge Hallyn <serge.hallyn@canonical.com>
|
|
There are two new ioctl-s:
One ioctl for the user namespace that owns a file descriptor.
One ioctl for the parent namespace of a namespace file descriptor.
The test checks that these ioctl-s works and that they handle a case
when a target namespace is outside of the current process namespace.
Signed-off-by: Andrei Vagin <avagin@openvz.org>
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
|
|
Pid and user namepaces are hierarchical. There is no way to discover
parent-child relationships.
In a future we will use this interface to dump and restore nested
namespaces.
Acked-by: Serge Hallyn <serge@hallyn.com>
Signed-off-by: Andrei Vagin <avagin@openvz.org>
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
|
|
Each namespace has an owning user namespace and now there is not way
to discover these relationships.
Understending namespaces relationships allows to answer the question:
what capability does process X have to perform operations on a resource
governed by namespace Y?
After a long discussion, Eric W. Biederman proposed to use ioctl-s for
this purpose.
The NS_GET_USERNS ioctl returns a file descriptor to an owning user
namespace.
It returns EPERM if a target namespace is outside of a current user
namespace.
v2: rename parent to relative
v3: Add a missing mntput when returning -EAGAIN --EWB
Acked-by: Serge Hallyn <serge@hallyn.com>
Link: https://lkml.org/lkml/2016/7/6/158
Signed-off-by: Andrei Vagin <avagin@openvz.org>
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
|
|
Return -EPERM if an owning user namespace is outside of a process
current user namespace.
v2: In a first version ns_get_owner returned ENOENT for init_user_ns.
This special cases was removed from this version. There is nothing
outside of init_user_ns, so we can return EPERM.
v3: rename ns->get_owner() to ns->owner(). get_* usually means that it
grabs a reference.
Acked-by: Serge Hallyn <serge@hallyn.com>
Signed-off-by: Andrei Vagin <avagin@openvz.org>
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
|
|
Run kvm-unit-tests/eventinj.flat in L1:
Sending NMI to self
After NMI to self
FAIL: NMI
This test scenario is to test whether VMM can handle NMI IDT-vectoring info correctly.
At the beginning, L2 writes LAPIC to send a self NMI, the EPT page tables on both L1
and L0 are empty so:
- The L2 accesses memory can generate EPT violation which can be intercepted by L0.
The EPT violation vmexit occurred during delivery of this NMI, and the NMI info is
recorded in vmcs02's IDT-vectoring info.
- L0 walks L1's EPT12 and L0 sees the mapping is invalid, it injects the EPT violation into L1.
The vmcs02's IDT-vectoring info is reflected to vmcs12's IDT-vectoring info since
it is a nested vmexit.
- L1 receives the EPT violation, then fixes its EPT12.
- L1 executes VMRESUME to resume L2 which generates vmexit and causes L1 exits to L0.
- L0 emulates VMRESUME which is called from L1, then return to L2.
L0 merges the requirement of vmcs12's IDT-vectoring info and injects it to L2 through
vmcs02.
- The L2 re-executes the fault instruction and cause EPT violation again.
- Since the L1's EPT12 is valid, L0 can fix its EPT02
- L0 resume L2
The EPT violation vmexit occurred during delivery of this NMI again, and the NMI info
is recorded in vmcs02's IDT-vectoring info. L0 should inject the NMI through vmentry
event injection since it is caused by EPT02's EPT violation.
However, vmx_inject_nmi() refuses to inject NMI from IDT-vectoring info if vCPU is in
guest mode, this patch fix it by permitting to inject NMI from IDT-vectoring if it is
the L0's responsibility to inject NMI from IDT-vectoring info to L2.
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Cc: Jan Kiszka <jan.kiszka@siemens.com>
Cc: Bandan Das <bsd@redhat.com>
Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
|
|
I observed that kvmvapic(to optimize flexpriority=N or AMD) is used
to boost TPR access when testing kvm-unit-test/eventinj.flat tpr case
on my haswell desktop (w/ flexpriority, w/o APICv). Commit (8d14695f9542
x86, apicv: add virtual x2apic support) disable virtual x2apic mode
completely if w/o APICv, and the author also told me that windows guest
can't enter into x2apic mode when he developed the APICv feature several
years ago. However, it is not truth currently, Interrupt Remapping and
vIOMMU is added to qemu and the developers from Intel test windows 8 can
work in x2apic mode w/ Interrupt Remapping enabled recently.
This patch enables TPR shadow for virtual x2apic mode to boost
windows guest in x2apic mode even if w/o APICv.
Can pass the kvm-unit-test.
Suggested-by: Radim Krčmář <rkrcmar@redhat.com>
Suggested-by: Wincy Van <fanwenyi0529@gmail.com>
Reviewed-by: Radim Krčmář <rkrcmar@redhat.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Cc: Wincy Van <fanwenyi0529@gmail.com>
Cc: Yang Zhang <yang.zhang.wz@gmail.com>
Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
|
|
WARNING: CPU: 1 PID: 4230 at kernel/sched/core.c:7564 __might_sleep+0x7e/0x80
do not call blocking ops when !TASK_RUNNING; state=1 set at [<ffffffff8d0de7f9>] prepare_to_swait+0x39/0xa0
CPU: 1 PID: 4230 Comm: qemu-system-x86 Not tainted 4.8.0-rc5+ #47
Call Trace:
dump_stack+0x99/0xd0
__warn+0xd1/0xf0
warn_slowpath_fmt+0x4f/0x60
? prepare_to_swait+0x39/0xa0
? prepare_to_swait+0x39/0xa0
__might_sleep+0x7e/0x80
__gfn_to_pfn_memslot+0x156/0x480 [kvm]
gfn_to_pfn+0x2a/0x30 [kvm]
gfn_to_page+0xe/0x20 [kvm]
kvm_vcpu_reload_apic_access_page+0x32/0xa0 [kvm]
nested_vmx_vmexit+0x765/0xca0 [kvm_intel]
? _raw_spin_unlock_irqrestore+0x36/0x80
vmx_check_nested_events+0x49/0x1f0 [kvm_intel]
kvm_arch_vcpu_runnable+0x2d/0xe0 [kvm]
kvm_vcpu_check_block+0x12/0x60 [kvm]
kvm_vcpu_block+0x94/0x4c0 [kvm]
kvm_arch_vcpu_ioctl_run+0x619/0x1aa0 [kvm]
? kvm_arch_vcpu_ioctl_run+0xdf1/0x1aa0 [kvm]
kvm_vcpu_ioctl+0x2d3/0x7c0 [kvm]
===============================
[ INFO: suspicious RCU usage. ]
4.8.0-rc5+ #47 Not tainted
-------------------------------
./include/linux/kvm_host.h:535 suspicious rcu_dereference_check() usage!
other info that might help us debug this:
rcu_scheduler_active = 1, debug_locks = 0
1 lock held by qemu-system-x86/4230:
#0: (&vcpu->mutex){+.+.+.}, at: [<ffffffffc062975c>] vcpu_load+0x1c/0x60 [kvm]
stack backtrace:
CPU: 1 PID: 4230 Comm: qemu-system-x86 Not tainted 4.8.0-rc5+ #47
Call Trace:
dump_stack+0x99/0xd0
lockdep_rcu_suspicious+0xe7/0x120
gfn_to_memslot+0x12a/0x140 [kvm]
gfn_to_pfn+0x12/0x30 [kvm]
gfn_to_page+0xe/0x20 [kvm]
kvm_vcpu_reload_apic_access_page+0x32/0xa0 [kvm]
nested_vmx_vmexit+0x765/0xca0 [kvm_intel]
? _raw_spin_unlock_irqrestore+0x36/0x80
vmx_check_nested_events+0x49/0x1f0 [kvm_intel]
kvm_arch_vcpu_runnable+0x2d/0xe0 [kvm]
kvm_vcpu_check_block+0x12/0x60 [kvm]
kvm_vcpu_block+0x94/0x4c0 [kvm]
kvm_arch_vcpu_ioctl_run+0x619/0x1aa0 [kvm]
? kvm_arch_vcpu_ioctl_run+0xdf1/0x1aa0 [kvm]
kvm_vcpu_ioctl+0x2d3/0x7c0 [kvm]
? __fget+0xfd/0x210
? __lock_is_held+0x54/0x70
do_vfs_ioctl+0x96/0x6a0
? __fget+0x11c/0x210
? __fget+0x5/0x210
SyS_ioctl+0x79/0x90
do_syscall_64+0x81/0x220
entry_SYSCALL64_slow_path+0x25/0x25
These can be triggered by running kvm-unit-test: ./x86-run x86/vmx.flat
The nested preemption timer is based on hrtimer which is started on L2
entry, stopped on L2 exit and evaluated via the new check_nested_events
hook. The current logic adds vCPU to a simple waitqueue (TASK_INTERRUPTIBLE)
if need to yield pCPU and w/o holding srcu read lock when accesses memslots,
both can be in nested preemption timer evaluation path which results in
the warning above.
This patch fix it by leveraging request bit to async reload APIC access
page before vmentry in order to avoid to reload directly during the nested
preemption timer evaluation, it is safe since the vmcs01 is loaded and
current is nested vmexit.
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Cc: Yunhong Jiang <yunhong.jiang@intel.com>
Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
|
|
virtio-gpu is used for VMs, so add it to the kvm config.
Signed-off-by: Rob Herring <robh@kernel.org>
Cc: Christoffer Dall <christoffer.dall@linaro.org>
Cc: Marc Zyngier <marc.zyngier@arm.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: "Radim Krčmář" <rkrcmar@redhat.com>
Cc: kvmarm@lists.cs.columbia.edu
Cc: kvm@vger.kernel.org
[expanded "frag" to "fragment" in summary]
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
|
|
kvm_guest.config is useful for KVM guests on other arches, and nothing
in it appears to be x86 specific, so just move the whole file. Kbuild
will find it in either location.
Signed-off-by: Rob Herring <robh@kernel.org>
Cc: Christoffer Dall <christoffer.dall@linaro.org>
Cc: Marc Zyngier <marc.zyngier@arm.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: "Radim Krčmář" <rkrcmar@redhat.com>
Cc: kvmarm@lists.cs.columbia.edu
Cc: kvm@vger.kernel.org
Acked-by: Christoffer Dall <christoffer.dall@linaro.org>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
|
|
Original commit, which added support for Armada CP110 system controller
used global variables for storing all clock information. It worked
fine for Armada 7k SoC, with single CP110 block. After dual-CP110 Armada 8k
was introduced, the data got overwritten and corrupted.
This patch fixes the issue by allocating resources dynamically in the
driver probe and storing it as platform drvdata.
Fixes: d3da3eaef7f4 ("clk: mvebu: new driver for Armada CP110 system ...")
Signed-off-by: Marcin Wojtas <mw@semihalf.com>
Reviewed-by: Thomas Petazzoni <thomas.petazzoni@free-electrons.com>
CC: <stable@vger.kernel.org>
Signed-off-by: Stephen Boyd <sboyd@codeaurora.org>
|
|
Armada CP110 system controller comprises its own routine responsble
for registering gate clocks. Among others 'flags' field in
struct clk_init_data was not set, using a random values, which
may cause an unpredicted behavior.
This patch fixes the problem by resetting all fields of clk_init_data
before assigning values for all gated clocks of Armada 7k/8k SoCs family.
Fixes: d3da3eaef7f4 ("clk: mvebu: new driver for Armada CP110 system ...")
Signed-off-by: Marcin Wojtas <mw@semihalf.com>
CC: <stable@vger.kernel.org>
Signed-off-by: Stephen Boyd <sboyd@codeaurora.org>
|
|
into irq/core
Pull irqchip core changes for v4.9 from Jason Cooper
- jcore: Add AIC driver
- mips-gic: Use for_each_set_bit
- mvebu: Add PIC driver
|
|
The upper bound IWL_RATE_COUNT_LEGACY should be used
with a >= check, rejecting the value itself; fix that.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
|
|
We were assigning the return value of iwl_mvm_ctdp_command() to a
variable, but never checking it. If this command fails, we should not
allow the interface up process to proceed, since it is potentially
dangerous to ignore thermal management requirements.
Fixes: commit 5c89e7bc557e ("iwlwifi: mvm: add registration to cooling device")
Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
|
|
The various TFD/TB helpers have two code paths depending on the
type of TFD supported, with variable shadowing due to the new if
branches. Move the fall-through code into else branches to avoid
variable shadowing. While doing so, rename some of the variables
and do some other cleanups (like removing void * casts of void *
pointers.)
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
|
|
Add two new PCI IDs for the 9560 series.
Signed-off-by: Oren Givon <oren.givon@intel.com>
Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
|
|
Move the init_dbg check to earlier in the function to simplify the
code.
Signed-off-by: Sara Sharon <sara.sharon@intel.com>
Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
|
|
Running with KASAN produces some messages:
BUG: KASAN: use-after-free in ti_st_send_frame+0x9c/0x16c at addr
ffffffc064868fe8
Read of size 1 by task kworker/u17:1/1266
<KASAN output trimmed>
Hardware name: HiKey Development Board (DT)
Workqueue: hci0 hci_cmd_work
Call trace:
[<ffffffc00008c00c>] dump_backtrace+0x0/0x178
[<ffffffc00008c1a4>] show_stack+0x20/0x28
[<ffffffc00067da38>] dump_stack+0xa8/0xe0
[<ffffffc000288430>] print_trailer+0x110/0x174
[<ffffffc00028aedc>] object_err+0x4c/0x5c
[<ffffffc00028f714>] kasan_report_error+0x254/0x54c
[<ffffffc00028fa70>] kasan_report+0x64/0x70
[<ffffffc00028eb8c>] __asan_load1+0x4c/0x54
[<ffffffc000b59b24>] ti_st_send_frame+0x9c/0x16c
[<ffffffc000ee8dcc>] hci_send_frame+0xb4/0x118
[<ffffffc000ee8efc>] hci_cmd_work+0xcc/0x154
[<ffffffc0000f6c48>] process_one_work+0x26c/0x7a4
[<ffffffc0000f721c>] worker_thread+0x9c/0x73c
[<ffffffc000100250>] kthread+0x138/0x154
[<ffffffc000085c50>] ret_from_fork+0x10/0x40
The packet is being accessed for statistics after it has been freed.
Save the packet type before sending for statistics afterwards.
Signed-off-by: Laura Abbott <labbott@redhat.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
|