summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2016-09-23netfilter: nf_tables: improve nft payload fast evalLiping Zhang
There's an off-by-one issue in nft_payload_fast_eval, skb_tail_pointer and ptr + priv->len all point to the last valid address plus 1. So if they are equal, we can still fetch the valid data. It's unnecessary to fall back to nft_payload_eval. Signed-off-by: Liping Zhang <liping.zhang@spreadtrum.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2016-09-23netfilter: nf_queue: improve queue range support for bridge familyLiping Zhang
After commit ac2863445686 ("netfilter: bridge: add nf_afinfo to enable queuing to userspace"), we can queue packets to the user space in bridge family. But when the user specify the queue range, packets will be only delivered to the first queue num. Because in nfqueue_hash, we only support ipv4 and ipv6 family. Now add support for bridge family too. Suggested-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Liping Zhang <liping.zhang@spreadtrum.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2016-09-23netfilter: nft_queue: add _SREG_QNUM attr to select the queue numberLiping Zhang
Currently, the user can specify the queue numbers by _QUEUE_NUM and _QUEUE_TOTAL attributes, this is enough in most situations. But acctually, it is not very flexible, for example: tcp dport 80 mapped to queue0 tcp dport 81 mapped to queue1 tcp dport 82 mapped to queue2 In order to do this thing, we must add 3 nft rules, and more mapping meant more rules ... So take one register to select the queue number, then we can add one simple rule to mapping queues, maybe like this: queue num tcp dport map { 80:0, 81:1, 82:2 ... } Florian Westphal also proposed wider usage scenarios: queue num jhash ip saddr . ip daddr mod ... queue num meta cpu ... queue num meta mark ... The last point is how to load a queue number from sreg, although we can use *(u16*)&regs->data[reg] to load the queue number, just like nat expr to load its l4port do. But we will cooperate with hash expr, meta cpu, meta mark expr and so on. They all store the result to u32 type, so cast it to u16 pointer and dereference it will generate wrong result in the big endian system. So just keep it simple, we treat queue number as u32 type, although u16 type is already enough. Suggested-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Liping Zhang <liping.zhang@spreadtrum.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2016-09-23netfilter: nf_tables: validate maximum value of u32 netlink attributesLaura Garcia Liebana
Fetch value and validate u32 netlink attribute. This validation is usually required when the u32 netlink attributes are being stored in a field whose size is smaller. This patch revisits 4da449ae1df9 ("netfilter: nft_exthdr: Add size check on u8 nft_exthdr attributes"). Fixes: 96518518cc41 ("netfilter: add nftables") Suggested-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Laura Garcia Liebana <nevola@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2016-09-23ixgbe: reset before SRIOV init to avoid mailbox issuesEmil Tantilov
Enabling SRIOV while the ixgbevf driver is loaded will result in all mailbox requests from ixgbevf_open() being rejected by ixgbe because adapter->clear_to_send is set to false on reset. Call ixgbe_sriov_reinit() before pci_enable_sriov() to make sure that mailbox requests are handled from the time ixgbevf is loaded. Reported-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Emil Tantilov <emil.s.tantilov@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2016-09-23ixgbe: Support 4 queue RSS on VFs with 1 or 2 queue RSS on PFAlexander Duyck
Instead of limiting the VFs if we don't use 4 queues for RSS in the PF we can instead just limit the RSS queues used to a power of 2. By doing this we can support use cases where VFs are using more queues than the PF is currently using and can support RSS if so desired. The only limitation on this is that we cannot support 3 queues of RSS in the PF or VF. In either of these cases we should fall back to 2 queues in order to be able to use the power of 2 masking provided by the psrtype register. Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2016-09-23ixgbe: Limit reporting of redirection table if SR-IOV is enabledAlexander Duyck
The hardware redirection table can support more queues then the PF currently has when SR-IOV is enabled. In order to account for this use the RSS mask to trim of the bits that are not used. Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2016-09-23ixgbe: Allow setting multiple queues when SR-IOV is enabledAlexander Duyck
The maximum queue count reported was 1, however support for multiple queues with SR-IOV was added some time ago so we should report support for it to the user so that they can select multiple queues if they so desire. Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2016-09-23ixgbe: Use MDIO_PRTAD_NONE consistentlyMark Rustad
The value MDIO_PRTAD_NONE should be used to indicate no PHY address. Not 0, not 0xFFFF. Use the MDIO_PRTAD_NONE value consistently. Signed-off-by: Mark Rustad <mark.d.rustad@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2016-09-23ixgbevf: add spinlocks for MTU change callsEmil Tantilov
Protect set_rlpml with mailbox lock to make sure the MTU configuration is handled properly. This change resolves an issue where set_rlpml can fail when the VF interface is brought up: ixgbevf 0000:03:1d.6: Failed to set MTU at 1500 Signed-off-by: Emil Tantilov <emil.s.tantilov@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2016-09-23ixgbe: Indicate support for pause frames in all casesMark Rustad
All the MACs supported by ixgbe support pause frames, so indicate that support in ethtool. Also set advertising according to requested mode. Signed-off-by: Mark Rustad <mark.d.rustad@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2016-09-23ixgbe: Resolve NULL reference by setting {read, write}_reg_mdiMark Rustad
Set the read_reg_mdi and write_reg_mdi method pointers for X550EM_A_10G_T devices to resolve jumping to NULL. Signed-off-by: Mark Rustad <mark.d.rustad@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2016-09-23ixgbe: make ixgbe_led_on/off_t_x550em staticEmil Tantilov
These functions are only used in ixgbe_x550.c. Fixes a warning when compiling with -Wmissing-prototypes Reported-by: Krishneil Singh <krishneil.k.singh@intel.com> Signed-off-by: Emil Tantilov <emil.s.tantilov@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2016-09-23ixgbe: simplify the logic for setting VLAN filteringEmil Tantilov
Simplify the logic for setting VLNCTRL.VFE by checking the VMDQ flag and 82598 MAC instead of having to maintain a list of MAC types. Signed-off-by: Emil Tantilov <emil.s.tantilov@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2016-09-23ALSA: usb-audio: Extend DragonFly dB scale quirk to cover other variantsAnssi Hannula
The DragonFly quirk added in 42e3121d90f4 ("ALSA: usb-audio: Add a more accurate volume quirk for AudioQuest DragonFly") applies a custom dB map on the volume control when its range is reported as 0..50 (0 .. 0.2dB). However, there exists at least one other variant (hw v1.0c, as opposed to the tested v1.2) which reports a different non-sensical volume range (0..53) and the custom map is therefore not applied for that device. This results in all of the volume change appearing close to 100% on mixer UIs that utilize the dB TLV information. Add a fallback case where no dB TLV is reported at all if the control range is not 0..50 but still 0..N where N <= 1000 (3.9 dB). Also restrict the quirk to only apply to the volume control as there is also a mute control which would match the check otherwise. Fixes: 42e3121d90f4 ("ALSA: usb-audio: Add a more accurate volume quirk for AudioQuest DragonFly") Signed-off-by: Anssi Hannula <anssi.hannula@iki.fi> Reported-by: David W <regulars@d-dub.org.uk> Tested-by: David W <regulars@d-dub.org.uk> Cc: <stable@vger.kernel.org> Signed-off-by: Takashi Iwai <tiwai@suse.de>
2016-09-22i40evf: remove unnecessary error checking against i40e_shutdown_adminqLihong Yang
The i40e_shutdown_adminq function never returns failure. There is no need to check the non-0 return value. Clean up the unnecessary error checking and warning against it. Change-ID: Ibb616f09cfb93bd1a872ebf3241a15fb8354b31b Signed-off-by: Lihong Yang <lihong.yang@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2016-09-22i40e: Limit TX descriptor count in cases where frag size is greater than 16KAlexander Duyck
The i40e driver was incorrectly assuming that we would always be pulling no more than 1 descriptor from each fragment. It is in fact possible for us to end up with the case where 2 descriptors worth of data may be pulled when a frame is larger than one of the pieces generated when aligning the payload to either 4K or pieces smaller than 16K. To adjust for this we just need to make certain to test all the way to the end of the fragments as it is possible for us to span 2 descriptors in the block before us so we need to guarantee that even the last 6 descriptors have enough data to fill a full frame. Change-ID: Ic2ecb4d6b745f447d334e66c14002152f50e2f99 Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2016-09-22i40evf: remove unnecessary error checking against i40evf_up_completeBimmy Pujari
Function i40evf_up_complete() always returns success. Changed this to a void type and removed the code that checks the return status and prints an error message. Change-ID: I8c400f174786b9c855f679e470f35af292fb50ad Signed-off-by: Bimmy Pujari <bimmy.pujari@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2016-09-22i40evf: Fix link state event handlingSridhar Samudrala
Currently disabling the link state from PF via ip link set enp5s0f0 vf 0 state disable doesn't disable the CARRIER on the VF. This patch updates the carrier and starts/stops the tx queues based on the link state notification from PF. PF: enp5s0f0, VF: enp5s2 #modprobe i40e #echo 2 > /sys/class/net/enp5s0f0/device/sriov_numvfs #ip link set enp5s2 up #ip -d link show enp5s2 175: enp5s2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP mode DEFAULT group default qlen 1000 link/ether ea:4d:60:bc:6f:85 brd ff:ff:ff:ff:ff:ff promiscuity 0 addrgenmode eui64 #ip link set enp5s0f0 vf 0 state disable #ip -d link show enp5s0f0 171: enp5s0f0: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000 link/ether 68:05:ca:2e:72:68 brd ff:ff:ff:ff:ff:ff promiscuity 0 addrgenmode eui64 numtxqueues 72 numrxqueues 72 portid 6805ca2e7268 vf 0 MAC 00:00:00:00:00:00, spoof checking on, link-state disable, trust off vf 1 MAC 00:00:00:00:00:00, spoof checking on, link-state auto, trust off #ip -d link show enp5s2 175: enp5s2: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc mq state DOWN mode DEFAULT group default qlen 1000 link/ether ea:4d:60:bc:6f:85 brd ff:ff:ff:ff:ff:ff promiscuity 0 addrgenmode eui64 numtxqueues 16 numrxqueues 16 Signed-off-by: Sridhar Samudrala <sridhar.samudrala@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2016-09-22i40e: avoid potential null pointer dereference when assigning lenColin Ian King
There is a sanitcy check for desc being null in the first line of function i40evf_debug_aq. However, before that, aq_desc is cast from desc, and aq_desc is being dereferenced on the assignment of len, so this could be a potential null pointer deference. Fix this by moving the initialization of len to the code block where len is being used and hence at this point we know it is OK to dereference aq_desc. Signed-off-by: Colin Ian King <colin.king@canonical.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2016-09-22i40e: Fix for extra byte swap in tunnel setupCarolyn Wyborny
This patch fixes an issue where we were byte swapping the port parameter, then byte swapping it again in function execution. Obviously, that's unnecessary, so take it out of the function calls. Without this patch, the udp based tunnel configuration would not be correct. Change-ID: I788d83c5bd5732170f1a81dbfa0b1ac3ca8ea5b7 Signed-off-by: Carolyn Wyborny <carolyn.wyborny@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2016-09-22i40e: Fix to check for NULLCarolyn Wyborny
This patch fixes an issue in the virt channel code, where a return from i40e_find_vsi_from_id was not checked for NULL when applicable. Without this patch, there is a risk for panic and static analysis tools complain. This patch fixes the problem by adding the check and adding an additional input check for similar reasons. Change-ID: I7e9be88eb7a3addb50eadc451c8336d9e06f5394 Signed-off-by: Carolyn Wyborny <carolyn.wyborny@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2016-09-22i40e: return correct opcode to VFMitch Williams
This conditional is backward, so the driver responds back to the VF with the wrong opcode. Do the old switcheroo to fix this. Change-ID: I384035b0fef8a3881c176de4b4672009b3400b25 Signed-off-by: Mitch Williams <mitch.a.williams@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2016-09-22i40e: fix "dump port" command when NPAR enabledAlan Brady
When using the debugfs to issue the "dump port" command with NPAR enabled, the firmware reports back with invalid argument. The issue occurs because the pf->mac_seid was used to perform the query. This is fine when NPAR is disabled because the switch ID == pf->mac_seid, however this is not the case when NPAR is enabled. This fix instead goes through the VSI to determine the correct ID to use in either case. Change-ID: I0cd67913a7f2c4a2962e06d39e32e7447cc55b6a Signed-off-by: Alan Brady <alan.brady@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2016-09-22i40e: fix setting user defined RSS hash keyAlan Brady
Previously, when using ethtool to change the RSS hash key, ethtool would report back saying the old key was still being used and no error was reported. It was unclear whether it was being reported incorrectly or being set incorrectly. Debugging revealed 'i40e_set_rxfh()' returned zero immediately instead of setting the key because a user defined indirection table is not supplied when changing the hash key. This fix instead changes it such that if an indirection table is not supplied, then a default one is created and the hash key is now correctly set. Change-ID: Iddb621897ecf208650272b7ee46702cad7b69a71 Signed-off-by: Alan Brady <alan.brady@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2016-09-23locking/hung_task: Fix typo in CONFIG_DETECT_HUNG_TASK help textVivien Didelot
Fix the indefinitiley -> indefinitely typo in Kconfig.debug. Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/20160922205513.17821-1-vivien.didelot@savoirfairelinux.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-09-23objtool: Add do_task_dead() to global noreturn listJosh Poimboeuf
objtool reports the following new warning: kernel/exit.o: warning: objtool: do_exit() falls through to next function complete_and_exit() The warning is caused by do_exit()'s new call to do_task_dead(), which is a new "noreturn" function which objtool doesn't know about yet, introduced by: 9af6528ee9b6 ("sched/core: Optimize __schedule()") ( objtool has to know all the global noreturn functions so it can follow the control flow of any functions which call them. Unfortunately they need to be hard-coded because there's no automated way to detect them. ) Reported-by: kbuild test robot <fengguang.wu@intel.com> Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: kbuild-all@01.org Cc: tipbuild@zytor.com Link: http://lkml.kernel.org/r/20160922212125.zbuewckqll4yur25@treble Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-09-23Merge tag 'perf-core-for-mingo-20160922' of ↵Ingo Molnar
git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux into perf/core Pull perf/core improvements from Arnaldo Carvalho de Melo: New features: - Add support for interacting with Coresight PMU ETMs/PTMs, that are IP blocks to perform hardware assisted tracing on a ARM CPU core (Mathieu Poirier) Infrastructure changes: - Histogram prep work for the upcoming c2c tool (Jiri Olsa) Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-09-23Merge branch 'perf/urgent' into perf/core, to pick up fixesIngo Molnar
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-09-22nvme-rdma: only clear queue flags after successful connectSagi Grimberg
Otherwise, nvme_rdma_stop_and_clear_queue() will incorrectly try to stop/free rdma qps/cm_ids that are already freed. Fixes: e89ca58f9c90 ("nvme-rdma: add DELETING queue flag") Reported-by: Steve Wise <swise@opengridcomputing.com> Tested-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@fb.com>
2016-09-22nsfs: Simplify __ns_get_pathEric W. Biederman
Move mntget from the very beginning of __ns_get_path to the success path of __ns_get_path, and remove the mntget calls. This removes the possibility that there will be a mntget/mntput pair of __ns_get_path has to retry, and generally simplifies the code. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2016-09-22Merge branch 'nsfs-ioctls' into HEADEric W. Biederman
From: Andrey Vagin <avagin@openvz.org> Each namespace has an owning user namespace and now there is not way to discover these relationships. Pid and user namepaces are hierarchical. There is no way to discover parent-child relationships too. Why we may want to know relationships between namespaces? One use would be visualization, in order to understand the running system. Another would be to answer the question: what capability does process X have to perform operations on a resource governed by namespace Y? One more use-case (which usually called abnormal) is checkpoint/restart. In CRIU we are going to dump and restore nested namespaces. There [1] was a discussion about which interface to choose to determing relationships between namespaces. Eric suggested to add two ioctl-s [2]: > Grumble, Grumble. I think this may actually a case for creating ioctls > for these two cases. Now that random nsfs file descriptors are bind > mountable the original reason for using proc files is not as pressing. > > One ioctl for the user namespace that owns a file descriptor. > One ioctl for the parent namespace of a namespace file descriptor. Here is an implementaions of these ioctl-s. $ man man7/namespaces.7 ... Since Linux 4.X, the following ioctl(2) calls are supported for namespace file descriptors. The correct syntax is: fd = ioctl(ns_fd, ioctl_type); where ioctl_type is one of the following: NS_GET_USERNS Returns a file descriptor that refers to an owning user names‐ pace. NS_GET_PARENT Returns a file descriptor that refers to a parent namespace. This ioctl(2) can be used for pid and user namespaces. For user namespaces, NS_GET_PARENT and NS_GET_USERNS have the same meaning. In addition to generic ioctl(2) errors, the following specific ones can occur: EINVAL NS_GET_PARENT was called for a nonhierarchical namespace. EPERM The requested namespace is outside of the current namespace scope. [1] https://lkml.org/lkml/2016/7/6/158 [2] https://lkml.org/lkml/2016/7/9/101 Changes for v2: * don't return ENOENT for init_user_ns and init_pid_ns. There is nothing outside of the init namespace, so we can return EPERM in this case too. > The fewer special cases the easier the code is to get > correct, and the easier it is to read. // Eric Changes for v3: * rename ns->get_owner() to ns->owner(). get_* usually means that it grabs a reference. Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: James Bottomley <James.Bottomley@HansenPartnership.com> Cc: "Michael Kerrisk (man-pages)" <mtk.manpages@gmail.com> Cc: "W. Trevor King" <wking@tremily.us> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Serge Hallyn <serge.hallyn@canonical.com>
2016-09-22tools/testing: add a test to check nsfs ioctl-sAndrey Vagin
There are two new ioctl-s: One ioctl for the user namespace that owns a file descriptor. One ioctl for the parent namespace of a namespace file descriptor. The test checks that these ioctl-s works and that they handle a case when a target namespace is outside of the current process namespace. Signed-off-by: Andrei Vagin <avagin@openvz.org> Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
2016-09-22nsfs: add ioctl to get a parent namespaceAndrey Vagin
Pid and user namepaces are hierarchical. There is no way to discover parent-child relationships. In a future we will use this interface to dump and restore nested namespaces. Acked-by: Serge Hallyn <serge@hallyn.com> Signed-off-by: Andrei Vagin <avagin@openvz.org> Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
2016-09-22nsfs: add ioctl to get an owning user namespace for ns file descriptorAndrey Vagin
Each namespace has an owning user namespace and now there is not way to discover these relationships. Understending namespaces relationships allows to answer the question: what capability does process X have to perform operations on a resource governed by namespace Y? After a long discussion, Eric W. Biederman proposed to use ioctl-s for this purpose. The NS_GET_USERNS ioctl returns a file descriptor to an owning user namespace. It returns EPERM if a target namespace is outside of a current user namespace. v2: rename parent to relative v3: Add a missing mntput when returning -EAGAIN --EWB Acked-by: Serge Hallyn <serge@hallyn.com> Link: https://lkml.org/lkml/2016/7/6/158 Signed-off-by: Andrei Vagin <avagin@openvz.org> Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
2016-09-22kernel: add a helper to get an owning user namespace for a namespaceAndrey Vagin
Return -EPERM if an owning user namespace is outside of a process current user namespace. v2: In a first version ns_get_owner returned ENOENT for init_user_ns. This special cases was removed from this version. There is nothing outside of init_user_ns, so we can return EPERM. v3: rename ns->get_owner() to ns->owner(). get_* usually means that it grabs a reference. Acked-by: Serge Hallyn <serge@hallyn.com> Signed-off-by: Andrei Vagin <avagin@openvz.org> Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
2016-09-23KVM: nVMX: Fix the NMI IDT-vectoring handlingWanpeng Li
Run kvm-unit-tests/eventinj.flat in L1: Sending NMI to self After NMI to self FAIL: NMI This test scenario is to test whether VMM can handle NMI IDT-vectoring info correctly. At the beginning, L2 writes LAPIC to send a self NMI, the EPT page tables on both L1 and L0 are empty so: - The L2 accesses memory can generate EPT violation which can be intercepted by L0. The EPT violation vmexit occurred during delivery of this NMI, and the NMI info is recorded in vmcs02's IDT-vectoring info. - L0 walks L1's EPT12 and L0 sees the mapping is invalid, it injects the EPT violation into L1. The vmcs02's IDT-vectoring info is reflected to vmcs12's IDT-vectoring info since it is a nested vmexit. - L1 receives the EPT violation, then fixes its EPT12. - L1 executes VMRESUME to resume L2 which generates vmexit and causes L1 exits to L0. - L0 emulates VMRESUME which is called from L1, then return to L2. L0 merges the requirement of vmcs12's IDT-vectoring info and injects it to L2 through vmcs02. - The L2 re-executes the fault instruction and cause EPT violation again. - Since the L1's EPT12 is valid, L0 can fix its EPT02 - L0 resume L2 The EPT violation vmexit occurred during delivery of this NMI again, and the NMI info is recorded in vmcs02's IDT-vectoring info. L0 should inject the NMI through vmentry event injection since it is caused by EPT02's EPT violation. However, vmx_inject_nmi() refuses to inject NMI from IDT-vectoring info if vCPU is in guest mode, this patch fix it by permitting to inject NMI from IDT-vectoring if it is the L0's responsibility to inject NMI from IDT-vectoring info to L2. Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Radim Krčmář <rkrcmar@redhat.com> Cc: Jan Kiszka <jan.kiszka@siemens.com> Cc: Bandan Das <bsd@redhat.com> Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2016-09-23KVM: VMX: Enable MSR-BASED TPR shadow even if APICv is inactiveWanpeng Li
I observed that kvmvapic(to optimize flexpriority=N or AMD) is used to boost TPR access when testing kvm-unit-test/eventinj.flat tpr case on my haswell desktop (w/ flexpriority, w/o APICv). Commit (8d14695f9542 x86, apicv: add virtual x2apic support) disable virtual x2apic mode completely if w/o APICv, and the author also told me that windows guest can't enter into x2apic mode when he developed the APICv feature several years ago. However, it is not truth currently, Interrupt Remapping and vIOMMU is added to qemu and the developers from Intel test windows 8 can work in x2apic mode w/ Interrupt Remapping enabled recently. This patch enables TPR shadow for virtual x2apic mode to boost windows guest in x2apic mode even if w/o APICv. Can pass the kvm-unit-test. Suggested-by: Radim Krčmář <rkrcmar@redhat.com> Suggested-by: Wincy Van <fanwenyi0529@gmail.com> Reviewed-by: Radim Krčmář <rkrcmar@redhat.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Radim Krčmář <rkrcmar@redhat.com> Cc: Wincy Van <fanwenyi0529@gmail.com> Cc: Yang Zhang <yang.zhang.wz@gmail.com> Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2016-09-23KVM: nVMX: Fix reload apic access page warningWanpeng Li
WARNING: CPU: 1 PID: 4230 at kernel/sched/core.c:7564 __might_sleep+0x7e/0x80 do not call blocking ops when !TASK_RUNNING; state=1 set at [<ffffffff8d0de7f9>] prepare_to_swait+0x39/0xa0 CPU: 1 PID: 4230 Comm: qemu-system-x86 Not tainted 4.8.0-rc5+ #47 Call Trace: dump_stack+0x99/0xd0 __warn+0xd1/0xf0 warn_slowpath_fmt+0x4f/0x60 ? prepare_to_swait+0x39/0xa0 ? prepare_to_swait+0x39/0xa0 __might_sleep+0x7e/0x80 __gfn_to_pfn_memslot+0x156/0x480 [kvm] gfn_to_pfn+0x2a/0x30 [kvm] gfn_to_page+0xe/0x20 [kvm] kvm_vcpu_reload_apic_access_page+0x32/0xa0 [kvm] nested_vmx_vmexit+0x765/0xca0 [kvm_intel] ? _raw_spin_unlock_irqrestore+0x36/0x80 vmx_check_nested_events+0x49/0x1f0 [kvm_intel] kvm_arch_vcpu_runnable+0x2d/0xe0 [kvm] kvm_vcpu_check_block+0x12/0x60 [kvm] kvm_vcpu_block+0x94/0x4c0 [kvm] kvm_arch_vcpu_ioctl_run+0x619/0x1aa0 [kvm] ? kvm_arch_vcpu_ioctl_run+0xdf1/0x1aa0 [kvm] kvm_vcpu_ioctl+0x2d3/0x7c0 [kvm] =============================== [ INFO: suspicious RCU usage. ] 4.8.0-rc5+ #47 Not tainted ------------------------------- ./include/linux/kvm_host.h:535 suspicious rcu_dereference_check() usage! other info that might help us debug this: rcu_scheduler_active = 1, debug_locks = 0 1 lock held by qemu-system-x86/4230: #0: (&vcpu->mutex){+.+.+.}, at: [<ffffffffc062975c>] vcpu_load+0x1c/0x60 [kvm] stack backtrace: CPU: 1 PID: 4230 Comm: qemu-system-x86 Not tainted 4.8.0-rc5+ #47 Call Trace: dump_stack+0x99/0xd0 lockdep_rcu_suspicious+0xe7/0x120 gfn_to_memslot+0x12a/0x140 [kvm] gfn_to_pfn+0x12/0x30 [kvm] gfn_to_page+0xe/0x20 [kvm] kvm_vcpu_reload_apic_access_page+0x32/0xa0 [kvm] nested_vmx_vmexit+0x765/0xca0 [kvm_intel] ? _raw_spin_unlock_irqrestore+0x36/0x80 vmx_check_nested_events+0x49/0x1f0 [kvm_intel] kvm_arch_vcpu_runnable+0x2d/0xe0 [kvm] kvm_vcpu_check_block+0x12/0x60 [kvm] kvm_vcpu_block+0x94/0x4c0 [kvm] kvm_arch_vcpu_ioctl_run+0x619/0x1aa0 [kvm] ? kvm_arch_vcpu_ioctl_run+0xdf1/0x1aa0 [kvm] kvm_vcpu_ioctl+0x2d3/0x7c0 [kvm] ? __fget+0xfd/0x210 ? __lock_is_held+0x54/0x70 do_vfs_ioctl+0x96/0x6a0 ? __fget+0x11c/0x210 ? __fget+0x5/0x210 SyS_ioctl+0x79/0x90 do_syscall_64+0x81/0x220 entry_SYSCALL64_slow_path+0x25/0x25 These can be triggered by running kvm-unit-test: ./x86-run x86/vmx.flat The nested preemption timer is based on hrtimer which is started on L2 entry, stopped on L2 exit and evaluated via the new check_nested_events hook. The current logic adds vCPU to a simple waitqueue (TASK_INTERRUPTIBLE) if need to yield pCPU and w/o holding srcu read lock when accesses memslots, both can be in nested preemption timer evaluation path which results in the warning above. This patch fix it by leveraging request bit to async reload APIC access page before vmentry in order to avoid to reload directly during the nested preemption timer evaluation, it is safe since the vmcs01 is loaded and current is nested vmexit. Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Radim Krčmář <rkrcmar@redhat.com> Cc: Yunhong Jiang <yunhong.jiang@intel.com> Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2016-09-23kvmconfig: add virtio-gpu to config fragmentRob Herring
virtio-gpu is used for VMs, so add it to the kvm config. Signed-off-by: Rob Herring <robh@kernel.org> Cc: Christoffer Dall <christoffer.dall@linaro.org> Cc: Marc Zyngier <marc.zyngier@arm.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: "Radim Krčmář" <rkrcmar@redhat.com> Cc: kvmarm@lists.cs.columbia.edu Cc: kvm@vger.kernel.org [expanded "frag" to "fragment" in summary] Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2016-09-23config: move x86 kvm_guest.config to a common locationRob Herring
kvm_guest.config is useful for KVM guests on other arches, and nothing in it appears to be x86 specific, so just move the whole file. Kbuild will find it in either location. Signed-off-by: Rob Herring <robh@kernel.org> Cc: Christoffer Dall <christoffer.dall@linaro.org> Cc: Marc Zyngier <marc.zyngier@arm.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: "Radim Krčmář" <rkrcmar@redhat.com> Cc: kvmarm@lists.cs.columbia.edu Cc: kvm@vger.kernel.org Acked-by: Christoffer Dall <christoffer.dall@linaro.org> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2016-09-22clk: mvebu: dynamically allocate resources in Armada CP110 system controllerMarcin Wojtas
Original commit, which added support for Armada CP110 system controller used global variables for storing all clock information. It worked fine for Armada 7k SoC, with single CP110 block. After dual-CP110 Armada 8k was introduced, the data got overwritten and corrupted. This patch fixes the issue by allocating resources dynamically in the driver probe and storing it as platform drvdata. Fixes: d3da3eaef7f4 ("clk: mvebu: new driver for Armada CP110 system ...") Signed-off-by: Marcin Wojtas <mw@semihalf.com> Reviewed-by: Thomas Petazzoni <thomas.petazzoni@free-electrons.com> CC: <stable@vger.kernel.org> Signed-off-by: Stephen Boyd <sboyd@codeaurora.org>
2016-09-22clk: mvebu: fix setting unwanted flags in CP110 gate clockMarcin Wojtas
Armada CP110 system controller comprises its own routine responsble for registering gate clocks. Among others 'flags' field in struct clk_init_data was not set, using a random values, which may cause an unpredicted behavior. This patch fixes the problem by resetting all fields of clk_init_data before assigning values for all gated clocks of Armada 7k/8k SoCs family. Fixes: d3da3eaef7f4 ("clk: mvebu: new driver for Armada CP110 system ...") Signed-off-by: Marcin Wojtas <mw@semihalf.com> CC: <stable@vger.kernel.org> Signed-off-by: Stephen Boyd <sboyd@codeaurora.org>
2016-09-22Merge tag 'irqchip-core-4.9' of git://git.infradead.org/users/jcooper/linux ↵Thomas Gleixner
into irq/core Pull irqchip core changes for v4.9 from Jason Cooper - jcore: Add AIC driver - mips-gic: Use for_each_set_bit - mvebu: Add PIC driver
2016-09-22iwlwifi: mvm: correct rate_idx bounds-checkJohannes Berg
The upper bound IWL_RATE_COUNT_LEGACY should be used with a >= check, rejecting the value itself; fix that. Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
2016-09-22iwlwifi: mvm: bail out if CTDP start operation failsLuca Coelho
We were assigning the return value of iwl_mvm_ctdp_command() to a variable, but never checking it. If this command fails, we should not allow the interface up process to proceed, since it is potentially dangerous to ignore thermal management requirements. Fixes: commit 5c89e7bc557e ("iwlwifi: mvm: add registration to cooling device") Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
2016-09-22iwlwifi: pcie: avoid variable shadowing in TFD helpersJohannes Berg
The various TFD/TB helpers have two code paths depending on the type of TFD supported, with variable shadowing due to the new if branches. Move the fall-through code into else branches to avoid variable shadowing. While doing so, rename some of the variables and do some other cleanups (like removing void * casts of void * pointers.) Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
2016-09-22iwlwifi: add two new 9560 series PCI IDsOren Givon
Add two new PCI IDs for the 9560 series. Signed-off-by: Oren Givon <oren.givon@intel.com> Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
2016-09-22iwlwifi: mvm: cleanup usage of init_dbg parameterSara Sharon
Move the init_dbg check to earlier in the function to simplify the code. Signed-off-by: Sara Sharon <sara.sharon@intel.com> Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
2016-09-22Bluetooth: btwilink: Save the packet type before sendingLaura Abbott
Running with KASAN produces some messages: BUG: KASAN: use-after-free in ti_st_send_frame+0x9c/0x16c at addr ffffffc064868fe8 Read of size 1 by task kworker/u17:1/1266 <KASAN output trimmed> Hardware name: HiKey Development Board (DT) Workqueue: hci0 hci_cmd_work Call trace: [<ffffffc00008c00c>] dump_backtrace+0x0/0x178 [<ffffffc00008c1a4>] show_stack+0x20/0x28 [<ffffffc00067da38>] dump_stack+0xa8/0xe0 [<ffffffc000288430>] print_trailer+0x110/0x174 [<ffffffc00028aedc>] object_err+0x4c/0x5c [<ffffffc00028f714>] kasan_report_error+0x254/0x54c [<ffffffc00028fa70>] kasan_report+0x64/0x70 [<ffffffc00028eb8c>] __asan_load1+0x4c/0x54 [<ffffffc000b59b24>] ti_st_send_frame+0x9c/0x16c [<ffffffc000ee8dcc>] hci_send_frame+0xb4/0x118 [<ffffffc000ee8efc>] hci_cmd_work+0xcc/0x154 [<ffffffc0000f6c48>] process_one_work+0x26c/0x7a4 [<ffffffc0000f721c>] worker_thread+0x9c/0x73c [<ffffffc000100250>] kthread+0x138/0x154 [<ffffffc000085c50>] ret_from_fork+0x10/0x40 The packet is being accessed for statistics after it has been freed. Save the packet type before sending for statistics afterwards. Signed-off-by: Laura Abbott <labbott@redhat.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>