summaryrefslogtreecommitdiff
path: root/include
AgeCommit message (Collapse)Author
2017-11-06KVM: arm/arm64: Get rid of kvm_timer_flush_hwstateChristoffer Dall
Now when both the vtimer and the ptimer when using both the in-kernel vgic emulation and a userspace IRQ chip are driven by the timer signals and at the vcpu load/put boundaries, instead of recomputing the timer state at every entry/exit to/from the guest, we can get entirely rid of the flush hwstate function. Signed-off-by: Christoffer Dall <cdall@linaro.org> Acked-by: Marc Zyngier <marc.zyngier@arm.com>
2017-11-06KVM: arm/arm64: Avoid timer save/restore in vcpu entry/exitChristoffer Dall
We don't need to save and restore the hardware timer state and examine if it generates interrupts on on every entry/exit to the guest. The timer hardware is perfectly capable of telling us when it has expired by signaling interrupts. When taking a vtimer interrupt in the host, we don't want to mess with the timer configuration, we just want to forward the physical interrupt to the guest as a virtual interrupt. We can use the split priority drop and deactivate feature of the GIC to do this, which leaves an EOI'ed interrupt active on the physical distributor, making sure we don't keep taking timer interrupts which would prevent the guest from running. We can then forward the physical interrupt to the VM using the HW bit in the LR of the GIC, like we do already, which lets the guest directly deactivate both the physical and virtual timer simultaneously, allowing the timer hardware to exit the VM and generate a new physical interrupt when the timer output is again asserted later on. We do need to capture this state when migrating VCPUs between physical CPUs, however, which we use the vcpu put/load functions for, which are called through preempt notifiers whenever the thread is scheduled away from the CPU or called directly if we return from the ioctl to userspace. One caveat is that we have to save and restore the timer state in both kvm_timer_vcpu_[put/load] and kvm_timer_[schedule/unschedule], because we can have the following flows: 1. kvm_vcpu_block 2. kvm_timer_schedule 3. schedule 4. kvm_timer_vcpu_put (preempt notifier) 5. schedule (vcpu thread gets scheduled back) 6. kvm_timer_vcpu_load (preempt notifier) 7. kvm_timer_unschedule And a version where we don't actually call schedule: 1. kvm_vcpu_block 2. kvm_timer_schedule 7. kvm_timer_unschedule Since kvm_timer_[schedule/unschedule] may not be followed by put/load, but put/load also may be called independently, we call the timer save/restore functions from both paths. Since they rely on the loaded flag to never save/restore when unnecessary, this doesn't cause any harm, and we ensure that all invokations of either set of functions work as intended. An added benefit beyond not having to read and write the timer sysregs on every entry and exit is that we no longer have to actively write the active state to the physical distributor, because we configured the irq for the vtimer to only get a priority drop when handling the interrupt in the GIC driver (we called irq_set_vcpu_affinity()), and the interrupt stays active after firing on the host. Reviewed-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Christoffer Dall <cdall@linaro.org>
2017-11-06KVM: arm/arm64: Use separate timer for phys timer emulationChristoffer Dall
We were using the same hrtimer for emulating the physical timer and for making sure a blocking VCPU thread would be eventually woken up. That worked fine in the previous arch timer design, but as we are about to actually use the soft timer expire function for the physical timer emulation, change the logic to use a dedicated hrtimer. This has the added benefit of not having to cancel any work in the sync path, which in turn allows us to run the flush and sync with IRQs disabled. Note that the hrtimer used to program the host kernel's timer to generate an exit from the guest when the emulated physical timer fires never has to inject any work, and to share the soft_timer_cancel() function with the bg_timer, we change the function to only cancel any pending work if the pointer to the work struct is not null. Acked-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Christoffer Dall <cdall@linaro.org>
2017-11-06KVM: arm/arm64: Rename soft timer to bg_timerChristoffer Dall
As we are about to introduce a separate hrtimer for the physical timer, call this timer bg_timer, because we refer to this timer as the background timer in the code and comments elsewhere. Signed-off-by: Christoffer Dall <cdall@linaro.org> Acked-by: Marc Zyngier <marc.zyngier@arm.com>
2017-11-06KVM: arm/arm64: Make timer_arm and timer_disarm helpers more genericChristoffer Dall
We are about to add an additional soft timer to the arch timer state for a VCPU and would like to be able to reuse the functions to program and cancel a timer, so we make them slightly more generic and rename to make it more clear that these functions work on soft timers and not the hardware resource that this code is managing. The armed flag on the timer state is only used to assert a condition, and we don't rely on this assertion in any meaningful way, so we can simply get rid of this flack and slightly reduce complexity. Acked-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Christoffer Dall <cdall@linaro.org>
2017-11-06ACPI / PM: Take SMART_SUSPEND driver flag into accountRafael J. Wysocki
Make the ACPI PM domain take DPM_FLAG_SMART_SUSPEND into account in its system suspend callbacks. [Note that the pm_runtime_suspended() check in acpi_dev_needs_resume() is an optimization, because if is not passed, all of the subsequent checks may be skipped and some of them are much more overhead in general.] Also use the observation that if the device is in runtime suspend at the beginning of the "late" phase of a system-wide suspend-like transition, its state cannot change going forward (runtime PM is disabled for it at that time) until the transition is over and the subsequent system-wide PM callbacks should be skipped for it (as they generally assume the device to not be suspended), so add checks for that in acpi_subsys_suspend_late/noirq() and acpi_subsys_freeze_late/noirq(). Moreover, if acpi_subsys_resume_noirq() is called during the subsequent system-wide resume transition and if the device was left in runtime suspend previously, its runtime PM status needs to be changed to "active" as it is going to be put into the full-power state going forward, so add a check for that too in there. In turn, if acpi_subsys_thaw_noirq() runs after the device has been left in runtime suspend, the subsequent "thaw" callbacks need to be skipped for it (as they may not work correctly with a suspended device), so set the power.direct_complete flag for the device then to make the PM core skip those callbacks. On top of the above, make the analogous changes in the acpi_lpss driver that uses the ACPI PM domain callbacks. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-11-06PCI / PM: Take SMART_SUSPEND driver flag into accountRafael J. Wysocki
Make the PCI bus type take DPM_FLAG_SMART_SUSPEND into account in its system-wide PM callbacks and make sure that all code that should not run in parallel with pci_pm_runtime_resume() is executed in the "late" phases of system suspend, freeze and poweroff transitions. [Note that the pm_runtime_suspended() check in pci_dev_keep_suspended() is an optimization, because if is not passed, all of the subsequent checks may be skipped and some of them are much more overhead in general.] Also use the observation that if the device is in runtime suspend at the beginning of the "late" phase of a system-wide suspend-like transition, its state cannot change going forward (runtime PM is disabled for it at that time) until the transition is over and the subsequent system-wide PM callbacks should be skipped for it (as they generally assume the device to not be suspended), so add checks for that in pci_pm_suspend_late/noirq(), pci_pm_freeze_late/noirq() and pci_pm_poweroff_late/noirq(). Moreover, if pci_pm_resume_noirq() or pci_pm_restore_noirq() is called during the subsequent system-wide resume transition and if the device was left in runtime suspend previously, its runtime PM status needs to be changed to "active" as it is going to be put into the full-power state, so add checks for that too to these functions. In turn, if pci_pm_thaw_noirq() runs after the device has been left in runtime suspend, the subsequent "thaw" callbacks need to be skipped for it (as they may not work correctly with a suspended device), so set the power.direct_complete flag for the device then to make the PM core skip those callbacks. In addition to the above add a core helper for checking if DPM_FLAG_SMART_SUSPEND is set and the device runtime PM status is "suspended" at the same time, which is done quite often in the new code (and will be done elsewhere going forward too). Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Acked-by: Bjorn Helgaas <bhelgaas@google.com>
2017-11-06PM / core: Add SMART_SUSPEND driver flagRafael J. Wysocki
Define and document a SMART_SUSPEND flag to instruct bus types and PM domains that the system suspend callbacks provided by the driver can cope with runtime-suspended devices, so from the driver's perspective it should be safe to leave devices in runtime suspend during system suspend. Setting that flag may also cause middle-layer code (bus types, PM domains etc.) to skip invocations of the ->suspend_late and ->suspend_noirq callbacks provided by the driver if the device is in runtime suspend at the beginning of the "late" phase of the system-wide suspend transition, in which case the driver's system-wide resume callbacks may be invoked back-to-back with its ->runtime_suspend callback, so the driver has to be able to cope with that too. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Reviewed-by: Ulf Hansson <ulf.hansson@linaro.org>
2017-11-06PCI / PM: Use the NEVER_SKIP driver flagRafael J. Wysocki
Replace the PCI-specific flag PCI_DEV_FLAGS_NEEDS_RESUME with the PM core's DPM_FLAG_NEVER_SKIP one everywhere and drop it. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Acked-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Ulf Hansson <ulf.hansson@linaro.org>
2017-11-06PM / core: Add NEVER_SKIP and SMART_PREPARE driver flagsRafael J. Wysocki
The motivation for this change is to provide a way to work around a problem with the direct-complete mechanism used for avoiding system suspend/resume handling for devices in runtime suspend. The problem is that some middle layer code (the PCI bus type and the ACPI PM domain in particular) returns positive values from its system suspend ->prepare callbacks regardless of whether the driver's ->prepare returns a positive value or 0, which effectively prevents drivers from being able to control the direct-complete feature. Some drivers need that control, however, and the PCI bus type has grown its own flag to deal with this issue, but since it is not limited to PCI, it is better to address it by adding driver flags at the core level. To that end, add a driver_flags field to struct dev_pm_info for flags that can be set by device drivers at the probe time to inform the PM core and/or bus types, PM domains and so on on the capabilities and/or preferences of device drivers. Also add two static inline helpers for setting that field and testing it against a given set of flags and make the driver core clear it automatically on driver remove and probe failures. Define and document two PM driver flags related to the direct- complete feature: NEVER_SKIP and SMART_PREPARE that can be used, respectively, to indicate to the PM core that the direct-complete mechanism should never be used for the device and to inform the middle layer code (bus types, PM domains etc) that it can only request the PM core to use the direct-complete mechanism for the device (by returning a positive value from its ->prepare callback) if it also has been requested by the driver. While at it, make the core check pm_runtime_suspended() when setting power.direct_complete so that it doesn't need to be checked by ->prepare callbacks. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Acked-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Ulf Hansson <ulf.hansson@linaro.org>
2017-11-06Merge branch 'acpi-pm' into pm-coreRafael J. Wysocki
2017-11-06Merge remote-tracking branches 'regmap/topic/const' and ↵Mark Brown
'regmap/topic/hwspinlock' into regmap-next
2017-11-06Merge remote-tracking branch 'regmap/topic/core' into regmap-nextMark Brown
2017-11-06ALSA: timer: Limit max instances per timerTakashi Iwai
Currently we allow unlimited number of timer instances, and it may bring the system hogging way too much CPU when too many timer instances are opened and processed concurrently. This may end up with a soft-lockup report as triggered by syzkaller, especially when hrtimer backend is deployed. Since such insane number of instances aren't demanded by the normal use case of ALSA sequencer and it merely opens a risk only for abuse, this patch introduces the upper limit for the number of instances per timer backend. As default, it's set to 1000, but for the fine-grained timer like hrtimer, it's set to 100. Reported-by: syzbot Tested-by: Jérôme Glisse <jglisse@redhat.com> Cc: <stable@vger.kernel.org> Signed-off-by: Takashi Iwai <tiwai@suse.de>
2017-11-06Merge branch 'x86/mm' into x86/asm, to pick up pending changesIngo Molnar
Concentrate x86 MM and asm related changes into a single super-topic, in preparation for larger changes. Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-11-06Merge branch 'x86/fpu' into x86/asm, to pick up fixIngo Molnar
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-11-05f2fs: support quota sys filesJaegeuk Kim
This patch supports hidden quota files in the system, which will be used for Android. It requires up-to-date f2fs-tools later than v1.9.0. Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2017-11-05f2fs: add quota_ino feature infraJaegeuk Kim
This patch adds quota_ino feature infra to be used for quota files. Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2017-11-05f2fs: support flexible inline xattr sizeChao Yu
Now, in product, more and more features based on file encryption were introduced, their demand of xattr space is increasing, however, inline xattr has fixed-size of 200 bytes, once inline xattr space is full, new increased xattr data would occupy additional xattr block which may bring us more space usage and performance regression during persisting. In order to resolve above issue, it's better to expand inline xattr size flexibly according to user's requirement. So this patch introduces new filesystem feature 'flexible inline xattr', and new mount option 'inline_xattr_size=%u', once mkfs enables the feature, we can use the option to make f2fs supporting flexible inline xattr size. To support this feature, we add extra attribute i_inline_xattr_size in inode layout, indicating that how many space inline xattr borrows from block address mapping space in inode layout, by this, we can easily locate and store flexible-sized inline xattr data in inode. Inode disk layout: +----------------------+ | .i_mode | | ... | | .i_ext | +----------------------+ | .i_extra_isize | | .i_inline_xattr_size |-----------+ | ... | | +----------------------+ | | .i_addr | | | - block address or | | | - inline data | | +----------------------+<---+ v | inline xattr | +---inline xattr range +----------------------+<---+ | .i_nid | +----------------------+ | node_footer | | (nid, ino, offset) | +----------------------+ Note that, we have to cnosider backward compatibility which reserved inline_data space, 200 bytes, all the time, reported by Sheng Yong. Previous inline data or directory always reserved 200 bytes in inode layout, even if inline_xattr is disabled. In order to keep inline_dentry's structure for backward compatibility, we get the space back only from inline_data. Signed-off-by: Chao Yu <yuchao0@huawei.com> Reported-by: Sheng Yong <shengyong1@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2017-11-05include/linux/fs.h: fix comment about struct address_spaceMike Rapoport
Before commit 9c5d760b8d22 ("mm: split gfp_mask and mapping flags into separate fields") the private_* fields of struct adrress_space were grouped together and using "ditto" in comments describing the last fields was correct. With introduction of gpf_mask between private_lock and private_list "ditto" references the wrong description. Fix it by using the elaborate description. Signed-off-by: Mike Rapoport <rppt@linux.vnet.ibm.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2017-11-05Merge branch 'perf-urgent-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull perf fixes from Ingo Molnar: "Various fixes: - synchronize kernel and tooling headers - cgroup support fix - two tooling fixes" * 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: tools/headers: Synchronize kernel ABI headers perf/cgroup: Fix perf cgroup hierarchy support perf tools: Unwind properly location after REJECT perf symbols: Fix memory corruption because of zero length symbols
2017-11-05Merge branch 'core-urgent-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull core fixes from Ingo Molnar: - workaround for gcc asm handling - futex race fixes - objtool build warning fix - two watchdog fixes: a crash fix (revert) and a bug fix for /proc/sys/kernel/watchdog_thresh handling. * 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: objtool: Prevent GCC from merging annotate_unreachable(), take 2 objtool: Resync objtool's instruction decoder source code copy with the kernel's latest version watchdog/hardlockup/perf: Use atomics to track in-use cpu counter watchdog/harclockup/perf: Revert a33d44843d45 ("watchdog/hardlockup/perf: Simplify deferred event destroy") futex: Fix more put_pi_state() vs. exit_pi_state_list() races
2017-11-05bpf, cgroup: implement eBPF-based device controller for cgroup v2Roman Gushchin
Cgroup v2 lacks the device controller, provided by cgroup v1. This patch adds a new eBPF program type, which in combination of previously added ability to attach multiple eBPF programs to a cgroup, will provide a similar functionality, but with some additional flexibility. This patch introduces a BPF_PROG_TYPE_CGROUP_DEVICE program type. A program takes major and minor device numbers, device type (block/character) and access type (mknod/read/write) as parameters and returns an integer which defines if the operation should be allowed or terminated with -EPERM. Signed-off-by: Roman Gushchin <guro@fb.com> Acked-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Tejun Heo <tj@kernel.org> Cc: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-11-05device_cgroup: prepare code for bpf-based device controllerRoman Gushchin
This is non-functional change to prepare the device cgroup code for adding eBPF-based controller for cgroups v2. The patch performs the following changes: 1) __devcgroup_inode_permission() and devcgroup_inode_mknod() are moving to the device-cgroup.h and converting into static inline. 2) __devcgroup_check_permission() is exported. 3) devcgroup_check_permission() wrapper is introduced to be used by both existing and new bpf-based implementations. Signed-off-by: Roman Gushchin <guro@fb.com> Acked-by: Tejun Heo <tj@kernel.org> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-11-05Merge tag 'mlx5-updates-2017-11-04' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux Saeed Mahameed says: ==================== mlx5-updates-2017-11-04 This series includes: From Huy: dscp to priority mapping for Ethernet packet. =================================================== First six patches enable differentiated services code point (dscp) to priority mapping for Ethernet packet. Once this feature is enabled, the packet is routed to the corresponding priority based on its dscp. User can combine this feature with priority flow control (pfc) feature to have priority flow control based on the dscp. Firmware interface: Mellanox firmware provides two control knobs for this feature: QPTS register allow changing the trust state between dscp and pcp mode. The default is pcp mode. Once in dscp mode, firmware will route the packet based on its dscp value if the dscp field exists. QPDPM register allow mapping a specific dscp (0 to 63) to a specific priority (0 to 7). By default, all the dscps are mapped to priority zero. Software interface: This feature is controlled via application priority TLV. IEEE specification P802.1Qcd/D2.1 defines priority selector id 5 for application priority TLV. This APP TLV selector defines DSCP to priority map. This APP TLV can be sent by the switch or can be set locally using software such as lldptool. In mlx5 drivers, we add the support for net dcb's getapp and setapp call back. Mlx5 driver only handles the selector id 5 application entry (dscp application priority application entry). If user sends multiple dscp to priority APP TLV entries on the same dscp, the last sent one will take effect. All the previous sent will be deleted. The firmware trust state (in QPTS register) is changed based on the number of dscp to priority application entries. When the first dscp to priority application entry is added by the user, the trust state is changed to dscp. When the last dscp to priority application entry is deleted by the user, the trust state is changed to pcp. When the port is in DSCP trust state, the transmit queue is selected based on the dscp of the skb. When the port is in DSCP trust state and vport inline mode is not NONE, firmware requires mlx5 driver to copy the IP header to the wqe ethernet segment inline header if the skb has it. This is done by changing the transmit queue sq's min inline mode to L3. Note that the min inline mode of sqs that belong to other features such as xdpsq, icosq are not modified. =================================================== Plus to the dscp series, some small misc changes are include as well: From Inbar, Ethtool msglvl support and some debug prints in DCBNL logic From Or Gerlitz, Enlarge the NIC TC offload table size From Rabie, Initialize destination_flow struct to 0 From Feras, Add inner TTC table to IPoIB flow steering From Tal, Enable CQE based moderation on TX CQ ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2017-11-05tcp: higher throughput under reordering with adaptive RACK reordering wndPriyaranjan Jha
Currently TCP RACK loss detection does not work well if packets are being reordered beyond its static reordering window (min_rtt/4).Under such reordering it may falsely trigger loss recoveries and reduce TCP throughput significantly. This patch improves that by increasing and reducing the reordering window based on DSACK, which is now supported in major TCP implementations. It makes RACK's reo_wnd adaptive based on DSACK and no. of recoveries. - If DSACK is received, increment reo_wnd by min_rtt/4 (upper bounded by srtt), since there is possibility that spurious retransmission was due to reordering delay longer than reo_wnd. - Persist the current reo_wnd value for TCP_RACK_RECOVERY_THRESH (16) no. of successful recoveries (accounts for full DSACK-based loss recovery undo). After that, reset it to default (min_rtt/4). - At max, reo_wnd is incremented only once per rtt. So that the new DSACK on which we are reacting, is due to the spurious retx (approx) after the reo_wnd has been updated last time. - reo_wnd is tracked in terms of steps (of min_rtt/4), rather than absolute value to account for change in rtt. In our internal testing, we observed significant increase in throughput, in scenarios where reordering exceeds min_rtt/4 (previous static value). Signed-off-by: Priyaranjan Jha <priyarjha@google.com> Signed-off-by: Yuchung Cheng <ycheng@google.com> Signed-off-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-11-05net: dsa: make tree index unsignedVivien Didelot
Similarly to a DSA switch and port, rename the tree index from "tree" to "index" and make it an unsigned int because it isn't supposed to be less than 0. u32 is an OF specific data used to retrieve the value and has no need to be propagated up to the tree index. Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-11-05net: dsa: make switch index unsignedVivien Didelot
Define the DSA switch index as an unsigned int, because it will never be less than 0. Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-11-05bpf: remove old offload/analyzerJakub Kicinski
Thanks to the ability to load a program for a specific device, running verifier twice is no longer needed. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-11-05xdp: allow attaching programs loaded for specific deviceJakub Kicinski
Pass the netdev pointer to bpf_prog_get_type(). This way BPF code can decide whether the device matches what the code was loaded/translated for. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Simon Horman <simon.horman@netronome.com> Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-11-05bpf: report offload info to user spaceJakub Kicinski
Extend struct bpf_prog_info to contain information about program being bound to a device. Since the netdev may get destroyed while program still exists we need a flag to indicate the program is loaded for a device, even if the device is gone. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Simon Horman <simon.horman@netronome.com> Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-11-05bpf: offload: add infrastructure for loading programs for a specific netdevJakub Kicinski
The fact that we don't know which device the program is going to be used on is quite limiting in current eBPF infrastructure. We have to reverse or limit the changes which kernel makes to the loaded bytecode if we want it to be offloaded to a networking device. We also have to invent new APIs for debugging and troubleshooting support. Make it possible to load programs for a specific netdev. This helps us to bring the debug information closer to the core eBPF infrastructure (e.g. we will be able to reuse the verifer log in device JIT). It allows device JITs to perform translation on the original bytecode. __bpf_prog_get() when called to get a reference for an attachment point will now refuse to give it if program has a device assigned. Following patches will add a version of that function which passes the expected netdev in. @type argument in __bpf_prog_get() is renamed to attach_type to make it clearer that it's only set on attachment. All calls to ndo_bpf are protected by rtnl, only verifier callbacks are not. We need a wait queue to make sure netdev doesn't get destroyed while verifier is still running and calling its driver. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Simon Horman <simon.horman@netronome.com> Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-11-05net: bpf: rename ndo_xdp to ndo_bpfJakub Kicinski
ndo_xdp is a control path callback for setting up XDP in the driver. We can reuse it for other forms of communication between the eBPF stack and the drivers. Rename the callback and associated structures and definitions. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Simon Horman <simon.horman@netronome.com> Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-11-05rtnetlink: use netnsid to query interfaceJiri Benc
Currently, when an application gets netnsid from the kernel (for example as the result of RTM_GETLINK call on one end of the veth pair), it's not much useful. There's no reliable way to get to the netns fd from the netnsid, nor does any kernel API accept netnsid. Extend the RTM_GETLINK call to also accept netnsid. It will operate on the netns with the given netnsid in such case. Of course, the calling process needs to have enough capabilities in the target name space; for now, require CAP_NET_ADMIN. This can be relaxed in the future. To signal to the calling process that the kernel understood the new IFLA_IF_NETNSID attribute in the query, it will include it in the response. This is needed to detect older kernels, as they will just ignore IFLA_IF_NETNSID and query in the current name space. This patch implemetns IFLA_IF_NETNSID only for get and dump. For set operations, this can be extended later. Signed-off-by: Jiri Benc <jbenc@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-11-05openvswitch: reliable interface indentification in port dumpsJiri Benc
This patch allows reliable identification of netdevice interfaces connected to openvswitch bridges. In particular, user space queries the netdev interfaces belonging to the ports for statistics, up/down state, etc. Datapath dump needs to provide enough information for the user space to be able to do that. Currently, only interface names are returned. This is not sufficient, as openvswitch allows its ports to be in different name spaces and the interface name is valid only in its name space. What is needed and generally used in other netlink APIs, is the pair ifindex+netnsid. The solution is addition of the ifindex+netnsid pair (or only ifindex if in the same name space) to vport get/dump operation. On request side, ideally the ifindex+netnsid pair could be used to get/set/del the corresponding vport. This is not implemented by this patch and can be added later if needed. Signed-off-by: Jiri Benc <jbenc@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-11-05i2c: nuc900: remove platform_data, tooWolfram Sang
Commit 7da62cb1853025 ("i2c: nuc900: remove driver") removed the driver, we should remove the platform_data as well. Signed-off-by: Wolfram Sang <wsa@the-dreams.de>
2017-11-04net/mlx5: QPTS and QPDPM register firmware command supportHuy Nguyen
The QPTS register allows changing the priority trust state between pcp and dscp. Add support to get/set trust state from device. When the port is in pcp/dscp trust state, packet is routed by hardware to matching priority based on its pcp/dscp value respectively. The QPDPM register allow channing the dscp to priority mapping. Add support to get/set dscp to priority mapping from device. Note that to change a dscp mapping, the "e" bit of this dscp structure must be set in the QPDPM firmware command. Signed-off-by: Huy Nguyen <huyn@mellanox.com> Reviewed-by: Parav Pandit <parav@mellanox.com> Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-11-04net/mlx5: Add MLX5_SET16 and MLX5_GET16Huy Nguyen
Add MLX5_SET16 and MLX5_GET16 for 16bit structure field in firmware command. Signed-off-by: Huy Nguyen <huyn@mellanox.com> Reviewed-by: Parav Pandit <parav@mellanox.com> Reviewed-by: Eli Cohen <eli@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-11-04net/mlx5: QCAM register firmware command supportHuy Nguyen
The QCAM register provides capability bit for all the QoS registers using ACCESS_REG command. Signed-off-by: Huy Nguyen <huyn@mellanox.com> Reviewed-by: Parav Pandit <parav@mellanox.com> Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-11-04net/dcb: Add dscp to priority selector typeHuy Nguyen
IEEE specification P802.1Qcd/D2.1 defines priority selector 5. This APP TLV selector defines DSCP to priority map. This patch defines such DSCP selector. Signed-off-by: Huy Nguyen <huyn@mellanox.com> Reviewed-by: Parav Pandit <parav@mellanox.com> Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-11-05ipv6: remove IN6_ADDR_HSIZE from addrconf.hEric Dumazet
IN6_ADDR_HSIZE is private to addrconf.c, move it here to avoid confusion. Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-11-04target: add sense code INSUFFICIENT REGISTRATION RESOURCEStangwenji
If a PERSISTENT RESERVE OUT command with a REGISTER service action or a REGISTER AND IGNORE EXISTING KEY service action or REGISTER AND MOVE service action is attempted, but there are insufficient device server resources to complete the operation, then the command shall be terminated with CHECK CONDITION status, with the sense key set to ILLEGAL REQUEST,and the additonal sense code set to INSUFFICIENT REGISTRATION RESOURCES. Signed-off-by: tangwenji <tang.wenji@zte.com.cn> Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2017-11-04blk-mq: don't handle failure in .get_budgetMing Lei
It is enough to just check if we can get the budget via .get_budget(). And we don't need to deal with device state change in .get_budget(). For SCSI, one issue to be fixed is that we have to call scsi_mq_uninit_cmd() to free allocated ressources if SCSI device fails to handle the request. And it isn't enough to simply call blk_mq_end_request() to do that if this request is marked as RQF_DONTPREP. Fixes: 0df21c86bdbf(scsi: implement .get_budget and .put_budget for blk-mq) Signed-off-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2017-11-04objtool: Prevent GCC from merging annotate_unreachable(), take 2Josh Poimboeuf
This fixes the following warning with GCC 4.6: mm/migrate.o: warning: objtool: migrate_misplaced_transhuge_page()+0x71: unreachable instruction The problem is that the compiler merged identical annotate_unreachable() inline asm blocks, resulting in a missing 'unreachable' annotation. This problem happened before, and was partially fixed with: 3d1e236022cc ("objtool: Prevent GCC from merging annotate_unreachable()") That commit tried to ensure that each instance of the annotate_unreachable() inline asm statement has a unique label. It used the __LINE__ macro to generate the label number. However, even the line number isn't necessarily unique when used in an inline function with multiple callers (in this case, __alloc_pages_node()'s use of VM_BUG_ON). Reported-by: kbuild test robot <fengguang.wu@intel.com> Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@suse.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: kbuild-all@01.org Cc: tipbuild@zytor.com Fixes: 3d1e236022cc ("objtool: Prevent GCC from merging annotate_unreachable()") Link: http://lkml.kernel.org/r/20171103221941.cajpwszir7ujxyc4@treble Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-11-04netfilter/ipvs: clear ipvs_property flag when SKB net namespace changedYe Yin
When run ipvs in two different network namespace at the same host, and one ipvs transport network traffic to the other network namespace ipvs. 'ipvs_property' flag will make the second ipvs take no effect. So we should clear 'ipvs_property' when SKB network namespace changed. Fixes: 621e84d6f373 ("dev: introduce skb_scrub_packet()") Signed-off-by: Ye Yin <hustcat@gmail.com> Signed-off-by: Wei Zhou <chouryzhou@gmail.com> Signed-off-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Simon Horman <horms@verge.net.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-11-04tty: cyclades: Convert timers to use timer_setup()Kees Cook
In preparation for unconditionally passing the struct timer_list pointer to all timer callbacks, switch to using the new timer_setup() and from_timer() to pass the timer pointer explicitly. Moves timer structures from global to attached to struct cyclades_port. Cc: Jiri Slaby <jslaby@suse.com> Signed-off-by: Kees Cook <keescook@chromium.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-11-04USB: add SPDX identifiers to all remaining files in drivers/usb/Greg Kroah-Hartman
It's good to have SPDX identifiers in all files to make it easier to audit the kernel tree for correct licenses. Update the drivers/usb/ and include/linux/usb* files with the correct SPDX license identifier based on the license text in the file itself. The SPDX identifier is a legally binding shorthand, which can be used instead of the full boiler plate text. This work is based on a script and data from Thomas Gleixner, Philippe Ombredanne, and Kate Stewart. Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Kate Stewart <kstewart@linuxfoundation.org> Cc: Philippe Ombredanne <pombredanne@nexb.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Acked-by: Felipe Balbi <felipe.balbi@linux.intel.com> Acked-by: Johan Hovold <johan@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-11-04tools/headers: Synchronize kernel ABI headersIngo Molnar
After the SPDX license tags were added a number of tooling headers got out of sync with their kernel variants, generating lots of build warnings. Sync them: - tools/arch/x86/include/asm/disabled-features.h, tools/arch/x86/include/asm/required-features.h, tools/include/linux/hash.h: Remove the SPDX tag where the kernel version does not have it. - tools/include/asm-generic/bitops/__fls.h, tools/include/asm-generic/bitops/arch_hweight.h, tools/include/asm-generic/bitops/const_hweight.h, tools/include/asm-generic/bitops/fls.h, tools/include/asm-generic/bitops/fls64.h, tools/include/uapi/asm-generic/ioctls.h, tools/include/uapi/asm-generic/mman-common.h, tools/include/uapi/sound/asound.h, tools/include/uapi/linux/kvm.h, tools/include/uapi/linux/perf_event.h, tools/include/uapi/linux/sched.h, tools/include/uapi/linux/vhost.h, tools/include/uapi/sound/asound.h: Add the SPDX tag of the respective kernel header. - tools/include/uapi/linux/bpf_common.h, tools/include/uapi/linux/fcntl.h, tools/include/uapi/linux/hw_breakpoint.h, tools/include/uapi/linux/mman.h, tools/include/uapi/linux/stat.h, Change the tag to the kernel header version: -/* SPDX-License-Identifier: GPL-2.0 */ +/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */ Also sync other header details: - include/uapi/sound/asound.h: Fix pointless end of line whitespace noise the header grew in this cycle. - tools/arch/x86/lib/memcpy_64.S: Sync the code and add tools/include/asm/export.h with dummy wrappers to support building the kernel side code in a tooling header environment. - tools/include/uapi/asm-generic/mman.h, tools/include/uapi/linux/bpf.h: Sync other details that don't impact tooling's use of the ABIs. Acked-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: linux-kernel@vger.kernel.org Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-11-04Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller
Files removed in 'net-next' had their license header updated in 'net'. We take the remove from 'net-next'. Signed-off-by: David S. Miller <davem@davemloft.net>
2017-11-03platform/x86: dell-smbios-wmi: introduce userspace interfaceMario Limonciello
It's important for the driver to provide a R/W ioctl to ensure that two competing userspace processes don't race to provide or read each others data. This userspace character device will be used to perform SMBIOS calls from any applications. It provides an ioctl that will allow passing the WMI calling interface buffer between userspace and kernel space. This character device is intended to deprecate the dcdbas kernel module and the interface that it provides to userspace. To perform an SMBIOS IOCTL call using the character device userspace will perform a read() on the the character device. The WMI bus will provide a u64 variable containing the necessary size of the IOCTL buffer. The API for interacting with this interface is defined in documentation as well as the WMI uapi header provides the format of the structures. Not all userspace requests will be accepted. The dell-smbios filtering functionality will be used to prevent access to certain tokens and calls. All whitelisted commands and tokens are now shared out to userspace so applications don't need to define them in their own headers. Signed-off-by: Mario Limonciello <mario.limonciello@dell.com> Reviewed-by: Edward O'Callaghan <quasisec@google.com> Signed-off-by: Darren Hart (VMware) <dvhart@infradead.org>