summaryrefslogtreecommitdiff
path: root/include
AgeCommit message (Collapse)Author
2017-01-23rcu: Check cond_resched_rcu_qs() state less often to reduce GP overheadPaul E. McKenney
Commit 4a81e8328d37 ("rcu: Reduce overhead of cond_resched() checks for RCU") moved quiescent-state generation out of cond_resched() and commit bde6c3aa9930 ("rcu: Provide cond_resched_rcu_qs() to force quiescent states in long loops") introduced cond_resched_rcu_qs(), and commit 5cd37193ce85 ("rcu: Make cond_resched_rcu_qs() apply to normal RCU flavors") introduced the per-CPU rcu_qs_ctr variable, which is frequently polled by the RCU core state machine. This frequent polling can increase grace-period rate, which in turn increases grace-period overhead, which is visible in some benchmarks (for example, the "open1" benchmark in Anton Blanchard's "will it scale" suite). This commit therefore reduces the rate at which rcu_qs_ctr is polled by moving that polling into the force-quiescent-state (FQS) machinery, and by further polling it only after the grace period has been in effect for at least jiffies_till_sched_qs jiffies. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Josh Triplett <josh@joshtriplett.org>
2017-01-23rcu: Abstract extended quiescent state determinationPaul E. McKenney
This commit is the fourth step towards full abstraction of all accesses to the ->dynticks counter, implementing previously open-coded checks and comparisons in new rcu_dynticks_in_eqs() and rcu_dynticks_in_eqs_since() functions. This abstraction will ease changes to the ->dynticks counter operation. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Josh Triplett <josh@joshtriplett.org>
2017-01-23llist: Clarify comments about when locking is neededJoel Fernandes
llist.h comments are confusing about when locking is needed versus when it isn't. Clarify these comments by being more descriptive about why locking is needed for llist_del_first. Cc: Ingo Molnar <mingo@kernel.org> Cc: Will Deacon <will.deacon@arm.com> Cc: Paul McKenney <paulmck@linux.vnet.ibm.com> Acked-by: Huang Ying <ying.huang@intel.com> Acked-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Signed-off-by: Joel Fernandes <joelaf@google.com> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Josh Triplett <josh@joshtriplett.org>
2017-01-23irqdomain: irq_domain_check_msi_remapEric Auger
This new function checks whether all MSI irq domains implement IRQ remapping. This is useful to understand whether VFIO passthrough is safe with respect to interrupts. On ARM typically an MSI controller can sit downstream to the IOMMU without preventing VFIO passthrough. As such any assigned device can write into the MSI doorbell. In case the MSI controller implements IRQ remapping, assigned devices will not be able to trigger interrupts towards the host. On the contrary, the assignment must be emphasized as unsafe with respect to interrupts. Signed-off-by: Eric Auger <eric.auger@redhat.com> Reviewed-by: Marc Zyngier <marc.zyngier@arm.com> Reviewed-by: Tomasz Nowicki <tomasz.nowicki@caviumnetworks.com> Tested-by: Tomasz Nowicki <tomasz.nowicki@caviumnetworks.com> Tested-by: Bharat Bhushan <bharat.bhushan@nxp.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
2017-01-23irqdomain: Add irq domain MSI and MSI_REMAP flagsEric Auger
We introduce two new enum values for the irq domain flag: - IRQ_DOMAIN_FLAG_MSI indicates the irq domain corresponds to an MSI domain - IRQ_DOMAIN_FLAG_MSI_REMAP indicates the irq domain has MSI remapping capabilities. Those values will be useful to check all MSI irq domains have MSI remapping support when assessing the safety of IRQ assignment to a guest. irq_domain_hierarchical_is_msi_remap() allows to check if an irq domain or any parent implements MSI remapping. Signed-off-by: Eric Auger <eric.auger@redhat.com> Reviewed-by: Marc Zyngier <marc.zyngier@arm.com> Reviewed-by: Tomasz Nowicki <tomasz.nowicki@caviumnetworks.com> Tested-by: Tomasz Nowicki <tomasz.nowicki@caviumnetworks.com> Tested-by: Bharat Bhushan <bharat.bhushan@nxp.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
2017-01-23iommu: iommu_get_group_resv_regionsEric Auger
Introduce iommu_get_group_resv_regions whose role consists in enumerating all devices from the group and collecting their reserved regions. The list is sorted and overlaps between regions of the same type are handled by merging the regions. Signed-off-by: Eric Auger <eric.auger@redhat.com> Reviewed-by: Tomasz Nowicki <tomasz.nowicki@caviumnetworks.com> Tested-by: Tomasz Nowicki <tomasz.nowicki@caviumnetworks.com> Tested-by: Bharat Bhushan <bharat.bhushan@nxp.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
2017-01-23iommu: iommu_alloc_resv_regionEric Auger
Introduce a new helper serving the purpose to allocate a reserved region. This will be used in iommu driver implementing reserved region callbacks. Signed-off-by: Eric Auger <eric.auger@redhat.com> Reviewed-by: Tomasz Nowicki <tomasz.nowicki@caviumnetworks.com> Tested-by: Tomasz Nowicki <tomasz.nowicki@caviumnetworks.com> Tested-by: Bharat Bhushan <bharat.bhushan@nxp.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
2017-01-23iommu: Add a new type field in iommu_resv_regionEric Auger
We introduce a new field to differentiate the reserved region types and specialize the apply_resv_region implementation. Legacy direct mapped regions have IOMMU_RESV_DIRECT type. We introduce 2 new reserved memory types: - IOMMU_RESV_MSI will characterize MSI regions that are mapped - IOMMU_RESV_RESERVED characterize regions that cannot by mapped. Signed-off-by: Eric Auger <eric.auger@redhat.com> Tested-by: Tomasz Nowicki <tomasz.nowicki@caviumnetworks.com> Tested-by: Bharat Bhushan <bharat.bhushan@nxp.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
2017-01-23iommu: Rename iommu_dm_regions into iommu_resv_regionsEric Auger
We want to extend the callbacks used for dm regions and use them for reserved regions. Reserved regions can be - directly mapped regions - regions that cannot be iommu mapped (PCI host bridge windows, ...) - MSI regions (because they belong to another address space or because they are not translated by the IOMMU and need special handling) So let's rename the struct and also the callbacks. Signed-off-by: Eric Auger <eric.auger@redhat.com> Acked-by: Robin Murphy <robin.murphy@arm.com> Reviewed-by: Tomasz Nowicki <tomasz.nowicki@caviumnetworks.com> Tested-by: Tomasz Nowicki <tomasz.nowicki@caviumnetworks.com> Tested-by: Bharat Bhushan <bharat.bhushan@nxp.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
2017-01-23iommu/dma: Allow MSI-only cookiesRobin Murphy
IOMMU domain users such as VFIO face a similar problem to DMA API ops with regard to mapping MSI messages in systems where the MSI write is subject to IOMMU translation. With the relevant infrastructure now in place for managed DMA domains, it's actually really simple for other users to piggyback off that and reap the benefits without giving up their own IOVA management, and without having to reinvent their own wheel in the MSI layer. Allow such users to opt into automatic MSI remapping by dedicating a region of their IOVA space to a managed cookie, and extend the mapping routine to implement a trivial linear allocator in such cases, to avoid the needless overhead of a full-blown IOVA domain. Signed-off-by: Robin Murphy <robin.murphy@arm.com> Reviewed-by: Tomasz Nowicki <tomasz.nowicki@caviumnetworks.com> Reviewed-by: Eric Auger <eric.auger@redhat.com> Tested-by: Eric Auger <eric.auger@redhat.com> Tested-by: Tomasz Nowicki <tomasz.nowicki@caviumnetworks.com> Tested-by: Bharat Bhushan <bharat.bhushan@nxp.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
2017-01-22net: phy: bcm7xxx: Add entry for BCM7278Florian Fainelli
Add support for the BCM7278 28nm process Gigabit Ethernet PHY. Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-22locking/rwsem: Reinit wake_q after useWaiman Long
In __rwsem_down_write_failed_common(), the same wake_q variable name is defined twice, with the inner wake_q hiding the one in outer scope. We can either use different names for the two wake_q's. Even better, we can use the same wake_q twice, if necessary. To enable the latter change, we need to define a new helper function wake_q_init() to enable reinitalization of wake_q after use. Signed-off-by: Waiman Long <longman@redhat.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Davidlohr Bueso <dave@stgolabs.net> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1485052415-9611-1-git-send-email-longman@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-01-20delay: Add explanation of udelay() inaccuracyRussell King
There seems to be some misunderstanding that udelay() and friends will always guarantee the specified delay. This is a false understanding. When udelay() is based on CPU cycles, it can return early for many reasons which are detailed by Linus' reply to me in a thread in 2011: http://lists.openwall.net/linux-kernel/2011/01/12/372 However, a udelay test module was created in 2014 which allows udelay() to only be 0.5% fast, which is outside of the CPU-cycles udelay() results I measured back in 2011, which were deemed to be in the "we don't care" region. test_udelay() should be fixed to reflect the real allowable tolerance on udelay(), rather than 0.5%. Cc: David Riley <davidriley@chromium.org> Cc: John Stultz <john.stultz@linaro.org> Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk> Signed-off-by: John Stultz <john.stultz@linaro.org>
2017-01-20Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds
Pull KVM fixes from Radim Krčmář: "ARM: - Fix for timer setup on VHE machines - Drop spurious warning when the timer races against the vcpu running again - Prevent a vgic deadlock when the initialization fails (for stable) s390: - Fix a kernel memory exposure (for stable) x86: - Fix exception injection when hypercall instruction cannot be patched" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: KVM: s390: do not expose random data via facility bitmap KVM: x86: fix fixing of hypercalls KVM: arm/arm64: vgic: Fix deadlock on error handling KVM: arm64: Access CNTHCTL_EL2 bit fields correctly on VHE systems KVM: arm/arm64: Fix occasional warning from the timer work function
2017-01-20Merge tag 'scsi-fixes' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi Pull SCSI fixes from James Bottomley: "This is a set of 12 fixes including the mpt3sas one that was causing hangs on ATA passthrough. The others are a couple of zoned block device fixes, a SAS device detection bug which lead to SATA drives not being matched to bays, two qla2xxx MSI fixes, a qla2xxx req for rsp confusion caused by cut and paste, and a few other minor fixes" * tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: scsi: mpt3sas: fix hang on ata passthrough commands scsi: lpfc: Set elsiocb contexts to NULL after freeing it scsi: sd: Ignore zoned field for host-managed devices scsi: sd: Fix wrong DPOFUA disable in sd_read_cache_type scsi: bfa: fix wrongly initialized variable in bfad_im_bsg_els_ct_request() scsi: ses: Fix SAS device detection in enclosure scsi: libfc: Fix variable name in fc_set_wwpn scsi: lpfc: avoid double free of resource identifiers scsi: qla2xxx: remove irq_affinity_notifier scsi: qla2xxx: fix MSI-X vector affinity scsi: qla2xxx: Fix apparent cut-n-paste error. scsi: qla2xxx: Get mutex lock before checking optrom_state
2017-01-20net: dsa: Remove hwmon supportAndrew Lunn
Only the Marvell mv88e6xxx DSA driver made use of the HWMON support in DSA. The temperature sensor registers are actually in the embedded PHYs, and the PHY driver now supports it. So remove all HWMON support from DSA and drivers. Signed-off-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-20Merge tag 'mlx5-updates-2017-01-19' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux Saeed Mahameed says: ==================== mlx5 and mlx5e updates 2017-01-19 This series includes some updates for mlx5 core and mlx5e netdevice driver. From Leon, a small fix that remove an unnecessary print. From Eli Cohen, a fix to the FW version printout in case of internal error. From Eugenia Emantayev, two patches, the 1st adds mlx5 1pps (pulse per second) mlx5 infrastructure support and the 2nd adds the necessary bits for mlx5e ptp logic and structures. From Mohamad, add support for s-tagged packet receive when in promiscuous mode. Form Gal Pressman, MCAM (Management capabilities mask register) and PCAM (Ports capabilities mask register) registers infrastructure, those registers are needed in order to query the different statistics registers support in FW, in order for the driver to enable/disable query and reporting them back to user. On top of this infrastructure we've exposed new set of statistics groups: - MPCNT: Physical layer statistical counters (For symbol errors) - PPCNT: PCIe performance counters In addition to the statistics capabilities series we've moved the mlx5 HCA capabilities fields to a dedicated struct under the driver private data. At the end a small patch to update & query statistics in the most desired order. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-20csum: eliminate sparse warning in remcsum_unadjust()Lance Richardson
Cast second parameter of csum_sub() from __sum16 to __wsum. Signed-off-by: Lance Richardson <lrichard@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-20tipc: make replicast a user selectable optionJon Paul Maloy
If the bearer carrying multicast messages supports broadcast, those messages will be sent to all cluster nodes, irrespective of whether these nodes host any actual destinations socket or not. This is clearly wasteful if the cluster is large and there are only a few real destinations for the message being sent. In this commit we extend the eligibility of the newly introduced "replicast" transmit option. We now make it possible for a user to select which method he wants to be used, either as a mandatory setting via setsockopt(), or as a relative setting where we let the broadcast layer decide which method to use based on the ratio between cluster size and the message's actual number of destination nodes. In the latter case, a sending socket must stick to a previously selected method until it enters an idle period of at least 5 seconds. This eliminates the risk of message reordering caused by method change, i.e., when changes to cluster size or number of destinations would otherwise mandate a new method to be used. Reviewed-by: Parthasarathy Bhuvaragan <parthasarathy.bhuvaragan@ericsson.com> Acked-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-20bpf: add bpf_probe_read_str helperGianluca Borello
Provide a simple helper with the same semantics of strncpy_from_unsafe(): int bpf_probe_read_str(void *dst, int size, const void *unsafe_addr) This gives more flexibility to a bpf program. A typical use case is intercepting a file name during sys_open(). The current approach is: SEC("kprobe/sys_open") void bpf_sys_open(struct pt_regs *ctx) { char buf[PATHLEN]; // PATHLEN is defined to 256 bpf_probe_read(buf, sizeof(buf), ctx->di); /* consume buf */ } This is suboptimal because the size of the string needs to be estimated at compile time, causing more memory to be copied than often necessary, and can become more problematic if further processing on buf is done, for example by pushing it to userspace via bpf_perf_event_output(), since the real length of the string is unknown and the entire buffer must be copied (and defining an unrolled strnlen() inside the bpf program is a very inefficient and unfeasible approach). With the new helper, the code can easily operate on the actual string length rather than the buffer size: SEC("kprobe/sys_open") void bpf_sys_open(struct pt_regs *ctx) { char buf[PATHLEN]; // PATHLEN is defined to 256 int res = bpf_probe_read_str(buf, sizeof(buf), ctx->di); /* consume buf, for example push it to userspace via * bpf_perf_event_output(), but this time we can use * res (the string length) as event size, after checking * its boundaries. */ } Another useful use case is when parsing individual process arguments or individual environment variables navigating current->mm->arg_start and current->mm->env_start: using this helper and the return value, one can quickly iterate at the right offset of the memory area. The code changes simply leverage the already existent strncpy_from_unsafe() kernel function, which is safe to be called from a bpf program as it is used in bpf_trace_printk(). Signed-off-by: Gianluca Borello <g.borello@gmail.com> Acked-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-20device: Implement a bus agnostic dev_num_vf routinePhil Sutter
Now that pci_bus_type has num_vf callback set, dev_num_vf can be implemented in a bus type independent way and the check for whether a PCI device is being handled in rtnetlink can be dropped. Signed-off-by: Phil Sutter <phil@nwl.cc> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-20device: bus_type: Introduce num_vf callbackPhil Sutter
This allows for bus types to implement their own method of retrieving the number of virtual functions a NIC on that type of bus supports. Signed-off-by: Phil Sutter <phil@nwl.cc> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-20sock: use hlist_entry_safeGeliang Tang
Use hlist_entry_safe() instead of open-coding it. Signed-off-by: Geliang Tang <geliangtang@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-20soc: samsung: pmu: Provide global function to get PMU regmapMarek Szyprowski
PMU is something like a SoC wide service, so add a helper function to get PMU regmap. This will be used by other Exynos device drivers. This way it can be avoided to model this dependency in device tree (as phandles to PMU node) for almost every device in the SoC. Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com> Reviewed-by: Tomasz Figa <tomasz.figa@gmail.com> Signed-off-by: Krzysztof Kozlowski <krzk@kernel.org>
2017-01-20net: remove bh disabling around percpu_counter accessesEric Dumazet
Shaohua Li made percpu_counter irq safe in commit 098faf5805c8 ("percpu_counter: make APIs irq safe") We can safely remove BH disable/enable sections around various percpu_counter manipulations. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-20virtio-net: restore VIRTIO_HDR_F_DATA_VALID on receivingJason Wang
Commit 501db511397f ("virtio: don't set VIRTIO_NET_HDR_F_DATA_VALID on xmit") in fact disables VIRTIO_HDR_F_DATA_VALID on receiving path too, fixing this by adding a hint (has_data_valid) and set it only on the receiving path. Cc: Rolf Neugebauer <rolf.neugebauer@docker.com> Signed-off-by: Jason Wang <jasowang@redhat.com> Acked-by: Rolf Neugebauer <rolf.neugebauer@docker.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-20brcmfmac: add support for BCM43455 with modalias sdio:c00v02D0dA9BFMartin Blumenstingl
BCM43455 is a more recent revision of the BCM4345. Some of the BCM43455 got a dedicated SDIO device ID which is currently not supported by brcmfmac. Adding the new sdio_device_id to brcmfmac is enough to get the BCM43455 supported because the chip itself is already supported (due to BCM4345 support in the driver). Signed-off-by: Martin Blumenstingl <martin.blumenstingl@googlemail.com> Acked-by: Arend van Spriel <arend.vanspriel@broadcom.com> Reviewed-by: Andreas Färber <afaerber@suse.de> Tested-by: Andreas Färber <afaerber@suse.de> Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
2017-01-20locking/rwsem: Remove unnecessary atomic_long_t castsWaiman Long
Since sem->count had been changed to a atomic_long_t type, it is no longer necessary to use the atomic_long_t cast anymore. So remove them. Signed-off-by: Waiman Long <longman@redhat.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-arch@vger.kernel.org Link: http://lkml.kernel.org/r/1484836312-6656-1-git-send-email-longman@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-01-20Revert "PM / sleep / ACPI: Use the ACPI_FADT_LOW_POWER_S0 flag"Rafael J. Wysocki
Revert commit 08b98d329165 (PM / sleep / ACPI: Use the ACPI_FADT_LOW_POWER_S0 flag) as it caused system suspend (in the default configuration) to fail on Dell XPS13 (9360) with the Kaby Lake processor. Fixes: 08b98d329165 (PM / sleep / ACPI: Use the ACPI_FADT_LOW_POWER_S0 flag) Reported-by: Paul Menzel <pmenzel@molgen.mpg.de> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2017-01-20sched/clock: Fix hotplug crashPeter Zijlstra
Mike reported that he could trigger the WARN_ON_ONCE() in set_sched_clock_stable() using hotplug. This exposed a fundamental problem with the interface, we should never mark the TSC stable if we ever find it to be unstable. Therefore set_sched_clock_stable() is a broken interface. The reason it existed is that not having it is a pain, it means all relevant architecture code needs to call clear_sched_clock_stable() where appropriate. Of the three architectures that select HAVE_UNSTABLE_SCHED_CLOCK ia64 and parisc are trivial in that they never called set_sched_clock_stable(), so add an unconditional call to clear_sched_clock_stable() to them. For x86 the story is a lot more involved, and what this patch tries to do is ensure we preserve the status quo. So even is Cyrix or Transmeta have usable TSC they never called set_sched_clock_stable() so they now get an explicit mark unstable. Reported-by: Mike Galbraith <efault@gmx.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Fixes: 9881b024b7d7 ("sched/clock: Delay switching sched_clock to stable") Link: http://lkml.kernel.org/r/20170119133633.GB6536@twins.programming.kicks-ass.net Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-01-19net/mlx5: Move cached hca caps to designated caps structGal Pressman
The caps structure consists of hca caps and port/management caps, all under one roof. Signed-off-by: Gal Pressman <galp@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-01-19net/mlx5: Add MPCNT register infrastructureGal Pressman
Add the needed infrastructure for future use of MPCNT register. Signed-off-by: Gal Pressman <galp@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-01-19net/mlx5: Add PPCNT physical layer statistical group infrastructureGal Pressman
Add the needed infrastructure for future use of PPCNT physical layer statistical group. Signed-off-by: Gal Pressman <galp@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org>
2017-01-19net/mlx5: Query and cache PCAM, MCAM registers on initializationGal Pressman
On load_one, we now cache our capabilities registers internally, similar to QUERY_HCA_CAP. Capabilities can later be queried using macros introduced in this patch. Signed-off-by: Gal Pressman <galp@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-01-19net/mlx5: Expose PCAM, MCAM registers infrastructureGal Pressman
PCAM: Ports capabilities mask register. MCAM: Management capabilities mask register. PCAM and MCAM registers will provide information regarding firmware support for different features, in order to avoid cases where new driver combined with old firmware results in syndromes (for ex. PCIe counters before this patchset). Signed-off-by: Gal Pressman <galp@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-01-19net/mlx5: Add support to s-tag in mlx5 firmware interfaceMohamad Haj Yahia
Add svlan_tag and rename vlan_tag to cvlan_tag in flow table entry match param. Signed-off-by: Mohamad Haj Yahia <mohamad@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
2017-01-19net/mlx5: Add MTPPS and MTPPSE registers infrastructureEugenia Emantayev
Implement query and set functionality for MTPPS and MTPPSE registers. MTPPS (Management Pulse Per Second) provides the device PPS capabilities, configures the PPS in and out modules and holds the PPS in time stamp. Query MTPPS is supported only when HCA_CAP.pps is set and modify is supported when HCA_CAP.pps_modify is set. MTPPSE (Management Pulse Per Second Event) configures the different event generation modes for PPS. Supported when HCA_CAP.pps is set. Signed-off-by: Eugenia Emantayev <eugenia@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-01-19phy: increase size of MII_BUS_ID_SIZE and bus_idVolodymyr Bendiuga
Some bus names are pretty long and do not fit into 17 chars. Increase therefore MII_BUS_ID_SIZE and phy_fixup.bus_id to larger number. Now mii_bus.id can host larger name. Signed-off-by: Volodymyr Bendiuga <volodymyr.bendiuga@gmail.com> Signed-off-by: Magnus Öberg <magnus.oberg@westermo.se> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-19arm64/dma-mapping: Implement DMA_ATTR_PRIVILEGEDMitchel Humpherys
The newly added DMA_ATTR_PRIVILEGED is useful for creating mappings that are only accessible to privileged DMA engines. Implement it in dma-iommu.c so that the ARM64 DMA IOMMU mapper can make use of it. Reviewed-by: Robin Murphy <robin.murphy@arm.com> Tested-by: Robin Murphy <robin.murphy@arm.com> Acked-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Mitchel Humpherys <mitchelh@codeaurora.org> Signed-off-by: Will Deacon <will.deacon@arm.com>
2017-01-19common: DMA-mapping: add DMA_ATTR_PRIVILEGED attributeMitchel Humpherys
This patch adds the DMA_ATTR_PRIVILEGED attribute to the DMA-mapping subsystem. Some advanced peripherals such as remote processors and GPUs perform accesses to DMA buffers in both privileged "supervisor" and unprivileged "user" modes. This attribute is used to indicate to the DMA-mapping subsystem that the buffer is fully accessible at the elevated privilege level (and ideally inaccessible or at least read-only at the lesser-privileged levels). Cc: linux-doc@vger.kernel.org Reviewed-by: Robin Murphy <robin.murphy@arm.com> Tested-by: Robin Murphy <robin.murphy@arm.com> Acked-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Mitchel Humpherys <mitchelh@codeaurora.org> Signed-off-by: Will Deacon <will.deacon@arm.com>
2017-01-19iommu: add IOMMU_PRIV attributeMitchel Humpherys
Add the IOMMU_PRIV attribute, which is used to indicate privileged mappings. Reviewed-by: Robin Murphy <robin.murphy@arm.com> Tested-by: Robin Murphy <robin.murphy@arm.com> Signed-off-by: Mitchel Humpherys <mitchelh@codeaurora.org> Acked-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
2017-01-19gpio: provide lockdep keys for nested/unnested irqchipsLinus Walleij
The helper function for adding a GPIO chip compiles in a lockdep key for debugging, the same key is needed for nested chips as well. The macro construction is unreadable, replace this with two static inlines instead. The _gpiochip_irqchip_add prefixed function is not helpful, rename it with gpiochip_irqchip_add_key() that tell us what the function is actually doing. Fixes: d245b3f9bd36 ("gpio: simplify adding threaded interrupts") Cc: Roger Quadros <rogerq@ti.com> Reported-by: Clemens Gruber <clemens.gruber@pqgruber.com> Reported-by: Roger Quadros <rogerq@ti.com> Reported-by: Grygorii Strashko <grygorii.strashko@ti.com> Tested-by: Clemens Gruber <clemens.gruber@pqgruber.com> Tested-by: Grygorii Strashko <grygorii.strashko@ti.com> Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
2017-01-19jump_labels: Move header guard #endif down where it belongsLuis R. Rodriguez
The ending header guard is misplaced. This has no functional change, this is just an eye-sore. Signed-off-by: Luis R. Rodriguez <mcgrof@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: bp@alien8.de Cc: bp@suse.de Cc: catalin.marinas@arm.com Cc: jbaron@akamai.com Cc: pbonzini@redhat.com Cc: tony.luck@intel.com Link: http://lkml.kernel.org/r/20170118173804.16281-1-mcgrof@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-01-19LSM: Add /sys/kernel/security/lsmCasey Schaufler
I am still tired of having to find indirect ways to determine what security modules are active on a system. I have added /sys/kernel/security/lsm, which contains a comma separated list of the active security modules. No more groping around in /proc/filesystems or other clever hacks. Unchanged from previous versions except for being updated to the latest security next branch. Signed-off-by: Casey Schaufler <casey@schaufler-ca.com> Acked-by: John Johansen <john.johansen@canonical.com> Acked-by: Paul Moore <paul@paul-moore.com> Acked-by: Kees Cook <keescook@chromium.org> Signed-off-by: James Morris <james.l.morris@oracle.com>
2017-01-18net: Remove usage of net_device last_rx memberTobias Klauser
The network stack no longer uses the last_rx member of struct net_device since the bonding driver switched to use its own private last_rx in commit 9f242738376d ("bonding: use last_arp_rx in slave_last_rx()"). However, some drivers still (ab)use the field for their own purposes and some driver just update it without actually using it. Previously, there was an accompanying comment for the last_rx member added in commit 4dc89133f49b ("net: add a comment on netdev->last_rx") which asked drivers not to update is, unless really needed. However, this commend was removed in commit f8ff080dacec ("bonding: remove useless updating of slave->dev->last_rx"), so some drivers added later on still did update last_rx. Remove all usage of last_rx and switch three drivers (sky2, atp and smc91c92_cs) which actually read and write it to use their own private copy in netdev_priv. Compile-tested with allyesconfig and allmodconfig on x86 and arm. Cc: Eric Dumazet <eric.dumazet@gmail.com> Cc: Jay Vosburgh <j.vosburgh@gmail.com> Cc: Veaceslav Falico <vfalico@gmail.com> Cc: Andy Gospodarek <andy@greyhouse.net> Cc: Mirko Lindner <mlindner@marvell.com> Cc: Stephen Hemminger <stephen@networkplumber.org> Signed-off-by: Tobias Klauser <tklauser@distanz.ch> Acked-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Jay Vosburgh <jay.vosburgh@canonical.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-18bpf: don't trigger OOM killer under pressure with map allocDaniel Borkmann
This patch adds two helpers, bpf_map_area_alloc() and bpf_map_area_free(), that are to be used for map allocations. Using kmalloc() for very large allocations can cause excessive work within the page allocator, so i) fall back earlier to vmalloc() when the attempt is considered costly anyway, and even more importantly ii) don't trigger OOM killer with any of the allocators. Since this is based on a user space request, for example, when creating maps with element pre-allocation, we really want such requests to fail instead of killing other user space processes. Also, don't spam the kernel log with warnings should any of the allocations fail under pressure. Given that, we can make backend selection in bpf_map_area_alloc() generic, and convert all maps over to use this API for spots with potentially large allocation requests. Note, replacing the one kmalloc_array() is fine as overflow checks happen earlier in htab_map_alloc(), since it must also protect the multiplication for vmalloc() should kmalloc_array() fail. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-18lwtunnel: fix autoload of lwt modulesDavid Ahern
Trying to add an mpls encap route when the MPLS modules are not loaded hangs. For example: CONFIG_MPLS=y CONFIG_NET_MPLS_GSO=m CONFIG_MPLS_ROUTING=m CONFIG_MPLS_IPTUNNEL=m $ ip route add 10.10.10.10/32 encap mpls 100 via inet 10.100.1.2 The ip command hangs: root 880 826 0 21:25 pts/0 00:00:00 ip route add 10.10.10.10/32 encap mpls 100 via inet 10.100.1.2 $ cat /proc/880/stack [<ffffffff81065a9b>] call_usermodehelper_exec+0xd6/0x134 [<ffffffff81065efc>] __request_module+0x27b/0x30a [<ffffffff814542f6>] lwtunnel_build_state+0xe4/0x178 [<ffffffff814aa1e4>] fib_create_info+0x47f/0xdd4 [<ffffffff814ae451>] fib_table_insert+0x90/0x41f [<ffffffff814a8010>] inet_rtm_newroute+0x4b/0x52 ... modprobe is trying to load rtnl-lwt-MPLS: root 881 5 0 21:25 ? 00:00:00 /sbin/modprobe -q -- rtnl-lwt-MPLS and it hangs after loading mpls_router: $ cat /proc/881/stack [<ffffffff81441537>] rtnl_lock+0x12/0x14 [<ffffffff8142ca2a>] register_netdevice_notifier+0x16/0x179 [<ffffffffa0033025>] mpls_init+0x25/0x1000 [mpls_router] [<ffffffff81000471>] do_one_initcall+0x8e/0x13f [<ffffffff81119961>] do_init_module+0x5a/0x1e5 [<ffffffff810bd070>] load_module+0x13bd/0x17d6 ... The problem is that lwtunnel_build_state is called with rtnl lock held preventing mpls_init from registering. Given the potential references held by the time lwtunnel_build_state it can not drop the rtnl lock to the load module. So, extract the module loading code from lwtunnel_build_state into a new function to validate the encap type. The new function is called while converting the user request into a fib_config which is well before any table, device or fib entries are examined. Fixes: 745041e2aaf1 ("lwtunnel: autoload of lwt modules") Signed-off-by: David Ahern <dsa@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-18net: dsa: store CPU switch structure in the treeVivien Didelot
Store a dsa_switch pointer to the CPU switch in the tree instead of only its index. This avoids the need to initialize it to -1. Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-18net: ipv6: remove nowait arg to rt6_fill_nodeDavid Ahern
All callers of rt6_fill_node pass 0 for nowait arg. Remove the arg and simplify rt6_fill_node accordingly. rt6_fill_node passes the nowait of 0 to ip6mr_get_route. Remove the nowait arg from it as well. Signed-off-by: David Ahern <dsa@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-18audit: log 32-bit socketcallsRichard Guy Briggs
32-bit socketcalls were not being logged by audit on x86_64 systems. Log them. This is basically a duplicate of the call from net/socket.c:sys_socketcall(), but it addresses the impedance mismatch between 32-bit userspace process and 64-bit kernel audit. See: https://github.com/linux-audit/audit-kernel/issues/14 Signed-off-by: Richard Guy Briggs <rgb@redhat.com> Acked-by: David S. Miller <davem@davemloft.net> Signed-off-by: Paul Moore <paul@paul-moore.com>