summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2017-06-08net: qrtr: Refactor packet allocationBjorn Andersson
Extract the allocation and filling in the control message header fields to a separate function in order to reuse this in subsequent patches. Cc: Courtney Cavin <ccavin@gmail.com> Signed-off-by: Bjorn Andersson <bjorn.andersson@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-08mISDN: remove unnecessary variable assignmentsGustavo A. R. Silva
Remove unnecessary variable assignments. Addresses-Coverity-ID: 1226917 Signed-off-by: Gustavo A. R. Silva <garsilva@embeddedor.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-08tcp: add TCPMemoryPressuresChrono counterEric Dumazet
DRAM supply shortage and poor memory pressure tracking in TCP stack makes any change in SO_SNDBUF/SO_RCVBUF (or equivalent autotuning limits) and tcp_mem[] quite hazardous. TCPMemoryPressures SNMP counter is an indication of tcp_mem sysctl limits being hit, but only tracking number of transitions. If TCP stack behavior under stress was perfect : 1) It would maintain memory usage close to the limit. 2) Memory pressure state would be entered for short times. We certainly prefer 100 events lasting 10ms compared to one event lasting 200 seconds. This patch adds a new SNMP counter tracking cumulative duration of memory pressure events, given in ms units. $ cat /proc/sys/net/ipv4/tcp_mem 3088 4117 6176 $ grep TCP /proc/net/sockstat TCP: inuse 180 orphan 0 tw 2 alloc 234 mem 4140 $ nstat -n ; sleep 10 ; nstat |grep Pressure TcpExtTCPMemoryPressures 1700 TcpExtTCPMemoryPressuresChrono 5209 v2: Used EXPORT_SYMBOL_GPL() instead of EXPORT_SYMBOL() as David instructed. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-08srcu: Allow use of Classic SRCU from both process and interrupt contextPaolo Bonzini
Linu Cherian reported a WARN in cleanup_srcu_struct() when shutting down a guest running iperf on a VFIO assigned device. This happens because irqfd_wakeup() calls srcu_read_lock(&kvm->irq_srcu) in interrupt context, while a worker thread does the same inside kvm_set_irq(). If the interrupt happens while the worker thread is executing __srcu_read_lock(), updates to the Classic SRCU ->lock_count[] field or the Tree SRCU ->srcu_lock_count[] field can be lost. The docs say you are not supposed to call srcu_read_lock() and srcu_read_unlock() from irq context, but KVM interrupt injection happens from (host) interrupt context and it would be nice if SRCU supported the use case. KVM is using SRCU here not really for the "sleepable" part, but rather due to its IPI-free fast detection of grace periods. It is therefore not desirable to switch back to RCU, which would effectively revert commit 719d93cd5f5c ("kvm/irqchip: Speed up KVM_SET_GSI_ROUTING", 2014-01-16). However, the docs are overly conservative. You can have an SRCU instance only has users in irq context, and you can mix process and irq context as long as process context users disable interrupts. In addition, __srcu_read_unlock() actually uses this_cpu_dec() on both Tree SRCU and Classic SRCU. For those two implementations, only srcu_read_lock() is unsafe. When Classic SRCU's __srcu_read_unlock() was changed to use this_cpu_dec(), in commit 5a41344a3d83 ("srcu: Simplify __srcu_read_unlock() via this_cpu_dec()", 2012-11-29), __srcu_read_lock() did two increments. Therefore it kept __this_cpu_inc(), with preempt_disable/enable in the caller. Tree SRCU however only does one increment, so on most architectures it is more efficient for __srcu_read_lock() to use this_cpu_inc(), and any performance differences appear to be down in the noise. Cc: stable@vger.kernel.org Fixes: 719d93cd5f5c ("kvm/irqchip: Speed up KVM_SET_GSI_ROUTING") Reported-by: Linu Cherian <linuc.decode@gmail.com> Suggested-by: Linu Cherian <linuc.decode@gmail.com> Cc: kvm@vger.kernel.org Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2017-06-08srcu: Allow use of Tiny/Tree SRCU from both process and interrupt contextPaolo Bonzini
Linu Cherian reported a WARN in cleanup_srcu_struct() when shutting down a guest running iperf on a VFIO assigned device. This happens because irqfd_wakeup() calls srcu_read_lock(&kvm->irq_srcu) in interrupt context, while a worker thread does the same inside kvm_set_irq(). If the interrupt happens while the worker thread is executing __srcu_read_lock(), updates to the Classic SRCU ->lock_count[] field or the Tree SRCU ->srcu_lock_count[] field can be lost. The docs say you are not supposed to call srcu_read_lock() and srcu_read_unlock() from irq context, but KVM interrupt injection happens from (host) interrupt context and it would be nice if SRCU supported the use case. KVM is using SRCU here not really for the "sleepable" part, but rather due to its IPI-free fast detection of grace periods. It is therefore not desirable to switch back to RCU, which would effectively revert commit 719d93cd5f5c ("kvm/irqchip: Speed up KVM_SET_GSI_ROUTING", 2014-01-16). However, the docs are overly conservative. You can have an SRCU instance only has users in irq context, and you can mix process and irq context as long as process context users disable interrupts. In addition, __srcu_read_unlock() actually uses this_cpu_dec() on both Tree SRCU and Classic SRCU. For those two implementations, only srcu_read_lock() is unsafe. When Classic SRCU's __srcu_read_unlock() was changed to use this_cpu_dec(), in commit 5a41344a3d83 ("srcu: Simplify __srcu_read_unlock() via this_cpu_dec()", 2012-11-29), __srcu_read_lock() did two increments. Therefore it kept __this_cpu_inc(), with preempt_disable/enable in the caller. Tree SRCU however only does one increment, so on most architectures it is more efficient for __srcu_read_lock() to use this_cpu_inc(), and any performance differences appear to be down in the noise. Unlike Classic and Tree SRCU, Tiny SRCU does increments and decrements on a single variable. Therefore, as Peter Zijlstra pointed out, Tiny SRCU's implementation already supports mixed-context use of srcu_read_lock() and srcu_read_unlock(), at least as long as uses of srcu_read_lock() and srcu_read_unlock() in each handler are nested and paired properly. In other words, it is still illegal to (say) invoke srcu_read_lock() in an interrupt handler and to invoke the matching srcu_read_unlock() in a softirq handler. Therefore, the only change required for Tiny SRCU is to its comments. Fixes: 719d93cd5f5c ("kvm/irqchip: Speed up KVM_SET_GSI_ROUTING") Reported-by: Linu Cherian <linuc.decode@gmail.com> Suggested-by: Linu Cherian <linuc.decode@gmail.com> Cc: kvm@vger.kernel.org Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Tested-by: Paolo Bonzini <pbonzini@redhat.com>
2017-06-08xfs: fix spurious spin_is_locked() assert failures on non-smp kernelsBrian Foster
The 0-day kernel test robot reports assertion failures on !CONFIG_SMP kernels due to failed spin_is_locked() checks. As it turns out, spin_is_locked() is hardcoded to return zero on !CONFIG_SMP kernels and so this function cannot be relied on to verify spinlock state in this configuration. To avoid this problem, replace the associated asserts with lockdep variants that do the right thing regardless of kernel configuration. Drop the one assert that checks for an unlocked lock as there is no suitable lockdep variant for that case. This moves the spinlock checks from XFS debug code to lockdep, but generally provides the same level of protection. Reported-by: kbuild test robot <fengguang.wu@intel.com> Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2017-06-08net: ipv6: Release route when device is unregisteringDavid Ahern
Roopa reported attempts to delete a bond device that is referenced in a multipath route is hanging: $ ifdown bond2 # ifupdown2 command that deletes virtual devices unregister_netdevice: waiting for bond2 to become free. Usage count = 2 Steps to reproduce: echo 1 > /proc/sys/net/ipv6/conf/all/ignore_routes_with_linkdown ip link add dev bond12 type bond ip link add dev bond13 type bond ip addr add 2001:db8:2::0/64 dev bond12 ip addr add 2001:db8:3::0/64 dev bond13 ip route add 2001:db8:33::0/64 nexthop via 2001:db8:2::2 nexthop via 2001:db8:3::2 ip link del dev bond12 ip link del dev bond13 The root cause is the recent change to keep routes on a linkdown. Update the check to detect when the device is unregistering and release the route for that case. Fixes: a1a22c12060e4 ("net: ipv6: Keep nexthop of multipath route on admin down") Reported-by: Roopa Prabhu <roopa@cumulusnetworks.com> Signed-off-by: David Ahern <dsahern@gmail.com> Acked-by: Roopa Prabhu <roopa@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-08net: Zero ifla_vf_info in rtnl_fill_vfinfo()Mintz, Yuval
Some of the structure's fields are not initialized by the rtnetlink. If driver doesn't set those in ndo_get_vf_config(), they'd leak memory to user. Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com> CC: Michal Schmidt <mschmidt@redhat.com> Reviewed-by: Greg Rose <gvrose8192@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-08Merge branch 'tcp-Namespaceify-3-sysctls'David S. Miller
Eric Dumazet says: ==================== tcp: Namespaceify 3 sysctls Move tcp_sack, tcp_window_scaling and tcp_timestamps sysctls to network namespaces. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-08tcp: Namespaceify sysctl_tcp_timestampsEric Dumazet
Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-08tcp: Namespaceify sysctl_tcp_window_scalingEric Dumazet
Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-08tcp: Namespaceify sysctl_tcp_sackEric Dumazet
Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-08tcp: add a struct net parameter to tcp_parse_options()Eric Dumazet
We want to move some TCP sysctls to net namespaces in the future. tcp_window_scaling, tcp_sack and tcp_timestamps being fetched from tcp_parse_options(), we need to pass an extra parameter. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-08decnet: dn_rtmsg: Improve input length sanitization in dnrmg_receive_user_skbMateusz Jurczyk
Verify that the length of the socket buffer is sufficient to cover the nlmsghdr structure before accessing the nlh->nlmsg_len field for further input sanitization. If the client only supplies 1-3 bytes of data in sk_buff, then nlh->nlmsg_len remains partially uninitialized and contains leftover memory from the corresponding kernel allocation. Operating on such data may result in indeterminate evaluation of the nlmsg_len < sizeof(*nlh) expression. The bug was discovered by a runtime instrumentation designed to detect use of uninitialized memory in the kernel. The patch prevents this and other similar tools (e.g. KMSAN) from flagging this behavior in the future. Signed-off-by: Mateusz Jurczyk <mjurczyk@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-08Revert "decnet: dn_rtmsg: Improve input length sanitization in ↵David S. Miller
dnrmg_receive_user_skb" This reverts commit 85eac2ba35a2dbfbdd5767c7447a4af07444a5b4. There is an updated version of this fix which we should use instead. Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-08net: emac: fix and unify emac_mdio functionsChristian Lamparter
emac_mdio_read_link() was not copying the requested phy settings back into the emac driver's own phy api. This has caused a link speed mismatch issue for the AR8035 as the emac driver kept trying to connect with 10/100MBps on a 1GBit/s link. This patch also unifies shared code between emac_setup_aneg() and emac_mdio_setup_forced(). And furthermore it removes a chunk of emac_mdio_init_phy(), that was copying the same data into itself. Signed-off-by: Christian Lamparter <chunkeey@googlemail.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-08net: emac: fix reset timeout with AR8035 phyChristian Lamparter
This patch fixes a problem where the AR8035 PHY can't be detected on an Cisco Meraki MR24, if the ethernet cable is not connected on boot. Russell Senior provided steps to reproduce the issue: |Disconnect ethernet cable, apply power, wait until device has booted, |plug in ethernet, check for interfaces, no eth0 is listed. | |This appears to be a problem during probing of the AR8035 Phy chip. |When ethernet has no link, the phy detection fails, and eth0 is not |created. Plugging ethernet later has no effect, because there is no |interface as far as the kernel is concerned. The relevant part of |the boot log looks like this: |this is the failing case: | |[ 0.876611] /plb/opb/emac-rgmii@ef601500: input 0 in RGMII mode |[ 0.882532] /plb/opb/ethernet@ef600c00: reset timeout |[ 0.888546] /plb/opb/ethernet@ef600c00: can't find PHY! |and the succeeding case: | |[ 0.876672] /plb/opb/emac-rgmii@ef601500: input 0 in RGMII mode |[ 0.883952] eth0: EMAC-0 /plb/opb/ethernet@ef600c00, MAC 00:01:.. |[ 0.890822] eth0: found Atheros 8035 Gigabit Ethernet PHY (0x01) Based on the comment and the commit message of commit 23fbb5a87c56 ("emac: Fix EMAC soft reset on 460EX/GT"). This is because the AR8035 PHY doesn't provide the TX Clock, if the ethernet cable is not attached. This causes the reset to timeout and the PHY detection code in emac_init_phy() is unable to detect the AR8035 PHY. As a result, the emac driver bails out early and the user left with no ethernet. In order to stay compatible with existing configurations, the driver tries the current reset approach at first. Only if the first attempt timed out, it does perform one more retry with the clock temporarily switched to the internal source for just the duration of the reset. LEDE-Bug: #687 <https://bugs.lede-project.org/index.php?do=details&task_id=687> Cc: Chris Blake <chrisrblake93@gmail.com> Reported-by: Russell Senior <russell@personaltelco.net> Fixes: 23fbb5a87c56e98 ("emac: Fix EMAC soft reset on 460EX/GT") Signed-off-by: Christian Lamparter <chunkeey@googlemail.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-08decnet: dn_rtmsg: Improve input length sanitization in dnrmg_receive_user_skbMateusz Jurczyk
Verify that the length of the socket buffer is sufficient to cover the entire nlh->nlmsg_len field before accessing that field for further input sanitization. If the client only supplies 1-3 bytes of data in sk_buff, then nlh->nlmsg_len remains partially uninitialized and contains leftover memory from the corresponding kernel allocation. Operating on such data may result in indeterminate evaluation of the nlmsg_len < sizeof(*nlh) expression. The bug was discovered by a runtime instrumentation designed to detect use of uninitialized memory in the kernel. The patch prevents this and other similar tools (e.g. KMSAN) from flagging this behavior in the future. Signed-off-by: Mateusz Jurczyk <mjurczyk@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-08Merge tag 'kvm-s390-master-4.12-1' of ↵Paolo Bonzini
git://git.kernel.org/pub/scm/linux/kernel/git/kvms390/linux into HEAD KVM: s390: Fix for master (4.12) - The newly created AIS capability enables the feature unconditionally and ignores the cpu model
2017-06-08Merge branch 'nvme-4.12' of git://git.infradead.org/nvme into for-linusJens Axboe
Christoph writes: "A few NVMe fixes for 4.12-rc, PCIe reset fixes and APST fixes, a RDMA reconnect fix, two FC fixes and a general controller removal fix."
2017-06-08hsi: Fix build regression due to netdev destructor fix.David S. Miller
> ../drivers/hsi/clients/ssi_protocol.c:1069:5: error: 'struct net_device' has no member named 'destructor' Reported-by: Mark Brown <broonie@kernel.org> Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-08net: s390: fix up for "Fix inconsistent teardown and release of private ↵Stephen Rothwell
netdev state" Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-08drm/i915: fix warning for unused variableJani Nikula
drivers/gpu/drm/i915/intel_engine_cs.c: In function ‘intel_engine_is_idle’: drivers/gpu/drm/i915/intel_engine_cs.c:1103:27: error: unused variable ‘dev_priv’ [-Werror=unused-variable] struct drm_i915_private *dev_priv = engine->i915; ^~~~~~~~ Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Jani Nikula <jani.nikula@intel.com>
2017-06-08Fix loop device flush before configure v3James Wang
While installing SLES-12 (based on v4.4), I found that the installer will stall for 60+ seconds during LVM disk scan. The root cause was determined to be the removal of a bound device check in loop_flush() by commit b5dd2f6047ca ("block: loop: improve performance via blk-mq"). Restoring this check, examining ->lo_state as set by loop_set_fd() eliminates the bad behavior. Test method: modprobe loop max_loop=64 dd if=/dev/zero of=disk bs=512 count=200K for((i=0;i<4;i++))do losetup -f disk; done mkfs.ext4 -F /dev/loop0 for((i=0;i<4;i++))do mkdir t$i; mount /dev/loop$i t$i;done for f in `ls /dev/loop[0-9]*|sort`; do \ echo $f; dd if=$f of=/dev/null bs=512 count=1; \ done Test output: stock patched /dev/loop0 18.1217e-05 8.3842e-05 /dev/loop1 6.1114e-05 0.000147979 /dev/loop10 0.414701 0.000116564 /dev/loop11 0.7474 6.7942e-05 /dev/loop12 0.747986 8.9082e-05 /dev/loop13 0.746532 7.4799e-05 /dev/loop14 0.480041 9.3926e-05 /dev/loop15 1.26453 7.2522e-05 Note that from loop10 onward, the device is not mounted, yet the stock kernel consumes several orders of magnitude more wall time than it does for a mounted device. (Thanks for Mike Galbraith <efault@gmx.de>, give a changelog review.) Reviewed-by: Hannes Reinecke <hare@suse.com> Reviewed-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: James Wang <jnwang@suse.com> Fixes: b5dd2f6047ca ("block: loop: improve performance via blk-mq") Signed-off-by: Jens Axboe <axboe@fb.com>
2017-06-08net: propagate tc filter chain index down the ndo_setup_tc callJiri Pirko
We need to push the chain index down to the drivers, so they have the information to which chain the rule belongs. For now, no driver supports multichain offload, so only chain 0 is supported. This is needed to prevent chain squashes during offload for now. Later this will be used to implement multichain offload. Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-08s390: update defconfigMartin Schwidefsky
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2017-06-08MIPS: kprobes: flush_insn_slot should flush only if probe initialisedMarcin Nowakowski
When ftrace is used with kprobes, it is possible for a kprobe to contain an invalid location (ie. only initialised to 0 and not to a specific location in the code). Trying to perform a cache flush on such location leads to a crash r4k_flush_icache_range(). Fixes: c1bf207d6ee1 ("MIPS: kprobe: Add support.") Signed-off-by: Marcin Nowakowski <marcin.nowakowski@imgtec.com> Cc: linux-mips@linux-mips.org Patchwork: https://patchwork.linux-mips.org/patch/16296/ Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
2017-06-08KVM: cpuid: Fix read/write out-of-bounds vulnerability in cpuid emulationWanpeng Li
If "i" is the last element in the vcpu->arch.cpuid_entries[] array, it potentially can be exploited the vulnerability. this will out-of-bounds read and write. Luckily, the effect is small: /* when no next entry is found, the current entry[i] is reselected */ for (j = i + 1; ; j = (j + 1) % nent) { struct kvm_cpuid_entry2 *ej = &vcpu->arch.cpuid_entries[j]; if (ej->function == e->function) { It reads ej->maxphyaddr, which is user controlled. However... ej->flags |= KVM_CPUID_FLAG_STATE_READ_NEXT; After cpuid_entries there is int maxphyaddr; struct x86_emulate_ctxt emulate_ctxt; /* 16-byte aligned */ So we have: - cpuid_entries at offset 1B50 (6992) - maxphyaddr at offset 27D0 (6992 + 3200 = 10192) - padding at 27D4...27DF - emulate_ctxt at 27E0 And it writes in the padding. Pfew, writing the ops field of emulate_ctxt would have been much worse. This patch fixes it by modding the index to avoid the out-of-bounds access. Worst case, i == j and ej->function == e->function, the loop can bail out. Reported-by: Moguofang <moguofang@huawei.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Radim Krčmář <rkrcmar@redhat.com> Cc: Guofang Mo <moguofang@huawei.com> Cc: stable@vger.kernel.org Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2017-06-08Merge tag 'kvm-arm-for-v4.12-rc5-take2' of ↵Paolo Bonzini
git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD KVM/ARM Fixes for v4.12-rc5 - Take 2 Changes include: - Fix an issue with migrating GICv2 VMs on GICv3 systems. - Squashed a bug for gicv3 when figuring out preemption levels. - Fix a potential null pointer derefence in KVM happening under memory pressure. - Maintain RES1 bits in the SCTLR_EL2 to make sure KVM works on new architecture revisions. - Allow unaligned accesses at EL2/HYP
2017-06-08MIPS: ftrace: fix init functions tracingMarcin Nowakowski
Since introduction of tracing for init functions the in_kernel_space() check is no longer correct, as it ignores the init sections. As a result, when probes are inserted (and disabled) in the init functions, a branch instruction is inserted instead of a nop, which is likely to result in random crashes during boot. Remove the MIPS-specific in_kernel_space() method and replace it with a generic core_kernel_text() that also checks for init sections during system boot stage. Fixes: 42c269c88dc1 ("ftrace: Allow for function tracing to record init functions on boot up") Signed-off-by: Marcin Nowakowski <marcin.nowakowski@imgtec.com> Tested-by: Matt Redfearn <matt.redfearn@imgtec.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: linux-mips@linux-mips.org Patchwork: https://patchwork.linux-mips.org/patch/16092/ Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
2017-06-08MIPS: mm: adjust PKMAP locationMarcin Nowakowski
Space reserved for PKMap should span from PKMAP_BASE to FIXADDR_START. For large page sizes this is not the case as eg. for 64k pages the range currently defined is from 0xfe000000 to 0x102000000(!!) which obviously isn't right. Remove the hardcoded location and set the BASE address as an offset from FIXADDR_START. Since all PKMAP ptes have to be placed in a contiguous memory, ensure that this is the case by placing them all in a single page. This is achieved by aligning the end address to pkmap pages count pages. Signed-off-by: Marcin Nowakowski <marcin.nowakowski@imgtec.com> Cc: linux-mips@linux-mips.org Patchwork: https://patchwork.linux-mips.org/patch/15950/ Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
2017-06-08MIPS: highmem: ensure that we don't use more than one page for PTEsMarcin Nowakowski
All PTEs used by PKMAP should be allocated in a contiguous memory area, but we do not currently have a mechanism to enforce that, so ensure that we don't try to allocate more entries than would fit in a single page. Current fixed value of 1024 would not work with XPA enabled when sizeof(pte_t)==8 and we need two pages to store pte tables. Signed-off-by: Marcin Nowakowski <marcin.nowakowski@imgtec.com> Cc: linux-mips@linux-mips.org Patchwork: https://patchwork.linux-mips.org/patch/15949/ Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
2017-06-08MIPS: mm: fixed mappings: correct initialisationMarcin Nowakowski
fixrange_init operates at PMD-granularity and expects the addresses to be PMD-size aligned, but currently that might not be the case for PKMAP_BASE unless it is defined properly, so ensure a correct alignment is used before passing the address to fixrange_init. fixed mappings: only align the start address that is passed to fixrange_init rather than the value before adding the size, as we may end up with uninitialised upper part of the range. Signed-off-by: Marcin Nowakowski <marcin.nowakowski@imgtec.com> Cc: linux-mips@linux-mips.org Patchwork: https://patchwork.linux-mips.org/patch/15948/ Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
2017-06-08MIPS: perf: Remove incorrect odd/even counter handling for I6400Marcin Nowakowski
All performance counters on I6400 (odd and even) are capable of counting any of the available events, so drop current logic of using the extra bit to determine which counter to use. Signed-off-by: Marcin Nowakowski <marcin.nowakowski@imgtec.com> Fixes: 4e88a8621301 ("MIPS: Add cases for CPU_I6400") Fixes: fd716fca10fc ("MIPS: perf: Fix I6400 event numbers") Cc: linux-mips@linux-mips.org Patchwork: https://patchwork.linux-mips.org/patch/15991/ Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
2017-06-08mac80211: manage RX BA session offload without SKB queueJohannes Berg
Instead of using the SKB queue with the fake pkt_type for the offloaded RX BA session management, also handle this with the normal aggregation state machine worker. This also makes the use of this more reliable since it gets rid of the allocation of the fake skb. Combined with the previous patch, this finally allows us to get rid of the pkt_type hack entirely, so do that as well. Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2017-06-08Merge remote-tracking branch 'net-next/master' into mac80211-nextJohannes Berg
This brings in commit 7a7c0a6438b8 ("mac80211: fix TX aggregation start/stop callback race") to allow the follow-up cleanup. Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2017-06-08net/mlx5e: Fill advertised and supported port data from Hardware infoEran Ben Elisha
Translate hardware port connector type data into link mode supported and advertised info instead of caching it in driver. Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-06-08net/mlx5e: Add support for reading connector type from PTYSEran Ben Elisha
Read port connector type from the firmware instead of caching it in the driver metadata. Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-06-08net/mlx5: Update flow table commands layoutMaor Gottlieb
Update struct mlx5_ifc_create(modify)_flow_table_bits according to the last device specification. Signed-off-by: Maor Gottlieb <maorg@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-06-08net/mlx5e: Support header re-write of partial fields in TC pedit offloadOr Gerlitz
Using a per field mask field, the TC pedit action supports modifying partial fields. E.g if using the TC tool, the following example would make the kernel to only re-write two bytes of the src ip address: tc filter add dev enp1s0 protocol ip parent ffff: prio 30 flower skip_sw ip_proto udp dst_port 8001 action pedit ex munge ip src set 10.1.0.0 retain 0xffff0000 We add driver support for offload these partial re-writes, by setting the per FW action offset-in-field and length-from-offset attributes. The 1st bit set in the mask specifies both the offset and the right shift to apply on the value such that the 1st bit which needs to be set will reside in bit 0 of the FW data field. Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Reviewed-by: Paul Blakey <paulb@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-06-08net/mlx5e: Use modify header ID cache for offloaded TC NIC flowsOr Gerlitz
Use the modify header ID cache for the header re-write part of offloading TC NIC flows. Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Reviewed-by: Paul Blakey <paulb@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-06-08net/mlx5e: Use modify header ID cache for offloaded TC E-Switch flowsOr Gerlitz
Use the modify header ID cache for the header re-write part of offloading TC eswitch flows. Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Reviewed-by: Paul Blakey <paulb@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-06-08net/mlx5e: Add cache for HW modify header IDsOr Gerlitz
Packets belonging to flows which are different by matching may still need to go through the same header re-write. Add a cache for header re-write IDs keyed by the binary chain of modify header actions. The caching is supported for both eswitch and NIC use-cases, where the actual conversion of the code to use caching comes in next patches, one per use-case. Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Reviewed-by: Paul Blakey <paulb@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-06-08net/mlx5e: Use short attribute form when adding/deleting offloaded TC flowsOr Gerlitz
Instead of going through flow->nic/esw_attr for each usage, assign an attr pointer per the context (nic or esw) and use that. This patch doesn't add any functionality. Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Reviewed-by: Roi Dayan <roid@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-06-08net/mlx5e: Remove limitation of single NIC offloaded TC action per ruleOr Gerlitz
Remove the limitation that offloaded NIC filters can have only one action. This allows us for example to provide flow tag as a note to upper layers / apps that that HW header re-write was applied. Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Reviewed-by: Paul Blakey <paulb@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-06-08powerpc/book3s64: Move PPC_DT_CPU_FTRs and enable it by defaultMichael Ellerman
The PPC_DT_CPU_FTRs is a bit misplaced in menuconfig, it shows up with other general kernel options. It's really more at home in the "Platform Support" section, so move it there. Also enable it by default, for Book3s 64. It does mostly nothing unless the device tree properties are found, and we will want it enabled eventually in distro kernels, so turn it on to start getting more testing. Fixes: 5a61ef74f269 ("powerpc/64s: Support new device tree binding for discovering CPU features") Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-06-08powerpc/mm/4k: Limit 4k page size config to 64TB virtual address spaceAneesh Kumar K.V
Supporting 512TB requires us to do a order 3 allocation for level 1 page table (pgd). This results in page allocation failures with certain workloads. For now limit 4k linux page size config to 64TB. Fixes: f6eedbba7a26 ("powerpc/mm/hash: Increase VA range to 128TB") Reported-by: Hugh Dickins <hughd@google.com> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-06-08cxl: Fix error path on bad ioctlFrederic Barrat
Fix error path if we can't copy user structure on CXL_IOCTL_START_WORK ioctl. We shouldn't unlock the context status mutex as it was not locked (yet). Fixes: 0712dc7e73e5 ("cxl: Fix issues when unmapping contexts") Cc: stable@vger.kernel.org # v3.19+ Signed-off-by: Frederic Barrat <fbarrat@linux.vnet.ibm.com> Reviewed-by: Vaibhav Jain <vaibhav@linux.vnet.ibm.com> Reviewed-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-06-08[media] cec: race fix: don't return -ENONET in cec_receive()Hans Verkuil
When calling CEC_RECEIVE do not check if the adapter is configured. Typically CEC_RECEIVE is called after a select() and if that indicates that there are messages in the receive queue, then you should always be able to dequeue a message. The race condition here is that a message has been received and is queued, so select() tells userspace that a message is available. But before the application calls CEC_RECEIVE the adapter is unconfigured (e.g. the HDMI cable is removed). Now select will always report that there is a message, but calling CEC_RECEIVE will always return -ENONET because the adapter is no longer configured and so will never actually dequeue the message. There is really no need for this check, and in fact the ENONET error code was never documented for CEC_RECEIVE. This may have been a left-over of old code that was never updated. Signed-off-by: Hans Verkuil <hans.verkuil@cisco.com> Cc: <stable@vger.kernel.org> # for v4.10 and up Signed-off-by: Mauro Carvalho Chehab <mchehab@s-opensource.com>
2017-06-08Revert "printk: fix double printing with earlycon"Petr Mladek
This reverts commit cf39bf58afdaabc0b86f141630fb3fd18190294e. The commit regression to users that define both console=ttyS1 and console=ttyS0 on the command line, see https://lkml.kernel.org/r/20170509082915.GA13236@bistromath.localdomain The kernel log messages always appeared only on one serial port. It is even documented in Documentation/admin-guide/serial-console.rst: "Note that you can only define one console per device type (serial, video)." The above mentioned commit changed the order in which the command line parameters are searched. As a result, the kernel log messages go to the last mentioned ttyS* instead of the first one. We long thought that using two console=ttyS* on the command line did not make sense. But then we realized that console= parameters were handled also by systemd, see http://0pointer.de/blog/projects/serial-console.html "By default systemd will instantiate one serial-getty@.service on the main kernel console, if it is not a virtual terminal." where "[4] If multiple kernel consoles are used simultaneously, the main console is the one listed first in /sys/class/tty/console/active, which is the last one listed on the kernel command line." This puts the original report into another light. The system is running in qemu. The first serial port is used to store the messages into a file. The second one is used to login to the system via a socket. It depends on systemd and the historic kernel behavior. By other words, systemd causes that it makes sense to define both console=ttyS1 console=ttyS0 on the command line. The kernel fix caused regression related to userspace (systemd) and need to be reverted. In addition, it went out that the fix helped only partially. The messages still were duplicated when the boot console was removed early by late_initcall(printk_late_init). Then the entire log was replayed when the same console was registered as a normal one. Link: 20170606160339.GC7604@pathway.suse.cz Cc: Aleksey Makarov <aleksey.makarov@linaro.org> Cc: Sabrina Dubroca <sd@queasysnail.net> Cc: Sudeep Holla <sudeep.holla@arm.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Peter Hurley <peter@hurleysoftware.com> Cc: Jiri Slaby <jslaby@suse.com> Cc: Robin Murphy <robin.murphy@arm.com>, Cc: Steven Rostedt <rostedt@goodmis.org> Cc: "Nair, Jayachandran" <Jayachandran.Nair@cavium.com> Cc: linux-serial@vger.kernel.org Cc: linux-kernel@vger.kernel.org Reported-by: Sabrina Dubroca <sd@queasysnail.net> Acked-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Signed-off-by: Petr Mladek <pmladek@suse.com>