summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2015-10-23mpls: flow-based multipath selectionRobert Shearman
Change the selection of a multipath route to use a flow-based hash. This more suitable for traffic sensitive to reordering within a flow (e.g. TCP, L2VPN) and whilst still allowing a good distribution of traffic given enough flows. Selection of the path for a multipath route is done using a hash of: 1. Label stack up to MAX_MP_SELECT_LABELS labels or up to and including entropy label, whichever is first. 2. 3-tuple of (L3 src, L3 dst, proto) from IPv4/IPv6 header in MPLS payload, if present. Naturally, a 5-tuple hash using L4 information in addition would be possible and be better in some scenarios, but there is a tradeoff between looking deeper into the packet to achieve good distribution, and packet forwarding performance, and I have erred on the side of the latter as the default. Signed-off-by: Robert Shearman <rshearma@brocade.com> Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-23mpls: multipath route supportRoopa Prabhu
This patch adds support for MPLS multipath routes. Includes following changes to support multipath: - splits struct mpls_route into 'struct mpls_route + struct mpls_nh' - 'struct mpls_nh' represents a mpls nexthop label forwarding entry - moves mpls route and nexthop structures into internal.h - A mpls_route can point to multiple mpls_nh structs - the nexthops are maintained as a array (similar to ipv4 fib) - In the process of restructuring, this patch also consistently changes all labels to u8 - Adds support to parse/fill RTA_MULTIPATH netlink attribute for multipath routes similar to ipv4/v6 fib - In this patch, the multipath route nexthop selection algorithm simply returns the first nexthop. It is replaced by a hash based algorithm from Robert Shearman in the next patch - mpls_route_update cleanup: remove 'dev' handling in mpls_route_update. mpls_route_update though implemented to update based on dev, it was never used that way. And the dev handling gets tricky with multiple nexthops. Cannot match against any single nexthops dev. So, this patch removes the unused 'dev' handling in mpls_route_update. - dead route/path handling will be implemented in a subsequent patch Example: $ip -f mpls route add 100 nexthop as 200 via inet 10.1.1.2 dev swp1 \ nexthop as 700 via inet 10.1.1.6 dev swp2 \ nexthop as 800 via inet 40.1.1.2 dev swp3 $ip -f mpls route show 100 nexthop as to 200 via inet 10.1.1.2 dev swp1 nexthop as to 700 via inet 10.1.1.6 dev swp2 nexthop as to 800 via inet 40.1.1.2 dev swp3 Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com> Acked-by: Robert Shearman <rshearma@brocade.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-23net: sysctl: fix a kmemleak warningLi RongQing
the returned buffer of register_sysctl() is stored into net_header variable, but net_header is not used after, and compiler maybe optimise the variable out, and lead kmemleak reported the below warning comm "swapper/0", pid 1, jiffies 4294937448 (age 267.270s) hex dump (first 32 bytes): 90 38 8b 01 c0 ff ff ff 00 00 00 00 01 00 00 00 .8.............. 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ backtrace: [<ffffffc00020f134>] create_object+0x10c/0x2a0 [<ffffffc00070ff44>] kmemleak_alloc+0x54/0xa0 [<ffffffc0001fe378>] __kmalloc+0x1f8/0x4f8 [<ffffffc00028e984>] __register_sysctl_table+0x64/0x5a0 [<ffffffc00028eef0>] register_sysctl+0x30/0x40 [<ffffffc00099c304>] net_sysctl_init+0x20/0x58 [<ffffffc000994dd8>] sock_init+0x10/0xb0 [<ffffffc0000842e0>] do_one_initcall+0x90/0x1b8 [<ffffffc000966bac>] kernel_init_freeable+0x218/0x2f0 [<ffffffc00070ed6c>] kernel_init+0x1c/0xe8 [<ffffffc000083bfc>] ret_from_fork+0xc/0x50 [<ffffffffffffffff>] 0xffffffffffffffff <<end check kmemleak>> Before fix, the objdump result on ARM64: 0000000000000000 <net_sysctl_init>: 0: a9be7bfd stp x29, x30, [sp,#-32]! 4: 90000001 adrp x1, 0 <net_sysctl_init> 8: 90000000 adrp x0, 0 <net_sysctl_init> c: 910003fd mov x29, sp 10: 91000021 add x1, x1, #0x0 14: 91000000 add x0, x0, #0x0 18: a90153f3 stp x19, x20, [sp,#16] 1c: 12800174 mov w20, #0xfffffff4 // #-12 20: 94000000 bl 0 <register_sysctl> 24: b4000120 cbz x0, 48 <net_sysctl_init+0x48> 28: 90000013 adrp x19, 0 <net_sysctl_init> 2c: 91000273 add x19, x19, #0x0 30: 9101a260 add x0, x19, #0x68 34: 94000000 bl 0 <register_pernet_subsys> 38: 2a0003f4 mov w20, w0 3c: 35000060 cbnz w0, 48 <net_sysctl_init+0x48> 40: aa1303e0 mov x0, x19 44: 94000000 bl 0 <register_sysctl_root> 48: 2a1403e0 mov w0, w20 4c: a94153f3 ldp x19, x20, [sp,#16] 50: a8c27bfd ldp x29, x30, [sp],#32 54: d65f03c0 ret After: 0000000000000000 <net_sysctl_init>: 0: a9bd7bfd stp x29, x30, [sp,#-48]! 4: 90000000 adrp x0, 0 <net_sysctl_init> 8: 910003fd mov x29, sp c: a90153f3 stp x19, x20, [sp,#16] 10: 90000013 adrp x19, 0 <net_sysctl_init> 14: 91000000 add x0, x0, #0x0 18: 91000273 add x19, x19, #0x0 1c: f90013f5 str x21, [sp,#32] 20: aa1303e1 mov x1, x19 24: 12800175 mov w21, #0xfffffff4 // #-12 28: 94000000 bl 0 <register_sysctl> 2c: f9002260 str x0, [x19,#64] 30: b40001a0 cbz x0, 64 <net_sysctl_init+0x64> 34: 90000014 adrp x20, 0 <net_sysctl_init> 38: 91000294 add x20, x20, #0x0 3c: 9101a280 add x0, x20, #0x68 40: 94000000 bl 0 <register_pernet_subsys> 44: 2a0003f5 mov w21, w0 48: 35000080 cbnz w0, 58 <net_sysctl_init+0x58> 4c: aa1403e0 mov x0, x20 50: 94000000 bl 0 <register_sysctl_root> 54: 14000004 b 64 <net_sysctl_init+0x64> 58: f9402260 ldr x0, [x19,#64] 5c: 94000000 bl 0 <unregister_sysctl_table> 60: f900227f str xzr, [x19,#64] 64: 2a1503e0 mov w0, w21 68: f94013f5 ldr x21, [sp,#32] 6c: a94153f3 ldp x19, x20, [sp,#16] 70: a8c37bfd ldp x29, x30, [sp],#48 74: d65f03c0 ret Add the possible error handle to free the net_header to remove the kmemleak warning Signed-off-by: Li RongQing <roy.qing.li@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-23Merge branch 'akpm' (patches from Andrew)Linus Torvalds
Merge fixes from Andrew Morton: "9 fixes" * emailed patches from Andrew Morton <akpm@linux-foundation.org>: ocfs2/dlm: unlock lockres spinlock before dlm_lockres_put fault-inject: fix inverted interval/probability values in printk lib/Kconfig.debug: disable -Wframe-larger-than warnings with KASAN=y mm: make sendfile(2) killable thp: use is_zero_pfn() only after pte_present() check mailmap: update Javier Martinez Canillas' email MAINTAINERS: add Sergey as zsmalloc reviewer mm: cma: fix incorrect type conversion for size during dma allocation kmod: don't run async usermode helper as a child of kworker thread
2015-10-23Merge branch 'mdiobus_nested_read_write'David S. Miller
Neil Armstrong says: ==================== Refactor nested mdiobus read/write functions In order to avoid locked signal false positive for nested mdiobus read/write calls, nested code was introduced in mv88e6xxx and mdio-mux. But mv88e6060 also needs such nested mdiobus read/write calls. For sake of refactoring, introduce nested variants of mdiobus read/write and make them used by mv88e6xxx and mv88e6060. In a next patch, mdio-mux should also use these variant calls. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-23net: dsa: Make mv88e6060 use nested mdiobus read/writeNeil Armstrong
Like mv88e6xxx and mdio-mux, to avoid lockdep give false positives because of nested MDIO busses, switch to previously introduced nested mdiobus_read/write variants. Signed-off-by: Neil Armstrong <narmstrong@baylibre.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-23net: dsa: Make mv88e6xxx use nested mdiobus read/writeNeil Armstrong
Make the mv88e6xxx driver use the previously introduced nested variants of mdiobus_read/write functions. Signed-off-by: Neil Armstrong <narmstrong@baylibre.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-23net: phy: Add nested variants of mdiobus read/writeNeil Armstrong
Since nested variants of mdiobus_read/write are used in multiple drivers, add nested variants in the mdiobus core. Suggested-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: Neil Armstrong <narmstrong@baylibre.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-23ixgbe, ixgbevf: Add new mbox API xcast modeHiroshi Shimamoto
The limitation of the number of multicast address for VF is not enough for the large scale server with SR-IOV feature. IPv6 requires the multicast MAC address for each IP address to handle the Neighbor Solicitation message. We couldn't assign over 30 IPv6 addresses to a single VF. This patch introduces the new mailbox API, IXGBE_VF_UPDATE_XCAST_MODE, to update multicast mode of VF. This adds 3 modes; - NONE only L2 exact match addresses or Flow Director enabled - MULTI BAM and ROMPE set - ALLMULTI BAM, ROMPE and MPE set If a guest VF user wants over 30 MAC multicast addresses, set IFF_ALLMULTI to request PF to update xcast mode to enable VF multicast promiscuous mode. On the other hand, enabling VF multicast promiscuous mode may affect security and performance in the network of the NIC. Only trusted VF can enable multicast promiscuous mode. The behavior of untrusted VF is the same as previous version. Signed-off-by: Hiroshi Shimamoto <h-shimamoto@ct.jp.nec.com> Tested-by: Krishneil Singh <Krishneil.k.singh@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2015-10-23ixgbe: Add new ndo to trust VFHiroshi Shimamoto
Implements the new netdev op to trust VF in ixgbe. The administrator can turn on and off VF trusted by ip command which supports trust message. # ip link set dev eth0 vf 1 trust on or # ip link set dev eth0 vf 1 trust off Send a ping to reset VF on changing the status of trusting. VF driver will reconfigure its features on reset. Signed-off-by: Hiroshi Shimamoto <h-shimamoto@ct.jp.nec.com> Tested-by: Krishneil Singh <Krishneil.k.singh@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2015-10-23drivers: net: cpsw: use module_platform_driverGrygorii Strashko
There is no reasons to probe cpsw from late_initcall level and it's not recommended. Hence, use module_platform_driver() to register and probe cpsw driver from module_init() level. Cc: Tony Lindgren <tony@atomide.com> Acked-by: Mugunthan V N <mugunthanvnm@ti.com> Signed-off-by: Grygorii Strashko <grygorii.strashko@ti.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-23if_link: Add control trust VFHiroshi Shimamoto
Add netlink directives and ndo entry to trust VF user. This controls the special permission of VF user. The administrator will dedicatedly trust VF user to use some features which impacts security and/or performance. The administrator never turn it on unless VF user is fully trusted. CC: Sy Jong Choi <sy.jong.choi@intel.com> Signed-off-by: Hiroshi Shimamoto <h-shimamoto@ct.jp.nec.com> Acked-by: Greg Rose <gregory.v.rose@intel.com> Tested-by: Krishneil Singh <Krishneil.k.singh@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2015-10-23tcp/dccp: fix hashdance race for passive sessionsEric Dumazet
Multiple cpus can process duplicates of incoming ACK messages matching a SYN_RECV request socket. This is a rare event under normal operations, but definitely can happen. Only one must win the race, otherwise corruption would occur. To fix this without adding new atomic ops, we use logic in inet_ehash_nolisten() to detect the request was present in the same ehash bucket where we try to insert the new child. If request socket was not found, we have to undo the child creation. This actually removes a spin_lock()/spin_unlock() pair in reqsk_queue_unlink() for the fast path. Fixes: e994b2f0fb92 ("tcp: do not lock listener to process SYN packets") Fixes: 079096f103fa ("tcp/dccp: install syn_recv requests into ehash table") Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-23i40e: fix unconditional execution of cpu_to_le16()Jean Sacren
The commit 3092e5e4cc79 ("i40e: add little endian conversion for checksum") fixed the checksum bug on big-endian architecture. But we should not execute cpu_to_le16() unconditionally. Thus, put cpu_to_le16() under certain condition. Cc: Jesse Brandeburg <jesse.brandeburg@intel.com> Cc: Paul M Stillwell Jr <paul.m.stillwell.jr@intel.com> Signed-off-by: Jean Sacren <sakiwit@gmail.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2015-10-23i40e: clean up local variable initializationJean Sacren
In both i40e_calc_nvm_checksum() and i40e_update_nvm_checksum(), the local variables designated by 'ret_code' are overwritten immediately. As such, they should merely be declared. Signed-off-by: Jean Sacren <sakiwit@gmail.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2015-10-23i40evf: clean up local variable initializationJean Sacren
In i40evf_msix_aq(), the first two lines of rd32() are mainly to clear the registers. If we initialize 'val' at this point, it will be overwritten immediately. We shall simply discard the return value here. When we initialize 'val', we might as well include the mask in one step. Signed-off-by: Jean Sacren <sakiwit@gmail.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2015-10-23i40e: add missing kernel-doc argumentJean Sacren
The following kernel-doc arguments for their respective functions are missing: 1) @cd_type_cmd_tso_mss for i40e_tso(); 2) @cd_type_cmd_tso_mss for i40e_tsyn(); 3) @tx_ring for i40e_tx_enable_csum(). Add them all for the kernel-doc requirement. Signed-off-by: Jean Sacren <sakiwit@gmail.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Acked-by: Jesse Brandeburg <jesse.brandeburg@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2015-10-23i40evf: add missing kernel-doc argumentJean Sacren
@flush has been missing since the inception of i40evf_irq_enable(). Add it for the kernel doc. Signed-off-by: Jean Sacren <sakiwit@gmail.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Acked-by: Jesse Brandeburg <jesse.brandeburg@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2015-10-23i40e: re-use %*ph specifier to hexdump a dataAndy Shevchenko
Instead of using a custom approach change the code to use %*ph format specifier. Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Acked-by: Jesse Brandeburg <jesse.brandeburg@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2015-10-23i40e/i40evf: Bump i40e to 1.3.46 and i40evf to 1.3.33Catherine Sullivan
Bump up the version... Change-ID: Ib8d501021671ba20250115ed54330e2c182255b7 Signed-off-by: Catherine Sullivan <catherine.sullivan@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2015-10-23i40e: Disable VEB bridge mode with SR-IOV failureAkeem G Abodunrin
If a call to enable SR-IOV in the kernel failed, we need to disable I40E_FLAG_VEB_MODE_ENABLED, so that bridge mode could fall back to VEPA, which is a default. Change-ID: I12b6f776769506db85b29bea94b9c88d0b5ee65e Signed-off-by: Akeem G Abodunrin <akeem.g.abodunrin@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2015-10-23i40e: Fix an incorrect OEM version stringCarolyn Wyborny
This patch fixes a problem where the driver output of the OEM version string varied from the other tools. The mask value and the order of operations were incorrect, per the original change request. Without this patch, the version string will appear incorrect from the driver. Change-ID: Ie1ca6485284b4ce3b57e5a99b18b7641617c7ef7 Signed-off-by: Carolyn Wyborny <carolyn.wyborny@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2015-10-23i40e: fix inconsistent statuses after a PF resetHelin Zhang
This patch fixes a problem of possibly getting inconsistent flow control statuses after a PF reset. Requested_mode was being set with a default value during probing, but the initial HW state could be different from this mode. Change-ID: I772bf07b78616e87086418d4bd87954b66fa17cd Signed-off-by: Helin Zhang <helin.zhang@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2015-10-23i40evf: use correct struct for list manipulationMitch Williams
Not sure how this compiles at all. Use the correct struct for manipulating the VLAN filter list. Without this, the VLAN filter list doesn't get processed correctly, and VLAN filters will not be re-enabled after any kind of reset. Change-ID: Iceff2dc089f303058fb71ecb08419eed471e0e90 Signed-off-by: Mitch Williams <mitch.a.williams@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2015-10-23i40e: Fix VEB/VEPA bridge mode mismatch issueAkeem G Abodunrin
Fix i40e_is_vsi_uplink_mode_veb to check if bridge is actually in VEB mode before allowing LB in the add VSI routine, instead of unconditionally returning VEB bridge mode. Change-ID: I162397b1bdd02367735fe9baaeb51465be2a3ce9 Signed-off-by: Akeem G Abodunrin <akeem.g.abodunrin@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2015-10-23i40e: fix a bug in debugfs with add/del macaddrAnjali Singhai Jain
The new code flow requires us to grab the filter list lock before adding/deleting the filter. Change-ID: I4eaef508ab4da2d1b2e23f20f2a78d931d5b6aeb Signed-off-by: Anjali Singhai Jain <anjali.singhai@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2015-10-23i40e/i40evf: Add a workaround to drop all flow control framesAnjali Singhai Jain
This patch adds a workaround to drop any flow control frames from being transmitted from any VSI. FW can still send flow control frames if flow control is enabled. With this patch in place a malicious VF cannot send flow control or PFC packets out on the wire. Change-ID: I4303b24e98b93066d2767fec24dfe78be591c277 Signed-off-by: Anjali Singhai Jain <anjali.singhai@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2015-10-23ppp: fix pppoe_dev deletion condition in pppoe_release()Guillaume Nault
We can't rely on PPPOX_ZOMBIE to decide whether to clear po->pppoe_dev. PPPOX_ZOMBIE can be set by pppoe_disc_rcv() even when po->pppoe_dev is NULL. So we have no guarantee that (sk->sk_state & PPPOX_ZOMBIE) implies (po->pppoe_dev != NULL). Since we're releasing a PPPoE socket, we want to release the pppoe_dev if it exists and reset sk_state to PPPOX_DEAD, no matter the previous value of sk_state. So we can just check for po->pppoe_dev and avoid any assumption on sk->sk_state. Fixes: 2b018d57ff18 ("pppoe: drop PPPOX_ZOMBIEs in pppoe_release") Signed-off-by: Guillaume Nault <g.nault@alphalink.fr> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-23af_key: fix two typosLi RongQing
Signed-off-by: Li RongQing <roy.qing.li@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-23sched/core: Add missing lockdep_unpin() annotationsPeter Zijlstra
Luca and Wanpeng reported two missing annotations that led to false lockdep complaints. Add the missing annotations. Reported-by: Luca Abeni <luca.abeni@unitn.it> Reported-by: Wanpeng Li <wanpeng.li@hotmail.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Juri Lelli <juri.lelli@arm.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Fixes: cbce1a686700 ("sched,lockdep: Employ lock pinning") Link: http://lkml.kernel.org/r/20151023095008.GY17308@twins.programming.kicks-ass.net Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-10-23amd-xgbe: Use wmb before updating current descriptor countLendacky, Thomas
The code currently uses the lightweight dma_wmb barrier before updating the current descriptor count. Under heavy load, the Tx cleanup routine was seeing the updated current descriptor count before the updated descriptor information. As a result, the Tx descriptor was being cleaned up before it was used because it was not "owned" by the hardware yet, resulting in a Tx queue hang. Using the wmb barrier insures that the descriptor is updated before the descriptor counter preventing the Tx queue hang. For extra insurance, the Tx cleanup routine is changed to grab the current decriptor count on entry and uses that initial value in the processing loop rather than trying to chase the current value. Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com> Tested-by: Christoffer Dall <christoffer.dall@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-23net/phy: micrel: Add workaround for bad autonegNathan Sullivan
Very rarely, the KSZ9031 will appear to complete autonegotiation, but will drop all traffic afterwards. When this happens, the idle error count will read 0xFF after autonegotiation completes. Reset the PHY when in that state. Signed-off-by: Nathan Sullivan <nathan.sullivan@ni.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-23ipv4: implement support for NOPREFIXROUTE ifa flag for ipv4 addressPaolo Abeni
Currently adding a new ipv4 address always cause the creation of the related network route, with default metric. When a host has multiple interfaces on the same network, multiple routes with the same metric are created. If the userspace wants to set specific metric on each routes, i.e. giving better metric to ethernet links in respect to Wi-Fi ones, the network routes must be deleted and recreated, which is error-prone. This patch implements the support for IFA_F_NOPREFIXROUTE for ipv4 address. When an address is added with such flag set, no associated network route is created, no network route is deleted when said IP is gone and it's up to the user space manage such route. Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-23Merge tag 'powerpc-4.3-5' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux Pull powerpc fixes from Michael Ellerman: - Revert "Use the POWER8 Micro Partition Prefetch Engine in KVM HV on POWER8" from Paul - Handle irq_happened flag correctly in off-line loop from Paul - Validate rtas.entry before calling enter_rtas() from Vasant * tag 'powerpc-4.3-5' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: powerpc/rtas: Validate rtas.entry before calling enter_rtas() powerpc/powernv: Handle irq_happened flag correctly in off-line loop powerpc: Revert "Use the POWER8 Micro Partition Prefetch Engine in KVM HV on POWER8"
2015-10-23Merge branch 'ipv6-overflow-arith'David S. Miller
Hannes Frederic Sowa says: ==================== overflow-arith: begin to add support for overflow builtins functions I add a new header, linux/overflow-arith.h, as the central place to add overflow and wrap-around checking functions. The reason I am doing so is that it can make use of compiler supported builtin functions which can leverage hardware. As I need this for a fix in the ipv6 stack, which is also included in this series, I propose to add it sooner than later over Davem's net tree. This is also the reason why I start slowly with only the one function I need at this time. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-23ipv6: protect mtu calculation of wrap-around and infinite loop by rounding ↵Hannes Frederic Sowa
issues Raw sockets with hdrincl enabled can insert ipv6 extension headers right into the data stream. In case we need to fragment those packets, we reparse the options header to find the place where we can insert the fragment header. If the extension headers exceed the link's MTU we actually cannot make progress in such a case. Instead of ending up in broken arithmetic or rounding towards 0 and entering an endless loop in ip6_fragment, just prevent those cases by aborting early and signal -EMSGSIZE to user space. Reported-by: Dmitry Vyukov <dvyukov@google.com> Cc: Dmitry Vyukov <dvyukov@google.com> Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-23overflow-arith: begin to add support for overflow builtin functionsHannes Frederic Sowa
The idea of the overflow-arith.h header is to collect overflow checking functions in one central place. If gcc compiler supports the __builtin_overflow_* builtins we use them because they might give better performance, otherwise the code falls back to normal overflow checking functions. The builtin_overflow functions are supported by gcc-5 and clang. The matter of supporting clang is to just provide a corresponding CC_HAVE_BUILTIN_OVERFLOW, because the specific overflow checking builtins don't differ between gcc and clang. I just provide overflow_usub function here as I intend this to get merged into net, more functions will definitely follow as they are needed. Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-23tcp: allow dctcp alpha to drop to zeroAndrew Shewmaker
If alpha is strictly reduced by alpha >> dctcp_shift_g and if alpha is less than 1 << dctcp_shift_g, then alpha may never reach zero. For example, given shift_g=4 and alpha=15, alpha >> dctcp_shift_g yields 0 and alpha remains 15. The effect isn't noticeable in this case below cwnd=137, but could gradually drive uncongested flows with leftover alpha down to cwnd=137. A larger dctcp_shift_g would have a greater effect. This change causes alpha=15 to drop to 0 instead of being decrementing by 1 as it would when alpha=16. However, it requires one less conditional to implement since it doesn't have to guard against subtracting 1 from 0U. A decay of 15 is not unreasonable since an equal or greater amount occurs at alpha >= 240. Signed-off-by: Andrew G. Shewmaker <agshew@gmail.com> Acked-by: Florian Westphal <fw@strlen.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-23ipv6: fix the incorrect return value of throw routelucien
The error condition -EAGAIN, which is signaled by throw routes, tells the rules framework to walk on searching for next matches. If the walk ends and we stop walking the rules with the result of a throw route we have to translate the error conditions to -ENETUNREACH. Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-23Merge tag 'fixes-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc Pull ARM SoC fixes from Arnd Bergmann: "Most of the changes this time are for incorrect device nodes in various ways, on on imx, berlin, exynos, ux500, uniphier, omap and meson. Chen-Yu Tsai now co-maintains mach-sunxi (Allwinner). Other bug fixes include - a partial revert of a broken tegra gpio patch - irq affinity for arm ccn - suspend on one Armada 385 machine - enable ZONE_DMA to avoid an OMAP crash for over 2GB RAM - turning on a regulator on beagleboard-x15 for HDMI - making the omap gpmc debug code visible - setup of orion network switch - a rare build regression for pxa" * tag 'fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc: (22 commits) ARM: OMAP2+: Fix imprecise external abort caused by bogus SRAM init thermal: exynos: Fix register read in TMU ARM: OMAP2+: Fix oops with LPAE and more than 2GB of memory ARM: tegra: Comment out gpio-ranges properties ARM: dts: uniphier: fix IRQ number for devices on PH1-LD6b ref board drivers/perf: arm_pmu: avoid CPU device_node reference leak bus: arm-ccn: Fix irq affinity setting on CPU migration bus: arm-ccn: Handle correctly no-more-cpus case ARM: mvebu: correct a385-db-ap compatible string ARM: meson6: DTS: Fix wrong reg mapping and IRQ numbers MAINTAINERS: Update Allwinner entry and add new maintainer ARM: ux500: modify initial levelshifter status ARM: pxa: fix pxa3xx DFI lockup hack Documentation: ARM: List new omap MMC requirements memory: omap-gpmc: dump "before" state before first modification memory: omap-gpmc: Fix unselectable debug option for GPMC ARM: dts: am57xx-beagle-x15: set VDD_SD to always-on ARM: dts: Fix audio card detection on Peach boards ARM: EXYNOS: Fix double of_node_put() when parsing child power domains ARM: orion: Fix DSA platform device after mvmdio conversion ...
2015-10-23macvtap: unbreak receiving of gro skb with frag listJason Wang
We don't have fraglist support in TAP_FEATURES. This will lead software segmentation of gro skb with frag list. Fixes by having frag list support in TAP_FEATURES. With this patch single session of netperf receiving were restored from about 5Gb/s to about 12Gb/s on mlx4. Fixes a567dd6252 ("macvtap: simplify usage of tap_features") Cc: Vlad Yasevich <vyasevic@redhat.com> Cc: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Jason Wang <jasowang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-23Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds
Pull KVM bugfixes from Paolo Bonzini: "Bug fixes for ARM, mostly 4.3 regressions related to virtual interrupt controller changes" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: arm/arm64: KVM: Fix disabled distributor operation arm/arm64: KVM: Clear map->active on pend/active clear arm/arm64: KVM: Fix arch timer behavior for disabled interrupts KVM: arm: use GIC support unconditionally KVM: arm/arm64: Fix memory leak if timer initialization fails KVM: arm/arm64: Do not inject spurious interrupts
2015-10-23Merge tag 'trace-fixes-v4.3-rc6' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace Pull tracing fixes from Steven Rostedt: "Running tests on other changes, the system locked up due to lots of warnings. It was caused by the stack tracer triggering a warning about using rcu_dereference() when RCU was not watching. This can happen due to the fact that the stack tracer uses the function tracer to check each function, and there are functions that may be called and traced when RCU stopped watching. Namely when a function is called just before going idle or to userspace and after RCU stopped watching that current CPU. The first patch makes sure that RCU is watching when the stack tracer uses RCU. The second patch is to make sure that the stack tracer does not get called by functions in NMI, as it's not NMI safe" * tag 'trace-fixes-v4.3-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: tracing: Do not allow stack_tracer to record stack in NMI tracing: Have stack tracer force RCU to be watching
2015-10-23Merge tag 'sound-4.3-rc7' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound Pull sound fixes from Takashi Iwai: "There is nothing to worry you much, only a few small & stable patches are found for usual stuff, HD-audio (a Lenovo laptop quirk, a fix for minor error handling) and ASoC (trivial fixes for RT298 and WM codecs). The only remaining major change is the fix for ASoC SX_TLV control that was overseen during refactoring, but the fix itself is trivial and safe" * tag 'sound-4.3-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound: ASoC: wm8962: mark cache_dirty flag after software reset in pm_resume ASoC: rt298: fix wrong setting of gpio2_en ASoC: wm8904: Correct number of EQ registers ALSA: hda - Fix deadlock at error in building PCM ASoC: Add info callback for SX_TLV controls ASoC: rt298: correct index default value ALSA: hda - Fix inverted internal mic on Lenovo G50-80 ALSA: hdac: Explicitly add io.h
2015-10-23Merge tag 'media/v4.3-4' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media Pull media fixes from Mauro Carvalho Chehab: "Some regression fixes and potential security issues: - netup_unidvb: fix potential crash when spi is NULL - rtl28xxu: fix control message flaws - m88ds3103: fix a regression on Kernel 4.2 - c8sectpfe: fix some issues on this new driver - v4l2-flash-led-class: fix a Kbuild dependency - si2157 and si2158: check for array boundary when uploading firmware files - horus3a and lnbh25: fix some building troubles when some options aren't selected - ir-hix5hd2: drop the use of IRQF_NO_SUSPEND" * tag 'media/v4.3-4' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media: [media] m88ds3103: use own reg update_bits() implementation [media] rtl28xxu: fix control message flaws [media] v4l2-flash-led-class: Add missing VIDEO_V4L2 Kconfig dependency [media] netup_unidvb: fix potential crash when spi is NULL [media] si2168: Bounds check firmware [media] si2157: Bounds check firmware [media] ir-hix5hd2: drop the use of IRQF_NO_SUSPEND [media] c8sectpfe: fix return of garbage [media] c8sectpfe: fix ininitialized error return on firmware load failure [media] lnbh25: Fix lnbh25_attach() function return type [media] horus3a: Fix horus3a_attach() function parameters
2015-10-23Merge branch 'drm-fixes' of git://people.freedesktop.org/~airlied/linuxLinus Torvalds
Pull drm fixes from Dave Airlie: "I've been a bit slow gathering these: - drm/mst: one mutex leak in a fail path - radeon: two oops fixes, one dpm fix - i915: one messy set of fixes, where we revert the original fix, and pull back the proper set of fixes from -next on top. - nouveau: one fix for an illegal buffer placement. Doesn't look too bad, hopefully shouldn't be too much more" * 'drm-fixes' of git://people.freedesktop.org/~airlied/linux: drm/nouveau/gem: return only valid domain when there's only one drm: fix mutex leak in drm_dp_get_mst_branch_device drm/amdgpu: add missing dpm check for KV dpm late init drm/amdgpu/dpm: don't add pwm attributes if DPM is disabled drm/radeon/dpm: don't add pwm attributes if DPM is disabled drm/i915: Add primary plane to mask if it's visible drm/i915: Move sprite/cursor plane disable to intel_sanitize_crtc() drm/i915: Assign hwmode after encoder state readout Revert "drm/i915: Add primary plane to mask if it's visible" drm/i915: Deny wrapping an userptr into a framebuffer drm/i915: Enable DPLL VGA mode before P1/P2 divider write drm/i915: Restore lost DPLL register write on gen2-4 drm/i915: Flush pipecontrol post-sync writes drm/i915: Fix kerneldoc for i915_gem_shrink_all
2015-10-23ocfs2/dlm: unlock lockres spinlock before dlm_lockres_putJoseph Qi
dlm_lockres_put will call dlm_lockres_release if it is the last reference, and then it may call dlm_print_one_lock_resource and take lockres spinlock. So unlock lockres spinlock before dlm_lockres_put to avoid deadlock. Signed-off-by: Joseph Qi <joseph.qi@huawei.com> Cc: Mark Fasheh <mfasheh@suse.de> Cc: Joel Becker <jlbec@evilplan.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-10-23fault-inject: fix inverted interval/probability values in printkFlorian Westphal
interval displays the probability and vice versa. Fixes: 6adc4a22f20bb ("fault-inject: add ratelimit option") Acked-by: Akinobu Mita <akinobu.mita@gmail.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-10-23lib/Kconfig.debug: disable -Wframe-larger-than warnings with KASAN=yAndrey Ryabinin
When the kernel compiled with KASAN=y, GCC adds redzones for each variable on stack. This enlarges function's stack frame and causes: 'warning: the frame size of X bytes is larger than Y bytes' The worst case I've seen for now is following: ../net/wireless/nl80211.c: In function `nl80211_send_wiphy': ../net/wireless/nl80211.c:1731:1: warning: the frame size of 5448 bytes is larger than 2048 bytes [-Wframe-larger-than=] That kind of warning becomes useless with KASAN=y. It doesn't necessarily indicate that there is some problem in the code, thus we should turn it off. (The KASAN=y stack size in increased from 16k to 32k for this reason) Signed-off-by: Andrey Ryabinin <aryabinin@virtuozzo.com> Reported-by: Fengguang Wu <fengguang.wu@intel.com> Acked-by: Abylay Ospan <aospan@netup.ru> Cc: Andi Kleen <andi@firstfloor.org> Cc: Ingo Molnar <mingo@kernel.org> Cc: Mauro Carvalho Chehab <m.chehab@samsung.com> Cc: Kozlov Sergey <serjk@netup.ru> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-10-23mm: make sendfile(2) killableJan Kara
Currently a simple program below issues a sendfile(2) system call which takes about 62 days to complete in my test KVM instance. int fd; off_t off = 0; fd = open("file", O_RDWR | O_TRUNC | O_SYNC | O_CREAT, 0644); ftruncate(fd, 2); lseek(fd, 0, SEEK_END); sendfile(fd, fd, &off, 0xfffffff); Now you should not ask kernel to do a stupid stuff like copying 256MB in 2-byte chunks and call fsync(2) after each chunk but if you do, sysadmin should have a way to stop you. We actually do have a check for fatal_signal_pending() in generic_perform_write() which triggers in this path however because we always succeed in writing something before the check is done, we return value > 0 from generic_perform_write() and thus the information about signal gets lost. Fix the problem by doing the signal check before writing anything. That way generic_perform_write() returns -EINTR, the error gets propagated up and the sendfile loop terminates early. Signed-off-by: Jan Kara <jack@suse.com> Reported-by: Dmitry Vyukov <dvyukov@google.com> Cc: Al Viro <viro@ZenIV.linux.org.uk> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>