summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2016-07-19Update maintainer for EHEA driver.Douglas Miller
Since Thadeu left IBM, EHEA has gone mostly unmaintained, since his email address doesn't work anymore. I'm stepping up to help maintain this driver upstream. I'm adding Thadeu's personal e-mail address in Cc, hoping that we can get his ack. CC: Thadeu Lima de Souza Cascardo <cascardo@cascardo.eti.br> Signed-off-by: Douglas Miller <dougmill@linux.vnet.ibm.com> Acked-by: Thadeu Lima de Souza Cascardo <cascardo@cascardo.eti.br> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-19Merge branch 'mlx4-fixes'David S. Miller
Tariq Toukan says: ==================== Safe flow for mlx4_en configuration change This patchset improves the mlx4_en driver resiliency, especially on systems with low memory. Upon a configuration change that requires the allocation of new resources, we first try to allocate, prior to destroying the current ones. Once it is successfully done, we release the old resources and attach the new ones. Otherwise, we stay with a functioning interface having the same old configuration. This improvement became of greater significance after removing the use of vmap. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-19net/mlx4_en: Add resilience in low memory systemsEugenia Emantayev
This patch fixes the lost of Ethernet port on low memory system, when driver frees its resources and fails to allocate new resources. Issue could happen while changing number of channels, rings size or changing the timestamp configuration. This fix is necessary because of removing vmap use in the code. When vmap was in use driver could allocate non-contiguous memory and make it contiguous with vmap. Now it could fail to allocate a large chunk of contiguous memory and lose the port. Current code tries to allocate new resources and then upon success frees the old resources. Fixes: 73898db04301 ('net/mlx4: Avoid wrong virtual mappings') Signed-off-by: Eugenia Emantayev <eugenia@mellanox.com> Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-19net/mlx4_en: Move filters cleanup to a proper locationEugenia Emantayev
Filters cleanup should be done once before destroying net device, since filters list is contained in the private data. Fixes: 1eb8c695bda9 ('net/mlx4_en: Add accelerated RFS support') Signed-off-by: Eugenia Emantayev <eugenia@mellanox.com> Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-18sctp: load transport header after sk_filterWillem de Bruijn
Do not cache pointers into the skb linear segment across sk_filter. The function call can trigger pskb_expand_head. Signed-off-by: Willem de Bruijn <willemb@google.com> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-18net/sched/sch_htb: clamp xstats tokens to fit into 32-bit intKonstantin Khlebnikov
In kernel HTB keeps tokens in signed 64-bit in nanoseconds. In netlink protocol these values are converted into pshed ticks (64ns for now) and truncated to 32-bit. In struct tc_htb_xstats fields "tokens" and "ctokens" are declared as unsigned 32-bit but they could be negative thus tool 'tc' prints them as signed. Big values loose higher bits and/or become negative. This patch clamps tokens in xstat into range from INT_MIN to INT_MAX. In this way it's easier to understand what's going on here. Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-16net: cavium: liquidio: Avoid dma_unmap_single on uninitialized ndataFlorian Fainelli
The label lio_xmit_failed is used 3 times through liquidio_xmit() but it always makes a call to dma_unmap_single() using potentially uninitialized variables from "ndata" variable. Out of the 3 gotos, 2 run after ndata has been initialized, and had a prior dma_map_single() call. Fix this by adding a new error label: lio_xmit_dma_failed which does this dma_unmap_single() and then processed with the lio_xmit_failed fallthrough. Fixes: f21fb3ed364bb ("Add support of Cavium Liquidio ethernet adapters") Reported-by: coverity (CID 1309740) Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-16net: nb8800: Fix SKB leak in nb8800_receive()Florian Fainelli
In case nb8800_receive() fails to allocate a fragment, we would leak the SKB freshly allocated and just return, instead, free it. Reported-by: coverity (CID 1341750) Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Acked-by: Mans Rullgard <mans@mansr.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-16et131x: Fix logical vs bitwise check in et131x_tx_timeout()Florian Fainelli
We should be using a logical check here instead of a bitwise operation to check if the device is closed already in et131x_tx_timeout(). Reported-by: coverity (CID 146498) Fixes: 38df6492eb511 ("et131x: Add PCIe gigabit ethernet driver et131x to drivers/net") Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-16vlan: use a valid default mtu value for vlan over macsecPaolo Abeni
macsec can't cope with mtu frames which need vlan tag insertion, and vlan device set the default mtu equal to the underlying dev's one. By default vlan over macsec devices use invalid mtu, dropping all the large packets. This patch adds a netif helper to check if an upper vlan device needs mtu reduction. The helper is used during vlan devices initialization to set a valid default and during mtu updating to forbid invalid, too bit, mtu values. The helper currently only check if the lower dev is a macsec device, if we get more users, we need to update only the helper (possibly reserving an additional IFF bit). Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-15net: bgmac: Fix infinite loop in bgmac_dma_tx_add()Florian Fainelli
Nothing is decrementing the index "i" while we are cleaning up the fragments we could not successful transmit. Fixes: 9cde94506eacf ("bgmac: implement scatter/gather support") Reported-by: coverity (CID 1352048) Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-15Merge branch 'mlxsw-fixes'David S. Miller
Jiri Pirko says: ==================== mlxsw: Couple of fixes Couple of fixes for mlxsw driver from Ido. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-15mlxsw: spectrum: Prevent invalid ingress buffer mappingIdo Schimmel
Packets entering the switch are mapped to a Switch Priority (SP) according to their PCP value (untagged frames are mapped to SP 0). The packets are classified to a priority group (PG) buffer in the port's headroom according to their SP. The switch maintains another mapping (SP to IEEE priority), which is used to generate PFC frames for lossless PGs. This mapping is initialized to IEEE = SP % 8. Therefore, when mapping SP 'x' to PG 'y' we create a situation in which an IEEE priority is mapped to two different PGs: IEEE 'x' ---> SP 'x' ---> PG 'y' IEEE 'x' ---> SP 'x + 8' ---> PG '0' (default) Which is invalid, as a flow can use only one PG buffer. Fix this by mapping both SP 'x' and 'x + 8' to the same PG buffer. Fixes: 8e8dfe9fdf06 ("mlxsw: spectrum: Add IEEE 802.1Qaz ETS support") Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-15mlxsw: spectrum: Prevent overwrite of DCB capability fieldsIdo Schimmel
The number of supported traffic classes that can have ETS and PFC simultaneously enabled is not subject to user configuration, so make sure we always initialize them to the correct values following a set operation. Fixes: 8e8dfe9fdf06 ("mlxsw: spectrum: Add IEEE 802.1Qaz ETS support") Fixes: d81a6bdb87ce ("mlxsw: spectrum: Add IEEE 802.1Qbb PFC support") Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-15mlxsw: spectrum: Don't emit errors when PFC is disabledIdo Schimmel
We can't have PAUSE frames and PFC both enabled on the same port, but the fact that ieee_setpfc() was called doesn't necessarily mean PFC is enabled. Only emit errors when PAUSE frames and PFC are enabled simultaneously. Fixes: d81a6bdb87ce ("mlxsw: spectrum: Add IEEE 802.1Qbb PFC support") Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-15mlxsw: spectrum: Indicate support for autonegotiationIdo Schimmel
The device supports link autonegotiation, so let the user know about it by indicating support via ethtool ops. Fixes: 56ade8fe3fe1 ("mlxsw: spectrum: Add initial support for Spectrum ASIC") Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-15mlxsw: spectrum: Force link training according to admin stateIdo Schimmel
When setting a new speed we need to disable and enable the port for the changes to take effect. We currently only do that if the operational state of the port is up. However, setting a new speed following link training failure will require us to explicitly set the port down and then up. Instead, disable and enable the port based on its administrative state. Fixes: 56ade8fe3fe1 ("mlxsw: spectrum: Add initial support for Spectrum ASIC") Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-15Merge branch 'master' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/net-queue Jeff Kirsher says: ==================== Intel Wired LAN Driver Updates 2016-07-14 This series contains fixes to i40e and ixgbe. Alex fixes issues found in i40e_rx_checksum() which was broken, where the checksum was being returned valid when it was not. Kiran fixes a bug which was found when we abruptly remove a cable which caused a panic. Set the VSI broadcast promiscuous mode during VSI add sequence and prevents adding MAC filter if specified MAC address is broadcast. Paolo Abeni fixes a bug by returning the actual work done, capped to weight - 1, since the core doesn't allow to return the full budget when the driver modifies the NAPI status. Guilherme Piccoli fixes an issue where the q_vector initialization routine sets the affinity _mask of a q_vector based on v_idx value. This means a loop iterates on v_idx, which is an incremental value, and the cpumask is created based on this value. This is a problem in systems with multiple logical CPUs per core (like in SMT scenarios). Changed the way q_vector's affinity_mask is created to resolve the issue. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-15r8152: add MODULE_VERSIONGrant Grundler
ethtool -i provides a driver version that is hard coded. Export the same value via "modinfo". Signed-off-by: Grant Grundler <grundler@chromium.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-15tcp: enable per-socket rate limiting of all 'challenge acks'Jason Baron
The per-socket rate limit for 'challenge acks' was introduced in the context of limiting ack loops: commit f2b2c582e824 ("tcp: mitigate ACK loops for connections as tcp_sock") And I think it can be extended to rate limit all 'challenge acks' on a per-socket basis. Since we have the global tcp_challenge_ack_limit, this patch allows for tcp_challenge_ack_limit to be set to a large value and effectively rely on the per-socket limit, or set tcp_challenge_ack_limit to a lower value and still prevents a single connections from consuming the entire challenge ack quota. It further moves in the direction of eliminating the global limit at some point, as Eric Dumazet has suggested. This a follow-up to: Subject: tcp: make challenge acks less predictable Cc: Eric Dumazet <edumazet@google.com> Cc: David S. Miller <davem@davemloft.net> Cc: Neal Cardwell <ncardwell@google.com> Cc: Yuchung Cheng <ycheng@google.com> Cc: Yue Cao <ycao009@ucr.edu> Signed-off-by: Jason Baron <jbaron@akamai.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-14i40e: use valid online CPU on q_vector initializationGuilherme G. Piccoli
Currently, the q_vector initialization routine sets the affinity_mask of a q_vector based on v_idx value. Meaning a loop iterates on v_idx, which is an incremental value, and the cpumask is created based on this value. This is a problem in systems with multiple logical CPUs per core (like in SMT scenarios). If we disable some logical CPUs, by turning SMT off for example, we will end up with a sparse cpu_online_mask, i.e., only the first CPU in a core is online, and incremental filling in q_vector cpumask might lead to multiple offline CPUs being assigned to q_vectors. Example: if we have a system with 8 cores each one containing 8 logical CPUs (SMT == 8 in this case), we have 64 CPUs in total. But if SMT is disabled, only the 1st CPU in each core remains online, so the cpu_online_mask in this case would have only 8 bits set, in a sparse way. In general case, when SMT is off the cpu_online_mask has only C bits set: 0, 1*N, 2*N, ..., C*(N-1) where C == # of cores; N == # of logical CPUs per core. In our example, only bits 0, 8, 16, 24, 32, 40, 48, 56 would be set. This patch changes the way q_vector's affinity_mask is created: it iterates on v_idx, but consumes the CPU index from the cpu_online_mask instead of just using the v_idx incremental value. No functional changes were introduced. Signed-off-by: Guilherme G Piccoli <gpiccoli@linux.vnet.ibm.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2016-07-14ixgbe: napi_poll must return the work donePaolo Abeni
Currently the function ixgbe_poll() returns 0 when it clean completely the rx rings, but this foul budget accounting in core code. Fix this returning the actual work done, capped to weight - 1, since the core doesn't allow to return the full budget when the driver modifies the napi status Signed-off-by: Paolo Abeni <pabeni@redhat.com> Reviewed-by: Venkatesh Srinivas <venkateshs@google.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2016-07-14i40e: enable VSI broadcast promiscuous mode instead of adding broadcast filterKiran Patil
This patch sets VSI broadcast promiscuous mode during VSI add sequence and prevents adding MAC filter if specified MAC address is broadcast. Change-ID: Ia62251fca095bc449d0497fc44bec3a5a0136773 Signed-off-by: Kiran Patil <kiran.patil@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2016-07-14i40e/i40evf: Fix i40e_rx_checksumAlexander Duyck
There are a couple of issues I found in i40e_rx_checksum while doing some recent testing. As a result I have found the Rx checksum logic is pretty much broken and returning that the checksum is valid for tunnels in cases where it is not. First the inner types are not the correct values to use to test for if a tunnel is present or not. In addition the inner protocol types are not a bitmask as such performing an OR of the values doesn't make sense. I have instead changed the code so that the inner protocol types are used to determine if we report CHECKSUM_UNNECESSARY or not. For anything that does not end in UDP, TCP, or SCTP it doesn't make much sense to report a checksum offload since it won't contain a checksum anyway. This leaves us with the need to set the csum_level based on some value. For that purpose I am using the tunnel_type field. If the tunnel type is GRENAT or greater then this means we have a GRE or UDP tunnel with an inner header. In the case of GRE or UDP we will have a possible checksum present so for this reason it should be safe to set the csum_level to 1 to indicate that we are reporting the state of the inner header. Signed-off-by: Alexander Duyck <aduyck@mirantis.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2016-07-14bonding: set carrier off for devices created through netlinkBeniamino Galvani
Commit e826eafa65c6 ("bonding: Call netif_carrier_off after register_netdevice") moved netif_carrier_off() from bond_init() to bond_create(), but the latter is called only for initial default devices and ones created through sysfs: $ modprobe bonding $ echo +bond1 > /sys/class/net/bonding_masters $ ip link add bond2 type bond $ grep "MII Status" /proc/net/bonding/* /proc/net/bonding/bond0:MII Status: down /proc/net/bonding/bond1:MII Status: down /proc/net/bonding/bond2:MII Status: up Ensure that carrier is initially off also for devices created through netlink. Signed-off-by: Beniamino Galvani <bgalvani@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-13Merge branch 'sk_filter-trim-limit'David S. Miller
Willem de Bruijn says: ==================== limit sk_filter trim to payload Sockets can apply a filter to incoming packets to drop or trim them. Fix two codepaths that call skb_pull/__skb_pull after sk_filter without checking for packet length. Reading beyond skb->tail after trimming happens in more codepaths, but safety of reading in the linear segment is based on minimum allocation size (MAX_HEADER, GRO_MAX_HEAD, ..). ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-13dccp: limit sk_filter trim to payloadWillem de Bruijn
Dccp verifies packet integrity, including length, at initial rcv in dccp_invalid_packet, later pulls headers in dccp_enqueue_skb. A call to sk_filter in-between can cause __skb_pull to wrap skb->len. skb_copy_datagram_msg interprets this as a negative value, so (correctly) fails with EFAULT. The negative length is reported in ioctl SIOCINQ or possibly in a DCCP_WARN in dccp_close. Introduce an sk_receive_skb variant that caps how small a filter program can trim packets, and call this in dccp with the header length. Excessively trimmed packets are now processed normally and queued for reception as 0B payloads. Fixes: 7c657876b63c ("[DCCP]: Initial implementation") Signed-off-by: Willem de Bruijn <willemb@google.com> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-13rose: limit sk_filter trim to payloadWillem de Bruijn
Sockets can have a filter program attached that drops or trims incoming packets based on the filter program return value. Rose requires data packets to have at least ROSE_MIN_LEN bytes. It verifies this on arrival in rose_route_frame and unconditionally pulls the bytes in rose_recvmsg. The filter can trim packets to below this value in-between, causing pull to fail, leaving the partial header at the time of skb_copy_datagram_msg. Place a lower bound on the size to which sk_filter may trim packets by introducing sk_filter_trim_cap and call this for rose packets. Signed-off-by: Willem de Bruijn <willemb@google.com> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-13Merge branch 'mlx5-fixes'David S. Miller
Saeed Mahameed says: ==================== mlx5 tx timeout watchdog fixes This patch set provides two trivial fixes for the tx timeout series lately applied into net 4.7. From Daniel, detect stuck queues due to BQL From Mohamad, fix tx timeout watchdog false alarm Hopefully those two fixes will make it to -stable, assuming 3947ca185999 ('net/mlx5e: Implement ndo_tx_timeout callback') was also backported to -stable. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-13net/mlx5e: start/stop all tx queues upon open/close netdevMohamad Haj Yahia
Start all tx queues (including inactive ones) when opening the netdev. Stop all tx queues (including inactive ones) when closing the netdev. This is a workaround for the tx timeout watchdog false alarm issue in which the netdev watchdog is polling all the tx queues which may include inactive queues and thus once lowering the real tx queues number (ethtool -L) it will generate tx timeout watchdog false alarms. Fixes: 3947ca185999 ('net/mlx5e: Implement ndo_tx_timeout callback') Signed-off-by: Mohamad Haj Yahia <mohamad@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-13net/mlx5e: Fix TX Timeout to detect queues stuck on BQLDaniel Jurgens
Change netif_tx_queue_stopped to netif_xmit_stopped. This will show when queues are stopped due to byte queue limits. Fixes: 3947ca185999 ('net/mlx5e: Implement ndo_tx_timeout callback') Signed-off-by: Daniel Jurgens <danielj@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-12Merge branch 'ethoc-fixes'David S. Miller
Florian Fainelli says: ==================== net: ethoc: Error path and transmit fixes This patch series contains two patches for the ethoc driver while testing on a TS-7300 board where ethoc is provided by an on-board FPGA. First patch was cooked after chasing crashes with invalid resources passed to the driver. Second patch was cooked after seeing that an interface configured with IP 192.168.2.2 was sending ARP packets for 192.168.0.0, no wonder why it could not work. I don't have access to any other platform using an ethoc interface so it could be good to some testing on Xtensa for instance. Changes in v3: - corrected the error path if skb_put_padto() fails, thanks to Max for spotting this! Changes in v2: - fixed the first commit message ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-12net: ethoc: Correctly pad short packetsFlorian Fainelli
Even though the hardware can be doing zero padding, we want the SKB to be going out on the wire with the appropriate size. This fixes packet truncations observed with e.g: ARP packets. Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-12net: ethoc: Fix early error pathsFlorian Fainelli
In case any operation fails before we can successfully go the point where we would register a MDIO bus, we would be going to an error label which involves unregistering then freeing this yet to be created MDIO bus. Update all error paths to go to label free which is the only one valid until either the clock is enabled, or the MDIO bus is allocated and registered. This fixes kernel oops observed while trying to dereference the MDIO bus structure which is not yet allocated. Fixes: a1702857724f ("net: Add support for the OpenCores 10/100 Mbps Ethernet MAC.") Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-12net: nps_enet: Fix PCS resetNoam Camus
During commit b54b8c2d6e3c ("net: ezchip: adapt driver to little endian architecture") adapting to little endian architecture, zeroing of controller was left out. Signed-off-by: Elad Kanfi <eladkan@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-12Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nfDavid S. Miller
Pablo Neira Ayuso says: ==================== Netfilter/IPVS fixes for net The following patchset contains Netfilter/IPVS fixes for your net tree. they are: 1) Fix leak in the error path of nft_expr_init(), from Liping Zhang. 2) Tracing from nf_tables cannot be disabled, also from Zhang. 3) Fix an integer overflow on 32bit archs when setting the number of hashtable buckets, from Florian Westphal. 4) Fix configuration of ipvs sync in backup mode with IPv6 address, from Quentin Armitage via Simon Horman. 5) Fix incorrect timeout calculation in nft_ct NFT_CT_EXPIRATION, from Florian Westphal. 6) Skip clash resolution in conntrack insertion races if NAT is in place. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-12netfilter: conntrack: skip clash resolution if nat is in placePablo Neira Ayuso
The clash resolution is not easy to apply if the NAT table is registered. Even if no NAT rules are installed, the nul-binding ensures that a unique tuple is used, thus, the packet that loses race gets a different source port number, as described by: http://marc.info/?l=netfilter-devel&m=146818011604484&w=2 Clash resolution with NAT is also problematic if addresses/port range ports are used since the conntrack that wins race may describe a different mangling that we may have earlier applied to the packet via nf_nat_setup_info(). Fixes: 71d8c47fc653 ("netfilter: conntrack: introduce clash resolution on insertion race") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Tested-by: Marc Dionne <marc.c.dionne@gmail.com>
2016-07-11Merge branch 'tipc-fixes'David S. Miller
Jon Maloy says: ==================== tipc: three small fixes Fixes for some broadcast link problems that may occur in large systems. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-11tipc: reset all unicast links when broadcast send link failsJon Paul Maloy
In test situations with many nodes and a heavily stressed system we have observed that the transmission broadcast link may fail due to an excessive number of retransmissions of the same packet. In such situations we need to reset all unicast links to all peers, in order to reset and re-synchronize the broadcast link. In this commit, we add a new function tipc_bearer_reset_all() to be used in such situations. The function scans across all bearers and resets all their pertaining links. Acked-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-11tipc: ensure correct broadcast send buffer release when peer is lostJon Paul Maloy
After a new receiver peer has been added to the broadcast transmission link, we allow immediate transmission of new broadcast packets, trusting that the new peer will not accept the packets until it has received the previously sent unicast broadcast initialiation message. In the same way, the sender must not accept any acknowledges until it has itself received the broadcast initialization from the peer, as well as confirmation of the reception of its own initialization message. Furthermore, when a receiver peer goes down, the sender has to produce the missing acknowledges from the lost peer locally, in order ensure correct release of the buffers that were expected to be acknowledged by the said peer. In a highly stressed system we have observed that contact with a peer may come up and be lost before the above mentioned broadcast initial- ization and confirmation have been received. This leads to the locally produced acknowledges being rejected, and the non-acknowledged buffers to linger in the broadcast link transmission queue until it fills up and the link goes into permanent congestion. In this commit, we remedy this by temporarily setting the corresponding broadcast receive link state to ESTABLISHED and the 'bc_peer_is_up' state to true before we issue the local acknowledges. This ensures that those acknowledges will always be accepted. The mentioned state values are restored immediately afterwards when the link is reset. Acked-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-11tipc: extend broadcast link initialization criteriaJon Paul Maloy
At first contact between two nodes, an endpoint might sometimes have time to send out a LINK_PROTOCOL/STATE packet before it has received the broadcast initialization packet from the peer, i.e., before it has received a valid broadcast packet number to add to the 'bc_ack' field of the protocol message. This means that the peer endpoint will receive a protocol packet with an invalid broadcast acknowledge value of 0. Under unlucky circumstances this may lead to the original, already received acknowledge value being overwritten, so that the whole broadcast link goes stale after a while. We fix this by delaying the setting of the link field 'bc_peer_is_up' until we know that the peer really has received our own broadcast initialization message. The latter is always sent out as the first unicast message on a link, and always with seqeunce number 1. Because of this, we only need to look for a non-zero unicast acknowledge value in the arriving STATE messages, and once that is confirmed we know we are safe and can set the mentioned field. Before this moment, we must ignore all broadcast acknowledges from the peer. Acked-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-11r8152: Add support for setting pass through MAC address on RTL8153-ADMario Limonciello
The RTL8153-AD supports a persistent system specific MAC address. This means a device plugged into two different systems with host side support will show different (but persistent) MAC addresses. This information for the system's persistent MAC address is burned in when the system HW is built and available under \_SB.AMAC in the DSDT at runtime. This technology is currently implemented in the Dell TB15 and WD15 Type-C docks. More information is available here: http://www.dell.com/support/article/us/en/04/SLN301147 Signed-off-by: Mario Limonciello <mario_limonciello@dell.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-11sock: ignore SCM_RIGHTS and SCM_CREDENTIALS in __sock_cmsg_sendSoheil Hassas Yeganeh
Sergei Trofimovich reported that pulse audio sends SCM_CREDENTIALS as a control message to TCP. Since __sock_cmsg_send does not support SCM_RIGHTS and SCM_CREDENTIALS, it returns an error and hence breaks pulse audio over TCP. SCM_RIGHTS and SCM_CREDENTIALS are sent on the SOL_SOCKET layer but they semantically belong to SOL_UNIX. Since all cmsg-processing functions including sock_cmsg_send ignore control messages of other layers, it is best to ignore SCM_RIGHTS and SCM_CREDENTIALS for consistency (and also for fixing pulse audio over TCP). Fixes: c14ac9451c34 ("sock: enable timestamping using control messages") Signed-off-by: Soheil Hassas Yeganeh <soheil@google.com> Reported-by: Sergei Trofimovich <slyfox@gentoo.org> Tested-by: Sergei Trofimovich <slyfox@gentoo.org> Cc: Eric Dumazet <edumazet@google.com> Cc: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-11ipv4: reject RTNH_F_DEAD and RTNH_F_LINKDOWN from user spaceJulian Anastasov
Vegard Nossum is reporting for a crash in fib_dump_info when nh_dev = NULL and fib_nhs == 1: Pid: 50, comm: netlink.exe Not tainted 4.7.0-rc5+ RIP: 0033:[<00000000602b3d18>] RSP: 0000000062623890 EFLAGS: 00010202 RAX: 0000000000000000 RBX: 000000006261b800 RCX: 0000000000000000 RDX: 0000000000000000 RSI: 0000000000000024 RDI: 000000006245ba00 RBP: 00000000626238f0 R08: 000000000000029c R09: 0000000000000000 R10: 0000000062468038 R11: 000000006245ba00 R12: 000000006245ba00 R13: 00000000625f96c0 R14: 00000000601e16f0 R15: 0000000000000000 Kernel panic - not syncing: Kernel mode fault at addr 0x2e0, ip 0x602b3d18 CPU: 0 PID: 50 Comm: netlink.exe Not tainted 4.7.0-rc5+ #581 Stack: 626238f0 960226a02 00000400 000000fe 62623910 600afca7 62623970 62623a48 62468038 00000018 00000000 00000000 Call Trace: [<602b3e93>] rtmsg_fib+0xd3/0x190 [<602b6680>] fib_table_insert+0x260/0x500 [<602b0e5d>] inet_rtm_newroute+0x4d/0x60 [<60250def>] rtnetlink_rcv_msg+0x8f/0x270 [<60267079>] netlink_rcv_skb+0xc9/0xe0 [<60250d4b>] rtnetlink_rcv+0x3b/0x50 [<60265400>] netlink_unicast+0x1a0/0x2c0 [<60265e47>] netlink_sendmsg+0x3f7/0x470 [<6021dc9a>] sock_sendmsg+0x3a/0x90 [<6021e0d0>] ___sys_sendmsg+0x300/0x360 [<6021fa64>] __sys_sendmsg+0x54/0xa0 [<6021fac0>] SyS_sendmsg+0x10/0x20 [<6001ea68>] handle_syscall+0x88/0x90 [<600295fd>] userspace+0x3fd/0x500 [<6001ac55>] fork_handler+0x85/0x90 $ addr2line -e vmlinux -i 0x602b3d18 include/linux/inetdevice.h:222 net/ipv4/fib_semantics.c:1264 Problem happens when RTNH_F_LINKDOWN is provided from user space when creating routes that do not use the flag, catched with netlink fuzzer. Currently, the kernel allows user space to set both flags to nh_flags and fib_flags but this is not intentional, the assumption was that they are not set. Fix this by rejecting both flags with EINVAL. Reported-by: Vegard Nossum <vegard.nossum@oracle.com> Fixes: 0eeb075fad73 ("net: ipv4 sysctl option to ignore routes when nexthop link is down") Signed-off-by: Julian Anastasov <ja@ssi.bg> Cc: Andy Gospodarek <gospo@cumulusnetworks.com> Cc: Dinesh Dutt <ddutt@cumulusnetworks.com> Cc: Scott Feldman <sfeldma@gmail.com> Reviewed-by: Andy Gospodarek <gospo@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-11tcp: make challenge acks less predictableEric Dumazet
Yue Cao claims that current host rate limiting of challenge ACKS (RFC 5961) could leak enough information to allow a patient attacker to hijack TCP sessions. He will soon provide details in an academic paper. This patch increases the default limit from 100 to 1000, and adds some randomization so that the attacker can no longer hijack sessions without spending a considerable amount of probes. Based on initial analysis and patch from Linus. Note that we also have per socket rate limiting, so it is tempting to remove the host limit in the future. v2: randomize the count of challenge acks per second, not the period. Fixes: 282f23c6ee34 ("tcp: implement RFC 5961 3.2") Reported-by: Yue Cao <ycao009@ucr.edu> Signed-off-by: Eric Dumazet <edumazet@google.com> Suggested-by: Linus Torvalds <torvalds@linux-foundation.org> Cc: Yuchung Cheng <ycheng@google.com> Cc: Neal Cardwell <ncardwell@google.com> Acked-by: Neal Cardwell <ncardwell@google.com> Acked-by: Yuchung Cheng <ycheng@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-11udp: prevent bugcheck if filter truncates packet too muchMichal Kubeček
If socket filter truncates an udp packet below the length of UDP header in udpv6_queue_rcv_skb() or udp_queue_rcv_skb(), it will trigger a BUG_ON in skb_pull_rcsum(). This BUG_ON (and therefore a system crash if kernel is configured that way) can be easily enforced by an unprivileged user which was reported as CVE-2016-6162. For a reproducer, see http://seclists.org/oss-sec/2016/q3/8 Fixes: e6afc8ace6dd ("udp: remove headers from UDP packets before queueing") Reported-by: Marco Grassi <marco.gra@gmail.com> Signed-off-by: Michal Kubecek <mkubecek@suse.cz> Acked-by: Eric Dumazet <edumazet@google.com> Acked-by: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-11bnxt_en: initialize rc to zero to avoid returning garbageColin Ian King
rc is not initialized so it can contain garbage if it is not set by the call to bnxt_read_sfp_module_eeprom_info. Ensure garbage is not returned by initializing rc to 0. Signed-off-by: Colin Ian King <colin.king@canonical.com> Acked-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-11Merge tag 'batadv-net-for-davem-20160708' of git://git.open-mesh.org/linux-mergeDavid S. Miller
Simon Wunderlich says: ==================== Here are a couple batman-adv bugfix patches, all by Sven Eckelmann: - Fix possible NULL pointer dereference for vlan_insert_tag (two patches) - Fix reference handling in some features, which may lead to reference leaks or invalid memory access (four patches) - Fix speedy join: DHCP packets handled by the gateway feature should be sent with 4-address unicast instead of 3-address unicast to make speedy join work. This fixes/speeds up DHCP assignment for clients which join a mesh for the first time. (one patch) ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-11Merge tag 'ipvs-fixes2-for-v4.7' of ↵Pablo Neira Ayuso
https://git.kernel.org/pub/scm/linux/kernel/git/horms/ipvs Simon Horman says: ==================== Second Round of IPVS Fixes for v4.7 The fix from Quentin Armitage allows the backup sync daemon to be bound to a link-local mcast IPv6 address as is already the case for IPv4. ==================== Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2016-07-09dccp: avoid deadlock in dccp_v4_ctl_send_resetEric Dumazet
In the prep work I did before enabling BH while handling socket backlog, I missed two points in DCCP : 1) dccp_v4_ctl_send_reset() uses bh_lock_sock(), assuming BH were blocked. It is not anymore always true. 2) dccp_v4_route_skb() was using __IP_INC_STATS() instead of IP_INC_STATS() A similar fix was done for TCP, in commit 47dcc20a39d0 ("ipv4: tcp: ip_send_unicast_reply() is not BH safe") Fixes: 7309f8821fd6 ("dccp: do not assume DCCP code is non preemptible") Fixes: 5413d1babe8f ("net: do not block BH while processing socket backlog") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: Dmitry Vyukov <dvyukov@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>