summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2017-05-21Linux 4.12-rc2Linus Torvalds
2017-05-21x86: fix 32-bit case of __get_user_asm_u64()Linus Torvalds
The code to fetch a 64-bit value from user space was entirely buggered, and has been since the code was merged in early 2016 in commit b2f680380ddf ("x86/mm/32: Add support for 64-bit __get_user() on 32-bit kernels"). Happily the buggered routine is almost certainly entirely unused, since the normal way to access user space memory is just with the non-inlined "get_user()", and the inlined version didn't even historically exist. The normal "get_user()" case is handled by external hand-written asm in arch/x86/lib/getuser.S that doesn't have either of these issues. There were two independent bugs in __get_user_asm_u64(): - it still did the STAC/CLAC user space access marking, even though that is now done by the wrapper macros, see commit 11f1a4b9755f ("x86: reorganize SMAP handling in user space accesses"). This didn't result in a semantic error, it just means that the inlined optimized version was hugely less efficient than the allegedly slower standard version, since the CLAC/STAC overhead is quite high on modern Intel CPU's. - the double register %eax/%edx was marked as an output, but the %eax part of it was touched early in the asm, and could thus clobber other inputs to the asm that gcc didn't expect it to touch. In particular, that meant that the generated code could look like this: mov (%eax),%eax mov 0x4(%eax),%edx where the load of %edx obviously was _supposed_ to be from the 32-bit word that followed the source of %eax, but because %eax was overwritten by the first instruction, the source of %edx was basically random garbage. The fixes are trivial: remove the extraneous STAC/CLAC entries, and mark the 64-bit output as early-clobber to let gcc know that no inputs should alias with the output register. Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Benjamin LaHaise <bcrl@kvack.org> Cc: Ingo Molnar <mingo@kernel.org> Cc: stable@kernel.org # v4.8+ Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-05-21Clean up x86 unsafe_get/put_user() type handlingLinus Torvalds
Al noticed that unsafe_put_user() had type problems, and fixed them in commit a7cc722fff0b ("fix unsafe_put_user()"), which made me look more at those functions. It turns out that unsafe_get_user() had a type issue too: it limited the largest size of the type it could handle to "unsigned long". Which is fine with the current users, but doesn't match our existing normal get_user() semantics, which can also handle "u64" even when that does not fit in a long. While at it, also clean up the type cast in unsafe_put_user(). We actually want to just make it an assignment to the expected type of the pointer, because we actually do want warnings from types that don't convert silently. And it makes the code more readable by not having that one very long and complex line. [ This patch might become stable material if we ever end up back-porting any new users of the unsafe uaccess code, but as things stand now this doesn't matter for any current existing uses. ] Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-05-21Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull misc uaccess fixes from Al Viro: "Fix for unsafe_put_user() (no callers currently in mainline, but anyone starting to use it will step into that) + alpha osf_wait4() infoleak fix" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: osf_wait4(): fix infoleak fix unsafe_put_user()
2017-05-21Merge branch 'sched-urgent-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull scheduler fix from Thomas Gleixner: "A single scheduler fix: Prevent idle task from ever being preempted. That makes sure that synchronize_rcu_tasks() which is ignoring idle task does not pretend that no task is stuck in preempted state. If that happens and idle was preempted on a ftrace trampoline the machine crashes due to inconsistent state" * 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: sched/core: Call __schedule() from do_idle() without enabling preemption
2017-05-21Merge branch 'irq-urgent-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull irq fixes from Thomas Gleixner: "A set of small fixes for the irq subsystem: - Cure a data ordering problem with chained interrupts - Three small fixlets for the mbigen irq chip" * 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: genirq: Fix chained interrupt data ordering irqchip/mbigen: Fix the clear register offset calculation irqchip/mbigen: Fix potential NULL dereferencing irqchip/mbigen: Fix memory mapping code
2017-05-21tcp: fix tcp_probe_timer() for TCP_USER_TIMEOUTEric Dumazet
TCP_USER_TIMEOUT is still converted to jiffies value in icsk_user_timeout So we need to make a conversion for the cases HZ != 1000 Fixes: 9a568de4818d ("tcp: switch TCP TS option (RFC 7323) to 1ms clock") Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Soheil Hassas Yeganeh <soheil@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-05-21ipv6: drop unused variables in seg6_genl_dumphacstephen hemminger
THe seg6_pernet_data variable was set but never used. Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-05-21fou: make local function staticstephen hemminger
The build header functions are not used by any other code. net/ipv6/fou6.c:36:5: warning: no previous prototype for ‘fou6_build_header’ [-Wmissing-prototypes] net/ipv6/fou6.c:54:5: warning: no previous prototype for ‘gue6_build_header’ [-Wmissing-prototypes] Need to do some code rearranging to satisfy different Kconfig possiblities. Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-05-21tcpnv: do not export local functionstephen hemminger
The TCP New Vegas congestion control was exporting an internal function tcpnv_get_info which is not used by any other in tree kernel code. Make it static. Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-05-21inet: fix warning about missing prototypestephen hemminger
The prototype for inet_rcv_saddr_equal was not being included. Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-05-21ila: propagate error code in ila_outputstephen hemminger
This warning: net/ipv6/ila/ila_lwt.c: In function ‘ila_output’: net/ipv6/ila/ila_lwt.c:42:6: warning: variable ‘err’ set but not used [-Wunused-but-set-variable] It looks like the code attempts to set propagate different error values, but always returned -EINVAL. Compile tested only. Needs review by original author. Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-05-21dcb: enforce minimum length on IEEE_APPS attributestephen hemminger
Found by reviewing the warning about unused policy table. The code implies that it meant to check for size, but since it unrolled the loop for attribute validation that is never used. Instead do explicit check for attribute. Compile tested only. Needs review by original author. Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-05-21Merge branch 'net-extend-socket-timestamping-API'David S. Miller
Miroslav Lichvar says: ==================== Extend socket timestamping API Changes v5->v6: - fixed skb_is_swtx_tstamp() when OPT_TX_SWHW is disabled and improved its description - improved OPT_PKTINFO documentation - improved scm_timestamping documentation Changes v4->v5: - fixed initialization of reserved fields in struct scm_ts_pktinfo Changes v3->v4: - added reserved fields to struct scm_ts_pktinfo - replaced patch fixing false SW timestamps with a documentation fix - updated OPT_TX_SWHW patch to handle false SW timestamps Changes v2->v3: - modified struct scm_ts_pktinfo to use fixed-width integer types - added WARN_ON_ONCE for missing RCU lock in dev_get_by_napi_id() - modified dev_get_by_napi_id() to not return dev in unexpected branch - modified recv to return SCM_TIMESTAMPING_PKTINFO even if the interface index is unknown Changes v1->v2: - added separate patch for new NAPI functions - split code from __sock_recv_timestamp() for better readability - fixed RCU locking - fixed compiler warning (missing case in switch in first patch) - inline sw_tx_timestamp() in its only user Changes RFC->v1: - reworked SOF_TIMESTAMPING_OPT_PKTINFO patch to not add new fields to skb shared info (net device is now looked up by napi_id), not require any changes in drivers, and restrict the cmsg to incoming packets - renamed SOF_TIMESTAMPING_OPT_MULTIMSG to SOF_TIMESTAMPING_OPT_TX_SWHW and fixed its description - moved struct scm_ts_pktinfo from errqueue.h to net_tstamp.h as it can't be received from the error queue anymore - improved commit descriptions and removed incorrect comment This patchset adds new options to the timestamping API that will be useful for NTP implementations and possibly other applications. The first patch specifies a timestamp filter for NTP packets. The second patch updates drivers that can timestamp all packets, or need to list the filter as unsupported. There is no attempt to add the support to the phyter driver. The third patch adds two helper functions working with NAPI ID, which is needed by the next patch. The fourth patch adds a new option to get a new control message with the L2 length and interface index for incoming packets with hardware timestamps. The fifth patch fixes documentation on number of non-zero fields in scm_timestamping and warns about false software timestamps when SO_TIMESTAMP(NS) is combined with SCM_TIMESTAMPING. The sixth patch adds a new option to request both software and hardware timestamps for outgoing packets. The seventh patch updates drivers that assumed software timestamping cannot be used together with hardware timestamping. The patches have been tested on x86_64 machines with igb and e1000e drivers. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2017-05-21net: ethernet: update drivers to make both SW and HW TX timestampsMiroslav Lichvar
Some drivers were calling the skb_tx_timestamp() function only when a hardware timestamp was not requested. Now that applications can use the SOF_TIMESTAMPING_OPT_TX_SWHW option to request both software and hardware timestamps, the drivers need to be modified to unconditionally call skb_tx_timestamp(). CC: Richard Cochran <richardcochran@gmail.com> CC: Willem de Bruijn <willemb@google.com> Signed-off-by: Miroslav Lichvar <mlichvar@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-05-21net: allow simultaneous SW and HW transmit timestampingMiroslav Lichvar
Add SOF_TIMESTAMPING_OPT_TX_SWHW option to allow an outgoing packet to be looped to the socket's error queue with a software timestamp even when a hardware transmit timestamp is expected to be provided by the driver. Applications using this option will receive two separate messages from the error queue, one with a software timestamp and the other with a hardware timestamp. As the hardware timestamp is saved to the shared skb info, which may happen before the first message with software timestamp is received by the application, the hardware timestamp is copied to the SCM_TIMESTAMPING control message only when the skb has no software timestamp or it is an incoming packet. While changing sw_tx_timestamp(), inline it in skb_tx_timestamp() as there are no other users. CC: Richard Cochran <richardcochran@gmail.com> CC: Willem de Bruijn <willemb@google.com> Signed-off-by: Miroslav Lichvar <mlichvar@redhat.com> Acked-by: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-05-21net: fix documentation of struct scm_timestampingMiroslav Lichvar
The scm_timestamping struct may return multiple non-zero fields, e.g. when both software and hardware RX timestamping is enabled, or when the SO_TIMESTAMP(NS) option is combined with SCM_TIMESTAMPING and a false software timestamp is generated in the recvmsg() call in order to always return a SCM_TIMESTAMP(NS) message. CC: Richard Cochran <richardcochran@gmail.com> CC: Willem de Bruijn <willemb@google.com> Signed-off-by: Miroslav Lichvar <mlichvar@redhat.com> Acked-by: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-05-21net: add new control message for incoming HW-timestamped packetsMiroslav Lichvar
Add SOF_TIMESTAMPING_OPT_PKTINFO option to request a new control message for incoming packets with hardware timestamps. It contains the index of the real interface which received the packet and the length of the packet at layer 2. The index is useful with bonding, bridges and other interfaces, where IP_PKTINFO doesn't allow applications to determine which PHC made the timestamp. With the L2 length (and link speed) it is possible to transpose preamble timestamps to trailer timestamps, which are used in the NTP protocol. While this information could be provided by two new socket options independently from timestamping, it doesn't look like they would be very useful. With this option any performance impact is limited to hardware timestamping. Use dev_get_by_napi_id() to get the device and its index. On kernels with disabled CONFIG_NET_RX_BUSY_POLL or drivers not using NAPI, a zero index will be returned in the control message. CC: Richard Cochran <richardcochran@gmail.com> Acked-by: Willem de Bruijn <willemb@google.com> Signed-off-by: Miroslav Lichvar <mlichvar@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-05-21net: add function to retrieve original skb device using NAPI IDMiroslav Lichvar
Since commit b68581778cd0 ("net: Make skb->skb_iif always track skb->dev") skbs don't have the original index of the interface which received the packet. This information is now needed for a new control message related to hardware timestamping. Instead of adding a new field to skb, we can find the device by the NAPI ID if it is available, i.e. CONFIG_NET_RX_BUSY_POLL is enabled and the driver is using NAPI. Add dev_get_by_napi_id() and also skb_napi_id() to hide the CONFIG_NET_RX_BUSY_POLL ifdef. CC: Richard Cochran <richardcochran@gmail.com> Suggested-by: Willem de Bruijn <willemb@google.com> Acked-by: Willem de Bruijn <willemb@google.com> Signed-off-by: Miroslav Lichvar <mlichvar@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-05-21net: ethernet: update drivers to handle HWTSTAMP_FILTER_NTP_ALLMiroslav Lichvar
Include HWTSTAMP_FILTER_NTP_ALL in net_hwtstamp_validate() as a valid filter and update drivers which can timestamp all packets, or which explicitly list unsupported filters instead of using a default case, to handle the filter. CC: Richard Cochran <richardcochran@gmail.com> CC: Willem de Bruijn <willemb@google.com> Signed-off-by: Miroslav Lichvar <mlichvar@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-05-21net: define receive timestamp filter for NTPMiroslav Lichvar
Add HWTSTAMP_FILTER_NTP_ALL to the hwtstamp_rx_filters enum for timestamping of NTP packets. There is currently only one driver (phyter) that could support it directly. CC: Richard Cochran <richardcochran@gmail.com> CC: Willem de Bruijn <willemb@google.com> Signed-off-by: Miroslav Lichvar <mlichvar@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-05-21bridge: start hello_timer when enabling KERNEL_STP in br_stp_startXin Long
Since commit 76b91c32dd86 ("bridge: stp: when using userspace stp stop kernel hello and hold timers"), bridge would not start hello_timer if stp_enabled is not KERNEL_STP when br_dev_open. The problem is even if users set stp_enabled with KERNEL_STP later, the timer will still not be started. It causes that KERNEL_STP can not really work. Users have to re-ifup the bridge to avoid this. This patch is to fix it by starting br->hello_timer when enabling KERNEL_STP in br_stp_start. As an improvement, it's also to start hello_timer again only when br->stp_enabled is KERNEL_STP in br_hello_timer_expired, there is no reason to start the timer again when it's NO_STP. Fixes: 76b91c32dd86 ("bridge: stp: when using userspace stp stop kernel hello and hold timers") Reported-by: Haidong Li <haili@redhat.com> Signed-off-by: Xin Long <lucien.xin@gmail.com> Acked-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Reviewed-by: Ivan Vecera <cera@cera.cz> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-05-21smsc95xx: Support only IPv4 TCP/UDP csum offloadNisar Sayed
When TX checksum offload is used, if the computed checksum is 0 the LAN95xx device do not alter the checksum to 0xffff. In the case of ipv4 UDP checksum, it indicates to receiver that no checksum is calculated. Under ipv6, UDP checksum yields a result of zero must be changed to 0xffff. Hence disabling checksum offload for ipv6 packets. Signed-off-by: Nisar Sayed <Nisar.Sayed@microchip.com> Reported-by: popcorn mix <popcornmix@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-05-21cxgb4 : retrieve port information from firmwareGanesh Goudar
issue get port information command to firmware to retrieve port information and update if it is different from what was last recorded and also add indication for supported link modes for firmware port types FW_PORT_TYPE_SFP28, FW_PORT_TYPE_KR_SFP28, FW_PORT_TYPE_CR4_QSFP. Based on the original work by Casey Leedom <leedom@chelsio.com> Signed-off-by: Casey Leedom <leedom@chelsio.com> Signed-off-by: Ganesh Goudar <ganeshgr@chelsio.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-05-21ibmveth: Support to enable LSO/CSO for Trunk VEA.Sivakumar Krishnasamy
Current largesend and checksum offload feature in ibmveth driver, - Source VM sends the TCP packets with ip_summed field set as CHECKSUM_PARTIAL and TCP pseudo header checksum is placed in checksum field - CHECKSUM_PARTIAL flag in SKB will enable ibmveth driver to mark "no checksum" and "checksum good" bits in transmit buffer descriptor before the packet is delivered to pseries PowerVM Hypervisor - If ibmveth has largesend capability enabled, transmit buffer descriptors are market accordingly before packet is delivered to Hypervisor (along with mss value for packets with length > MSS) - Destination VM's ibmveth driver receives the packet with "checksum good" bit set and so, SKB's ip_summed field is set with CHECKSUM_UNNECESSARY - If "largesend" bit was on, mss value is copied from receive descriptor into SKB's gso_size and other flags are appropriately set for packets > MSS size - The packet is now successfully delivered up the stack in destination VM The offloads described above works fine for TCP communication among VMs in the same pseries server ( VM A <=> PowerVM Hypervisor <=> VM B ) We are now enabling support for OVS in pseries PowerVM environment. One of our requirements is to have ibmveth driver configured in "Trunk" mode, when they are used with OVS. This is because, PowerVM Hypervisor will no more bridge the packets between VMs, instead the packets are delivered to IO Server which hosts OVS to bridge them between VMs or to external networks (flow shown below), VM A <=> PowerVM Hypervisor <=> IO Server(OVS) <=> PowerVM Hypervisor <=> VM B In "IO server" the packet is received by inbound Trunk ibmveth and then delivered to OVS, which is then bridged to outbound Trunk ibmveth (shown below), Inbound Trunk ibmveth <=> OVS <=> Outbound Trunk ibmveth In this model, we hit the following issues which impacted the VM communication performance, - Issue 1: ibmveth doesn't support largesend and checksum offload features when configured as "Trunk". Driver has explicit checks to prevent enabling these offloads. - Issue 2: SYN packet drops seen at destination VM. When the packet originates, it has CHECKSUM_PARTIAL flag set and as it gets delivered to IO server's inbound Trunk ibmveth, on validating "checksum good" bits in ibmveth receive routine, SKB's ip_summed field is set with CHECKSUM_UNNECESSARY flag. This packet is then bridged by OVS (or Linux Bridge) and delivered to outbound Trunk ibmveth. At this point the outbound ibmveth transmit routine will not set "no checksum" and "checksum good" bits in transmit buffer descriptor, as it does so only when the ip_summed field is CHECKSUM_PARTIAL. When this packet gets delivered to destination VM, TCP layer receives the packet with checksum value of 0 and with no checksum related flags in ip_summed field. This leads to packet drops. So, TCP connections never goes through fine. - Issue 3: First packet of a TCP connection will be dropped, if there is no OVS flow cached in datapath. OVS while trying to identify the flow, computes the checksum. The computed checksum will be invalid at the receiving end, as ibmveth transmit routine zeroes out the pseudo checksum value in the packet. This leads to packet drop. - Issue 4: ibmveth driver doesn't have support for SKB's with frag_list. When Physical NIC has GRO enabled and when OVS bridges these packets, OVS vport send code will end up calling dev_queue_xmit, which in turn calls validate_xmit_skb. In validate_xmit_skb routine, the larger packets will get segmented into MSS sized segments, if SKB has a frag_list and if the driver to which they are delivered to doesn't support NETIF_F_FRAGLIST feature. This patch addresses the above four issues, thereby enabling end to end largesend and checksum offload support for better performance. - Fix for Issue 1 : Remove checks which prevent enabling TCP largesend and checksum offloads. - Fix for Issue 2 : When ibmveth receives a packet with "checksum good" bit set and if its configured in Trunk mode, set appropriate SKB fields using skb_partial_csum_set (ip_summed field is set with CHECKSUM_PARTIAL) - Fix for Issue 3: Recompute the pseudo header checksum before sending the SKB up the stack. - Fix for Issue 4: Linearize the SKBs with frag_list. Though we end up allocating buffers and copying data, this fix gives upto 4X throughput increase. Note: All these fixes need to be dropped together as fixing just one of them will lead to other issues immediately (especially for Issues 1,2 & 3). Signed-off-by: Sivakumar Krishnasamy <ksiva@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-05-21Merge branch 'arp-always-override-existing-neigh-entries-with-gratuitous-ARP'David S. Miller
Ihar Hrachyshka says: ==================== arp: always override existing neigh entries with gratuitous ARP This patchset is spurred by discussion started at https://patchwork.ozlabs.org/patch/760372/ where we figured that there is no real reason for enforcing override by gratuitous ARP packets only when arp_accept is 1. Same should happen when it's 0 (the default value). changelog v2: handled review comments by Julian Anastasov - fixed a mistake in a comment; - postponed addr_type calculation to as late as possible. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2017-05-21arp: always override existing neigh entries with gratuitous ARPIhar Hrachyshka
Currently, when arp_accept is 1, we always override existing neigh entries with incoming gratuitous ARP replies. Otherwise, we override them only if new replies satisfy _locktime_ conditional (packets arrive not earlier than _locktime_ seconds since the last update to the neigh entry). The idea behind locktime is to pick the very first (=> close) reply received in a unicast burst when ARP proxies are used. This helps to avoid ARP thrashing where Linux would switch back and forth from one proxy to another. This logic has nothing to do with gratuitous ARP replies that are generally not aligned in time when multiple IP address carriers send them into network. This patch enforces overriding of existing neigh entries by all incoming gratuitous ARP packets, irrespective of their time of arrival. This will make the kernel honour all incoming gratuitous ARP packets. Signed-off-by: Ihar Hrachyshka <ihrachys@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-05-21arp: postpone addr_type calculation to as late as possibleIhar Hrachyshka
The addr_type retrieval can be costly, so it's worth trying to avoid its calculation as much as possible. This patch makes it calculated only for gratuitous ARP packets. This is especially important since later we may want to move is_garp calculation outside of arp_accept block, at which point the costly operation will be executed for all setups. The patch is the result of a discussion in net-dev: http://marc.info/?l=linux-netdev&m=149506354216994 Suggested-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Ihar Hrachyshka <ihrachys@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-05-21arp: decompose is_garp logic into a separate functionIhar Hrachyshka
The code is quite involving already to earn a separate function for itself. If anything, it helps arp_process readability. Signed-off-by: Ihar Hrachyshka <ihrachys@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-05-21arp: fixed error in a commentIhar Hrachyshka
the is_garp code deals just with gratuitous ARP packets, not every unsolicited packet. This patch is a result of a discussion in netdev: http://marc.info/?l=linux-netdev&m=149506354216994 Suggested-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Ihar Hrachyshka <ihrachys@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-05-21tcp: initialize rcv_mss to TCP_MIN_MSS instead of 0Wei Wang
When tcp_disconnect() is called, inet_csk_delack_init() sets icsk->icsk_ack.rcv_mss to 0. This could potentially cause tcp_recvmsg() => tcp_cleanup_rbuf() => __tcp_select_window() call path to have division by 0 issue. So this patch initializes rcv_mss to TCP_MIN_MSS instead of 0. Reported-by: Andrey Konovalov <andreyknvl@google.com> Signed-off-by: Wei Wang <weiwan@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: Yuchung Cheng <ycheng@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-05-21osf_wait4(): fix infoleakAl Viro
failing sys_wait4() won't fill struct rusage... Cc: stable@vger.kernel.org Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2017-05-21fix unsafe_put_user()Al Viro
__put_user_size() relies upon its first argument having the same type as what the second one points to; the only other user makes sure of that and unsafe_put_user() should do the same. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2017-05-21Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nfDavid S. Miller
Pablo Neira Ayuso says: ==================== Netfilter/IPVS fixes for net The following patchset contains Netfilter/IPVS fixes for your net tree, they are: 1) When using IPVS in direct-routing mode, normal traffic from the LVS host to a back-end server is sometimes incorrectly NATed on the way back into the LVS host. Patch to fix this from Julian Anastasov. 2) Calm down clang compilation warning in ctnetlink due to type mismatch, from Matthias Kaehlcke. 3) Do not re-setup NAT for conntracks that are already confirmed, this is fixing a problem that was introduced in the previous nf-next batch. Patch from Liping Zhang. 4) Do not allow conntrack helper removal from userspace cthelper infrastructure if already in used. This comes with an initial patch to introduce nf_conntrack_helper_put() that is required by this fix. From Liping Zhang. 5) Zero the pad when copying data to userspace, otherwise iptables fails to remove rules. This is a follow up on the patchset that sorts out the internal match/target structure pointer leak to userspace. Patch from the same author, Willem de Bruijn. This also comes with a build failure when CONFIG_COMPAT is not on, coming in the last patch of this series. 6) SYNPROXY crashes with conntrack entries that are created via ctnetlink, more specifically via conntrackd state sync. Patch from Eric Leblond. 7) RCU safe iteration on set element dumping in nf_tables, from Liping Zhang. 8) Missing sanitization of immediate date for the bitwise and cmp expressions in nf_tables. 9) Refcounting logic for chain and objects from set elements does not integrate into the nf_tables 2-phase commit protocol. 10) Missing sanitization of target verdict in ebtables arpreply target, from Gao Feng. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2017-05-21Merge branch 'qed-next'David S. Miller
Yuval Mintz says: ==================== qed/qede updates This series contains some general minor fixes and enhancements: - #1, #2 and #9 correct small missing ethtool functionality. - #3, #6 and #8 correct minor issues in driver, but those are either print-related or unexposed in existing code. - #4 adds proper support to TLB mode bonding. - #10 is meant to improve performance on varying cache-line sizes. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2017-05-21qede: Support 1G advertisment.Sudarsana Reddy Kalluru
Some variants of adapters support the 1G speed capability. Need to allow the configuration of 1G speed if adapter supports it. Signed-off-by: Sudarsana Reddy Kalluru <Sudarsana.Kalluru@cavium.com> Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-05-21qed: Fix setting of Management bitfieldsTomer Tayar
The management firmware HSI contains masks which are already shifted to their right place, so QED_MFW_SET_FIELD() is clearing incorrect fields by shifting the mask by the offset. Luckily, today we set the fields in an incrementing order [so we're not erasing any previously set fields], but this still needs fixing. Signed-off-by: Tomer Tayar <Tomer.Tayar@cavium.com> Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-05-21qede: qedr closure after setting stateMintz, Yuval
This is benign, but it makes more sense to start the close sequence only after changing the internal state [in case it would once care]. Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-05-21qed: Correct print in iscsi error-flowMintz, Yuval
If too many CQs are requested, qed would print the available number as if it's a resource and not a feature leading to the wrong print. Fixes: 08737a3fa30a ("qed: Inform qedi the number of possible CQs") Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-05-21qed: Revise alloc/setup/free flowTomer Tayar
Re-organize the logic that allocates and frees memory of various sub-components of the hw-function - a. No need to pass pointers to said structure as parameters; The internal logic knows exactly where to find/set the data. b. Nullify pointers after cleanup to prevent possible errors to re-entrant code. Signed-off-by: Tomer Tayar <Tomer.Tayar@cavium.com> Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-05-21qede: Don't use an internal MAC fieldMintz, Yuval
Driver maintains its primary MAC in a private field which gets updated when ndo_dev_set_mac() gets called. However, there are flows where the primary MAC of the device can change without said NDO being called [bond device in TLB mode configuring slaves' addresses], resulting in a configuration where there's a mismatch between what's apparent to user [the netdevice's value] and what's configured in the HW [the private value]. As we don't have any real motivation of maintaining this private field, simply remove it and start using the netdevice's field instead. Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-05-21qede: Add missing Status-block freeSudarsana Reddy Kalluru
When destroying the datapath channels, qede doesn't notify qed of the released status blocks which were acquired during the initialization. Signed-off-by: Sudarsana Reddy Kalluru <Sudarsana.Kalluru@cavium.com> Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-05-21qede: Honor user request for Tx buffersSudarsana Reddy Kalluru
Driver always allocates the maximal number of tx-buffers irrespective of actual Tx ring config. Signed-off-by: Sudarsana Reddy Kalluru <Sudarsana.Kalluru@cavium.com> Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-05-21qede: Allow WoL to activate by defaultMintz, Yuval
When management firmware declares that the device is WoL-capable, the default driver behavior would be to allow the management firmware to take the decision of whether it's actually needed or not. Problem is ethtool interface doesn't have a 'default' kind of option, and user would see the interface WoL as disabled, which doesn't accurately reflect the actual configuration. More-so, if the user actually wants to explicitly disable WoL he'd have to first enable it [otherwise ethtool would block the command]. Instead of allowing management to make the decision, enable WoL by default on all devices capable of it. Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-05-20Merge tag 'trace-v4.12-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace Pull tracing fixes from Steven Rostedt: - Fix a bug caused by not cleaning up the new instance unique triggers when deleting an instance. It also creates a selftest that triggers that bug. - Fix the delayed optimization happening after kprobes boot up self tests being removed by freeing of init memory. - Comment kprobes on why the delay optimization is not a problem for removal of modules, to keep other developers from searching that riddle. - Fix another case of rcu not watching in stack trace tracing. * tag 'trace-v4.12-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: tracing: Make sure RCU is watching before calling a stack trace kprobes: Document how optimized kprobes are removed from module unload selftests/ftrace: Add test to remove instance with active event triggers selftests/ftrace: Fix bashisms ftrace: Remove #ifdef from code and add clear_ftrace_function_probes() stub ftrace/instances: Clear function triggers when removing instances ftrace: Simplify glob handling in unregister_ftrace_function_probe_func() tracing/kprobes: Enforce kprobes teardown after testing tracing: Move postpone selftests to core from early_initcall
2017-05-20Merge branch 'for-linus' of git://git.kernel.dk/linux-blockLinus Torvalds
Pull block fixes from Jens Axboe: "A small collection of fixes that should go into this cycle. - a pull request from Christoph for NVMe, which ended up being manually applied to avoid pulling in newer bits in master. Mostly fibre channel fixes from James, but also a few fixes from Jon and Vijay - a pull request from Konrad, with just a single fix for xen-blkback from Gustavo. - a fuseblk bdi fix from Jan, fixing a regression in this series with the dynamic backing devices. - a blktrace fix from Shaohua, replacing sscanf() with kstrtoull(). - a request leak fix for drbd from Lars, fixing a regression in the last series with the kref changes. This will go to stable as well" * 'for-linus' of git://git.kernel.dk/linux-block: nvmet: release the sq ref on rdma read errors nvmet-fc: remove target cpu scheduling flag nvme-fc: stop queues on error detection nvme-fc: require target or discovery role for fc-nvme targets nvme-fc: correct port role bits nvme: unmap CMB and remove sysfs file in reset path blktrace: fix integer parse fuseblk: Fix warning in super_setup_bdi_name() block: xen-blkback: add null check to avoid null pointer dereference drbd: fix request leak introduced by locking/atomic, kref: Kill kref_sub()
2017-05-20nvmet: release the sq ref on rdma read errorsVijay Immanuel
On rdma read errors, release the sq ref that was taken when the req was initialized. This avoids a hang in nvmet_sq_destroy() when the queue is being freed. Signed-off-by: Vijay Immanuel <vijayi@attalasystems.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@fb.com>
2017-05-20nvmet-fc: remove target cpu scheduling flagJames Smart
Remove NVMET_FCTGTFEAT_NEEDS_CMD_CPUSCHED. It's unnecessary. Signed-off-by: James Smart <james.smart@broadcom.com> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@fb.com>
2017-05-20nvme-fc: stop queues on error detectionJames Smart
Per the recommendation by Sagi on: http://lists.infradead.org/pipermail/linux-nvme/2017-April/009261.html Rather than waiting for reset work thread to stop queues and abort the ios, immediately stop the queues on error detection. Reset thread will restop the queues (as it's called on other paths), but it does not appear to have a side effect. Signed-off-by: James Smart <james.smart@broadcom.com> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@fb.com>
2017-05-20nvme-fc: require target or discovery role for fc-nvme targetsJames Smart
In order to create an association, the remoteport must be serving either a target role or a discovery role. Signed-off-by: James Smart <james.smart@broadcom.com> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@fb.com>