summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2015-10-21tcp: remove improper preemption check in tcp_xmit_probe_skb()Renato Westphal
Commit e520af48c7e5a introduced the following bug when setting the TCP_REPAIR sockoption: [ 2860.657036] BUG: using __this_cpu_add() in preemptible [00000000] code: daemon/12164 [ 2860.657045] caller is __this_cpu_preempt_check+0x13/0x20 [ 2860.657049] CPU: 1 PID: 12164 Comm: daemon Not tainted 4.2.3 #1 [ 2860.657051] Hardware name: Dell Inc. PowerEdge R210 II/0JP7TR, BIOS 2.0.5 03/13/2012 [ 2860.657054] ffffffff81c7f071 ffff880231e9fdf8 ffffffff8185d765 0000000000000002 [ 2860.657058] 0000000000000001 ffff880231e9fe28 ffffffff8146ed91 ffff880231e9fe18 [ 2860.657062] ffffffff81cd1a5d ffff88023534f200 ffff8800b9811000 ffff880231e9fe38 [ 2860.657065] Call Trace: [ 2860.657072] [<ffffffff8185d765>] dump_stack+0x4f/0x7b [ 2860.657075] [<ffffffff8146ed91>] check_preemption_disabled+0xe1/0xf0 [ 2860.657078] [<ffffffff8146edd3>] __this_cpu_preempt_check+0x13/0x20 [ 2860.657082] [<ffffffff817e0bc7>] tcp_xmit_probe_skb+0xc7/0x100 [ 2860.657085] [<ffffffff817e1e2d>] tcp_send_window_probe+0x2d/0x30 [ 2860.657089] [<ffffffff817d1d8c>] do_tcp_setsockopt.isra.29+0x74c/0x830 [ 2860.657093] [<ffffffff817d1e9c>] tcp_setsockopt+0x2c/0x30 [ 2860.657097] [<ffffffff81767b74>] sock_common_setsockopt+0x14/0x20 [ 2860.657100] [<ffffffff817669e1>] SyS_setsockopt+0x71/0xc0 [ 2860.657104] [<ffffffff81865172>] entry_SYSCALL_64_fastpath+0x16/0x75 Since tcp_xmit_probe_skb() can be called from process context, use NET_INC_STATS() instead of NET_INC_STATS_BH(). Fixes: e520af48c7e5 ("tcp: add TCPWinProbe and TCPKeepAlive SNMP counters") Signed-off-by: Renato Westphal <renatow@taghos.com.br> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-21Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nfDavid S. Miller
Pablo Neira Ayuso says: ==================== Netfilter fixes for net The following patchset contains four Netfilter fixes for net, they are: 1) Fix Kconfig dependencies of new nf_dup_ipv4 and nf_dup_ipv6. 2) Remove bogus test nh_scope in IPv4 rpfilter match that is breaking --accept-local, from Xin Long. 3) Wait for RCU grace period after dropping the pending packets in the nfqueue, from Florian Westphal. 4) Fix sleeping allocation while holding spin_lock_bh, from Nikolay Borisov. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-21netlink: Rightsize IFLA_AF_SPEC size calculationArad, Ronen
if_nlmsg_size() overestimates the minimum allocation size of netlink dump request (when called from rtnl_calcit()) or the size of the message (when called from rtnl_getlink()). This is because ext_filter_mask is not supported by rtnl_link_get_af_size() and rtnl_link_get_size(). The over-estimation is significant when at least one netdev has many VLANs configured (8 bytes for each configured VLAN). This patch-set "rightsizes" the protocol specific attribute size calculation by propagating ext_filter_mask to rtnl_link_get_af_size() and adding this a argument to get_link_af_size op in rtnl_af_ops. Bridge module already used filtering aware sizing for notifications. br_get_link_af_size_filtered() is consistent with the modified get_link_af_size op so it replaces br_get_link_af_size() in br_af_ops. br_get_link_af_size() becomes unused and thus removed. Signed-off-by: Ronen Arad <ronen.arad@intel.com> Acked-by: Sridhar Samudrala <sridhar.samudrala@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-21tipc: conditionally expand buffer headroom over udp tunnelJon Paul Maloy
In commit d999297c3dbbe ("tipc: reduce locking scope during packet reception") we altered the packet retransmission function. Since then, when restransmitting packets, we create a clone of the original buffer using __pskb_copy(skb, MIN_H_SIZE), where MIN_H_SIZE is the size of the area we want to have copied, but also the smallest possible TIPC packet size. The value of MIN_H_SIZE is 24. Unfortunately, __pskb_copy() also has the effect that the headroom of the cloned buffer takes the size MIN_H_SIZE. This is too small for carrying the packet over the UDP tunnel bearer, which requires a minimum headroom of 28 bytes. A change to just use pskb_copy() lets the clone inherit the original headroom of 80 bytes, but also assumes that the copied data area is of at least that size, something that is not always the case. So that is not a viable solution. We now fix this by adding a check for sufficient headroom in the transmit function of udp_media.c, and expanding it when necessary. Fixes: commit d999297c3dbbe ("tipc: reduce locking scope during packet reception") Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-21net: cavium: change NET_VENDOR_CAVIUM to boolAndreas Schwab
CONFIG_NET_VENDOR_CAVIUM is only used to hide/show config options and to include subdirectories in the build, so it doesn't make sense to make it tristate. Signed-off-by: Andreas Schwab <schwab@suse.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-21tipc: allow non-linear first fragment bufferJon Paul Maloy
The current code for message reassembly is erroneously assuming that the the first arriving fragment buffer always is linear, and then goes ahead resetting the fragment list of that buffer in anticipation of more arriving fragments. However, if the buffer already happens to be non-linear, we will inadvertently drop the already attached fragment list, and later on trig a BUG() in __pskb_pull_tail(). We see this happen when running fragmented TIPC multicast across UDP, something made possible since commit d0f91938bede ("tipc: add ip/udp media type") We fix this by not resetting the fragment list when the buffer is non- linear, and by initiatlizing our private fragment list tail pointer to the tail of the existing fragment list. Fixes: commit d0f91938bede ("tipc: add ip/udp media type") Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-21openvswitch: Allocate memory for ovs internal device stats.James Morse
"openvswitch: Remove vport stats" removed the per-vport statistics, in order to use the netdev's statistics fields. "openvswitch: Fix ovs_vport_get_stats()" fixed the export of these stats to user-space, by using the provided netdev_ops to collate them - but ovs internal devices still use an unallocated dev->tstats field to count packets, which are no longer exported by this api. Allocate the dev->tstats field for ovs internal devices, and wire up ndo_get_stats64 with the original implementation of ovs_vport_get_stats(). On its own, "openvswitch: Fix ovs_vport_get_stats()" fixes the OOPs, unmasking a full-on panic on arm64: =============%<============== [<ffffffbffc00ce4c>] internal_dev_recv+0xa8/0x170 [openvswitch] [<ffffffbffc0008b4>] do_output.isra.31+0x60/0x19c [openvswitch] [<ffffffbffc000bf8>] do_execute_actions+0x208/0x11c0 [openvswitch] [<ffffffbffc001c78>] ovs_execute_actions+0xc8/0x238 [openvswitch] [<ffffffbffc003dfc>] ovs_packet_cmd_execute+0x21c/0x288 [openvswitch] [<ffffffc0005e8c5c>] genl_family_rcv_msg+0x1b0/0x310 [<ffffffc0005e8e60>] genl_rcv_msg+0xa4/0xe4 [<ffffffc0005e7ddc>] netlink_rcv_skb+0xb0/0xdc [<ffffffc0005e8a94>] genl_rcv+0x38/0x50 [<ffffffc0005e76c0>] netlink_unicast+0x164/0x210 [<ffffffc0005e7b70>] netlink_sendmsg+0x304/0x368 [<ffffffc0005a21c0>] sock_sendmsg+0x30/0x4c [SNIP] Kernel panic - not syncing: Fatal exception in interrupt =============%<============== Fixes: 8c876639c985 ("openvswitch: Remove vport stats.") Signed-off-by: James Morse <james.morse@arm.com> Acked-by: Pravin B Shelar <pshelar@nicira.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-21net: Really fix vti6 with oif in dst lookupsDavid Ahern
6e28b000825d ("net: Fix vti use case with oif in dst lookups for IPv6") is missing the checks on FLOWI_FLAG_SKIP_NH_OIF. Add them. Fixes: 42a7b32b73d6 ("xfrm: Add oif to dst lookups") Cc: Steffen Klassert <steffen.klassert@secunet.com> Signed-off-by: David Ahern <dsa@cumulusnetworks.com> Acked-by: Steffen Klassert <steffen.klassert@secunet.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-21tipc: extend broadcast link window sizeJon Paul Maloy
The default fix broadcast window size is currently set to 20 packets. This is a very low value, set at a time when we were still testing on 10 Mb/s hubs, and a change to it is long overdue. Commit 7845989cb4b3da1db ("net: tipc: fix stall during bclink wakeup procedure") revealed a problem with this low value. For messages of importance LOW, the backlog queue limit will be calculated to 30 packets, while a single, maximum sized message of 66000 bytes, carried across a 1500 MTU network consists of 46 packets. This leads to the following scenario (among others leading to the same situation): 1: Msg 1 of 46 packets is sent. 20 packets go to the transmit queue, 26 packets to the backlog queue. 2: Msg 2 of 46 packets is attempted sent, but rejected because there is no more space in the backlog queue at this level. The sender is added to the wakeup queue with a "pending packets chain size" number of 46. 3: Some packets in the transmit queue are acked and released. We try to wake up the sender, but the pending size of 46 is bigger than the LOW wakeup limit of 30, so this doesn't happen. 5: Subsequent acks releases all the remaining buffers. Each time we test for the wakeup criteria and find that 46 still is larger than 30, even after both the transmit and the backlog queues are empty. 6: The sender is never woken up and given a chance to send its message. He is stuck. We could now loosen the wakeup criteria (used by link_prepare_wakeup()) to become equal to the send criteria (used by tipc_link_xmit()), i.e., by ignoring the "pending packets chain size" value altogether, or we can just increase the queue limits so that the criteria can be satisfied anyway. There are good reasons (potentially multiple waiting senders) to not opt for the former solution, so we choose the latter one. This commit fixes the problem by giving the broadcast link window a default value of 50 packets. We also introduce a new minimum link window size BCLINK_MIN_WIN of 32, which is enough to always avoid the described situation. Finally, in order to not break any existing users which may set the window explicitly, we enforce that the window is set to the new minimum value in case the user is trying to set it to anything lower. Fixes: 7845989cb4b3da1db ("net: tipc: fix stall during bclink wakeup procedure") Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Reviewed-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-21btrfs: fix possible leak in btrfs_ioctl_balance()Christian Engelmayer
Commit 8eb934591f8b ("btrfs: check unsupported filters in balance arguments") adds a jump to exit label out_bargs in case the argument check fails. At this point in addition to the bargs memory, the memory for struct btrfs_balance_control has already been allocated. Ownership of bctl is passed to btrfs_balance() in the good case, thus the memory is not freed due to the introduced jump. Make sure that the memory gets freed in any case as necessary. Detected by Coverity CID 1328378. Signed-off-by: Christian Engelmayer <cengelma@gmx.at> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: Chris Mason <clm@fb.com>
2015-10-22Merge branch 'drm-fixes-4.3' of git://people.freedesktop.org/~agd5f/linux ↵Dave Airlie
into drm-fixes Just a crash fix for radeon and amdgpu if the user has forcibly disabled dpm and tries to access the pwm sysfs controls. * 'drm-fixes-4.3' of git://people.freedesktop.org/~agd5f/linux: drm/amdgpu: add missing dpm check for KV dpm late init drm/amdgpu/dpm: don't add pwm attributes if DPM is disabled drm/radeon/dpm: don't add pwm attributes if DPM is disabled
2015-10-22Merge tag 'drm-intel-fixes-2015-10-16' of ↵Dave Airlie
git://anongit.freedesktop.org/drm-intel into drm-fixes The revert dance could use some explanation: we had stuff fixed in -next, and initially backported one commit to v4.3. Now, turns out we need more fixes, and we could cherry-pick them all without conflicts if we reverted the backported one first. So did that to not have to edit and backport them all. * tag 'drm-intel-fixes-2015-10-16' of git://anongit.freedesktop.org/drm-intel: drm/i915: Add primary plane to mask if it's visible drm/i915: Move sprite/cursor plane disable to intel_sanitize_crtc() drm/i915: Assign hwmode after encoder state readout Revert "drm/i915: Add primary plane to mask if it's visible" drm/i915: Deny wrapping an userptr into a framebuffer drm/i915: Enable DPLL VGA mode before P1/P2 divider write drm/i915: Restore lost DPLL register write on gen2-4 drm/i915: Flush pipecontrol post-sync writes drm/i915: Fix kerneldoc for i915_gem_shrink_all
2015-10-22powerpc/rtas: Validate rtas.entry before calling enter_rtas()Vasant Hegde
Currently we do not validate rtas.entry before calling enter_rtas(). This leads to a kernel oops when user space calls rtas system call on a powernv platform (see below). This patch adds code to validate rtas.entry before making enter_rtas() call. Oops: Exception in kernel mode, sig: 4 [#1] SMP NR_CPUS=1024 NUMA PowerNV task: c000000004294b80 ti: c0000007e1a78000 task.ti: c0000007e1a78000 NIP: 0000000000000000 LR: 0000000000009c14 CTR: c000000000423140 REGS: c0000007e1a7b920 TRAP: 0e40 Not tainted (3.18.17-340.el7_1.pkvm3_1_0.2400.1.ppc64le) MSR: 1000000000081000 <HV,ME> CR: 00000000 XER: 00000000 CFAR: c000000000009c0c SOFTE: 0 NIP [0000000000000000] (null) LR [0000000000009c14] 0x9c14 Call Trace: [c0000007e1a7bba0] [c00000000041a7f4] avc_has_perm_noaudit+0x54/0x110 (unreliable) [c0000007e1a7bd80] [c00000000002ddc0] ppc_rtas+0x150/0x2d0 [c0000007e1a7be30] [c000000000009358] syscall_exit+0x0/0x98 Cc: stable@vger.kernel.org # v3.2+ Fixes: 55190f88789a ("powerpc: Add skeleton PowerNV platform") Reported-by: NAGESWARA R. SASTRY <nasastry@in.ibm.com> Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com> [mpe: Reword change log, trim oops, and add stable + fixes] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-10-22Merge branch 'linux-4.3' of git://anongit.freedesktop.org/nouveau/linux-2.6 ↵Dave Airlie
into drm-fixes Just one fix from Ilia to resolve various issues that have resulted from buffer eviction. * 'linux-4.3' of git://anongit.freedesktop.org/nouveau/linux-2.6: drm/nouveau/gem: return only valid domain when there's only one
2015-10-22drm/nouveau/gem: return only valid domain when there's only oneIlia Mirkin
On nv50+, we restrict the valid domains to just the one where the buffer was originally created. However after the buffer is evicted to system memory, we might move it back to a different domain that was not originally valid. When sharing the buffer and retrieving its GEM_INFO data, we still want the domain that will be valid for this buffer in a pushbuf, not the one where it currently happens to be. This resolves fdo#92504 and several others. These are due to suspend evicting all buffers, making it more likely that they temporarily end up in the wrong place. Cc: stable@vger.kernel.org Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=92504 Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu> Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
2015-10-22drm: fix mutex leak in drm_dp_get_mst_branch_deviceAdam Richter
In Linux 4.3-rc5, there is an error case in drm_dp_get_branch_device that returns without releasing mgr->lock, resulting a spew of kernel messages about a kernel work function possibly having leaked a mutex and presumably more serious adverse consequences later. This patch changes the error to "goto out" to unlock the mutex before returning. [airlied: grabbed from drm-next as it fixes something we've seen] Signed-off-by: Adam J. Richter <adam_richter2004@yahoo.com> Cc: stable@vger.kernel.org Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch> Signed-off-by: Dave Airlie <airlied@redhat.com>
2015-10-22Merge tag 'for-linus-20151021' of git://git.infradead.org/intel-iommuLinus Torvalds
Pull intel-iommu bugfix from David Woodhouse: "This contains a single fix, for when the IOMMU API is used to overlay an existing mapping comprised of 4KiB pages, with a mapping that can use superpages. For the *first* superpage in the new mapping, we were correctly¹ freeing the old bottom-level page table page and clearing the link to it, before installing the superpage. For subsequent superpages, however, we weren't. This causes a memory leak, and a warning about setting a PTE which is already set. ¹ Well, not *entirely* correctly. We just free the page table pages right there and then, which is wrong. In fact they should only be freed *after* the IOTLB is flushed so we know the hardware will no longer be looking at them.... and in fact I note that the IOTLB flush is completely missing from the intel_iommu_map() code path, although it needs to be there if it's permitted to overwrite existing mappings. Fixing those is somewhat more intrusive though, and will probably need to wait for 4.4 at this point" * tag 'for-linus-20151021' of git://git.infradead.org/intel-iommu: iommu/vt-d: fix range computation when making room for large pages
2015-10-22Merge tag 'mmc-v4.3-rc5' of git://git.linaro.org/people/ulf.hansson/mmcLinus Torvalds
Pull MMC bugfix from Ulf Hansson: "Here's yet another MMC fix intended for v4.3 rc7. I don't expect to send any further pull requests for 4.3 rc[n]. MMC core: - Don't re-tune in the reset sequence to allow re-init of the card" * tag 'mmc-v4.3-rc5' of git://git.linaro.org/people/ulf.hansson/mmc: mmc: core: Fix init_card in 52Mhz
2015-10-21perf annotate: Add debug message for out of bounds sampleArnaldo Carvalho de Melo
Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Borislav Petkov <bp@suse.de> Cc: David Ahern <dsahern@gmail.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Stephane Eranian <eranian@google.com> Cc: Wang Nan <wangnan0@huawei.com> Link: http://lkml.kernel.org/n/tip-q0lde9ajs84oi38nlyjcqbwg@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2015-10-21perf evsel: Print branch filter state with -vvAndi Kleen
Add a missing field to the perf_event_attr debug output. Signed-off-by: Andi Kleen <ak@linux.intel.com> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Jiri Olsa <jolsa@kernel.org> Link: http://lkml.kernel.org/r/1445366797-30894-4-git-send-email-andi@firstfloor.org [ Print it between config2 and sample_regs_user (peterz)] Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2015-10-21IB/cm: Fix rb-tree duplicate free and use-after-freeDoron Tsur
ib_send_cm_sidr_rep could sometimes erase the node from the sidr (depending on errors in the process). Since ib_send_cm_sidr_rep is called both from cm_sidr_req_handler and cm_destroy_id, cm_id_priv could be either erased from the rb_tree twice or not erased at all. Fixing that by making sure it's erased only once before freeing cm_id_priv. Fixes: a977049dacde ('[PATCH] IB: Add the kernel CM implementation') Signed-off-by: Doron Tsur <doront@mellanox.com> Signed-off-by: Matan Barak <matanb@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-10-21drm/vmwgfx: Stabilize the command buffer submission codeThomas Hellstrom
This commit addresses some stability problems with the command buffer submission code recently introduced: 1) Make the vmw_cmdbuf_man_process() function handle reruns internally to avoid losing interrupts if the caller forgets to rerun on -EAGAIN. 2) Handle default command buffer allocations using inline command buffers. This avoids rare allocation deadlocks. 3) In case of command buffer errors we might lose fence submissions. Therefore send a new fence after each command buffer error. This will help avoid lengthy fence waits. Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com> Reviewed-by: Sinclair Yeh <syeh@vmware.com>
2015-10-21Bluetooth: Remove unnecessary hci_explicit_connect_lookup functionJohan Hedberg
There's only one user of this helper which can be replaces with a call to hci_pend_le_action_lookup() and a check for params->explicit_connect. Signed-off-by: Johan Hedberg <johan.hedberg@intel.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2015-10-21Bluetooth: Remove redundant (and possibly wrong) flag clearingJohan Hedberg
There's no need to clear the HCI_CONN_ENCRYPT_PEND flag in smp_failure. In fact, this may cause the encryption tracking to get out of sync as this has nothing to do with HCI activity. Signed-off-by: Johan Hedberg <johan.hedberg@intel.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2015-10-21Bluetooth: Add hdev helper variable to hci_le_create_connection_cancelJohan Hedberg
The hci_le_create_connection_cancel() function needs to use the hdev pointer in many places so add a variable for it to avoid the need to dereference the hci_conn every time. Signed-off-by: Johan Hedberg <johan.hedberg@intel.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2015-10-21Bluetooth: Remove unnecessary indentation in unpair_device()Johan Hedberg
Instead of doing all of the LE-specific handling in an else-branch in unpair_device() create a 'done' label for the BR/EDR branch to jump to and then remove the else-branch completely. Signed-off-by: Johan Hedberg <johan.hedberg@intel.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2015-10-21Bluetooth: 6lowpan: Use hci_conn_hash_lookup_le() when possibleJohan Hedberg
Use the new hci_conn_hash_lookup_le() API to look up LE connections. This way we're guaranteed exact matches that also take into account the address type. Signed-off-by: Johan Hedberg <johan.hedberg@intel.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2015-10-21Bluetooth: Use hci_conn_hash_lookup_le() when possibleJohan Hedberg
Use the new hci_conn_hash_lookup_le() API to look up LE connections. This way we're guaranteed exact matches that also take into account the address type. Signed-off-by: Johan Hedberg <johan.hedberg@intel.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2015-10-21Bluetooth: Add hci_conn_hash_lookup_le() helper functionJohan Hedberg
Many of the existing LE connection lookups are forced to use hci_conn_hash_lookup_ba() which doesn't take into account the address type. What's worse, most of the users don't bother checking that the returned address type matches what was wanted. This patch adds a new helper API to look up LE connections based on their address and address type, paving the way to have the hci_conn_hash_lookup_ba() users converted to do more precise lookups. Signed-off-by: Johan Hedberg <johan.hedberg@intel.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2015-10-21Bluetooth: Add le_addr_type() helper functionJohan Hedberg
The mgmt code needs to convert from mgmt/L2CAP address types to HCI in many places. Having a dedicated helper function for this simplifies code by shortening it and removing unnecessary 'addr_type' variables. Signed-off-by: Johan Hedberg <johan.hedberg@intel.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2015-10-21Merge tag 'kvm-arm-for-v4.3-rc7' of ↵Paolo Bonzini
git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into kvm-master A late round of KVM/ARM fixes for v4.3-rc7, fixing: - A bug where level-triggered interrupts lowered from userspace are still routed to the guest - A memory leak an a failed initialization path - A build error under certain configurations - Several timer bugs introduced with moving the timer to the active state handling instead of the masking trick.
2015-10-21Merge tag 'mvebu-fixes-4.3-2' of git://git.infradead.org/linux-mvebu into fixesArnd Bergmann
Merge "mvebu fixes for 4.3 (part 2)" from Gregory CLEMENT: Fix wrong compatible for A385 DB AP preventing using suspend * tag 'mvebu-fixes-4.3-2' of git://git.infradead.org/linux-mvebu: ARM: mvebu: correct a385-db-ap compatible string
2015-10-21Merge tag 'samsung-fixes-2' of ↵Arnd Bergmann
git://git.kernel.org/pub/scm/linux/kernel/git/kgene/linux-samsung into fixes Merge "Samsung 2nd fixes for v4.3" from Kukjin Kim: - fix SOC detection of exynos thermal on exynos5260 - fix audio card detection on Peach boards - fix double of_node_put() when parsing child power domains * tag 'samsung-fixes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/kgene/linux-samsung: thermal: exynos: Fix register read in TMU ARM: dts: Fix audio card detection on Peach boards ARM: EXYNOS: Fix double of_node_put() when parsing child power domains
2015-10-21Merge tag 'omap-for-v4.3/fixes-rc6' of ↵Arnd Bergmann
git://git.kernel.org/pub/scm/linux/kernel/git/tmlind/linux-omap into fixes Merge "Fixes for omaps for v4.3-rc cycle" from Tony Lindgren: - Fix oops with LPAE and moew than 2GB of memory by enabling ZONE_DMA for LPAE. Probably no need for stable on this one as we only recently ran into this with the mainline kernel - Fix imprecise external abort caused by bogus SRAM init. This affects dm814x recently merged, so no need for stable on this one AFAIK * tag 'omap-for-v4.3/fixes-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/tmlind/linux-omap: ARM: OMAP2+: Fix imprecise external abort caused by bogus SRAM init ARM: OMAP2+: Fix oops with LPAE and more than 2GB of memory
2015-10-21Adding switchdev ageing notification on port bridgedElad Raz
Configure ageing time to the HW for newly bridged device CC: Scott Feldman <sfeldma@gmail.com> CC: Jiri Pirko <jiri@resnulli.us> Signed-off-by: Elad Raz <eladr@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Acked-by: Scott Feldman <sfeldma@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-21irda: precedence bug in irlmp_seq_hb_idx()Dan Carpenter
This is decrementing the pointer, instead of the value stored in the pointer. KASan detects it as an out of bounds reference. Reported-by: "Berry Cheng 程君(成淼)" <chengmiao.cj@alibaba-inc.com> Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-21xen-netfront: update num_queues to real createdJoe Jin
Sometimes xennet_create_queues() may failed to created all requested queues, we need to update num_queues to real created to avoid NULL pointer dereference. Signed-off-by: Joe Jin <joe.jin@oracle.com> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: Wei Liu <wei.liu2@citrix.com> Cc: Ian Campbell <ian.campbell@citrix.com> Cc: David S. Miller <davem@davemloft.net> Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-21vsock: fix missing cleanup when misc_register failedGao feng
reset transport and unlock if misc_register failed. Signed-off-by: Gao feng <omarapazanadi@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-21Merge branch 'mv643xx-fixes'David S. Miller
Philipp Kirchhofer says: ==================== net: mv643xx_eth: TSO TX data corruption fixes as previously discussed [1] the mv643xx_eth driver has some issues with data corruption when using TCP segmentation offload (TSO). The following patch set improves this situation by fixing two data corruption bugs in the TSO TX path. Before applying the patches repeatedly accessing large files located on a SMB share on my NSA325 NAS with TSO enabled resulted in different hash sums, which confirmed that data corruption is happening during file transfer. After applying the patches the hash sums were the same. As this is my first patch submission please feel free to point out any issues with the patch set. [1] http://thread.gmane.org/gmane.linux.network/336530 ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-21net: mv643xx_eth: Defer writing the first TX descriptor when using TSOPhilipp Kirchhofer
To prevent a race between the TX DMA engine and the CPU the writing of the first transmit descriptor must be deferred until all following descriptors have been updated. The network card may otherwise start transmitting before all packet descriptors are set up correctly, which leads to data corruption or an aborted transmit operation. This deferral is already done in the non-TSO TX path, implement it also in the TSO TX path. Signed-off-by: Philipp Kirchhofer <philipp@familie-kirchhofer.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-21net: mv643xx_eth: Ensure proper data alignment in TSO TX pathPhilipp Kirchhofer
The TX DMA engine requires that buffers with a size of 8 bytes or smaller must be 64 bit aligned. This requirement may be violated when doing TSO, as in this case larger skb frags can be broken up and transmitted in small parts with then inappropriate alignment. Fix this by checking for proper alignment before handing a buffer to the DMA engine. If the data is misaligned realign it by copying it into the TSO header data area. Signed-off-by: Philipp Kirchhofer <philipp@familie-kirchhofer.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-21writeback: remove broken rbtree_postorder_for_each_entry_safe() usage in ↵Tejun Heo
cgwb_bdi_destroy() a20135ffbc44 ("writeback: don't drain bdi_writeback_congested on bdi destruction") added rbtree_postorder_for_each_entry_safe() which is used to remove all entries; however, according to Cody, the iterator isn't safe against operations which may rebalance the tree. Fix it by switching to repeatedly removing rb_first() until empty. Signed-off-by: Tejun Heo <tj@kernel.org> Reported-by: Cody P Schafer <dev@codyps.com> Fixes: a20135ffbc44 ("writeback: don't drain bdi_writeback_congested on bdi destruction") Link: http://lkml.kernel.org/g/1443997973-1700-1-git-send-email-dev@codyps.com Signed-off-by: Jens Axboe <axboe@fb.com>
2015-10-21Merge branch 'tcp-rack'David S. Miller
Yuchung Cheng says: ==================== RACK loss detection RACK (Recent ACK) loss recovery uses the notion of time instead of packet sequence (FACK) or counts (dupthresh). It's inspired by the FACK heuristic in tcp_mark_lost_retrans(): when a limited transmit (new data packet) is sacked in recovery, then any retransmission sent before that newly sacked packet was sent must have been lost, since at least one round trip time has elapsed. But that existing heuristic from tcp_mark_lost_retrans() has several limitations: 1) it can't detect tail drops since it depends on limited transmit 2) it's disabled upon reordering (assumes no reordering) 3) it's only enabled in fast recovery but not timeout recovery RACK addresses these limitations with a core idea: an unacknowledged packet P1 is deemed lost if a packet P2 that was sent later is is s/acked, since at least one round trip has passed. Since RACK cares about the time sequence instead of the data sequence of packets, it can detect tail drops when a later retransmission is s/acked, while FACK or dupthresh can't. For reordering RACK uses a dynamically adjusted reordering window ("reo_wnd") to reduce false positives on ever (small) degree of reordering, similar to the delayed Early Retransmit. In the current patch set RACK is only a supplemental loss detection and does not trigger fast recovery. However we are developing RACK to replace or consolidate FACK/dupthresh, early retransmit, and thin-dupack. These heuristics all implicitly bear the time notion. For example, the delayed Early Retransmit is simply applying RACK to trigger the fast recovery with small inflight. RACK requires measuring the minimum RTT. Tracking a global min is less robust due to traffic engineering pathing changes. Therefore it uses a windowed filter by Kathleen Nichols. The min RTT can also be useful for various other purposes like congestion control or stat monitoring. This patch has been used on Google servers for well over 1 year. RACK has also been implemented in the QUIC protocol. We are submitting an IETF draft as well. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-21tcp: use RACK to detect lossesYuchung Cheng
This patch implements the second half of RACK that uses the the most recent transmit time among all delivered packets to detect losses. tcp_rack_mark_lost() is called upon receiving a dubious ACK. It then checks if an not-yet-sacked packet was sent at least "reo_wnd" prior to the sent time of the most recently delivered. If so the packet is deemed lost. The "reo_wnd" reordering window starts with 1msec for fast loss detection and changes to min-RTT/4 when reordering is observed. We found 1msec accommodates well on tiny degree of reordering (<3 pkts) on faster links. We use min-RTT instead of SRTT because reordering is more of a path property but SRTT can be inflated by self-inflicated congestion. The factor of 4 is borrowed from the delayed early retransmit and seems to work reasonably well. Since RACK is still experimental, it is now used as a supplemental loss detection on top of existing algorithms. It is only effective after the fast recovery starts or after the timeout occurs. The fast recovery is still triggered by FACK and/or dupack threshold instead of RACK. We introduce a new sysctl net.ipv4.tcp_recovery for future experiments of loss recoveries. For now RACK can be disabled by setting it to 0. Signed-off-by: Yuchung Cheng <ycheng@google.com> Signed-off-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-21tcp: track the packet timings in RACKYuchung Cheng
This patch is the first half of the RACK loss recovery. RACK loss recovery uses the notion of time instead of packet sequence (FACK) or counts (dupthresh). It's inspired by the previous FACK heuristic in tcp_mark_lost_retrans(): when a limited transmit (new data packet) is sacked, then current retransmitted sequence below the newly sacked sequence must been lost, since at least one round trip time has elapsed. But it has several limitations: 1) can't detect tail drops since it depends on limited transmit 2) is disabled upon reordering (assumes no reordering) 3) only enabled in fast recovery ut not timeout recovery RACK (Recently ACK) addresses these limitations with the notion of time instead: a packet P1 is lost if a later packet P2 is s/acked, as at least one round trip has passed. Since RACK cares about the time sequence instead of the data sequence of packets, it can detect tail drops when later retransmission is s/acked while FACK or dupthresh can't. For reordering RACK uses a dynamically adjusted reordering window ("reo_wnd") to reduce false positives on ever (small) degree of reordering. This patch implements tcp_advanced_rack() which tracks the most recent transmission time among the packets that have been delivered (ACKed or SACKed) in tp->rack.mstamp. This timestamp is the key to determine which packet has been lost. Consider an example that the sender sends six packets: T1: P1 (lost) T2: P2 T3: P3 T4: P4 T100: sack of P2. rack.mstamp = T2 T101: retransmit P1 T102: sack of P2,P3,P4. rack.mstamp = T4 T205: ACK of P4 since the hole is repaired. rack.mstamp = T101 We need to be careful about spurious retransmission because it may falsely advance tp->rack.mstamp by an RTT or an RTO, causing RACK to falsely mark all packets lost, just like a spurious timeout. We identify spurious retransmission by the ACK's TS echo value. If TS option is not applicable but the retransmission is acknowledged less than min-RTT ago, it is likely to be spurious. We refrain from using the transmission time of these spurious retransmissions. The second half is implemented in the next patch that marks packet lost using RACK timestamp. Signed-off-by: Yuchung Cheng <ycheng@google.com> Signed-off-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-21tcp: skb_mstamp_after helperYuchung Cheng
a helper to prepare the first main RACK patch. Signed-off-by: Yuchung Cheng <ycheng@google.com> Signed-off-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-21tcp: add tcp_tsopt_ecr_before helperYuchung Cheng
a helper to prepare the main RACK patch Signed-off-by: Yuchung Cheng <ycheng@google.com> Signed-off-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-21tcp: remove tcp_mark_lost_retrans()Yuchung Cheng
Remove the existing lost retransmit detection because RACK subsumes it completely. This also stops the overloading the ack_seq field of the skb control block. Signed-off-by: Yuchung Cheng <ycheng@google.com> Signed-off-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-21tcp: track min RTT using windowed min-filterYuchung Cheng
Kathleen Nichols' algorithm for tracking the minimum RTT of a data stream over some measurement window. It uses constant space and constant time per update. Yet it almost always delivers the same minimum as an implementation that has to keep all the data in the window. The measurement window is tunable via sysctl.net.ipv4.tcp_min_rtt_wlen with a default value of 5 minutes. The algorithm keeps track of the best, 2nd best & 3rd best min values, maintaining an invariant that the measurement time of the n'th best >= n-1'th best. It also makes sure that the three values are widely separated in the time window since that bounds the worse case error when that data is monotonically increasing over the window. Upon getting a new min, we can forget everything earlier because it has no value - the new min is less than everything else in the window by definition and it's the most recent. So we restart fresh on every new min and overwrites the 2nd & 3rd choices. The same property holds for the 2nd & 3rd best. Therefore we have to maintain two invariants to maximize the information in the samples, one on values (1st.v <= 2nd.v <= 3rd.v) and the other on times (now-win <=1st.t <= 2nd.t <= 3rd.t <= now). These invariants determine the structure of the code The RTT input to the windowed filter is the minimum RTT measured from ACK or SACK, or as the last resort from TCP timestamps. The accessor tcp_min_rtt() returns the minimum RTT seen in the window. ~0U indicates it is not available. The minimum is 1usec even if the true RTT is below that. Signed-off-by: Yuchung Cheng <ycheng@google.com> Signed-off-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-21tcp: apply Kern's check on RTTs used for congestion controlYuchung Cheng
Currently ca_seq_rtt_us does not use Kern's check. Fix that by checking if any packet acked is a retransmit, for both RTT used for RTT estimation and congestion control. Fixes: 5b08e47ca ("tcp: prefer packet timing to TS-ECR for RTT") Signed-off-by: Yuchung Cheng <ycheng@google.com> Signed-off-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>