summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2019-02-12selftests: mlxsw: avoid double sourcing of lib.shJiri Pirko
Don't source lib.sh 2 times and make the script work with ifnames passed on the command line. Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-12mlxsw: spectrum_flower: Fix VLAN modify action supportIdo Schimmel
The driver does not support VLAN push and pop, but only VLAN modify. Fixes: 738678817573 ("drivers: net: use flow action infrastructure") Signed-off-by: Ido Schimmel <idosch@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Cc: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-12mlxsw: spectrum_router: Drop unnecessary WARN_ON_ONCE()Ido Schimmel
In case the register access failed an error would be logged anyway, so we can drop the warning. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-12mlxsw: spectrum: Set LAG port collector only when activeNir Dotan
The LAG port collecting (receive) function was mistakenly set when the port was registered as a LAG member, while it should be set only when the port collection state is set to true. Set LAG port to collecting when it is set to distributing, as described in the IEEE link aggregation standard coupled control mux machine state diagram. Signed-off-by: Nir Dotan <nird@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-12Merge branch 'net-smc-next'David S. Miller
Ursula Braun says: ==================== net/smc: patches 2019-02-12 here are patches for SMC: * patches 1 and 3 optimize SMC-R tx logic * patch 2 is a cleanup without functional change * patch 4 optimizes rx logic * patches 5 and 6 improve robustness in link group and IB event handling * patch 7 establishes Karsten Graul as another SMC maintainer ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-12MAINTAINERS: add Karsten as SMC maintainerUrsula Braun
Add Karsten as additional maintainer for Shared Memory Communications (SMC) Sockets. Acked-by: Karsten Graul <kgraul@linux.ibm.com> Signed-off-by: Ursula Braun <ubraun@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-12net/smc: check port_idx of ib eventKarsten Graul
For robustness protect of higher port numbers than expected to avoid setting bits behind our port_event_mask. In case of an DEVICE_FATAL event all ports must be checked. The IB_EVENT_GID_CHANGE event is provided in the global event handler, so handle it there. And handle a QP_FATAL event instead of an DEVICE_FATAL event in the qp handler. Signed-off-by: Karsten Graul <kgraul@linux.ibm.com> Signed-off-by: Ursula Braun <ubraun@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-12net/smc: check connections in smc_lgr_free_workKarsten Graul
Remove the shortcut that smc_lgr_free() would skip the check for existing connections when the link group is not in the link group list. Signed-off-by: Karsten Graul <kgraul@linux.ibm.com> Signed-off-by: Ursula Braun <ubraun@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-12net/smc: reduce amount of status updates to peerKarsten Graul
In smc_cdc_msg_recv_action() the received cdc message is evaluated. To reduce the number of messaged triggered by this evaluation the logic is streamlined. For the write_blocked condition we do not need to send a response immediately. The remaining conditions can be put together into one if clause. Signed-off-by: Karsten Graul <kgraul@linux.ibm.com> Signed-off-by: Ursula Braun <ubraun@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-12net/smc: no delay for free tx buffer waitKarsten Graul
When no free transfer buffers are available then a work to call smc_tx_work() is scheduled. Set the schedule delay to zero, because for the out-of-buffers condition the work can start immediately and will block in the called function smc_wr_tx_get_free_slot(), waiting for free buffers. Signed-off-by: Karsten Graul <kgraul@linux.ibm.com> Signed-off-by: Ursula Braun <ubraun@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-12net/smc: move wake up of close waiterKarsten Graul
Move the call to smc_close_wake_tx_prepared() (which wakes up a possibly waiting close processing that might wait for 'all data sent') to smc_tx_sndbuf_nonempty() (which is the main function to send data). Signed-off-by: Karsten Graul <kgraul@linux.ibm.com> Signed-off-by: Ursula Braun <ubraun@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-12net/smc: reset cursor update required flagKarsten Graul
When an updated rx_cursor_confirmed field was sent to the peer then reset the cons_curs_upd_req flag. And remove the duplicate reset and cursor update in smc_tx_consumer_update(). Signed-off-by: Karsten Graul <kgraul@linux.ibm.com> Signed-off-by: Ursula Braun <ubraun@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-12sfc: initialise found bitmap in efx_ef10_mtd_probeBert Kenward
The bitmap of found partitions in efx_ef10_mtd_probe was not initialised, causing partitions to be suppressed based off whatever value was in the bitmap at the start. Fixes: 3366463513f5 ("sfc: suppress duplicate nvmem partition types in efx_ef10_mtd_probe") Signed-off-by: Bert Kenward <bkenward@solarflare.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-12Merge tag 'mac80211-for-davem-2019-02-12' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211 Johannes Berg says: ==================== Just a few fixes: * aggregation session teardown with internal TXQs was continuing to send some frames marked as aggregation, fix from Ilan * IBSS join was missed during firmware restart, should such a thing happen * speculative execution based on the return value of cfg80211_classify8021d() - which is controlled by the sender of the packet - could be problematic in some code using it, prevent it * a few peer measurement fixes ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-12nfp: flower: remove double new lineJakub Kicinski
Recent cls_flower offload rewrite added a double new line. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-12ASoC: samsung: i2s: Fix prescaler setting for the secondary DAISylwester Nawrocki
Make sure i2s->rclk_srcrate is properly initialized also during playback through the secondary DAI. Signed-off-by: Sylwester Nawrocki <s.nawrocki@samsung.com> Acked-by: Krzysztof Kozlowski <krzk@kernel.org> Signed-off-by: Mark Brown <broonie@kernel.org>
2019-02-12floppy: check_events callback should not return a negative numberYufen Yu
floppy_check_events() is supposed to return bit flags to say which events occured. We should return zero to say that no event flags are set. Only BIT(0) and BIT(1) are used in the caller. And .check_events interface also expect to return an unsigned int value. However, after commit a0c80efe5956, it may return -EINTR (-4u). Here, both BIT(0) and BIT(1) are cleared. So this patch shouldn't affect runtime, but it obviously is still worth fixing. Reviewed-by: Dan Carpenter <dan.carpenter@oracle.com> Fixes: a0c80efe5956 ("floppy: fix lock_fdc() signal handling") Signed-off-by: Yufen Yu <yuyufen@huawei.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-02-12bpf: offload: add priv field for driversJakub Kicinski
Currently bpf_offload_dev does not have any priv pointer, forcing the drivers to work backwards from the netdev in program metadata. This is not great given programs are conceptually associated with the offload device, and it means one or two unnecessary deferences. Add a priv pointer to bpf_offload_dev. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-02-12tools: bpftool: doc, add text about feature-subcommandPrashant Bhole
This patch adds missing information about feature-subcommand in bpftool.rst Signed-off-by: Prashant Bhole <bhole_prashant_q7@lab.ntt.co.jp> Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-02-12xsk: do not remove umem from netdevice on fall-back to copy-modeBjörn Töpel
Commit c9b47cc1fabc ("xsk: fix bug when trying to use both copy and zero-copy on one queue id") stores the umem into the netdev._rx struct. However, the patch incorrectly removed the umem from the netdev._rx struct when user-space passed "best-effort" mode (i.e. select the fastest possible option available), and zero-copy mode was not available. This commit fixes that. Fixes: c9b47cc1fabc ("xsk: fix bug when trying to use both copy and zero-copy on one queue id") Signed-off-by: Björn Töpel <bjorn.topel@intel.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-02-12ARM: 8835/1: dma-mapping: Clear DMA ops on teardownRobin Murphy
Installing the appropriate non-IOMMU DMA ops in arm_iommu_detch_device() serves the case where IOMMU-aware drivers choose to control their own mapping but still make DMA API calls, however it also affects the case when the arch code itself tears down the mapping upon driver unbinding, where the ops now get left in place and can inhibit arch_setup_dma_ops() on subsequent re-probe attempts. Fix the latter case by making sure that arch_teardown_dma_ops() cleans up whenever the ops were automatically installed by its counterpart. Reported-by: Tobias Jakobi <tjakobi@math.uni-bielefeld.de> Reported-by: Marek Szyprowski <m.szyprowski@samsung.com> Fixes: 1874619a7df4 "ARM: dma-mapping: Set proper DMA ops in arm_iommu_detach_device()" Tested-by: Tobias Jakobi <tjakobi@math.uni-bielefeld.de> Tested-by: Thierry Reding <treding@nvidia.com> Signed-off-by: Robin Murphy <robin.murphy@arm.com> Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
2019-02-12ARM: 8834/1: Fix: kprobes: optimized kprobes illegal instructionMathieu Desnoyers
commit e46daee53bb5 ("ARM: 8806/1: kprobes: Fix false positive with FORTIFY_SOURCE") introduced a regression in optimized kprobes. It triggers "invalid instruction" oopses when using kprobes instrumentation through lttng and perf. This commit was introduced in kernel v4.20, and has been backported to stable kernels 4.19 and 4.14. This crash was also reported by Hongzhi Song on the redhat bugzilla where the patch was originally introduced. Link: https://bugzilla.redhat.com/show_bug.cgi?id=1639397 Link: https://bugs.lttng.org/issues/1174 Link: https://lore.kernel.org/lkml/342740659.2887.1549307721609.JavaMail.zimbra@efficios.com Fixes: e46daee53bb5 ("ARM: 8806/1: kprobes: Fix false positive with FORTIFY_SOURCE") Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Reported-by: Robert Berger <Robert.Berger@ReliableEmbeddedSystems.com> Tested-by: Robert Berger <Robert.Berger@ReliableEmbeddedSystems.com> Acked-by: Kees Cook <keescook@chromium.org> Cc: Robert Berger <Robert.Berger@ReliableEmbeddedSystems.com> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: William Cohen <wcohen@redhat.com> Cc: Laura Abbott <labbott@redhat.com> Cc: Kees Cook <keescook@chromium.org> Cc: <stable@vger.kernel.org> # v4.14+ Cc: linux-arm-kernel@lists.infradead.org Cc: patches@armlinux.org.uk Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
2019-02-12x86/kvm/nVMX: read from MSR_IA32_VMX_PROCBASED_CTLS2 only when it is availableVitaly Kuznetsov
SDM says MSR_IA32_VMX_PROCBASED_CTLS2 is only available "If (CPUID.01H:ECX.[5] && IA32_VMX_PROCBASED_CTLS[63])". It was found that some old cpus (namely "Intel(R) Core(TM)2 CPU 6600 @ 2.40GHz (family: 0x6, model: 0xf, stepping: 0x6") don't have it. Add the missing check. Reported-by: Zdenek Kaspar <zkaspar82@gmail.com> Tested-by: Zdenek Kaspar <zkaspar82@gmail.com> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Reviewed-by: Jim Mattson <jmattson@google.com> Cc: stable@vger.kernel.org Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-02-12drm/i915/opregion: rvda is relative from opregion base in opregion 2.1+Jani Nikula
Starting from opregion version 2.1 (roughly corresponding to ICL+) the RVDA field is relative from the beginning of opregion, not absolute address. Fix the error path while at it. v2: Make relative vs. absolute conditional on the opregion version, bumped for the purpose. Turned out there are machines relying on absolute RVDA in the wild. v3: Fix the version checks Fixes: 04ebaadb9f2d ("drm/i915/opregion: handle VBT sizes bigger than 6 KB") Cc: Ville Syrjälä <ville.syrjala@linux.intel.com> Cc: Imre Deak <imre.deak@intel.com> Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Signed-off-by: Jani Nikula <jani.nikula@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190208184254.24123-2-jani.nikula@intel.com (cherry picked from commit a0f52c3d357af218a9c1f7cd906ab70426176a1a) Signed-off-by: Jani Nikula <jani.nikula@intel.com>
2019-02-12drm/i915/opregion: fix version checkJani Nikula
The u32 version field encodes major, minor, revision and reserved. We've basically been checking for any non-zero version. Add opregion version logging while at it. v2: Fix the fix of the version check Fixes: 04ebaadb9f2d ("drm/i915/opregion: handle VBT sizes bigger than 6 KB") Cc: Ville Syrjälä <ville.syrjala@linux.intel.com> Cc: Imre Deak <imre.deak@intel.com> Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Signed-off-by: Jani Nikula <jani.nikula@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190208184254.24123-1-jani.nikula@intel.com (cherry picked from commit 98fdaaca9537b997062f1abc0aa87c61b50ce40a) Signed-off-by: Jani Nikula <jani.nikula@intel.com>
2019-02-12drm/i915: Prevent a race during I915_GEM_MMAP ioctl with WC setJoonas Lahtinen
Make sure the underlying VMA in the process address space is the same as it was during vm_mmap to avoid applying WC to wrong VMA. A more long-term solution would be to have vm_mmap_locked variant in linux/mmap.h for when caller wants to hold mmap_sem for an extended duration. v2: - Refactor the compare function Fixes: 1816f9236303 ("drm/i915: Support creation of unbound wc user mappings for objects") Reported-by: Adam Zabrocki <adamza@microsoft.com> Suggested-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: <stable@vger.kernel.org> # v4.0+ Cc: Akash Goel <akash.goel@intel.com> Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com> Cc: Adam Zabrocki <adamza@microsoft.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> #v1 Link: https://patchwork.freedesktop.org/patch/msgid/20190207085454.10598-1-joonas.lahtinen@linux.intel.com (cherry picked from commit 5c4604e757ba9b193b09768d75a7d2105a5b883f) Signed-off-by: Jani Nikula <jani.nikula@intel.com>
2019-02-12drm/i915: Block fbdev HPD processing during suspendLyude Paul
When resuming, we check whether or not any previously connected MST topologies are still present and if so, attempt to resume them. If this fails, we disable said MST topologies and fire off a hotplug event so that userspace knows to reprobe. However, sending a hotplug event involves calling drm_fb_helper_hotplug_event(), which in turn results in fbcon doing a connector reprobe in the caller's thread - something we can't do at the point in which i915 calls drm_dp_mst_topology_mgr_resume() since hotplugging hasn't been fully initialized yet. This currently causes some rather subtle but fatal issues. For example, on my T480s the laptop dock connected to it usually disappears during a suspend cycle, and comes back up a short while after the system has been resumed. This guarantees pretty much every suspend and resume cycle, drm_dp_mst_topology_mgr_set_mst(mgr, false); will be caused and in turn, a connector hotplug will occur. Now it's Rute Goldberg time: when the connector hotplug occurs, i915 reprobes /all/ of the connectors, including eDP. However, eDP probing requires that we power on the panel VDD which in turn, grabs a wakeref to the appropriate power domain on the GPU (on my T480s, this is the PORT_DDI_A_IO domain). This is where things start breaking, since this all happens before intel_power_domains_enable() is called we end up leaking the wakeref that was acquired and never releasing it later. Come next suspend/resume cycle, this causes us to fail to shut down the GPU properly, which causes it not to resume properly and die a horrible complicated death. (as a note: this only happens when there's both an eDP panel and MST topology connected which is removed mid-suspend. One or the other seems to always be OK). We could try to fix the VDD wakeref leak, but this doesn't seem like it's worth it at all since we aren't able to handle hotplug detection while resuming anyway. So, let's go with a more robust solution inspired by nouveau: block fbdev from handling hotplug events until we resume fbdev. This allows us to still send sysfs hotplug events to be handled later by user space while we're resuming, while also preventing us from actually processing any hotplug events we receive until it's safe. This fixes the wakeref leak observed on the T480s and as such, also fixes suspend/resume with MST topologies connected on this machine. Changes since v2: * Don't call drm_fb_helper_hotplug_event() under lock, do it after lock (Chris Wilson) * Don't call drm_fb_helper_hotplug_event() in intel_fbdev_output_poll_changed() under lock (Chris Wilson) * Always set ifbdev->hpd_waiting (Chris Wilson) Signed-off-by: Lyude Paul <lyude@redhat.com> Fixes: 0e32b39ceed6 ("drm/i915: add DP 1.2 MST support (v0.7)") Cc: Todd Previte <tprevite@gmail.com> Cc: Dave Airlie <airlied@redhat.com> Cc: Jani Nikula <jani.nikula@linux.intel.com> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com> Cc: Imre Deak <imre.deak@intel.com> Cc: intel-gfx@lists.freedesktop.org Cc: <stable@vger.kernel.org> # v3.17+ Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Link: https://patchwork.freedesktop.org/patch/msgid/20190129191001.442-2-lyude@redhat.com (cherry picked from commit fe5ec65668cdaa4348631d8ce1766eed43b33c10) Signed-off-by: Jani Nikula <jani.nikula@intel.com>
2019-02-12drm/i915/pmu: Fix enable count array size and bounds checkingTvrtko Ursulin
Enable count array is supposed to have one counter for each possible engine sampler. As such, array sizing and bounds checking is not correct and would blow up the asserts if more samplers were added. No ill-effect in the current code base but lets fix it for correctness. At the same time tidy the assert for readability and robustness. v2: * One check per assert. (Chris Wilson) Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Fixes: b46a33e271ed ("drm/i915/pmu: Expose a PMU interface for perf queries") Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Link: https://patchwork.freedesktop.org/patch/msgid/20190205130353.21105-1-tvrtko.ursulin@linux.intel.com (cherry picked from commit 26a11deea685b41a43edb513194718aa1f461c9a) Signed-off-by: Jani Nikula <jani.nikula@intel.com>
2019-02-12ipvs: fix dependency on nf_defrag_ipv6Andrea Claudi
ipvs relies on nf_defrag_ipv6 module to manage IPv6 fragmentation, but lacks proper Kconfig dependencies and does not explicitly request defrag features. As a result, if netfilter hooks are not loaded, when IPv6 fragmented packet are handled by ipvs only the first fragment makes through. Fix it properly declaring the dependency on Kconfig and registering netfilter hooks on ip_vs_add_service() and ip_vs_new_dest(). Reported-by: Li Shuang <shuali@redhat.com> Signed-off-by: Andrea Claudi <aclaudi@redhat.com> Acked-by: Julian Anastasov <ja@ssi.bg> Acked-by: Simon Horman <horms@verge.net.au> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2019-02-12netfilter: conntrack: fix cloned unconfirmed skb->_nfct race in ↵Chieh-Min Wang
__nf_conntrack_confirm For bridge(br_flood) or broadcast/multicast packets, they could clone skb with unconfirmed conntrack which break the rule that unconfirmed skb->_nfct is never shared. With nfqueue running on my system, the race can be easily reproduced with following warning calltrace: [13257.707525] CPU: 0 PID: 12132 Comm: main Tainted: P W 4.4.60 #7744 [13257.707568] Hardware name: Qualcomm (Flattened Device Tree) [13257.714700] [<c021f6dc>] (unwind_backtrace) from [<c021bce8>] (show_stack+0x10/0x14) [13257.720253] [<c021bce8>] (show_stack) from [<c0449e10>] (dump_stack+0x94/0xa8) [13257.728240] [<c0449e10>] (dump_stack) from [<c022a7e0>] (warn_slowpath_common+0x94/0xb0) [13257.735268] [<c022a7e0>] (warn_slowpath_common) from [<c022a898>] (warn_slowpath_null+0x1c/0x24) [13257.743519] [<c022a898>] (warn_slowpath_null) from [<c06ee450>] (__nf_conntrack_confirm+0xa8/0x618) [13257.752284] [<c06ee450>] (__nf_conntrack_confirm) from [<c0772670>] (ipv4_confirm+0xb8/0xfc) [13257.761049] [<c0772670>] (ipv4_confirm) from [<c06e7a60>] (nf_iterate+0x48/0xa8) [13257.769725] [<c06e7a60>] (nf_iterate) from [<c06e7af0>] (nf_hook_slow+0x30/0xb0) [13257.777108] [<c06e7af0>] (nf_hook_slow) from [<c07f20b4>] (br_nf_post_routing+0x274/0x31c) [13257.784486] [<c07f20b4>] (br_nf_post_routing) from [<c06e7a60>] (nf_iterate+0x48/0xa8) [13257.792556] [<c06e7a60>] (nf_iterate) from [<c06e7af0>] (nf_hook_slow+0x30/0xb0) [13257.800458] [<c06e7af0>] (nf_hook_slow) from [<c07e5580>] (br_forward_finish+0x94/0xa4) [13257.808010] [<c07e5580>] (br_forward_finish) from [<c07f22ac>] (br_nf_forward_finish+0x150/0x1ac) [13257.815736] [<c07f22ac>] (br_nf_forward_finish) from [<c06e8df0>] (nf_reinject+0x108/0x170) [13257.824762] [<c06e8df0>] (nf_reinject) from [<c06ea854>] (nfqnl_recv_verdict+0x3d8/0x420) [13257.832924] [<c06ea854>] (nfqnl_recv_verdict) from [<c06e940c>] (nfnetlink_rcv_msg+0x158/0x248) [13257.841256] [<c06e940c>] (nfnetlink_rcv_msg) from [<c06e5564>] (netlink_rcv_skb+0x54/0xb0) [13257.849762] [<c06e5564>] (netlink_rcv_skb) from [<c06e4ec8>] (netlink_unicast+0x148/0x23c) [13257.858093] [<c06e4ec8>] (netlink_unicast) from [<c06e5364>] (netlink_sendmsg+0x2ec/0x368) [13257.866348] [<c06e5364>] (netlink_sendmsg) from [<c069fb8c>] (sock_sendmsg+0x34/0x44) [13257.874590] [<c069fb8c>] (sock_sendmsg) from [<c06a03dc>] (___sys_sendmsg+0x1ec/0x200) [13257.882489] [<c06a03dc>] (___sys_sendmsg) from [<c06a11c8>] (__sys_sendmsg+0x3c/0x64) [13257.890300] [<c06a11c8>] (__sys_sendmsg) from [<c0209b40>] (ret_fast_syscall+0x0/0x34) The original code just triggered the warning but do nothing. It will caused the shared conntrack moves to the dying list and the packet be droppped (nf_ct_resolve_clash returns NF_DROP for dying conntrack). - Reproduce steps: +----------------------------+ | br0(bridge) | | | +-+---------+---------+------+ | eth0| | eth1| | eth2| | | | | | | +--+--+ +--+--+ +---+-+ | | | | | | +--+-+ +-+--+ +--+-+ | PC1| | PC2| | PC3| +----+ +----+ +----+ iptables -A FORWARD -m mark --mark 0x1000000/0x1000000 -j NFQUEUE --queue-num 100 --queue-bypass ps: Our nfq userspace program will set mark on packets whose connection has already been processed. PC1 sends broadcast packets simulated by hping3: hping3 --rand-source --udp 192.168.1.255 -i u100 - Broadcast racing flow chart is as follow: br_handle_frame BR_HOOK(NFPROTO_BRIDGE, NF_BR_PRE_ROUTING, br_handle_frame_finish) // skb->_nfct (unconfirmed conntrack) is constructed at PRE_ROUTING stage br_handle_frame_finish // check if this packet is broadcast br_flood_forward br_flood list_for_each_entry_rcu(p, &br->port_list, list) // iterate through each port maybe_deliver deliver_clone skb = skb_clone(skb) __br_forward BR_HOOK(NFPROTO_BRIDGE, NF_BR_FORWARD,...) // queue in our nfq and received by our userspace program // goto __nf_conntrack_confirm with process context on CPU 1 br_pass_frame_up BR_HOOK(NFPROTO_BRIDGE, NF_BR_LOCAL_IN,...) // goto __nf_conntrack_confirm with softirq context on CPU 0 Because conntrack confirm can happen at both INPUT and POSTROUTING stage. So with NFQUEUE running, skb->_nfct with the same unconfirmed conntrack could race on different core. This patch fixes a repeating kernel splat, now it is only displayed once. Signed-off-by: Chieh-Min Wang <chiehminw@synology.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2019-02-12af_key: unconditionally clone on broadcastSean Tranchetti
Attempting to avoid cloning the skb when broadcasting by inflating the refcount with sock_hold/sock_put while under RCU lock is dangerous and violates RCU principles. It leads to subtle race conditions when attempting to free the SKB, as we may reference sockets that have already been freed by the stack. Unable to handle kernel paging request at virtual address 6b6b6b6b6b6c4b [006b6b6b6b6b6c4b] address between user and kernel address ranges Internal error: Oops: 96000004 [#1] PREEMPT SMP task: fffffff78f65b380 task.stack: ffffff8049a88000 pc : sock_rfree+0x38/0x6c lr : skb_release_head_state+0x6c/0xcc Process repro (pid: 7117, stack limit = 0xffffff8049a88000) Call trace: sock_rfree+0x38/0x6c skb_release_head_state+0x6c/0xcc skb_release_all+0x1c/0x38 __kfree_skb+0x1c/0x30 kfree_skb+0xd0/0xf4 pfkey_broadcast+0x14c/0x18c pfkey_sendmsg+0x1d8/0x408 sock_sendmsg+0x44/0x60 ___sys_sendmsg+0x1d0/0x2a8 __sys_sendmsg+0x64/0xb4 SyS_sendmsg+0x34/0x4c el0_svc_naked+0x34/0x38 Kernel panic - not syncing: Fatal exception Suggested-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: Sean Tranchetti <stranche@codeaurora.org> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2019-02-12nvme-pci: add missing unlock for reset errorKeith Busch
The reset work holds a mutex to prevent races with removal modifying the same resources, but was unlocking only on success. Unlock on failure too. Fixes: 5c959d73dba64 ("nvme-pci: fix rapid add remove sequence") Signed-off-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
2019-02-11flow_offload: Fix flow action infrastructureEli Britstein
Implementation of macro "flow_action_for_each" introduced in commit e3ab786b42535 ("flow_offload: add flow action infrastructure") and used in commit 738678817573c ("drivers: net: use flow action infrastructure") iterated the first item twice and did not reach the last one. Fix it. Fixes: e3ab786b42535 ("flow_offload: add flow action infrastructure") Fixes: 738678817573c ("drivers: net: use flow action infrastructure") Signed-off-by: Eli Britstein <elibr@mellanox.com> Reviewed-by: Roi Dayan <roid@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Acked-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-11Documentation: fix some freescale dpio-driver.rst warningsRandy Dunlap
Fix markup warnings for one list by using correct list syntax. Fix markup warnings for another list by using blank lines before the list. Documentation/networking/device_drivers/freescale/dpaa2/dpio-driver.rst:30: WARNING: Unexpected indentation. Documentation/networking/device_drivers/freescale/dpaa2/dpio-driver.rst:143: WARNING: Unexpected indentation. Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Cc: Stuart Yoder <stuyoder@gmail.com> Cc: Laurentiu Tudor <laurentiu.tudor@nxp.com> Cc: Ioana Radulescu <ruxandra.radulescu@nxp.com> Cc: netdev@vger.kernel.org Cc: Madalin Bucur <madalin.bucur@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-11tipc: fix link session and re-establish issuesTuong Lien
When a link endpoint is re-created (e.g. after a node reboot or interface reset), the link session number is varied by random, the peer endpoint will be synced with this new session number before the link is re-established. However, there is a shortcoming in this mechanism that can lead to the link never re-established or faced with a failure then. It happens when the peer endpoint is ready in ESTABLISHING state, the 'peer_session' as well as the 'in_session' flag have been set, but suddenly this link endpoint leaves. When it comes back with a random session number, there are two situations possible: 1/ If the random session number is larger than (or equal to) the previous one, the peer endpoint will be updated with this new session upon receipt of a RESET_MSG from this endpoint, and the link can be re- established as normal. Otherwise, all the RESET_MSGs from this endpoint will be rejected by the peer. In turn, when this link endpoint receives one ACTIVATE_MSG from the peer, it will move to ESTABLISHED and start to send STATE_MSGs, but again these messages will be dropped by the peer due to wrong session. The peer link endpoint can still become ESTABLISHED after receiving a traffic message from this endpoint (e.g. a BCAST_PROTOCOL or NAME_DISTRIBUTOR), but since all the STATE_MSGs are invalid, the link will be forced down sooner or later! Even in case the random session number is larger than the previous one, it can be that the ACTIVATE_MSG from the peer arrives first, and this link endpoint moves quickly to ESTABLISHED without sending out any RESET_MSG yet. Consequently, the peer link will not be updated with the new session number, and the same link failure scenario as above will happen. 2/ Another situation can be that, the peer link endpoint was reset due to any reasons in the meantime, its link state was set to RESET from ESTABLISHING but still in session, i.e. the 'in_session' flag is not reset... Now, if the random session number from this endpoint is less than the previous one, all the RESET_MSGs from this endpoint will be rejected by the peer. In the other direction, when this link endpoint receives a RESET_MSG from the peer, it moves to ESTABLISHING and starts to send ACTIVATE_MSGs, but all these messages will be rejected by the peer too. As a result, the link cannot be re-established but gets stuck with this link endpoint in state ESTABLISHING and the peer in RESET! Solution: =========== This link endpoint should not go directly to ESTABLISHED when getting ACTIVATE_MSG from the peer which may belong to the old session if the link was re-created. To ensure the session to be correct before the link is re-established, the peer endpoint in ESTABLISHING state will send back the last session number in ACTIVATE_MSG for a verification at this endpoint. Then, if needed, a new and more appropriate session number will be regenerated to force a re-synch first. In addition, when a link in ESTABLISHING state is reset, its state will move to RESET according to the link FSM, along with resetting the 'in_session' flag (and the other data) as a normal link reset, it will also be deleted if requested. The solution is backward compatible. Acked-by: Jon Maloy <jon.maloy@ericsson.com> Acked-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: Tuong Lien <tuong.t.lien@dektech.com.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-11Merge branch 'devinfo-tweaks'David S. Miller
Jakub Kicinski says: ==================== devlink: minor tweaks to reported device info This series contains two minor touch ups for devlink code. First || is corrected to && in the ethtool compat code. Next patch decreases the stack allocation size. On the nfp side after further discussions with the manufacturing team we decided to realign the serial number contents slightly and rename one of the other fields from "vendor" to "mfr", short for "manufacture". v2: - add patch 3 - move board maker as a generic attribute. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-11nfp: devlink: include vendor/product info in serial numberJakub Kicinski
The manufacturing team requests we include vendor and product in the serial number field, as the serial number itself is not unique across manufacturing facilities and products. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-11nfp: devlink: use the generic manufacture identifier instead of vendorJakub Kicinski
Vendor may sound ambiguous, let's rename the fab string to "board.manufacture" (which was just added as a generic identifier). Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-11devlink: add a generic board.manufacture version nameJakub Kicinski
At Jiri's suggestion add a generic "board.manufacture" version identifier. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-11devlink: don't allocate attrs on the stackJakub Kicinski
Number of devlink attributes has grown over 128, causing the following warning: ../net/core/devlink.c: In function ‘devlink_nl_cmd_region_read_dumpit’: ../net/core/devlink.c:3740:1: warning: the frame size of 1064 bytes is larger than 1024 bytes [-Wframe-larger-than=] } ^ Since the number of attributes is only going to grow allocate the array dynamically. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-11devlink: fix condition for compat device infoJakub Kicinski
We need the port to be both ethernet and have the rigth netdev, not one or the other. Fixes: ddb6e99e2db1 ("ethtool: add compat for devlink info") Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-11net: fix IPv6 prefix route residueZhiqiang Liu
Follow those steps: # ip addr add 2001:123::1/32 dev eth0 # ip addr add 2001:123:456::2/64 dev eth0 # ip addr del 2001:123::1/32 dev eth0 # ip addr del 2001:123:456::2/64 dev eth0 and then prefix route of 2001:123::1/32 will still exist. This is because ipv6_prefix_equal in check_cleanup_prefix_route func does not check whether two IPv6 addresses have the same prefix length. If the prefix of one address starts with another shorter address prefix, even though their prefix lengths are different, the return value of ipv6_prefix_equal is true. Here I add a check of whether two addresses have the same prefix to decide whether their prefixes are equal. Fixes: 5b84efecb7d9 ("ipv6 addrconf: don't cleanup prefix route for IFA_F_NOPREFIXROUTE") Signed-off-by: Zhiqiang Liu <liuzhiqiang26@huawei.com> Reported-by: Wenhao Zhang <zhangwenhao8@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-11Merge branch 'bpf-prog-build'Alexei Starovoitov
Jiong Wang says: ==================== This set improves bpf object file related rules in selftests Makefile. - tell git to ignore the build dir "alu32". - extend sub-register mode compilation to all bpf object files to give LLVM compiler bpf back-end more exercise. - auto-generate bpf kernel object file list. - relax sub-register mode compilation criteria. v1 -> v2: - rename "kern_progs" to "progs". (Alexei) - spin a new patch to remove build server kernel requirement for sub-register mode compilation (Alexei) - rebase on top of KaFai’s latest "test_sock_fields" patch set. ==================== Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2019-02-11selftests: bpf: relax sub-register mode compilation criteriaJiong Wang
Sub-register mode compilation was enabled only when there are eBPF "v3" processor supports at both compilation time inside LLVM and runtime inside kernel. Given separation betwen build and test server could be often, this patch removes the runtime support criteria. Suggested-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Jiong Wang <jiong.wang@netronome.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2019-02-11selftests: bpf: centre kernel bpf objects under new subdir "progs"Jiong Wang
At the moment, all kernel bpf objects are listed under BPF_OBJ_FILES. Listing them manually sometimes causing patch conflict when people are adding new testcases simultaneously. It is better to centre all the related source files under a subdir "progs", then auto-generate the object file list. Suggested-by: Alexei Starovoitov <ast@kernel.org> Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: Jiong Wang <jiong.wang@netronome.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2019-02-11selftests: bpf: extend sub-register mode compilation to all bpf object filesJiong Wang
At the moment, we only do extra sub-register mode compilation on bpf object files used by "test_progs". These object files are really loaded and executed. This patch further extends sub-register mode compilation to all bpf object files, even those without corresponding runtime tests. Because this could help testing LLVM sub-register code-gen, kernel bpf selftest has much more C testcases with reasonable size and complexity compared with LLVM testsuite which only contains unit tests. There were some file duplication inside BPF_OBJ_FILES_DUAL_COMPILE which is removed now. Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: Jiong Wang <jiong.wang@netronome.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2019-02-11selftests: bpf: add "alu32" to .gitignoreJiong Wang
"alu32" is a build dir and contains various files for BPF sub-register code-gen testing. This patch tells git to ignore it. Suggested-by: Yonghong Song <yhs@fb.com> Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: Jiong Wang <jiong.wang@netronome.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2019-02-11blk-mq: insert rq with DONTPREP to hctx dispatch list when requeueJianchao Wang
When requeue, if RQF_DONTPREP, rq has contained some driver specific data, so insert it to hctx dispatch list to avoid any merge. Take scsi as example, here is the trace event log (no io scheduler, because RQF_STARTED would prevent merging), kworker/0:1H-339 [000] ...1 2037.209289: block_rq_insert: 8,0 R 4096 () 32768 + 8 [kworker/0:1H] scsi_inert_test-1987 [000] .... 2037.220465: block_bio_queue: 8,0 R 32776 + 8 [scsi_inert_test] scsi_inert_test-1987 [000] ...2 2037.220466: block_bio_backmerge: 8,0 R 32776 + 8 [scsi_inert_test] kworker/0:1H-339 [000] .... 2047.220913: block_rq_issue: 8,0 R 8192 () 32768 + 16 [kworker/0:1H] scsi_inert_test-1996 [000] ..s1 2047.221007: block_rq_complete: 8,0 R () 32768 + 8 [0] scsi_inert_test-1996 [000] .Ns1 2047.221045: block_rq_requeue: 8,0 R () 32776 + 8 [0] kworker/0:1H-339 [000] ...1 2047.221054: block_rq_insert: 8,0 R 4096 () 32776 + 8 [kworker/0:1H] kworker/0:1H-339 [000] ...1 2047.221056: block_rq_issue: 8,0 R 4096 () 32776 + 8 [kworker/0:1H] scsi_inert_test-1986 [000] ..s1 2047.221119: block_rq_complete: 8,0 R () 32776 + 8 [0] (32768 + 8) was requeued by scsi_queue_insert and had RQF_DONTPREP. Then it was merged with (32776 + 8) and issued. Due to RQF_DONTPREP, the sdb only contained the part of (32768 + 8), then only that part was completed. The lucky thing was that scsi_io_completion detected it and requeued the remaining part. So we didn't get corrupted data. However, the requeue of (32776 + 8) is not expected. Suggested-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Jianchao Wang <jianchao.w.wang@oracle.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-02-11tipc: fix skb may be leaky in tipc_link_inputHoang Le
When we free skb at tipc_data_input, we return a 'false' boolean. Then, skb passed to subcalling tipc_link_input in tipc_link_rcv, <snip> 1303 int tipc_link_rcv: ... 1354 if (!tipc_data_input(l, skb, l->inputq)) 1355 rc |= tipc_link_input(l, skb, l->inputq); </snip> Fix it by simple changing to a 'true' boolean when skb is being free-ed. Then, tipc_link_rcv will bypassed to subcalling tipc_link_input as above condition. Acked-by: Ying Xue <ying.xue@windriver.com> Acked-by: Jon Maloy <maloy@donjonn.com> Signed-off-by: Hoang Le <hoang.h.le@dektech.com.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-12netfilter: xt_recent: Use struct_size() in kvzalloc()Gustavo A. R. Silva
One of the more common cases of allocation size calculations is finding the size of a structure that has a zero-sized array at the end, along with memory for some number of elements for that array. For example: struct foo { int stuff; void *entry[]; }; size = sizeof(struct foo) + count * sizeof(void *); instance = alloc(size, GFP_KERNEL) Instead of leaving these open-coded and prone to type mistakes, we can now use the new struct_size() helper: size = struct_size(instance, entry, count); instance = alloc(size, GFP_KERNEL) Notice that, in this case, variable sz is not necessary, hence it is removed. This code was detected with the help of Coccinelle. Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>