summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2019-09-09Documentation: iavf: Update the Intel LAN driver doc for iavfJeff Kirsher
Update the LAN driver documentation to include the latest feature implementation and driver capabilities. Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Tested-by: Aaron Brown <aaron.f.brown@intel.com>
2019-09-09igc: Remove useless forward declarationSasha Neftin
Move igc_phy_setup_autoneg, igc_wait_autoneg and igc_set_fc_watermarks up to avoid forward declaration. It is not necessary to forward declare these static methods. Signed-off-by: Sasha Neftin <sasha.neftin@intel.com> Tested-by: Aaron Brown <aaron.f.brown@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2019-09-09e1000e: Make speed detection on hotplugging cable more reliableKai-Heng Feng
After hot plugging an 1Gbps Ethernet cable with 1Gbps link partner, the MII_BMSR may report 10Mbps, renders the network rather slow. The issue has much lower fail rate after commit 59653e6497d1 ("e1000e: Make watchdog use delayed work"), which essentially introduces some delay before running the watchdog task. But there's still a chance that the hot plugging event and the queued watchdog task gets run at the same time, then the original issue can be observed once again. So let's use mod_delayed_work() to add a deterministic 1 second delay before running watchdog task, after an interrupt. Signed-off-by: Kai-Heng Feng <kai.heng.feng@canonical.com> Tested-by: Aaron Brown <aaron.f.brown@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2019-09-09ixgbevf: Link lost in VM on ixgbevf when restoring from freeze or suspendRadoslaw Tyl
This patch fixed issue in VM which shows no link when hypervisor is restored from low-power state. The driver is responsible for re-enabling any features of the device that had been disabled during suspend calls, such as IRQs and bus mastering. Signed-off-by: Radoslaw Tyl <radoslawx.tyl@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2019-09-09iavf: remove unused debug function iavf_debug_dYueHaibing
There is no caller of function iavf_debug_d() in tree since commit 75051ce4c5d8 ("iavf: Fix up debug print macro"), so it can be removed. Reported-by: Hulk Robot <hulkci@huawei.com> Signed-off-by: YueHaibing <yuehaibing@huawei.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2019-09-09virtio_ring: fix unmap of indirect descriptorsMatthias Lange
The function virtqueue_add_split() DMA-maps the scatterlist buffers. In case a mapping error occurs the already mapped buffers must be unmapped. This happens by jumping to the 'unmap_release' label. In case of indirect descriptors the release is wrong and may leak kernel memory. Because the implementation assumes that the head descriptor is already mapped it starts iterating over the descriptor list starting from the head descriptor. However for indirect descriptors the head descriptor is never mapped in case of an error. The fix is to initialize the start index with zero in case of indirect descriptors and use the 'desc' pointer directly for iterating over the descriptor chain. Signed-off-by: Matthias Lange <matthias.lange@kernkonzept.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2019-09-09drm/i915: Restore relaxed padding (OCL_OOB_SUPPRES_ENABLE) for skl+Chris Wilson
This bit was fliped on for "syncing dependencies between camera and graphics". BSpec has no recollection why, and it is causing unrecoverable GPU hangs with Vulkan compute workloads. From BSpec, setting bit5 to 0 enables relaxed padding requirements for buffers, 1D and 2D non-array, non-MSAA, non-mip-mapped linear surfaces; and *must* be set to 0h on skl+ to ensure "Out of Bounds" case is suppressed. Reported-by: Jason Ekstrand <jason@jlekstrand.net> Suggested-by: Jason Ekstrand <jason@jlekstrand.net> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=110998 Fixes: 8424171e135c ("drm/i915/gen9: h/w w/a: syncing dependencies between camera and graphics") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Tested-by: denys.kostin@globallogic.com Cc: Jason Ekstrand <jason@jlekstrand.net> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> Cc: <stable@vger.kernel.org> # v4.1+ Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190904100707.7377-1-chris@chris-wilson.co.uk (cherry picked from commit 9d7b01e93526efe79dbf75b69cc5972b5a4f7b37) Signed-off-by: Jani Nikula <jani.nikula@intel.com>
2019-09-09drm/i915: Limit MST to <= 8bpc once againVille Syrjälä
My attempt at allowing MST to use the higher color depths has regressed some configurations. Apparently people have setups where all MST streams will fit into the DP link with 8bpc but won't fit with higher color depths. What we really should be doing is reducing the bpc for all the streams on the same link until they start to fit. But that requires a bit more work, so in the meantime let's revert back closer to the old behavior and limit MST to at most 8bpc. Cc: stable@vger.kernel.org Cc: Lyude Paul <lyude@redhat.com> Tested-by: Geoffrey Bennett <gmux22@gmail.com> Fixes: f1477219869c ("drm/i915: Remove the 8bpc shackles from DP MST") Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=111505 Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190828102059.2512-1-ville.syrjala@linux.intel.com Reviewed-by: Lyude Paul <lyude@redhat.com> (cherry picked from commit 75427b2a2bffc083d51dec389c235722a9c69b05) Signed-off-by: Jani Nikula <jani.nikula@intel.com>
2019-09-09gpio: fix line flag validation in lineevent_createKent Gibson
lineevent_create should not allow any of GPIOHANDLE_REQUEST_OUTPUT, GPIOHANDLE_REQUEST_OPEN_DRAIN or GPIOHANDLE_REQUEST_OPEN_SOURCE to be set. Fixes: d7c51b47ac11 ("gpio: userspace ABI for reading/writing GPIO lines") Cc: stable <stable@vger.kernel.org> Signed-off-by: Kent Gibson <warthog618@gmail.com> Signed-off-by: Bartosz Golaszewski <bgolaszewski@baylibre.com>
2019-09-09gpio: fix line flag validation in linehandle_createKent Gibson
linehandle_create should not allow both GPIOHANDLE_REQUEST_INPUT and GPIOHANDLE_REQUEST_OUTPUT to be set. Fixes: d7c51b47ac11 ("gpio: userspace ABI for reading/writing GPIO lines") Cc: stable <stable@vger.kernel.org> Signed-off-by: Kent Gibson <warthog618@gmail.com> Signed-off-by: Bartosz Golaszewski <bgolaszewski@baylibre.com>
2019-09-09gpio: mockup: add missing single_release()Wei Yongjun
When using single_open() for opening, single_release() should be used instead of seq_release(), otherwise there is a memory leak. Fixes: 2a9e27408e12 ("gpio: mockup: rework debugfs interface") Cc: stable <stable@vger.kernel.org> Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com> Signed-off-by: Bartosz Golaszewski <bgolaszewski@baylibre.com>
2019-09-08Linux 5.3-rc8Linus Torvalds
2019-09-08netfilter: nf_tables_offload: move indirect flow_block callback logic to corePablo Neira Ayuso
Add nft_offload_init() and nft_offload_exit() function to deal with the init and the exit path of the offload infrastructure. Rename nft_indr_block_get_and_ing_cmd() to nft_indr_block_cb(). Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2019-09-08Merge tag 'compiler-attributes-for-linus-v5.3-rc8' of ↵Linus Torvalds
git://github.com/ojeda/linux Pull section attribute fix from Miguel Ojeda: "Fix Oops in Clang-compiled kernels (Nick Desaulniers)" * tag 'compiler-attributes-for-linus-v5.3-rc8' of git://github.com/ojeda/linux: include/linux/compiler.h: fix Oops for Clang-compiled kernels
2019-09-08Merge tag 'gpio-v5.3-5' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-gpio Pull GPIO fixes from Linus Walleij: "All related to the PCA953x driver when handling chips with more than 8 ports, now that works again" * tag 'gpio-v5.3-5' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-gpio: gpio: pca953x: use pca953x_read_regs instead of regmap_bulk_read gpio: pca953x: correct type of reg_direction
2019-09-08netfilter: nf_tables_offload: avoid excessive stack usageArnd Bergmann
The nft_offload_ctx structure is much too large to put on the stack: net/netfilter/nf_tables_offload.c:31:23: error: stack frame size of 1200 bytes in function 'nft_flow_rule_create' [-Werror,-Wframe-larger-than=] Use dynamic allocation here, as we do elsewhere in the same function. Fixes: c9626a2cbdb2 ("netfilter: nf_tables: add hardware offload support") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2019-09-08netfilter: nf_tables: Fix an Oops in nf_tables_updobj() error handlingDan Carpenter
The "newobj" is an error pointer so we can't pass it to kfree(). It doesn't need to be freed so we can remove that and I also renamed the error label. Fixes: d62d0ba97b58 ("netfilter: nf_tables: Introduce stateful object update operation") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Acked-by: Fernando Fernandez Mancera <ffmancera@riseup.net> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2019-09-08include/linux/compiler.h: fix Oops for Clang-compiled kernelsNick Desaulniers
GCC unescapes escaped string section names while Clang does not. Because __section uses the `#` stringification operator for the section name, it doesn't need to be escaped. This fixes an Oops observed in distro's that use systemd and not net.core.bpf_jit_enable=1, when their kernels are compiled with Clang. Link: https://github.com/ClangBuiltLinux/linux/issues/619 Link: https://bugs.llvm.org/show_bug.cgi?id=42950 Link: https://marc.info/?l=linux-netdev&m=156412960619946&w=2 Link: https://lore.kernel.org/lkml/20190904181740.GA19688@gmail.com/ Acked-by: Will Deacon <will@kernel.org> Reported-by: Sedat Dilek <sedat.dilek@gmail.com> Suggested-by: Josh Poimboeuf <jpoimboe@redhat.com> Tested-by: Sedat Dilek <sedat.dilek@gmail.com> Signed-off-by: Nick Desaulniers <ndesaulniers@google.com> [Cherry-picked from the __section cleanup series for 5.3] [Adjusted commit message] Signed-off-by: Miguel Ojeda <miguel.ojeda.sandonis@gmail.com>
2019-09-08x86/timer: Force PIT initialization when !X86_FEATURE_ARATJan Stancek
KVM guests with commit c8c4076723da ("x86/timer: Skip PIT initialization on modern chipsets") applied to guest kernel have been observed to have unusually higher CPU usage with symptoms of increase in vm exits for HLT and MSW_WRITE (MSR_IA32_TSCDEADLINE). This is caused by older QEMUs lacking support for X86_FEATURE_ARAT. lapic clock retains CLOCK_EVT_FEAT_C3STOP and nohz stays inactive. There's no usable broadcast device either. Do the PIT initialization if guest CPU lacks X86_FEATURE_ARAT. On real hardware it shouldn't matter as ARAT and DEADLINE come together. Fixes: c8c4076723da ("x86/timer: Skip PIT initialization on modern chipsets") Signed-off-by: Jan Stancek <jstancek@redhat.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2019-09-07Revert "x86/apic: Include the LDR when clearing out APIC registers"Linus Torvalds
This reverts commit 558682b5291937a70748d36fd9ba757fb25b99ae. Chris Wilson reports that it breaks his CPU hotplug test scripts. In particular, it breaks offlining and then re-onlining the boot CPU, which we treat specially (and the BIOS does too). The symptoms are that we can offline the CPU, but it then does not come back online again: smpboot: CPU 0 is now offline smpboot: Booting Node 0 Processor 0 APIC 0x0 smpboot: do_boot_cpu failed(-1) to wakeup CPU#0 Thomas says he knows why it's broken (my personal suspicion: our magic handling of the "cpu0_logical_apicid" thing), but for 5.3 the right fix is to just revert it, since we've never touched the LDR bits before, and it's not worth the risk to do anything else at this stage. [ Hotpluging of the boot CPU is special anyway, and should be off by default. See the "BOOTPARAM_HOTPLUG_CPU0" config option and the cpu0_hotplug kernel parameter. In general you should not do it, and it has various known limitations (hibernate and suspend require the boot CPU, for example). But it should work, even if the boot CPU is special and needs careful treatment - Linus ] Link: https://lore.kernel.org/lkml/156785100521.13300.14461504732265570003@skylake-alporthouse-com/ Reported-by: Chris Wilson <chris@chris-wilson.co.uk> Acked-by: Thomas Gleixner <tglx@linutronix.de> Cc: Bandan Das <bsd@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-09-07ipc: fix sparc64 ipc() wrapperArnd Bergmann
Matt bisected a sparc64 specific issue with semctl, shmctl and msgctl to a commit from my y2038 series in linux-5.1, as I missed the custom sys_ipc() wrapper that sparc64 uses in place of the generic version that I patched. The problem is that the sys_{sem,shm,msg}ctl() functions in the kernel now do not allow being called with the IPC_64 flag any more, resulting in a -EINVAL error when they don't recognize the command. Instead, the correct way to do this now is to call the internal ksys_old_{sem,shm,msg}ctl() functions to select the API version. As we generally move towards these functions anyway, change all of sparc_ipc() to consistently use those in place of the sys_*() versions, and move the required ksys_*() declarations into linux/syscalls.h The IS_ENABLED(CONFIG_SYSVIPC) check is required to avoid link errors when ipc is disabled. Reported-by: Matt Turner <mattst88@gmail.com> Fixes: 275f22148e87 ("ipc: rename old-style shmctl/semctl/msgctl syscalls") Cc: stable@vger.kernel.org Tested-by: Matt Turner <mattst88@gmail.com> Tested-by: Anatoly Pugachev <matorola@gmail.com> Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2019-09-07Merge tag 'char-misc-5.3-rc8' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc Pull Documentation updates from Greg KH: "A few small patches for the documenation file that came in through the char-misc tree in -rc7 for your tree. They fix the mistake in the .rst format that kept the table of companies from showing up in the html output, and most importantly, add people's names to the list showing support for our process" * tag 'char-misc-5.3-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc: Documentation/process: Add Qualcomm process ambassador for hardware security issues Documentation/process/embargoed-hardware-issues: Microsoft ambassador Documentation/process: Add Google contact for embargoed hardware issues Documentation/process: Volunteer as the ambassador for Xen
2019-09-07Documentation/process: Add Qualcomm process ambassador for hardware security ↵Trilok Soni
issues Add Trilok Soni as process ambassador for hardware security issues from Qualcomm. Signed-off-by: Trilok Soni <tsoni@codeaurora.org> Link: https://lore.kernel.org/r/1567796517-8964-1-git-send-email-tsoni@codeaurora.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-09-07Merge tag 'dmaengine-fix-5.3' of git://git.infradead.org/users/vkoul/slave-dmaLinus Torvalds
Pull dmaengine fixes from Vinod Koul: "Some late fixes for drivers: - memory leak in ti crossbar dma driver - cleanup of omap dma probe - Fix for link list configuration in sprd dma driver - Handling fixed for DMACHCLR if iommu is mapped in rcar dma" * tag 'dmaengine-fix-5.3' of git://git.infradead.org/users/vkoul/slave-dma: dmaengine: rcar-dmac: Fix DMACHCLR handling if iommu is mapped dmaengine: sprd: Fix the DMA link-list configuration dmaengine: ti: omap-dma: Add cleanup in omap_dma_probe() dmaengine: ti: dma-crossbar: Fix a memory leak bug
2019-09-07Merge branch 'net-tls-small-TX-offload-optimizations'David S. Miller
Jakub Kicinski says: ==================== net/tls: small TX offload optimizations This set brings small TLS TX device optimizations. The biggest gain comes from fixing a misuse of non temporal copy instructions. On a synthetic workload modelled after customer's RFC application I see 3-5% percent gain. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2019-09-07net/tls: align non temporal copy to cache linesJakub Kicinski
Unlike normal TCP code TLS has to touch the cache lines it copies into to fill header info. On memory-heavy workloads having non temporal stores and normal accesses targeting the same cache line leads to significant overhead. Measured 3% overhead running 3600 round robin connections with additional memory heavy workload. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-09-07net/tls: remove the record tail optimizationJakub Kicinski
For TLS device offload the tag/message authentication code are filled in by the device. The kernel merely reserves space for them. Because device overwrites it, the contents of the tag make do no matter. Current code tries to save space by reusing the header as the tag. This, however, leads to an additional frag being created and defeats buffer coalescing (which trickles all the way down to the drivers). Remove this optimization, and try to allocate the space for the tag in the usual way, leave the memory uninitialized. If memory allocation fails rewind the record pointer so that we use the already copied user data as tag. Note that the optimization was actually buggy, as the tag for TLS 1.2 is 16 bytes, but header is just 13, so the reuse may had looked past the end of the page.. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-09-07net/tls: use RCU for the adder to the offload record listJakub Kicinski
All modifications to TLS record list happen under the socket lock. Since records form an ordered queue readers are only concerned about elements being removed, additions can happen concurrently. Use RCU primitives to ensure the correct access types (READ_ONCE/WRITE_ONCE). Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-09-07net/tls: unref frags in orderJakub Kicinski
It's generally more cache friendly to walk arrays in order, especially those which are likely not in cache. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-09-07Merge branch 'for-upstream' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth-next Johan Hedberg says: ==================== pull request: bluetooth-next 2019-09-06 Here's the main bluetooth-next pull request for the 5.4 kernel. - Cleanups & fixes to btrtl driver - Fixes for Realtek devices in btusb, e.g. for suspend handling - Firmware loading support for BCM4345C5 - hidp_send_message() return value handling fixes - Added support for utilizing Fast Advertising Interval - Various other minor cleanups & fixes Please let me know if there are any issues pulling. Thanks. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2019-09-07nfp: flower: cmsg rtnl locks can timeout reify messagesFred Lotter
Flower control message replies are handled in different locations. The truly high priority replies are handled in the BH (tasklet) context, while the remaining replies are handled in a predefined Linux work queue. The work queue handler orders replies into high and low priority groups, and always start servicing the high priority replies within the received batch first. Reply Type: Rtnl Lock: Handler: CMSG_TYPE_PORT_MOD no BH tasklet (mtu) CMSG_TYPE_TUN_NEIGH no BH tasklet CMSG_TYPE_FLOW_STATS no BH tasklet CMSG_TYPE_PORT_REIFY no WQ high CMSG_TYPE_PORT_MOD yes WQ high (link/mtu) CMSG_TYPE_MERGE_HINT yes WQ low CMSG_TYPE_NO_NEIGH no WQ low CMSG_TYPE_ACTIVE_TUNS no WQ low CMSG_TYPE_QOS_STATS no WQ low CMSG_TYPE_LAG_CONFIG no WQ low A subset of control messages can block waiting for an rtnl lock (from both work queue priority groups). The rtnl lock is heavily contended for by external processes such as systemd-udevd, systemd-network and libvirtd, especially during netdev creation, such as when flower VFs and representors are instantiated. Kernel netlink instrumentation shows that external processes (such as systemd-udevd) often use successive rtnl_trylock() sequences, which can result in an rtnl_lock() blocked control message to starve for longer periods of time during rtnl lock contention, i.e. netdev creation. In the current design a single blocked control message will block the entire work queue (both priorities), and introduce a latency which is nondeterministic and dependent on system wide rtnl lock usage. In some extreme cases, one blocked control message at exactly the wrong time, just before the maximum number of VFs are instantiated, can block the work queue for long enough to prevent VF representor REIFY replies from getting handled in time for the 40ms timeout. The firmware will deliver the total maximum number of REIFY message replies in around 300us. Only REIFY and MTU update messages require replies within a timeout period (of 40ms). The MTU-only updates are already done directly in the BH (tasklet) handler. Move the REIFY handler down into the BH (tasklet) in order to resolve timeouts caused by a blocked work queue waiting on rtnl locks. Signed-off-by: Fred Lotter <frederik.lotter@netronome.com> Signed-off-by: Simon Horman <simon.horman@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-09-07net: hns3: make array spec_opcode static const, makes object smallerColin Ian King
Don't populate the array spec_opcode on the stack but instead make it static const. Makes the object code smaller by 48 bytes. Before: text data bss dec hex filename 6914 1040 128 8082 1f92 hns3/hns3vf/hclgevf_cmd.o After: text data bss dec hex filename 6866 1040 128 8034 1f62 hns3/hns3vf/hclgevf_cmd.o (gcc version 9.2.1, amd64) Signed-off-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-09-07be2net: make two arrays static const, makes object smallerColin Ian King
Don't populate the arrays on the stack but instead make them static const. Makes the object code smaller by 281 bytes. Before: text data bss dec hex filename 87553 5672 0 93225 16c29 benet/be_cmds.o After: text data bss dec hex filename 87112 5832 0 92944 16b10 benet/be_cmds.o (gcc version 9.2.1, amd64) Signed-off-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-09-07ionic: Remove unused including <linux/version.h>YueHaibing
Remove including <linux/version.h> that don't need it. Signed-off-by: YueHaibing <yuehaibing@huawei.com> Acked-by: Shannon Nelson <snelson@pensando.io> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-09-07net: gso: Fix skb_segment splat when splitting gso_size mangled skb having ↵Shmulik Ladkani
linear-headed frag_list Historically, support for frag_list packets entering skb_segment() was limited to frag_list members terminating on exact same gso_size boundaries. This is verified with a BUG_ON since commit 89319d3801d1 ("net: Add frag_list support to skb_segment"), quote: As such we require all frag_list members terminate on exact MSS boundaries. This is checked using BUG_ON. As there should only be one producer in the kernel of such packets, namely GRO, this requirement should not be difficult to maintain. However, since commit 6578171a7ff0 ("bpf: add bpf_skb_change_proto helper"), the "exact MSS boundaries" assumption no longer holds: An eBPF program using bpf_skb_change_proto() DOES modify 'gso_size', but leaves the frag_list members as originally merged by GRO with the original 'gso_size'. Example of such programs are bpf-based NAT46 or NAT64. This lead to a kernel BUG_ON for flows involving: - GRO generating a frag_list skb - bpf program performing bpf_skb_change_proto() or bpf_skb_adjust_room() - skb_segment() of the skb See example BUG_ON reports in [0]. In commit 13acc94eff12 ("net: permit skb_segment on head_frag frag_list skb"), skb_segment() was modified to support the "gso_size mangling" case of a frag_list GRO'ed skb, but *only* for frag_list members having head_frag==true (having a page-fragment head). Alas, GRO packets having frag_list members with a linear kmalloced head (head_frag==false) still hit the BUG_ON. This commit adds support to skb_segment() for a 'head_skb' packet having a frag_list whose members are *non* head_frag, with gso_size mangled, by disabling SG and thus falling-back to copying the data from the given 'head_skb' into the generated segmented skbs - as suggested by Willem de Bruijn [1]. Since this approach involves the penalty of skb_copy_and_csum_bits() when building the segments, care was taken in order to enable this solution only when required: - untrusted gso_size, by testing SKB_GSO_DODGY is set (SKB_GSO_DODGY is set by any gso_size mangling functions in net/core/filter.c) - the frag_list is non empty, its item is a non head_frag, *and* the headlen of the given 'head_skb' does not match the gso_size. [0] https://lore.kernel.org/netdev/20190826170724.25ff616f@pixies/ https://lore.kernel.org/netdev/9265b93f-253d-6b8c-f2b8-4b54eff1835c@fb.com/ [1] https://lore.kernel.org/netdev/CA+FuTSfVsgNDi7c=GUU8nMg2hWxF2SjCNLXetHeVPdnxAW5K-w@mail.gmail.com/ Fixes: 6578171a7ff0 ("bpf: add bpf_skb_change_proto helper") Suggested-by: Willem de Bruijn <willemdebruijn.kernel@gmail.com> Cc: Daniel Borkmann <daniel@iogearbox.net> Cc: Eric Dumazet <eric.dumazet@gmail.com> Cc: Alexander Duyck <alexander.duyck@gmail.com> Signed-off-by: Shmulik Ladkani <shmulik.ladkani@gmail.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Reviewed-by: Alexander Duyck <alexander.h.duyck@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-09-07Merge branch 'stmmac-next'David S. Miller
Jose Abreu says: ==================== net: stmmac: Improvements and fixes for -next Improvements and fixes for recently introduced features. All for -next tree. More info in commit logs. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2019-09-07net: stmmac: Limit max speeds of XGMAC if asked toJose Abreu
We may have some SoCs that can't achieve XGMAC max speed. Limit it if asked to. Signed-off-by: Jose Abreu <joabreu@synopsys.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-09-07net: stmmac: selftests: Add Split Header testJose Abreu
Add a test to validate that Split Header feature is working correctly. It works by using the rececently introduced counter that increments each time a packet with split header is received. Signed-off-by: Jose Abreu <joabreu@synopsys.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-09-07net: stmmac: dwmac4: Enable RX Jumbo frame supportJose Abreu
We are already doing it by default in the TX path so we can also enable Jumbo Frame support in the RX path independently of MTU value. Signed-off-by: Jose Abreu <joabreu@synopsys.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-09-07net: stmmac: selftests: Set RX tail pointer in Flow Control testJose Abreu
We need to set the RX tail pointer so that RX engine starts working again after finishing the Flow Control test. Signed-off-by: Jose Abreu <joabreu@synopsys.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-09-07net: stmmac: selftests: Add missing checks for support of SAJose Abreu
Add checks for support of Source Address Insertion/Replacement before running the test. Signed-off-by: Jose Abreu <joabreu@synopsys.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-09-07ipmr: remove hard code cache_resolve_queue_len limitHangbin Liu
This is a re-post of previous patch wrote by David Miller[1]. Phil Karn reported[2] that on busy networks with lots of unresolved multicast routing entries, the creation of new multicast group routes can be extremely slow and unreliable. The reason is we hard-coded multicast route entries with unresolved source addresses(cache_resolve_queue_len) to 10. If some multicast route never resolves and the unresolved source addresses increased, there will be no ability to create new multicast route cache. To resolve this issue, we need either add a sysctl entry to make the cache_resolve_queue_len configurable, or just remove cache_resolve_queue_len limit directly, as we already have the socket receive queue limits of mrouted socket, pointed by David. >From my side, I'd perfer to remove the cache_resolve_queue_len limit instead of creating two more(IPv4 and IPv6 version) sysctl entry. [1] https://lkml.org/lkml/2018/7/22/11 [2] https://lkml.org/lkml/2018/7/21/343 v3: instead of remove cache_resolve_queue_len totally, let's only remove the hard code limit when allocate the unresolved cache, as Eric Dumazet suggested, so we don't need to re-count it in other places. v2: hold the mfc_unres_lock while walking the unresolved list in queue_count(), as Nikolay Aleksandrov remind. Reported-by: Phil Karn <karn@ka9q.net> Signed-off-by: Hangbin Liu <liuhangbin@gmail.com> Reviewed-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-09-07ipv6: addrconf_f6i_alloc - fix non-null pointer check to !IS_ERR()Maciej Żenczykowski
Fixes a stupid bug I recently introduced... ip6_route_info_create() returns an ERR_PTR(err) and not a NULL on error. Fixes: d55a2e374a94 ("net-ipv6: fix excessive RTF_ADDRCONF flag on ::1/128 local route (and others)'") Cc: David Ahern <dsahern@gmail.com> Cc: Lorenzo Colitti <lorenzo@google.com> Cc: Eric Dumazet <edumazet@google.com> Signed-off-by: Maciej Żenczykowski <maze@google.com> Reported-by: syzbot <syzkaller@googlegroups.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-09-07isdn/capi: check message length in capi_write()Eric Biggers
syzbot reported: BUG: KMSAN: uninit-value in capi_write+0x791/0xa90 drivers/isdn/capi/capi.c:700 CPU: 0 PID: 10025 Comm: syz-executor379 Not tainted 4.20.0-rc7+ #2 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Call Trace: __dump_stack lib/dump_stack.c:77 [inline] dump_stack+0x173/0x1d0 lib/dump_stack.c:113 kmsan_report+0x12e/0x2a0 mm/kmsan/kmsan.c:613 __msan_warning+0x82/0xf0 mm/kmsan/kmsan_instr.c:313 capi_write+0x791/0xa90 drivers/isdn/capi/capi.c:700 do_loop_readv_writev fs/read_write.c:703 [inline] do_iter_write+0x83e/0xd80 fs/read_write.c:961 vfs_writev fs/read_write.c:1004 [inline] do_writev+0x397/0x840 fs/read_write.c:1039 __do_sys_writev fs/read_write.c:1112 [inline] __se_sys_writev+0x9b/0xb0 fs/read_write.c:1109 __x64_sys_writev+0x4a/0x70 fs/read_write.c:1109 do_syscall_64+0xbc/0xf0 arch/x86/entry/common.c:291 entry_SYSCALL_64_after_hwframe+0x63/0xe7 [...] The problem is that capi_write() is reading past the end of the message. Fix it by checking the message's length in the needed places. Reported-and-tested-by: syzbot+0849c524d9c634f5ae66@syzkaller.appspotmail.com Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-09-07Merge branch 'hv_netvsc-features'David S. Miller
Haiyang Zhang says: ==================== hv_netvsc: Enable sg as tunable, sync offload settings to VF NIC This patch set fixes an issue in SG tuning, and sync offload settings from synthetic NIC to VF NIC. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2019-09-07hv_netvsc: Sync offloading features to VF NICHaiyang Zhang
VF NIC may go down then come up during host servicing events. This causes the VF NIC offloading feature settings to roll back to the defaults. This patch can synchronize features from synthetic NIC to the VF NIC during ndo_set_features (ethtool -K), and netvsc_register_vf when VF comes back after host events. Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com> Cc: Mark Bloch <markb@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-09-07hv_netvsc: Allow scatter-gather feature to be tunableHaiyang Zhang
In a previous patch, the NETIF_F_SG was missing after the code changes. That caused the SG feature to be "fixed". This patch includes it into hw_features, so it is tunable again. Fixes: 23312a3be999 ("netvsc: negotiate checksum and segmentation parameters") Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-09-07Merge tag 'mlx5-updates-2019-09-05' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux Saeed Mahameed says: ==================== mlx5-updates-2019-09-05 1) Allover mlx5 cleanups 2) Added port congestion counters to ethtool stats: Add 3 counters per priority to ethtool using PPCNT: 2.1) rx_prio[p]_buf_discard - the number of packets discarded by device due to lack of per host receive buffers 2.2) rx_prio[p]_cong_discard - the number of packets discarded by device due to per host congestion 2.3) rx_prio[p]_marked - the number of packets ECN marked by device due to per host congestion ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2019-09-07net/ibmvnic: free reset work of removed device from queueJuliet Kim
Commit 36f1031c51a2 ("ibmvnic: Do not process reset during or after device removal") made the change to exit reset if the driver has been removed, but does not free reset work items of the adapter from queue. Ensure all reset work items are freed when breaking out of the loop early. Fixes: 36f1031c51a2 ("ibmnvic: Do not process reset during or after device removal”) Signed-off-by: Juliet Kim <julietk@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-09-07tcp: ulp: fix possible crash in tcp_diag_get_aux_size()Eric Dumazet
tcp_diag_get_aux_size() can be called with sockets in any state. icsk_ulp_ops is only present for full sockets. For SYN_RECV or TIME_WAIT ones we would access garbage. Fixes: 61723b393292 ("tcp: ulp: add functions to dump ulp-specific information") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: Luke Hsiao <lukehsiao@google.com> Reported-by: Neal Cardwell <ncardwell@google.com> Cc: Davide Caratti <dcaratti@redhat.com> Acked-by: Davide Caratti <dcaratti@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>