summaryrefslogtreecommitdiff
path: root/include
AgeCommit message (Collapse)Author
2018-12-04tcp: reduce POLLOUT events caused by TCP_NOTSENT_LOWATEric Dumazet
TCP_NOTSENT_LOWAT socket option or sysctl was added in linux-3.12 as a step to enable bigger tcp sndbuf limits. It works reasonably well, but the following happens : Once the limit is reached, TCP stack generates an [E]POLLOUT event for every incoming ACK packet. This causes a high number of context switches. This patch implements the strategy David Miller added in sock_def_write_space() : - If TCP socket has a notsent_lowat constraint of X bytes, allow sendmsg() to fill up to X bytes, but send [E]POLLOUT only if number of notsent bytes is below X/2 This considerably reduces TCP_NOTSENT_LOWAT overhead, while allowing to keep the pipe full. Tested: 100 ms RTT netem testbed between A and B, 100 concurrent TCP_STREAM A:/# cat /proc/sys/net/ipv4/tcp_wmem 4096 262144 64000000 A:/# super_netperf 100 -H B -l 1000 -- -K bbr & A:/# grep TCP /proc/net/sockstat TCP: inuse 203 orphan 0 tw 19 alloc 414 mem 1364904 # This is about 54 MB of memory per flow :/ A:/# vmstat 5 5 procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu----- r b swpd free buff cache si so bi bo in cs us sy id wa st 0 0 0 256220672 13532 694976 0 0 10 0 28 14 0 1 99 0 0 2 0 0 256320016 13532 698480 0 0 512 0 715901 5927 0 10 90 0 0 0 0 0 256197232 13532 700992 0 0 735 13 771161 5849 0 11 89 0 0 1 0 0 256233824 13532 703320 0 0 512 23 719650 6635 0 11 89 0 0 2 0 0 256226880 13532 705780 0 0 642 4 775650 6009 0 12 88 0 0 A:/# echo 2097152 >/proc/sys/net/ipv4/tcp_notsent_lowat A:/# grep TCP /proc/net/sockstat TCP: inuse 203 orphan 0 tw 19 alloc 414 mem 86411 # 3.5 MB per flow A:/# vmstat 5 5 # check that context switches have not inflated too much. procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu----- r b swpd free buff cache si so bi bo in cs us sy id wa st 2 0 0 260386512 13592 662148 0 0 10 0 17 14 0 1 99 0 0 0 0 0 260519680 13592 604184 0 0 512 13 726843 12424 0 10 90 0 0 1 1 0 260435424 13592 598360 0 0 512 25 764645 12925 0 10 90 0 0 1 0 0 260855392 13592 578380 0 0 512 7 722943 13624 0 11 88 0 0 1 0 0 260445008 13592 601176 0 0 614 34 772288 14317 0 10 90 0 0 Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Soheil Hassas Yeganeh <soheil@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-05Merge tag 'imx-drm-next-2018-12-03' of ↵Dave Airlie
git://git.pengutronix.de/git/pza/linux into drm-next drm/imx: update image-convert with fixes for multi-tiled scaling Update the ipu-v3 mem2mem image-convert code, with some fixes for race conditions, alignment issues, and visual artifacts due to tile alignment and scaling factor issues when scaling images larger than hardware limitations in multiple tiles. This will allow the V4L2 mem2mem scaler driver to write output images larger than 1024x1024 pixels. Also switch drm/imx source files to SPDX license identifiers, constify struct clk_ops in imx-tve, and add a timeout warning to the busy wait in ipu_plane_disable(). Signed-off-by: Dave Airlie <airlied@redhat.com> From: Philipp Zabel <p.zabel@pengutronix.de> Link: https://patchwork.freedesktop.org/patch/msgid/1543835266.5647.1.camel@pengutronix.de
2018-12-05soc: imx: gpcv2: add support for i.MX8MQ SoCLucas Stach
The GPCv2 on the Freescale i.MX8MQ SoC works in the same way as the GPCv2 on the i.MX7, but only controls more power domains with a different mapping. Signed-off-by: Lucas Stach <l.stach@pengutronix.de> Acked-by: Rob Herring <robh@kernel.org> Signed-off-by: Shawn Guo <shawnguo@kernel.org>
2018-12-04IB/mlx5: Update the supported DEVX commandsYishai Hadas
Update the supported DEVX commands, it includes adding to the query/modify command's list and to the encoding handling. In addition, a valid range for general commands was added to be used for future commands. Signed-off-by: Yishai Hadas <yishaih@mellanox.com> Reviewed-by: Artemy Kovalyov <artemyko@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-12-04IB/core: Enable getting an object type from a given uobjectYishai Hadas
Enable getting an object type from a given uobject, the type is saved upon tree merging and is returned as part of some helper function. Signed-off-by: Yishai Hadas <yishaih@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-12-04IB/core: Introduce UVERBS_IDR_ANY_OBJECTYishai Hadas
Introduce the UVERBS_IDR_ANY_OBJECT type to match any IDR object. Once used, the infrastructure skips checking for the IDR type, it becomes the driver handler responsibility. This enables drivers to get in a given method an object from various of types. Signed-off-by: Yishai Hadas <yishaih@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-12-04block: remove ->poll_fnChristoph Hellwig
This was intended to support users like nvme multipath, but is just getting in the way and adding another indirect call. Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-12-04block: move queues types to the block layerChristoph Hellwig
Having another indirect all in the fast path doesn't really help in our post-spectre world. Also having too many queue type is just going to create confusion, so I'd rather manage them centrally. Note that the queue type naming and ordering changes a bit - the first index now is the default queue for everything not explicitly marked, the optional ones are read and poll queues. Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-12-04Merge 'mlx5-next' into mlx5-devxDoug Ledford
The enhanced devx support series needs commit: 9d43faac02e3 ("net/mlx5: Update mlx5_ifc with DEVX UCTX capabilities bits") Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-12-04acpi/nfit: Add support for Intel DSM 1.8 commandsDave Jiang
Add command definition for security commands defined in Intel DSM specification v1.8 [1]. This includes "get security state", "set passphrase", "unlock unit", "freeze lock", "secure erase", "overwrite", "overwrite query", "master passphrase enable/disable", and "master erase", . Since this adds several Intel definitions, move the relevant bits to their own header. These commands mutate physical data, but that manipulation is not cache coherent. The requirement to flush and invalidate caches makes these commands unsuitable to be called from userspace, so extra logic is added to detect and block these commands from being submitted via the ioctl command submission path. Lastly, the commands may contain sensitive key material that should not be dumped in a standard debug session. Update the nvdimm-command payload-dump facility to move security command payloads behind a default-off compile time switch. [1]: http://pmem.io/documents/NVDIMM_DSM_Interface-V1.8.pdf Signed-off-by: Dave Jiang <dave.jiang@intel.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2018-12-04Merge branch 'topic/3wire-gpio' of ↵Mark Brown
https://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi into spi-4.21 mode conflict
2018-12-04Merge tag 'v4.20-rc5' into for-4.21/blockJens Axboe
Pull in v4.20-rc5, solving a conflict we'll otherwise get in aio.c and also getting the merge fix that went into mainline that users are hitting testing for-4.21/block and/or for-next. * tag 'v4.20-rc5': (664 commits) Linux 4.20-rc5 PCI: Fix incorrect value returned from pcie_get_speed_cap() MAINTAINERS: Update linux-mips mailing list address ocfs2: fix potential use after free mm/khugepaged: fix the xas_create_range() error path mm/khugepaged: collapse_shmem() do not crash on Compound mm/khugepaged: collapse_shmem() without freezing new_page mm/khugepaged: minor reorderings in collapse_shmem() mm/khugepaged: collapse_shmem() remember to clear holes mm/khugepaged: fix crashes due to misaccounted holes mm/khugepaged: collapse_shmem() stop if punched or truncated mm/huge_memory: fix lockdep complaint on 32-bit i_size_read() mm/huge_memory: splitting set mapping+index before unfreeze mm/huge_memory: rename freeze_page() to unmap_page() initramfs: clean old path before creating a hardlink kernel/kcov.c: mark funcs in __sanitizer_cov_trace_pc() as notrace psi: make disabling/enabling easier for vendor kernels proc: fixup map_files test on arm debugobjects: avoid recursive calls with kmemleak userfaultfd: shmem: UFFDIO_COPY: set the page dirty if VM_WRITE is not set ...
2018-12-04skbuff: Rename 'offload_mr_fwd_mark' to 'offload_l3_fwd_mark'Ido Schimmel
Commit abf4bb6b63d0 ("skbuff: Add the offload_mr_fwd_mark field") added the 'offload_mr_fwd_mark' field to indicate that a packet has already undergone L3 multicast routing by a capable device. The field is used to prevent the kernel from forwarding a packet through a netdev through which the device has already forwarded the packet. Currently, no unicast packet is routed by both the device and the kernel, but this is about to change by subsequent patches and we need to be able to mark such packets, so that they will no be forwarded twice. Instead of adding yet another field to 'struct sk_buff', we can just rename 'offload_mr_fwd_mark' to 'offload_l3_fwd_mark', as a packet either has a multicast or a unicast destination IP. While at it, add a comment about both 'offload_fwd_mark' and 'offload_l3_fwd_mark'. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-04bpf: respect size hint to BPF_PROG_TEST_RUN if presentLorenz Bauer
Use data_size_out as a size hint when copying test output to user space. ENOSPC is returned if the output buffer is too small. Callers which so far did not set data_size_out are not affected. Signed-off-by: Lorenz Bauer <lmb@cloudflare.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-12-04clk: renesas: r8a77995: Add missing CPEX clockGeert Uytterhoeven
The R-Car Gen3 HardWare Manual Errata for Rev. 0.80 (Feb 28, 2018) added the CPEX clock on R-Car D3. This clock can be selected as a clock source for CMT1 (Compare Match Timer Type 1). Add the missing clock to the DT bindings header, and implement support for it in the clock driver. Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be> Acked-by: Stephen Boyd <sboyd@kernel.org>
2018-12-04clk: renesas: r8a77995: Remove non-existent SSP clocksGeert Uytterhoeven
The R-Car Gen3 HardWare Manual Errata for Rev. 0.80 (Dec 22, 2017, and Feb 28, 2018) removed the SSPSRC, SSP1, and SSP2 clocks on R-Car D3, as this SoC does not have a Stream and Security Processor. As these definitions were never used, they can just be removed. The freed slots in the DT bindings header must not be reused, though. Fixes: 714c53aa2e2d6d60 ("clk: renesas: Add r8a77995 CPG Core Clock Definitions") Fixes: d71e851d82c6cfe5 ("clk: renesas: cpg-mssr: Add R8A77995 support") Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be> Acked-by: Stephen Boyd <sboyd@kernel.org>
2018-12-04dt-bindings: clock: r8a7796: Remove CSIREF clockGeert Uytterhoeven
The R-Car Gen3 HardWare Manual Errata for Rev. 0.52 (Nov 30, 2016) removed the CSI reference clock on R-Car M3-W. As this definition was never used, it can just be removed. The freed slot in the DT bindings header must not be reused, though. Fixes: 972610fb23b08dd5 ("clk: renesas: Add r8a7796 CPG Core Clock Definitions") Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be> Acked-by: Stephen Boyd <sboyd@kernel.org>
2018-12-04dt-bindings: clock: r8a7795: Remove CSIREF clockGeert Uytterhoeven
The R-Car Gen3 HardWare Manual Errata for Rev. 0.52 (Nov 30, 2016) removed the CSI reference clock on R-Car H3. As this definition was never used, it can just be removed. The freed slot in the DT bindings header must not be reused, though. Fixes: 9d0c3c682033d3f1 ("clk: shmobile: Add r8a7795 CPG Core Clock Definitions") Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be> Acked-by: Stephen Boyd <sboyd@kernel.org>
2018-12-04net/mlx5: Update mlx5_ifc with DEVX UCTX capabilities bitsYishai Hadas
Expose device capabilities for DEVX user context, it includes which caps the device is supported and a matching bit to set as part of user context creation. Signed-off-by: Yishai Hadas <yishaih@mellanox.com> Reviewed-by: Artemy Kovalyov <artemyko@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
2018-12-04dt-bindings: clock: Add Allwinner suniv F1C100s CCUMesih Kilinc
Add compatiple string for Allwinner suniv F1C100s CCU. Add clock and reset definitions. Signed-off-by: Mesih Kilinc <mesihkilinc@gmail.com> Acked-by: Maxime Ripard <maxime.ripard@bootlin.com> Acked-by: Stephen Boyd <sboyd@kernel.org> Signed-off-by: Maxime Ripard <maxime.ripard@bootlin.com>
2018-12-04RDMA/mlx5: Initialize SRQ tables on mlx5_ibLeon Romanovsky
Transfer initialization and cleanup from mlx5_priv struct of mlx5_core_dev to be part of mlx5_ib_dev. This completes removal of SRQ from mlx5_core. Reviewed-by: Mark Bloch <markb@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
2018-12-04net/mlx5: Move SRQ functions to RDMA partLeon Romanovsky
There is no need to keep SRQ which is RDMA object in mlx5_core. In this patch, we partially move the execution code, while next patches will move table initialization/release logic too. Reviewed-by: Mark Bloch <markb@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
2018-12-04net/mlx5: Remove dead transobj codeLeon Romanovsky
Delete functions which are not called and not needed. Reviewed-by: Mark Bloch <markb@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
2018-12-04net/mlx5: Align SRQ licenses and copyright informationLeon Romanovsky
Ensure that both RDMA and netdev parts of SRQ implementation has same copyright and license information annotated by SPDX tags. Reviewed-by: Mark Bloch <markb@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
2018-12-04Merge branch 'for-mingo' of ↵Ingo Molnar
git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu into core/rcu Pull RCU changes from Paul E. McKenney: - Convert RCU's BUG_ON() and similar calls to WARN_ON() and similar. - Replace calls of RCU-bh and RCU-sched update-side functions to their vanilla RCU counterparts. This series is a step towards complete removal of the RCU-bh and RCU-sched update-side functions. ( Note that some of these conversions are going upstream via their respective maintainers. ) - Documentation updates, including a number of flavor-consolidation updates from Joel Fernandes. - Miscellaneous fixes. - Automate generation of the initrd filesystem used for rcutorture testing. - Convert spin_is_locked() assertions to instead use lockdep. ( Note that some of these conversions are going upstream via their respective maintainers. ) - SRCU updates, especially including a fix from Dennis Krein for a bag-on-head-class bug. - RCU torture-test updates. Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-12-04jbd2: update locking documentation for transaction_tAlexander Lochmann
The following members of struct transaction_s aka transaction_t were turned into lock-free variables in the past: - t_updates - t_outstanding_credits - t_handle_count However, the documentation has not been updated yet. This commit replaced the annotated lock by [none]. Found by LockDoc (Alexander Lochmann, Horst Schirmeier and Olaf Spinczyk) Signed-off-by: Alexander Lochmann <alexander.lochmann@tu-dortmund.de> Signed-off-by: Horst Schirmeier <horst.schirmeier@tu-dortmund.de> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2018-12-03jbd2: avoid long hold times of j_state_lock while committing a transactionJan Kara
We can hold j_state_lock for writing at the beginning of jbd2_journal_commit_transaction() for a rather long time (reportedly for 30 ms) due cleaning revoke bits of all revoked buffers under it. The handling of revoke tables as well as cleaning of t_reserved_list, and checkpoint lists does not need j_state_lock for anything. It is only needed to prevent new handles from joining the transaction. Generally T_LOCKED transaction state prevents new handles from joining the transaction - except for reserved handles which have to allowed to join while we wait for other handles to complete. To prevent reserved handles from joining the transaction while cleaning up lists, add new transaction state T_SWITCH and watch for it when starting reserved handles. With this we can just drop the lock for operations that don't need it. Reported-and-tested-by: Adrian Hunter <adrian.hunter@intel.com> Suggested-by: "Theodore Y. Ts'o" <tytso@mit.edu> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2018-12-03pstore: Convert buf_lock to semaphoreKees Cook
Instead of running with interrupts disabled, use a semaphore. This should make it easier for backends that may need to sleep (e.g. EFI) when performing a write: |BUG: sleeping function called from invalid context at kernel/sched/completion.c:99 |in_atomic(): 1, irqs_disabled(): 1, pid: 2236, name: sig-xstate-bum |Preemption disabled at: |[<ffffffff99d60512>] pstore_dump+0x72/0x330 |CPU: 26 PID: 2236 Comm: sig-xstate-bum Tainted: G D 4.20.0-rc3 #45 |Call Trace: | dump_stack+0x4f/0x6a | ___might_sleep.cold.91+0xd3/0xe4 | __might_sleep+0x50/0x90 | wait_for_completion+0x32/0x130 | virt_efi_query_variable_info+0x14e/0x160 | efi_query_variable_store+0x51/0x1a0 | efivar_entry_set_safe+0xa3/0x1b0 | efi_pstore_write+0x109/0x140 | pstore_dump+0x11c/0x330 | kmsg_dump+0xa4/0xd0 | oops_exit+0x22/0x30 ... Reported-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Fixes: 21b3ddd39fee ("efi: Don't use spinlocks for efi vars") Signed-off-by: Kees Cook <keescook@chromium.org>
2018-12-03pstore: Map PSTORE_TYPE_* to stringsJoel Fernandes (Google)
In later patches we will need to map types to names, so create a constant table for that which can also be used in different parts of old and new code. This saves the type in the PRZ which will be useful in later patches. Instead of having an explicit PSTORE_TYPE_UNKNOWN, just use ..._MAX. This includes removing the now redundant filename templates which can use a single format string. Also, there's no reason to limit the "is it still compressed?" test to only PSTORE_TYPE_DMESG when building the pstorefs filename. Records are zero-initialized, so a backend would need to have explicitly set compressed=1. Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org> Co-developed-by: Kees Cook <keescook@chromium.org> Signed-off-by: Kees Cook <keescook@chromium.org>
2018-12-03pstore: Replace open-coded << with BIT()Kees Cook
Minor clean-up to use BIT() (as already done in pstore_ram.h). Signed-off-by: Kees Cook <keescook@chromium.org>
2018-12-03pstore: Improve and update some comments and status outputKees Cook
This improves and updates some comments: - dump handler comment out of sync from calling convention - fix kern-doc typo and improves status output: - reminder that only kernel crash dumps are compressed - do not be silent about ECC infrastructure failures Signed-off-by: Kees Cook <keescook@chromium.org>
2018-12-03pstore/ram: Add kern-doc for struct persistent_ram_zoneKees Cook
The struct persistent_ram_zone wasn't well documented. This adds kern-doc for it. Signed-off-by: Kees Cook <keescook@chromium.org>
2018-12-03pstore: Avoid duplicate call of persistent_ram_zap()Peng Wang
When initialing a prz, if invalid data is found (no PERSISTENT_RAM_SIG), the function call path looks like this: ramoops_init_prz -> persistent_ram_new -> persistent_ram_post_init -> persistent_ram_zap persistent_ram_zap As we can see, persistent_ram_zap() is called twice. We can avoid this by adding an option to persistent_ram_new(), and only call persistent_ram_zap() when it is needed. Signed-off-by: Peng Wang <wangpeng15@xiaomi.com> [kees: minor tweak to exit path and commit log] Signed-off-by: Kees Cook <keescook@chromium.org>
2018-12-03Merge branch 'for-linus/pstore' into for-next/pstoreKees Cook
2018-12-04netfilter: nf_tables: fix suspicious RCU usage in nft_chain_stats_replace()Taehee Yoo
basechain->stats is rcu protected data which is updated from nft_chain_stats_replace(). This function is executed from the commit phase which holds the pernet nf_tables commit mutex - not the global nfnetlink subsystem mutex. Test commands to reproduce the problem are: %iptables-nft -I INPUT %iptables-nft -Z %iptables-nft -Z This patch uses RCU calls to handle basechain->stats updates to fix a splat that looks like: [89279.358755] ============================= [89279.363656] WARNING: suspicious RCU usage [89279.368458] 4.20.0-rc2+ #44 Tainted: G W L [89279.374661] ----------------------------- [89279.379542] net/netfilter/nf_tables_api.c:1404 suspicious rcu_dereference_protected() usage! [...] [89279.406556] 1 lock held by iptables-nft/5225: [89279.411728] #0: 00000000bf45a000 (&net->nft.commit_mutex){+.+.}, at: nf_tables_valid_genid+0x1f/0x70 [nf_tables] [89279.424022] stack backtrace: [89279.429236] CPU: 0 PID: 5225 Comm: iptables-nft Tainted: G W L 4.20.0-rc2+ #44 [89279.430135] Call Trace: [89279.430135] dump_stack+0xc9/0x16b [89279.430135] ? show_regs_print_info+0x5/0x5 [89279.430135] ? lockdep_rcu_suspicious+0x117/0x160 [89279.430135] nft_chain_commit_update+0x4ea/0x640 [nf_tables] [89279.430135] ? sched_clock_local+0xd4/0x140 [89279.430135] ? check_flags.part.35+0x440/0x440 [89279.430135] ? __rhashtable_remove_fast.constprop.67+0xec0/0xec0 [nf_tables] [89279.430135] ? sched_clock_cpu+0x126/0x170 [89279.430135] ? find_held_lock+0x39/0x1c0 [89279.430135] ? hlock_class+0x140/0x140 [89279.430135] ? is_bpf_text_address+0x5/0xf0 [89279.430135] ? check_flags.part.35+0x440/0x440 [89279.430135] ? __lock_is_held+0xb4/0x140 [89279.430135] nf_tables_commit+0x2555/0x39c0 [nf_tables] Fixes: f102d66b335a4 ("netfilter: nf_tables: use dedicated mutex to guard transactions") Signed-off-by: Taehee Yoo <ap420073@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-12-03sbitmap: fix sbitmap_for_each_set()Omar Sandoval
We need to ignore bits in the cleared mask when iterating over all set bits. Fixes: ea86ea2cdced ("sbitmap: ammortize cost of clearing bits") Reported-by: Jens Axboe@kernel.dk> Signed-off-by: Omar Sandoval <osandov@fb.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-12-03udp: elide zerocopy operation in hot pathWillem de Bruijn
With MSG_ZEROCOPY, each skb holds a reference to a struct ubuf_info. Release of its last reference triggers a completion notification. The TCP stack in tcp_sendmsg_locked holds an extra ref independent of the skbs, because it can build, send and free skbs within its loop, possibly reaching refcount zero and freeing the ubuf_info too soon. The UDP stack currently also takes this extra ref, but does not need it as all skbs are sent after return from __ip(6)_append_data. Avoid the extra refcount_inc and refcount_dec_and_test, and generally the sock_zerocopy_put in the common path, by passing the initial reference to the first skb. This approach is taken instead of initializing the refcount to 0, as that would generate error "refcount_t: increment on 0" on the next skb_zcopy_set. Changes v3 -> v4 - Move skb_zcopy_set below the only kfree_skb that might cause a premature uarg destroy before skb_zerocopy_put_abort - Move the entire skb_shinfo assignment block, to keep that cacheline access in one place Signed-off-by: Willem de Bruijn <willemb@google.com> Acked-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-03udp: msg_zerocopyWillem de Bruijn
Extend zerocopy to udp sockets. Allow setting sockopt SO_ZEROCOPY and interpret flag MSG_ZEROCOPY. This patch was previously part of the zerocopy RFC patchsets. Zerocopy is not effective at small MTU. With segmentation offload building larger datagrams, the benefit of page flipping outweights the cost of generating a completion notification. tools/testing/selftests/net/msg_zerocopy.sh after applying follow-on test patch and making skb_orphan_frags_rx same as skb_orphan_frags: ipv4 udp -t 1 tx=191312 (11938 MB) txc=0 zc=n rx=191312 (11938 MB) ipv4 udp -z -t 1 tx=304507 (19002 MB) txc=304507 zc=y rx=304507 (19002 MB) ok ipv6 udp -t 1 tx=174485 (10888 MB) txc=0 zc=n rx=174485 (10888 MB) ipv6 udp -z -t 1 tx=294801 (18396 MB) txc=294801 zc=y rx=294801 (18396 MB) ok Changes v1 -> v2 - Fixup reverse christmas tree violation v2 -> v3 - Split refcount avoidance optimization into separate patch - Fix refcount leak on error in fragmented case (thanks to Paolo Abeni for pointing this one out!) - Fix refcount inc on zero - Test sock_flag SOCK_ZEROCOPY directly in __ip_append_data. This is needed since commit 5cf4a8532c99 ("tcp: really ignore MSG_ZEROCOPY if no SO_ZEROCOPY") did the same for tcp. Signed-off-by: Willem de Bruijn <willemb@google.com> Acked-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-03sctp: kfree_rcu asocXin Long
In sctp_hash_transport/sctp_epaddr_lookup_transport, it dereferences a transport's asoc under rcu_read_lock while asoc is freed not after a grace period, which leads to a use-after-free panic. This patch fixes it by calling kfree_rcu to make asoc be freed after a grace period. Note that only the asoc's memory is delayed to free in the patch, it won't cause sk to linger longer. Thanks Neil and Marcelo to make this clear. Fixes: 7fda702f9315 ("sctp: use new rhlist interface on sctp transport rhashtable") Fixes: cd2b70875058 ("sctp: check duplicate node before inserting a new transport") Reported-by: syzbot+0b05d8aa7cb185107483@syzkaller.appspotmail.com Reported-by: syzbot+aad231d51b1923158444@syzkaller.appspotmail.com Suggested-by: Neil Horman <nhorman@tuxdriver.com> Signed-off-by: Xin Long <lucien.xin@gmail.com> Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Acked-by: Neil Horman <nhorman@tuxdriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-03Merge tag 'wireless-drivers-next-for-davem-2018-11-30' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/kvalo/wireless-drivers-next Kalle Valo says: ==================== wireless-drivers-next patches for 4.21 First set of patches for 4.21. Most notable here is support for Quantenna's QSR1000/QSR2000 chipsets and more flexible ways to provide nvram files for brcmfmac. Major changes: brcmfmac * add support for first trying to get a board specific nvram file * add support for getting nvram contents from EFI variables qtnfmac * use single PCIe driver for all platforms and rename Kconfig option CONFIG_QTNFMAC_PEARL_PCIE to CONFIG_QTNFMAC_PCIE * add support for QSR1000/QSR2000 (Topaz) family of chipsets ath10k * add support for WCN3990 firmware crash recovery * add firmware memory dump support for QCA4019 wil6210 * add firmware error recovery while in AP mode ath9k * remove experimental notice from dynack feature iwlwifi * PCI IDs for some new 9000-series cards * improve antenna usage on connection problems * new firmware debugging infrastructure * some more work on 802.11ax * improve support for multiple RF modules with 22000 devices cordic * move cordic macros and defines to a public header file * convert brcmsmac and b43 to fully use cordic library ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-03of: net: kill of_get_nvmem_mac_address()Bartosz Golaszewski
We've switched all users to nvmem_get_mac_address(). Remove the now dead code. Signed-off-by: Bartosz Golaszewski <bgolaszewski@baylibre.com> Reviewed-by: Rob Herring <robh@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-03net: ethernet: provide nvmem_get_mac_address()Bartosz Golaszewski
We already have of_get_nvmem_mac_address() but some non-DT systems want to read the MAC address from NVMEM too. Implement a generalized routine that takes struct device as argument. Signed-off-by: Bartosz Golaszewski <bgolaszewski@baylibre.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-03rhashtable: detect when object movement between tables might have ↵NeilBrown
invalidated a lookup Some users of rhashtables might need to move an object from one table to another - this appears to be the reason for the incomplete usage of NULLS markers. To support these, we store a unique NULLS_MARKER at the end of each chain, and when a search fails to find a match, we check if the NULLS marker found was the expected one. If not, the search may not have examined all objects in the target bucket, so it is repeated. The unique NULLS_MARKER is derived from the address of the head of the chain. As this cannot be derived at load-time the static rhnull in rht_bucket_nested() needs to be initialised at run time. Any caller of a lookup function must still be prepared for the possibility that the object returned is in a different table - it might have been there for some time. Note that this does NOT provide support for other uses of NULLS_MARKERs such as allocating with SLAB_TYPESAFE_BY_RCU or changing the key of an object and re-inserting it in the same table. These could only be done safely if new objects were inserted at the *start* of a hash chain, and that is not currently the case. Signed-off-by: NeilBrown <neilb@suse.com> Acked-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-03l3mdev: add function to retreive upper masterAlexis Bauvin
Existing functions to retreive the l3mdev of a device did not walk the master chain to find the upper master. This patch adds a function to find the l3mdev, even indirect through e.g. a bridge: +----------+ | | | vrf-blue | | | +----+-----+ | | +----+-----+ | | | br-blue | | | +----+-----+ | | +----+-----+ | | | eth0 | | | +----------+ This will properly resolve the l3mdev of eth0 to vrf-blue. Signed-off-by: Alexis Bauvin <abauvin@scaleway.com> Reviewed-by: Amine Kherbouche <akherbouche@scaleway.com> Reviewed-by: David Ahern <dsahern@gmail.com> Tested-by: Amine Kherbouche <akherbouche@scaleway.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-03udp_tunnel: add config option to bind to a deviceAlexis Bauvin
UDP tunnel sockets are always opened unbound to a specific device. This patch allow the socket to be bound on a custom device, which incidentally makes UDP tunnels VRF-aware if binding to an l3mdev. Signed-off-by: Alexis Bauvin <abauvin@scaleway.com> Reviewed-by: Amine Kherbouche <akherbouche@scaleway.com> Tested-by: Amine Kherbouche <akherbouche@scaleway.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-03devlink: Add 'fw_load_policy' generic parameterShalom Toledo
Many drivers load the device's firmware image during the initialization flow either from the flash or from the disk. Currently this option is not controlled by the user and the driver decides from where to load the firmware image. 'fw_load_policy' gives the ability to control this option which allows the user to choose between different loading policies supported by the driver. This parameter can be useful while testing and/or debugging the device. For example, testing a firmware bug fix. Signed-off-by: Shalom Toledo <shalomt@mellanox.com> Reviewed-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-03dmaengine: pxa: make the filter function internalRobert Jarzmik
As the pxa architecture and all its related drivers do not rely anymore on the filter function, thanks to the slave map conversion, make pxad_filter_fn() static, and remove it from the global namespace. Signed-off-by: Robert Jarzmik <robert.jarzmik@free.fr> Acked-by: Vinod Koul <vkoul@kernel.org>
2018-12-03Merge tag 'qcom-drivers-for-4.21' of ↵Olof Johansson
git://git.kernel.org/pub/scm/linux/kernel/git/agross/linux into next/drivers Qualcomm ARM Based Driver Updates for v4.21 * Fix llcc license, includes, and error checks * Remove use of memcpy in cmd-db and fix API breakage * Add QCS404 compatible to SMD-RPM * Minor fixes for QMI * Add irq clear handling in QCOM Geni SE during init * tag 'qcom-drivers-for-4.21' of git://git.kernel.org/pub/scm/linux/kernel/git/agross/linux: drm: msm: Check cmd_db_read_aux_data() for failure soc: qcom: smd-rpm: Add QCS404 compatible soc: qcom: llcc-slice: Remove duplicated include from llcc-slice.c soc: qcom: cmd-db: Stop memcpy()ing in cmd_db_read_aux_data() soc: qcom: cmd-db: Remove memcpy()ing from cmd_db_get_header() soc: qcom: Drop help text for QCOM_QMI_HELPERS soc: qcom: qmi_interface: Limit txn ids to U16_MAX soc: qcom: llcc-slice: Add error checks for API functions soc: qcom/llcc: add MODULE_LICENSE tag soc: qcom: Add irq clear handling during SE init Signed-off-by: Olof Johansson <olof@lixom.net>
2018-12-03Merge tag 'arm-soc/for-4.21/drivers' of https://github.com/Broadcom/stblinux ↵Olof Johansson
into next/drivers This pull request contains Broadcom ARM/ARM64/MIPS SoCs drivers changes for 4.21, please pull the following changes: - James fixes the firmware interface after a commit changed the use of VLA and broke large transfers - Stefan adds a timeout check for Raspberry Pi firmware transactions and updates a bunch of SoC/firmware files to use SPDX tags - Wolfram switches the GISB bus arbiter to use dev_get_drvdata() - Yangtao provides a fix for a reference leak due to a call to of_find_node_by_path() - Florian fixes the CPU re-entry point out of S3 suspend with kernels built in Thumb2 mode * tag 'arm-soc/for-4.21/drivers' of https://github.com/Broadcom/stblinux: soc: bcm: brcmstb: Don't leak device tree node reference firmware: raspberrypi: Switch to SPDX identifier firmware: raspberrypi: Fix firmware calls with large buffers soc: bcm: Switch raspberrypi-power to SPDX identifier firmware: raspberrypi: Define timeout for transactions bus: brcmstb_gisb: simplify getting .driver_data soc: bcm: brcmstb: Fix re-entry point with a THUMB2_KERNEL Signed-off-by: Olof Johansson <olof@lixom.net>
2018-12-03bpf: fix documentation for eBPF helpersQuentin Monnet
The missing indentation on the "Return" sections for bpf_map_pop_elem() and bpf_map_peek_elem() helpers break RST and man pages generation. This patch fixes them, and moves the description of those two helpers towards the end of the list (even though they are somehow related to the three first helpers for maps, the man page explicitly states that the helpers are sorted in chronological order). While at it, bring other minor formatting edits for eBPF helpers documentation: mostly blank lines removal, RST formatting, or other small nits for consistency. Signed-off-by: Quentin Monnet <quentin.monnet@netronome.com> Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>