summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2021-02-13skbuff: introduce {,__}napi_build_skb() which reuses NAPI cache headsAlexander Lobakin
Instead of just bulk-flushing skbuff_heads queued up through napi_consume_skb() or __kfree_skb_defer(), try to reuse them on allocation path. If the cache is empty on allocation, bulk-allocate the first 16 elements, which is more efficient than per-skb allocation. If the cache is full on freeing, bulk-wipe the second half of the cache (32 elements). This also includes custom KASAN poisoning/unpoisoning to be double sure there are no use-after-free cases. To not change current behaviour, introduce a new function, napi_build_skb(), to optionally use a new approach later in drivers. Note on selected bulk size, 16: - this equals to XDP_BULK_QUEUE_SIZE, DEV_MAP_BULK_SIZE and especially VETH_XDP_BATCH, which is also used to bulk-allocate skbuff_heads and was tested on powerful setups; - this also showed the best performance in the actual test series (from the array of {8, 16, 32}). Suggested-by: Edward Cree <ecree.xilinx@gmail.com> # Divide on two halves Suggested-by: Eric Dumazet <edumazet@google.com> # KASAN poisoning Cc: Dmitry Vyukov <dvyukov@google.com> # Help with KASAN Cc: Paolo Abeni <pabeni@redhat.com> # Reduced batch size Signed-off-by: Alexander Lobakin <alobakin@pm.me> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-02-13skbuff: move NAPI cache declarations upper in the fileAlexander Lobakin
NAPI cache structures will be used for allocating skbuff_heads, so move their declarations a bit upper. Signed-off-by: Alexander Lobakin <alobakin@pm.me> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-02-13skbuff: remove __kfree_skb_flush()Alexander Lobakin
This function isn't much needed as NAPI skb queue gets bulk-freed anyway when there's no more room, and even may reduce the efficiency of bulk operations. It will be even less needed after reusing skb cache on allocation path, so remove it and this way lighten network softirqs a bit. Suggested-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Alexander Lobakin <alobakin@pm.me> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-02-13skbuff: use __build_skb_around() in __alloc_skb()Alexander Lobakin
Just call __build_skb_around() instead of open-coding it. Signed-off-by: Alexander Lobakin <alobakin@pm.me> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-02-13skbuff: simplify __alloc_skb() a bitAlexander Lobakin
Use unlikely() annotations for skbuff_head and data similarly to the two other allocation functions and remove totally redundant goto. Signed-off-by: Alexander Lobakin <alobakin@pm.me> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-02-13skbuff: make __build_skb_around() return voidAlexander Lobakin
__build_skb_around() can never fail and always returns passed skb. Make it return void to simplify and optimize the code. Signed-off-by: Alexander Lobakin <alobakin@pm.me> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-02-13skbuff: simplify kmalloc_reserve()Alexander Lobakin
Eversince the introduction of __kmalloc_reserve(), "ip" argument hasn't been used. _RET_IP_ is embedded inside kmalloc_node_track_caller(). Remove the redundant macro and rename the function after it. Signed-off-by: Alexander Lobakin <alobakin@pm.me> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-02-13skbuff: move __alloc_skb() next to the other skb allocation functionsAlexander Lobakin
In preparation before reusing several functions in all three skb allocation variants, move __alloc_skb() next to the __netdev_alloc_skb() and __napi_alloc_skb(). No functional changes. Signed-off-by: Alexander Lobakin <alobakin@pm.me> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-02-13Merge tag 'clk-fixes-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux Pull clk fix from Stephen Boyd: "One small fix for the Allwinner clk driver so that display clks figure out the correct rate to use. This fixes displays running 4k@60Hz and some other resolutions that haven't been exercised and fully understood until now" * tag 'clk-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux: clk: sunxi-ng: mp: fix parent rate change flag check
2021-02-13Merge tag 'scsi-fixes' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi Pull SCSI fix from James Bottomley: "One fix for scsi_debug that fixes a memory leak on module removal" * tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: scsi: scsi_debug: Fix a memory leak
2021-02-13staging: hikey9xx: Fix alignment of function parametersMukul Mehar
This patch fixes the following checkpatch.pl check: CHECK: Alignment should match open parenthesis Signed-off-by: Mukul Mehar <mukulmehar02@gmail.com> Link: https://lore.kernel.org/r/20210213120556.73579-1-mukulmehar02@gmail.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-02-13staging: greybus: Fixed a misspelling in hid.cPritthijit Nath
Fixed the spelling of 'transfered' to 'transferred'. Signed-off-by: Pritthijit Nath <pritthijit.nath@icloud.com> Link: https://lore.kernel.org/r/20210212101324.12391-1-pritthijit.nath@icloud.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-02-13staging: wimax/i2400m: fix some byte order issues found by sparseAnirudh Rayabharam
Fix sparse byte-order warnings in the i2400m_bm_cmd_prepare() function: wimax/i2400m/fw.c:194:36: warning: restricted __le32 degrades to integer wimax/i2400m/fw.c:195:34: warning: invalid assignment: += wimax/i2400m/fw.c:195:34: left side has type unsigned int wimax/i2400m/fw.c:195:34: right side has type restricted __le32 wimax/i2400m/fw.c:196:32: warning: restricted __le32 degrades to integer wimax/i2400m/fw.c:196:47: warning: restricted __le32 degrades to integer wimax/i2400m/fw.c:196:66: warning: restricted __le32 degrades to integer Signed-off-by: Anirudh Rayabharam <mail@anirudhrb.com> Link: https://lore.kernel.org/r/20210212153843.8554-1-mail@anirudhrb.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-02-13staging: wimax: i2400m: fix some incorrect type warningsAyush
Fix some "incorrect type in assignment" warnings reported by sparse in tx.c sparse warnings: wimax/i2400m/tx.c:788:35: warning: cast to restricted __le16 wimax/i2400m/tx.c:788:33: warning: incorrect type in assignment (different base types) wimax/i2400m/tx.c:788:33: expected restricted __le16 [usertype] num_pls wimax/i2400m/tx.c:788:33: got unsigned short [usertype] wimax/i2400m/tx.c:896:32: warning: cast to restricted __le32 wimax/i2400m/tx.c:896:30: warning: incorrect type in assignment (different base types) wimax/i2400m/tx.c:896:30: expected restricted __le32 [usertype] barker wimax/i2400m/tx.c:896:30: got unsigned int [usertype] wimax/i2400m/tx.c:897:34: warning: cast to restricted __le32 wimax/i2400m/tx.c:897:32: warning: incorrect type in assignment (different base types) wimax/i2400m/tx.c:897:32: expected restricted __le32 [usertype] sequence wimax/i2400m/tx.c:897:32: got unsigned int [usertype] wimax/i2400m/tx.c:899:15: warning: cast to restricted __le32 wimax/i2400m/tx.c:899:15: warning: cast from restricted __le16 Signed-off-by: Ayush <ayush@disroot.org> Link: https://lore.kernel.org/r/20210212213628.801642-1-ayush@disroot.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-02-13staging: greybus: minor code style fixManikantan Ravichandran
checkpatch warning fix for string split across lines Signed-off-by: Manikantan Ravichandran <ravman1991@gmail.com> Link: https://lore.kernel.org/r/20210212225035.GA16260@whach Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-02-13staging:wlan-ng: use memdup_user instead of kmalloc/copy_from_userIvan Safonov
memdup_user() is shorter and safer equivalent of kmalloc/copy_from_user pair. Signed-off-by: Ivan Safonov <insafonov@gmail.com> Link: https://lore.kernel.org/r/20210213120527.451531-1-insafonov@gmail.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-02-13staging:r8188eu: use IEEE80211_FCTL_* kernel definitionsIvan Safonov
_TO_DS_, _FROM_DS_, _MORE_FRAG_, _RETRY_, _PWRMGT_, _MORE_DATA_, _PRIVACY_, _ORDER_ definitions are duplicate IEEE80211_FCTL_* kernel definitions. Signed-off-by: Ivan Safonov <insafonov@gmail.com> Link: https://lore.kernel.org/r/20210213131148.458582-1-insafonov@gmail.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-02-13staging: rtl8192e: remove multiple blank linesWilliam Durand
This patch removes some blank lines in order to fix a checkpatch issue. Signed-off-by: William Durand <will+git@drnd.me> Link: https://lore.kernel.org/r/20210213034711.14823-1-will+git@drnd.me Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-02-13Merge branch 'for-5.11-fixes' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup Pull cgroup fixes from Tejun Heo: "Two cgroup fixes: - fix a NULL deref when trying to poll PSI in the root cgroup - fix confusing controller parsing corner case when mounting cgroup v1 hierarchies And doc / maintainer file updates" * 'for-5.11-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup: cgroup: update PSI file description in docs cgroup: fix psi monitor for root cgroup MAINTAINERS: Update my email address MAINTAINERS: Remove stale URLs for cpuset cgroup-v1: add disabled controller check in cgroup1_parse_param()
2021-02-13Merge branch 'akpm' (patches from Andrew)Linus Torvalds
Merge fixes from Andrew Morton: "6 patches. Subsystems affected by this patch series: mm/pagemap, scripts, MAINTAINERS, and h8300" * emailed patches from Andrew Morton <akpm@linux-foundation.org>: h8300: fix PREEMPTION build, TI_PRE_COUNT undefined MAINTAINERS: add Andrey Konovalov to KASAN reviewers MAINTAINERS: update Andrey Konovalov's email address MAINTAINERS: update KASAN file list scripts/recordmcount.pl: support big endian for ARCH sh m68k: make __pfn_to_phys() and __phys_to_pfn() available for !MMU
2021-02-13Merge branch 'i2c/for-current' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux Pull i2c fix from Wolfram Sang: "One more I2C driver bugfix" * 'i2c/for-current' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux: i2c: stm32f7: fix configuration of the digital filter
2021-02-13Merge tag 'for-5.11-rc7-tag' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux Pull btrfs fix from David Sterba: "A regression fix caused by a refactoring in 5.11. A corrupted superblock wouldn't be detected by checksum verification due to wrongly placed initialization of the checksum length, thus making memcmp always work" * tag 'for-5.11-rc7-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux: btrfs: initialize fs_info::csum_size earlier in open_ctree
2021-02-13h8300: fix PREEMPTION build, TI_PRE_COUNT undefinedRandy Dunlap
Fix a build error for undefined 'TI_PRE_COUNT' by adding it to asm-offsets.c. h8300-linux-ld: arch/h8300/kernel/entry.o: in function `resume_kernel': (.text+0x29a): undefined reference to `TI_PRE_COUNT' Link: https://lkml.kernel.org/r/20210212021650.22740-1-rdunlap@infradead.org Fixes: df2078b8daa7 ("h8300: Low level entry") Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Reported-by: kernel test robot <lkp@intel.com> Cc: Yoshinori Sato <ysato@users.sourceforge.jp> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-02-13MAINTAINERS: add Andrey Konovalov to KASAN reviewersAndrey Konovalov
Add my personal email address to KASAN reviewers list. Link: https://lkml.kernel.org/r/c1ce89a7aae0e2d6852249c280b1eb59aeac30c0.1613150186.git.andreyknvl@google.com Signed-off-by: Andrey Konovalov <andreyknvl@google.com> Cc: Alexander Potapenko <glider@google.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Marco Elver <elver@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-02-13MAINTAINERS: update Andrey Konovalov's email addressAndrey Konovalov
Use my personal email address. Link: https://lkml.kernel.org/r/b0ec98dabbc12336c162788f5ccde97045a0d65e.1613150186.git.andreyknvl@google.com Signed-off-by: Andrey Konovalov <andreyknvl@google.com> Cc: Alexander Potapenko <glider@google.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Marco Elver <elver@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-02-13MAINTAINERS: update KASAN file listAndrey Konovalov
Account for the following files: - lib/Kconfig.kasan - lib/test_kasan_module.c - arch/arm64/include/asm/mte-kasan.h Link: https://lkml.kernel.org/r/7f9771d97b34d396bfdc4e288ad93486bb865a06.1613150186.git.andreyknvl@google.com Signed-off-by: Andrey Konovalov <andreyknvl@google.com> Cc: Alexander Potapenko <glider@google.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Marco Elver <elver@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-02-13scripts/recordmcount.pl: support big endian for ARCH shRong Chen
The kernel test robot reported the following issue: CC [M] drivers/soc/litex/litex_soc_ctrl.o sh4-linux-objcopy: Unable to change endianness of input file(s) sh4-linux-ld: cannot find drivers/soc/litex/.tmp_gl_litex_soc_ctrl.o: No such file or directory sh4-linux-objcopy: 'drivers/soc/litex/.tmp_mx_litex_soc_ctrl.o': No such file The problem is that the format of input file is elf32-shbig-linux, but sh4-linux-objcopy wants to output a file which format is elf32-sh-linux: $ sh4-linux-objdump -d drivers/soc/litex/litex_soc_ctrl.o | grep format drivers/soc/litex/litex_soc_ctrl.o: file format elf32-shbig-linux Link: https://lkml.kernel.org/r/20210210150435.2171567-1-rong.a.chen@intel.com Link: https://lore.kernel.org/linux-mm/202101261118.GbbYSlHu-lkp@intel.com Signed-off-by: Rong Chen <rong.a.chen@intel.com> Reported-by: kernel test robot <lkp@intel.com> Cc: Yoshinori Sato <ysato@users.osdn.me> Cc: Rich Felker <dalias@libc.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-02-13m68k: make __pfn_to_phys() and __phys_to_pfn() available for !MMUMike Rapoport
Recent changes that obsoleted DISCONTIGMEM on m68k switched the MMU variant to use generic definitions of __pfn_to_phys() and __phys_to_pfn(), but missed the !MMU variant which caused a build failure: drivers/media/common/videobuf2/videobuf2-dma-contig.c: In function 'vb2_dc_get_userptr': drivers/media/common/videobuf2/videobuf2-dma-contig.c:509:5: error: implicit declaration of function '__pfn_to_phys' [-Werror=implicit-function-declaration] 509 | __pfn_to_phys(nums[0]), size, buf->dma_dir, 0); | ^~~~~~~~~~~~~ cc1: some warnings being treated as errors Enable __pfn_to_phys() and __phys_to_pfn() on !MMU builds. Link: https://lkml.kernel.org/r/20210211232202.GS299309@linux.ibm.com Fixes: 4bfc848e0981 ("m68k/mm: enable use of generic memory_model.h for !DISCONTIGMEM") Signed-off-by: Mike Rapoport <rppt@linux.ibm.com> Reported-by: kernel test robot <lkp@intel.com> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Greg Ungerer <gerg@linux-m68k.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-02-12Merge branch 'Xilinx-axienet-updates'David S. Miller
Robert Hancock says: ==================== Xilinx axienet updates Updates to the Xilinx AXI Ethernet driver to add support for an additional ethtool operation, and to support dynamic switching between 1000BaseX and SGMII interface modes. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2021-02-12net: axienet: Support dynamic switching between 1000BaseX and SGMIIRobert Hancock
Newer versions of the Xilinx AXI Ethernet core (specifically version 7.2 or later) allow the core to be configured with a PHY interface mode of "Both", allowing either 1000BaseX or SGMII modes to be selected at runtime. Add support for this in the driver to allow better support for applications which can use both fiber and copper SFP modules. Signed-off-by: Robert Hancock <robert.hancock@calian.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-02-12dt-bindings: net: xilinx_axienet: add xlnx,switch-x-sgmii attributeRobert Hancock
Document the new xlnx,switch-x-sgmii attribute which is used to indicate that the Ethernet core supports dynamic switching between 1000BaseX and SGMII. Signed-off-by: Robert Hancock <robert.hancock@calian.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-02-12net: axienet: hook up nway_reset ethtool operationRobert Hancock
Hook up the nway_reset ethtool operation to the corresponding phylink function so that "ethtool -r" can be supported. Signed-off-by: Robert Hancock <robert.hancock@calian.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-02-12Merge branch 'Add support of pointer to struct in global'Alexei Starovoitov
Dmitrii Banshchikov says: ==================== This patchset adds support of pointers to type with known size among global function arguments. The motivation is to overcome the limit on the maximum number of allowed arguments and avoid tricky and unoptimal ways of passing arguments. A referenced type may contain pointers but access via such pointers cannot be veirified currently. v2 -> v3 - Fix reg ID generation - Fix commit description - Fix typo - Fix tests v1 -> v2: - Allow pointer to any type with known size rather than struct only - Allow pointer in global functions only - Add more tests - Fix wrapping and v1 comments ==================== Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2021-02-12selftests/bpf: Add unit tests for pointers in global functionsDmitrii Banshchikov
test_global_func9 - check valid pointer's scenarios test_global_func10 - check that a smaller type cannot be passed as a larger one test_global_func11 - check that CTX pointer cannot be passed test_global_func12 - check access to a null pointer test_global_func13 - check access to an arbitrary pointer value test_global_func14 - check that an opaque pointer cannot be passed test_global_func15 - check that a variable has an unknown value after it was passed to a global function by pointer test_global_func16 - check access to uninitialized stack memory test_global_func_args - check read and write operations through a pointer Signed-off-by: Dmitrii Banshchikov <me@ubique.spb.ru> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20210212205642.620788-5-me@ubique.spb.ru
2021-02-12bpf: Support pointers in global func argsDmitrii Banshchikov
Add an ability to pass a pointer to a type with known size in arguments of a global function. Such pointers may be used to overcome the limit on the maximum number of arguments, avoid expensive and tricky workarounds and to have multiple output arguments. A referenced type may contain pointers but indirect access through them isn't supported. The implementation consists of two parts. If a global function has an argument that is a pointer to a type with known size then: 1) In btf_check_func_arg_match(): check that the corresponding register points to NULL or to a valid memory region that is large enough to contain the expected argument's type. 2) In btf_prepare_func_args(): set the corresponding register type to PTR_TO_MEM_OR_NULL and its size to the size of the expected type. Only global functions are supported because allowance of pointers for static functions might break validation. Consider the following scenario. A static function has a pointer argument. A caller passes pointer to its stack memory. Because the callee can change referenced memory verifier cannot longer assume any particular slot type of the caller's stack memory hence the slot type is changed to SLOT_MISC. If there is an operation that relies on slot type other than SLOT_MISC then verifier won't be able to infer safety of the operation. When verifier sees a static function that has a pointer argument different from PTR_TO_CTX then it skips arguments check and continues with "inline" validation with more information available. The operation that relies on the particular slot type now succeeds. Because global functions were not allowed to have pointer arguments different from PTR_TO_CTX it's not possible to break existing and valid code. Signed-off-by: Dmitrii Banshchikov <me@ubique.spb.ru> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20210212205642.620788-4-me@ubique.spb.ru
2021-02-12bpf: Extract nullable reg type conversion into a helper functionDmitrii Banshchikov
Extract conversion from a register's nullable type to a type with a value. The helper will be used in mark_ptr_not_null_reg(). Signed-off-by: Dmitrii Banshchikov <me@ubique.spb.ru> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20210212205642.620788-3-me@ubique.spb.ru
2021-02-12bpf: Rename bpf_reg_state variablesDmitrii Banshchikov
Using "reg" for an array of bpf_reg_state and "reg[i + 1]" for an individual bpf_reg_state is error-prone and verbose. Use "regs" for the former and "reg" for the latter as other code nearby does. Signed-off-by: Dmitrii Banshchikov <me@ubique.spb.ru> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20210212205642.620788-2-me@ubique.spb.ru
2021-02-12net: axienet: Handle deferred probe on clock properlyRobert Hancock
This driver is set up to use a clock mapping in the device tree if it is present, but still work without one for backward compatibility. However, if getting the clock returns -EPROBE_DEFER, then we need to abort and return that error from our driver initialization so that the probe can be retried later after the clock is set up. Move clock initialization to earlier in the process so we do not waste as much effort if the clock is not yet available. Switch to use devm_clk_get_optional and abort initialization on any error reported. Also enable the clock regardless of whether the controller is using an MDIO bus, as the clock is required in any case. Fixes: 09a0354cadec267be7f ("net: axienet: Use clock framework to get device clock rate") Signed-off-by: Robert Hancock <robert.hancock@calian.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-02-12Merge branch 'tcp-mem-pressure-vs-SO_RCVLOWAT'David S. Miller
Eric Dumazet says: ==================== tcp: mem pressure vs SO_RCVLOWAT First patch fixes an issue for applications using SO_RCVLOWAT to reduce context switches. Second patch is a cleanup. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2021-02-12tcp: factorize logic into tcp_epollin_ready()Eric Dumazet
Both tcp_data_ready() and tcp_stream_is_readable() share the same logic. Add tcp_epollin_ready() helper to avoid duplication. Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Arjun Roy <arjunroy@google.com> Cc: Wei Wang <weiwan@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-02-12tcp: fix SO_RCVLOWAT related hangs under mem pressureEric Dumazet
While commit 24adbc1676af ("tcp: fix SO_RCVLOWAT hangs with fat skbs") fixed an issue vs too small sk_rcvbuf for given sk_rcvlowat constraint, it missed to address issue caused by memory pressure. 1) If we are under memory pressure and socket receive queue is empty. First incoming packet is allowed to be queued, after commit 76dfa6082032 ("tcp: allow one skb to be received per socket under memory pressure") But we do not send EPOLLIN yet, in case tcp_data_ready() sees sk_rcvlowat is bigger than skb length. 2) Then, when next packet comes, it is dropped, and we directly call sk->sk_data_ready(). 3) If application is using poll(), tcp_poll() will then use tcp_stream_is_readable() and decide the socket receive queue is not yet filled, so nothing will happen. Even when sender retransmits packets, phases 2) & 3) repeat and flow is effectively frozen, until memory pressure is off. Fix is to consider tcp_under_memory_pressure() to take care of global memory pressure or memcg pressure. Fixes: 24adbc1676af ("tcp: fix SO_RCVLOWAT hangs with fat skbs") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: Arjun Roy <arjunroy@google.com> Suggested-by: Wei Wang <weiwan@google.com> Reviewed-by: Wei Wang <weiwan@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-02-12Merge branch '40GbE' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue Tony Nguyen says: ==================== 40GbE Intel Wired LAN Driver Updates 2021-02-12 This series contains updates to i40e, ice, and ixgbe drivers. Maciej does cleanups on the following drivers. For i40e, removes redundant check for XDP prog, cleans up no longer relevant information, and removes an unused function argument. For ice, removes local variable use, instead returning values directly. Moves skb pointer from buffer to ring and removes an unneeded check for xdp_prog in zero copy path. Also removes a redundant MTU check when changing it. For i40e, ice, and ixgbe, stores the rx_offset in the Rx ring as the value is constant so there's no need for continual calls. Bjorn folds a decrement into a while statement. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2021-02-12ibmvnic: change IBMVNIC_MAX_IND_DESCS to 16Dany Madden
The supported indirect subcrq entries on Power8 is 16. Power9 supports 128. Redefined this value to 16 to minimize the driver from having to reset when migrating between Power9 and Power8. In our rx/tx performance testing, we found no performance difference between 16 and 128 at this time. Fixes: f019fb6392e5 ("ibmvnic: Introduce indirect subordinate Command Response Queue buffer") Signed-off-by: Dany Madden <drt@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-02-12Merge branch 'tc-mpls-selftests'David S. Miller
Guillaume Nault says: ==================== selftests: tc: Test tc-flower's MPLS features A couple of patches for exercising the MPLS filters of tc-flower. Patch 1 tests basic MPLS matching features: those that only work on the first label stack entry (that is, the mpls_label, mpls_tc, mpls_bos and mpls_ttl options). Patch 2 tests the more generic "mpls" and "lse" options, which allow matching MPLS fields beyond the first stack entry. In both patches, special care is taken to skip these new tests for incompatible versions of tc. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2021-02-12selftests: tc: Add generic mpls matching support for tc-flowerGuillaume Nault
Add tests in tc_flower.sh for generic matching on MPLS Label Stack Entries. The label, tc, bos and ttl fields are tested for the first and second labels. For each field, the minimal and maximal values are tested (the former at depth 1 and the later at depth 2). There are also tests for matching the presence of a label stack entry at a given depth. In order to reduce the amount of code, all "lse" subcommands are tested in match_mpls_lse_test(). Action "continue" is used, so that test packets are evaluated by all filters. Then, we can verify if each filter matched the expected number of packets. Some versions of tc-flower produced invalid json output when dumping MPLS filters with depth > 1. Skip the test if tc isn't recent enough. Signed-off-by: Guillaume Nault <gnault@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-02-12selftests: tc: Add basic mpls_* matching support for tc-flowerGuillaume Nault
Add tests in tc_flower.sh for mpls_label, mpls_tc, mpls_bos and mpls_ttl. For each keyword, test the minimal and maximal values. Selectively skip these new mpls tests for tc versions that don't support them. Signed-off-by: Guillaume Nault <gnault@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-02-12Merge branch 'brport-flags'David S. Miller
Vladimir Oltean says: ==================== Cleanup in brport flags switchdev offload for DSA The initial goal of this series was to have better support for standalone ports mode on the DSA drivers like ocelot/felix and sja1105. This turned out to require some API adjustments in both directions: to the information presented to and by the switchdev notifier, and to the API presented to the switch drivers by the DSA layer. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2021-02-12net: dsa: sja1105: offload bridge port flags to deviceVladimir Oltean
The chip can configure unicast flooding, broadcast flooding and learning. Learning is per port, while flooding is per {ingress, egress} port pair and we need to configure the same value for all possible ingress ports towards the requested one. While multicast flooding is not officially supported, we can hack it by using a feature of the second generation (P/Q/R/S) devices, which is that FDB entries are maskable, and multicast addresses always have an odd first octet. So by putting a match-all for 00:01:00:00:00:00 addr and 00:01:00:00:00:00 mask at the end of the FDB, we make sure that it is always checked last, and does not take precedence in front of any other MDB. So it behaves effectively as an unknown multicast entry. For the first generation switches, this feature is not available, so unknown multicast will always be treated the same as unknown unicast. So the only thing we can do is request the user to offload the settings for these 2 flags in tandem, i.e. ip link set swp2 type bridge_slave flood off Error: sja1105: This chip cannot configure multicast flooding independently of unicast. ip link set swp2 type bridge_slave flood off mcast_flood off ip link set swp2 type bridge_slave mcast_flood on Error: sja1105: This chip cannot configure multicast flooding independently of unicast. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-02-12net: mscc: ocelot: offload bridge port flags to deviceVladimir Oltean
We should not be unconditionally enabling address learning, since doing that is actively detrimential when a port is standalone and not offloading a bridge. Namely, if a port in the switch is standalone and others are offloading the bridge, then we could enter a situation where we learn an address towards the standalone port, but the bridged ports could not forward the packet there, because the CPU is the only path between the standalone and the bridged ports. The solution of course is to not enable address learning unless the bridge asks for it. We need to set up the initial port flags for no learning and flooding everything, and also when the port joins and leaves the bridge. The flood configuration was already configured ok for standalone mode in ocelot_init, we just need to disable learning in ocelot_init_port. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Alexandre Belloni <alexandre.belloni@bootlin.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-02-12net: mscc: ocelot: use separate flooding PGID for broadcastVladimir Oltean
In preparation of offloading the bridge port flags which have independent settings for unknown multicast and for broadcast, we should also start reserving one destination Port Group ID for the flooding of broadcast packets, to allow configuring it individually. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>