summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2020-12-04net: bna: remove trailing semicolon in macro definitionTom Rix
The macro use will already have a semicolon. Clean up escaped newlines. Signed-off-by: Tom Rix <trix@redhat.com> Link: https://lore.kernel.org/r/20201202163622.3733506-1-trix@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-12-04tipc: support 128bit node identity for peer removingHoang Le
We add the support to remove a specific node down with 128bit node identifier, as an alternative to legacy 32-bit node address. example: $tipc peer remove identiy <1001002|16777777> Acked-by: Jon Maloy <jmaloy@redhat.com> Signed-off-by: Hoang Le <hoang.h.le@dektech.com.au> Link: https://lore.kernel.org/r/20201203035045.4564-1-hoang.h.le@dektech.com.au Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-12-04mac80211: mesh: fix mesh_pathtbl_init() error pathEric Dumazet
If tbl_mpp can not be allocated, we call mesh_table_free(tbl_path) while tbl_path rhashtable has not yet been initialized, which causes panics. Simply factorize the rhashtable_init() call into mesh_table_alloc() WARNING: CPU: 1 PID: 8474 at kernel/workqueue.c:3040 __flush_work kernel/workqueue.c:3040 [inline] WARNING: CPU: 1 PID: 8474 at kernel/workqueue.c:3040 __cancel_work_timer+0x514/0x540 kernel/workqueue.c:3136 Modules linked in: CPU: 1 PID: 8474 Comm: syz-executor663 Not tainted 5.10.0-rc6-syzkaller #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 RIP: 0010:__flush_work kernel/workqueue.c:3040 [inline] RIP: 0010:__cancel_work_timer+0x514/0x540 kernel/workqueue.c:3136 Code: 5d c3 e8 bf ae 29 00 0f 0b e9 f0 fd ff ff e8 b3 ae 29 00 0f 0b 43 80 3c 3e 00 0f 85 31 ff ff ff e9 34 ff ff ff e8 9c ae 29 00 <0f> 0b e9 dc fe ff ff 89 e9 80 e1 07 80 c1 03 38 c1 0f 8c 7d fd ff RSP: 0018:ffffc9000165f5a0 EFLAGS: 00010293 RAX: ffffffff814b7064 RBX: 0000000000000001 RCX: ffff888021c80000 RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000 RBP: ffff888024039ca0 R08: dffffc0000000000 R09: fffffbfff1dd3e64 R10: fffffbfff1dd3e64 R11: 0000000000000000 R12: 1ffff920002cbebd R13: ffff888024039c88 R14: 1ffff11004807391 R15: dffffc0000000000 FS: 0000000001347880(0000) GS:ffff8880b9d00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000020000140 CR3: 000000002cc0a000 CR4: 00000000001506e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: rhashtable_free_and_destroy+0x25/0x9c0 lib/rhashtable.c:1137 mesh_table_free net/mac80211/mesh_pathtbl.c:69 [inline] mesh_pathtbl_init+0x287/0x2e0 net/mac80211/mesh_pathtbl.c:785 ieee80211_mesh_init_sdata+0x2ee/0x530 net/mac80211/mesh.c:1591 ieee80211_setup_sdata+0x733/0xc40 net/mac80211/iface.c:1569 ieee80211_if_add+0xd5c/0x1cd0 net/mac80211/iface.c:1987 ieee80211_add_iface+0x59/0x130 net/mac80211/cfg.c:125 rdev_add_virtual_intf net/wireless/rdev-ops.h:45 [inline] nl80211_new_interface+0x563/0xb40 net/wireless/nl80211.c:3855 genl_family_rcv_msg_doit net/netlink/genetlink.c:739 [inline] genl_family_rcv_msg net/netlink/genetlink.c:783 [inline] genl_rcv_msg+0xe4e/0x1280 net/netlink/genetlink.c:800 netlink_rcv_skb+0x190/0x3a0 net/netlink/af_netlink.c:2494 genl_rcv+0x24/0x40 net/netlink/genetlink.c:811 netlink_unicast_kernel net/netlink/af_netlink.c:1304 [inline] netlink_unicast+0x780/0x930 net/netlink/af_netlink.c:1330 netlink_sendmsg+0x9a8/0xd40 net/netlink/af_netlink.c:1919 sock_sendmsg_nosec net/socket.c:651 [inline] sock_sendmsg net/socket.c:671 [inline] ____sys_sendmsg+0x519/0x800 net/socket.c:2353 ___sys_sendmsg net/socket.c:2407 [inline] __sys_sendmsg+0x2b1/0x360 net/socket.c:2440 do_syscall_64+0x2d/0x70 arch/x86/entry/common.c:46 entry_SYSCALL_64_after_hwframe+0x44/0xa9 Fixes: 60854fd94573 ("mac80211: mesh: convert path table to rhashtable") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: syzbot <syzkaller@googlegroups.com> Reviewed-by: Johannes Berg <johannes@sipsolutions.net> Link: https://lore.kernel.org/r/20201204162428.2583119-1-eric.dumazet@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-12-04[SECURITY] fix namespaced fscaps when !CONFIG_SECURITYSerge Hallyn
Namespaced file capabilities were introduced in 8db6c34f1dbc . When userspace reads an xattr for a namespaced capability, a virtualized representation of it is returned if the caller is in a user namespace owned by the capability's owning rootid. The function which performs this virtualization was not hooked up if CONFIG_SECURITY=n. Therefore in that case the original xattr was shown instead of the virtualized one. To test this using libcap-bin (*1), $ v=$(mktemp) $ unshare -Ur setcap cap_sys_admin-eip $v $ unshare -Ur setcap -v cap_sys_admin-eip $v /tmp/tmp.lSiIFRvt8Y: OK "setcap -v" verifies the values instead of setting them, and will check whether the rootid value is set. Therefore, with this bug un-fixed, and with CONFIG_SECURITY=n, setcap -v will fail: $ v=$(mktemp) $ unshare -Ur setcap cap_sys_admin=eip $v $ unshare -Ur setcap -v cap_sys_admin=eip $v nsowner[got=1000, want=0],/tmp/tmp.HHDiOOl9fY differs in [] Fix this bug by calling cap_inode_getsecurity() in security_inode_getsecurity() instead of returning -EOPNOTSUPP, when CONFIG_SECURITY=n. *1 - note, if libcap is too old for getcap to have the '-n' option, then use verify-caps instead. Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=209689 Cc: Hervé Guillemet <herve@guillemet.org> Acked-by: Casey Schaufler <casey@schaufler-ca.com> Signed-off-by: Serge Hallyn <shallyn@cisco.com> Signed-off-by: Andrew G. Morgan <morgan@kernel.org> Signed-off-by: James Morris <jamorris@linux.microsoft.com>
2020-12-04nfp: Replace zero-length array with flexible-array memberSimon Horman
There is a regular need in the kernel to provide a way to declare having a dynamically sized set of trailing elements in a structure. Kernel code should always use "flexible array members"[1] for these cases. The older style of one-element or zero-length arrays should no longer be used[2]. [1] https://en.wikipedia.org/wiki/Flexible_array_member [2] https://www.kernel.org/doc/html/v5.9/process/deprecated.html#zero-length-and-one-element-arrays Cc: Gustavo A. R. Silva <gustavoars@kernel.org> Signed-off-by: Simon Horman <simon.horman@netronome.com> Signed-off-by: Louis Peens <louis.peens@netronome.com> Link: https://lore.kernel.org/r/20201204125601.24876-1-simon.horman@netronome.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-12-04openvswitch: fix error return code in validate_and_copy_dec_ttl()Wang Hai
Fix to return a negative error code from the error handling case instead of 0, as done elsewhere in this function. Changing 'return start' to 'return action_start' can fix this bug. Fixes: 69929d4c49e1 ("net: openvswitch: fix TTL decrement action netlink message format") Reported-by: Hulk Robot <hulkci@huawei.com> Signed-off-by: Wang Hai <wanghai38@huawei.com> Reviewed-by: Eelco Chaudron <echaudro@redhat.com> Link: https://lore.kernel.org/r/20201204114314.1596-1-wanghai38@huawei.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-12-04net: bridge: vlan: fix error return code in __vlan_add()Zhang Changzhong
Fix to return a negative error code from the error handling case instead of 0, as done elsewhere in this function. Fixes: f8ed289fab84 ("bridge: vlan: use br_vlan_(get|put)_master to deal with refcounts") Reported-by: Hulk Robot <hulkci@huawei.com> Signed-off-by: Zhang Changzhong <zhangchangzhong@huawei.com> Acked-by: Nikolay Aleksandrov <nikolay@nvidia.com> Link: https://lore.kernel.org/r/1607071737-33875-1-git-send-email-zhangchangzhong@huawei.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-12-04ipv4: fix error return code in rtm_to_fib_config()Zhang Changzhong
Fix to return a negative error code from the error handling case instead of 0, as done elsewhere in this function. Fixes: d15662682db2 ("ipv4: Allow ipv6 gateway with ipv4 routes") Reported-by: Hulk Robot <hulkci@huawei.com> Signed-off-by: Zhang Changzhong <zhangchangzhong@huawei.com> Reviewed-by: David Ahern <dsahern@kernel.org> Link: https://lore.kernel.org/r/1607071695-33740-1-git-send-email-zhangchangzhong@huawei.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-12-04nfc: s3fwrn5: skip the NFC bootloader modeBongsu Jeon
If there isn't a proper NFC firmware image, Bootloader mode will be skipped. Signed-off-by: Bongsu Jeon <bongsu.jeon@samsung.com> Reviewed-by: Krzysztof Kozlowski <krzk@kernel.org> Link: https://lore.kernel.org/r/20201203225257.2446-1-bongsu.jeon@samsung.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-12-04ASoC: qcom: fix QDSP6 dependencies, attempt #3Arnd Bergmann
The previous fix left another warning in randconfig builds: WARNING: unmet direct dependencies detected for SND_SOC_QDSP6 Depends on [n]: SOUND [=y] && !UML && SND [=y] && SND_SOC [=y] && SND_SOC_QCOM [=y] && QCOM_APR [=y] && COMMON_CLK [=n] Selected by [y]: - SND_SOC_MSM8996 [=y] && SOUND [=y] && !UML && SND [=y] && SND_SOC [=y] && SND_SOC_QCOM [=y] && QCOM_APR [=y] Add one more dependency for this one. Fixes: 2bc8831b135c ("ASoC: qcom: fix SDM845 & QDSP6 dependencies more") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Reviewed-by: Srinivas Kandagatla <srinivas.kandagatla@linaro.org> Link: https://lore.kernel.org/r/20201203231443.1483763-1-arnd@kernel.org Signed-off-by: Mark Brown <broonie@kernel.org>
2020-12-04ASoC: fsl_aud2htx: mark PM functions as __maybe_unusedArnd Bergmann
When CONFIG_PM is disabled, we get a warning for unused functions: sound/soc/fsl/fsl_aud2htx.c:261:12: error: unused function 'fsl_aud2htx_runtime_suspend' [-Werror,-Wunused-function] static int fsl_aud2htx_runtime_suspend(struct device *dev) sound/soc/fsl/fsl_aud2htx.c:271:12: error: unused function 'fsl_aud2htx_runtime_resume' [-Werror,-Wunused-function] static int fsl_aud2htx_runtime_resume(struct device *dev) Mark these as __maybe_unused to avoid the warning without adding an #ifdef. Fixes: 8a24c834c053 ("ASoC: fsl_aud2htx: Add aud2htx module driver") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Nicolin Chen <nicoleotsuka@gmail.com> Link: https://lore.kernel.org/r/20201203222900.1042578-1-arnd@kernel.org Signed-off-by: Mark Brown <broonie@kernel.org>
2020-12-04ASoC: intel: sof_rt5682: Add support for tgl_rt1011_rt5682Brent Lu
This patch adds the driver data for two rt1011 speaker amplifiers on SSP1 and rt5682 on SSP0 for TGL platform. DAI format for rt1011 is leveraged from cml_rt1011_rt5682 which is 4-slot tdm with 100fs bclk. Signed-off-by: Brent Lu <brent.lu@intel.com> Acked-by: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com> Link: https://lore.kernel.org/r/20201203154010.29464-1-brent.lu@intel.com Signed-off-by: Mark Brown <broonie@kernel.org>
2020-12-04ASoC: atmel: mchp-spdifrx needs COMMON_CLKArnd Bergmann
Compile-testing this driver on an older platform without CONFIG_COMMON_CLK fails with ERROR: modpost: "clk_set_min_rate" [sound/soc/atmel/snd-soc-mchp-spdifrx.ko] undefined! Make this is a strict dependency. Fixes: ef265c55c1ac ("ASoC: mchp-spdifrx: add driver for SPDIF RX") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Reviewed-by: Codrin Ciubotariu <codrin.ciubotariu@microchip.com> Link: https://lore.kernel.org/r/20201203223815.1353451-1-arnd@kernel.org Signed-off-by: Mark Brown <broonie@kernel.org>
2020-12-04ASoC: cros_ec_codec: fix uninitialized memory readArnd Bergmann
gcc points out a memory area that is copied to a device but not initialized: sound/soc/codecs/cros_ec_codec.c: In function 'i2s_rx_event': arch/x86/include/asm/string_32.h:83:20: error: '*((void *)&p+4)' may be used uninitialized in this function [-Werror=maybe-uninitialized] 83 | *((int *)to + 1) = *((int *)from + 1); Initialize all the unused fields to zero. Fixes: 727f1c71c780 ("ASoC: cros_ec_codec: refactor I2S RX") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Tzung-Bi Shih <tzungbi@google.com> Link: https://lore.kernel.org/r/20201203225458.1477830-1-arnd@kernel.org Signed-off-by: Mark Brown <broonie@kernel.org>
2020-12-04ASoC: SOF: control: fix cppcheck warning in snd_sof_volume_info()Pierre-Louis Bossart
Fix cppcheck warning: sound/soc/sof/control.c:117:82: style:inconclusive: Function 'snd_sof_volume_info' argument 2 names different: declaration 'ucontrol' definition 'uinfo'. [funcArgNamesDifferent] Fixes: fca18e62984a ("ASoC: SOF: control: override volume info callback") Signed-off-by: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com> Signed-off-by: Kai Vehmanen <kai.vehmanen@linux.intel.com> Link: https://lore.kernel.org/r/20201204170313.2704499-1-kai.vehmanen@linux.intel.com Signed-off-by: Mark Brown <broonie@kernel.org>
2020-12-04ethernet: select CONFIG_CRC32 as neededArnd Bergmann
A number of ethernet drivers require crc32 functionality to be avaialable in the kernel, causing a link error otherwise: arm-linux-gnueabi-ld: drivers/net/ethernet/agere/et131x.o: in function `et1310_setup_device_for_multicast': et131x.c:(.text+0x5918): undefined reference to `crc32_le' arm-linux-gnueabi-ld: drivers/net/ethernet/cadence/macb_main.o: in function `macb_start_xmit': macb_main.c:(.text+0x4b88): undefined reference to `crc32_le' arm-linux-gnueabi-ld: drivers/net/ethernet/faraday/ftgmac100.o: in function `ftgmac100_set_rx_mode': ftgmac100.c:(.text+0x2b38): undefined reference to `crc32_le' arm-linux-gnueabi-ld: drivers/net/ethernet/freescale/fec_main.o: in function `set_multicast_list': fec_main.c:(.text+0x6120): undefined reference to `crc32_le' arm-linux-gnueabi-ld: drivers/net/ethernet/freescale/fman/fman_dtsec.o: in function `dtsec_add_hash_mac_address': fman_dtsec.c:(.text+0x830): undefined reference to `crc32_le' arm-linux-gnueabi-ld: drivers/net/ethernet/freescale/fman/fman_dtsec.o:fman_dtsec.c:(.text+0xb68): more undefined references to `crc32_le' follow arm-linux-gnueabi-ld: drivers/net/ethernet/netronome/nfp/nfpcore/nfp_hwinfo.o: in function `nfp_hwinfo_read': nfp_hwinfo.c:(.text+0x250): undefined reference to `crc32_be' arm-linux-gnueabi-ld: nfp_hwinfo.c:(.text+0x288): undefined reference to `crc32_be' arm-linux-gnueabi-ld: drivers/net/ethernet/netronome/nfp/nfpcore/nfp_resource.o: in function `nfp_resource_acquire': nfp_resource.c:(.text+0x144): undefined reference to `crc32_be' arm-linux-gnueabi-ld: nfp_resource.c:(.text+0x158): undefined reference to `crc32_be' arm-linux-gnueabi-ld: drivers/net/ethernet/nxp/lpc_eth.o: in function `lpc_eth_set_multicast_list': lpc_eth.c:(.text+0x1934): undefined reference to `crc32_le' arm-linux-gnueabi-ld: drivers/net/ethernet/rocker/rocker_ofdpa.o: in function `ofdpa_flow_tbl_do': rocker_ofdpa.c:(.text+0x2e08): undefined reference to `crc32_le' arm-linux-gnueabi-ld: drivers/net/ethernet/rocker/rocker_ofdpa.o: in function `ofdpa_flow_tbl_del': rocker_ofdpa.c:(.text+0x3074): undefined reference to `crc32_le' arm-linux-gnueabi-ld: drivers/net/ethernet/rocker/rocker_ofdpa.o: in function `ofdpa_port_fdb': arm-linux-gnueabi-ld: drivers/net/ethernet/mellanox/mlx5/core/steering/dr_ste.o: in function `mlx5dr_ste_calc_hash_index': dr_ste.c:(.text+0x354): undefined reference to `crc32_le' arm-linux-gnueabi-ld: drivers/net/ethernet/microchip/lan743x_main.o: in function `lan743x_netdev_set_multicast': lan743x_main.c:(.text+0x5dc4): undefined reference to `crc32_le' Add the missing 'select CRC32' entries in Kconfig for each of them. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Nicolas Ferre <nicolas.ferre@microchip.com> Acked-by: Madalin Bucur <madalin.bucur@oss.nxp.com> Acked-by: Mark Einon <mark.einon@gmail.com> Acked-by: Simon Horman <simon.horman@netronome.com> Link: https://lore.kernel.org/r/20201203232114.1485603-1-arnd@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-12-04net: ipa: pass the correct size when freeing DMA memoryAlex Elder
When the coherent memory is freed in gsi_trans_pool_exit_dma(), we are mistakenly passing the size of a single element in the pool rather than the actual allocated size. Fix this bug. Fixes: 9dd441e4ed575 ("soc: qcom: ipa: GSI transactions") Reported-by: Stephen Boyd <swboyd@chromium.org> Tested-by: Sujit Kautkar <sujitka@chromium.org> Signed-off-by: Alex Elder <elder@linaro.org> Reviewed-by: Bjorn Andersson <bjorn.andersson@linaro.org> Link: https://lore.kernel.org/r/20201203215106.17450-1-elder@linaro.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-12-04ARM: highmem: Fix cache_is_vivt() referenceArnd Bergmann
The reference to cache_is_vivt() was moved into a header file, which now causes a build failure in rare randconfig builds: arch/arm/include/asm/highmem.h:52:43: error: implicit declaration of function 'cache_is_vivt' [-Werror,-Wimplicit-function-declaration] Add an explicit include to make it build reliably. Fixes: 2a15ba82fa6c ("ARM: highmem: Switch to generic kmap atomic") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Russell King <linux@armlinux.org.uk> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Ira Weiny <ira.weiny@intel.com> Cc: linux-arm-kernel@lists.infradead.org Link: https://lore.kernel.org/r/20201204165930.3877571-1-arnd@kernel.org
2020-12-04block: fix incorrect branching in blk_max_size_offset()Mike Snitzer
If non-zero 'chunk_sectors' is passed in to blk_max_size_offset() that override will be incorrectly ignored. Old blk_max_size_offset() branching, prior to commit 3ee16db390b4, must be used only if passed 'chunk_sectors' override is zero. Fixes: 3ee16db390b4 ("dm: fix IO splitting") Cc: stable@vger.kernel.org # 5.9 Reported-by: John Dorminy <jdorminy@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2020-12-04net/sched: fq_pie: initialize timer earlier in fq_pie_init()Davide Caratti
with the following tdc testcase: 83be: (qdisc, fq_pie) Create FQ-PIE with invalid number of flows as fq_pie_init() fails, fq_pie_destroy() is called to clean up. Since the timer is not yet initialized, it's possible to observe a splat like this: INFO: trying to register non-static key. the code is fine but needs lockdep annotation. turning off the locking correctness validator. CPU: 0 PID: 975 Comm: tc Not tainted 5.10.0-rc4+ #298 Hardware name: Red Hat KVM, BIOS 1.11.1-4.module+el8.1.0+4066+0f1aadab 04/01/2014 Call Trace: dump_stack+0x99/0xcb register_lock_class+0x12dd/0x1750 __lock_acquire+0xfe/0x3970 lock_acquire+0x1c8/0x7f0 del_timer_sync+0x49/0xd0 fq_pie_destroy+0x3f/0x80 [sch_fq_pie] qdisc_create+0x916/0x1160 tc_modify_qdisc+0x3c4/0x1630 rtnetlink_rcv_msg+0x346/0x8e0 netlink_unicast+0x439/0x630 netlink_sendmsg+0x719/0xbf0 sock_sendmsg+0xe2/0x110 ____sys_sendmsg+0x5ba/0x890 ___sys_sendmsg+0xe9/0x160 __sys_sendmsg+0xd3/0x170 do_syscall_64+0x33/0x40 entry_SYSCALL_64_after_hwframe+0x44/0xa9 [...] ODEBUG: assert_init not available (active state 0) object type: timer_list hint: 0x0 WARNING: CPU: 0 PID: 975 at lib/debugobjects.c:508 debug_print_object+0x162/0x210 [...] Call Trace: debug_object_assert_init+0x268/0x380 try_to_del_timer_sync+0x6a/0x100 del_timer_sync+0x9e/0xd0 fq_pie_destroy+0x3f/0x80 [sch_fq_pie] qdisc_create+0x916/0x1160 tc_modify_qdisc+0x3c4/0x1630 rtnetlink_rcv_msg+0x346/0x8e0 netlink_rcv_skb+0x120/0x380 netlink_unicast+0x439/0x630 netlink_sendmsg+0x719/0xbf0 sock_sendmsg+0xe2/0x110 ____sys_sendmsg+0x5ba/0x890 ___sys_sendmsg+0xe9/0x160 __sys_sendmsg+0xd3/0x170 do_syscall_64+0x33/0x40 entry_SYSCALL_64_after_hwframe+0x44/0xa9 fix it moving timer_setup() before any failure, like it was done on 'red' with former commit 608b4adab178 ("net_sched: initialize timer earlier in red_init()"). Fixes: ec97ecf1ebe4 ("net: sched: add Flow Queue PIE packet scheduler") Signed-off-by: Davide Caratti <dcaratti@redhat.com> Reviewed-by: Cong Wang <cong.wang@bytedance.com> Link: https://lore.kernel.org/r/2e78e01c504c633ebdff18d041833cf2e079a3a4.1607020450.git.dcaratti@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-12-04Merge branch 'perf-optimizations-for-tcp-recv-zerocopy'Jakub Kicinski
Arjun Roy says: ==================== Perf. optimizations for TCP Recv. Zerocopy This patchset contains several optimizations for TCP Recv. Zerocopy. Summarized: 1. It is possible that a read payload is not exactly page aligned - that there may exist "straggler" bytes that we cannot map into the caller's address space cleanly. For this, we allow the caller to provide as argument a "hybrid copy buffer", turning getsockopt(TCP_ZEROCOPY_RECEIVE) into a "hybrid" operation that allows the caller to avoid a subsequent recvmsg() call to read the stragglers. 2. Similarly, for "small" read payloads that are either below the size of a page, or small enough that remapping pages is not a performance win - we allow the user to short-circuit the remapping operations entirely and simply copy into the buffer provided. Some of the patches in the middle of this set are refactors to support this "short-circuiting" optimization. 3. We allow the user to provide a hint that performing a page zap operation (and the accompanying TLB shootdown) may not be necessary, for the provided region that the kernel will attempt to map pages into. This allows us to avoid this expensive operation while holding the socket lock, which provides a significant performance advantage. With all of these changes combined, "medium" sized receive traffic (multiple tens to few hundreds of KB) see significant efficiency gains when using TCP receive zerocopy instead of regular recvmsg(). For example, with RPC-style traffic with 32KB messages, there is a roughly 15% efficiency improvement when using zerocopy. Without these changes, there is a roughly 60-70% efficiency reduction with such messages when employing zerocopy. ==================== Link: https://lore.kernel.org/r/20201202225349.935284-1-arjunroy.kdev@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-12-04net-zerocopy: Defer vm zap unless actually needed.Arjun Roy
Zapping pages is required only if we are calling vm_insert_page into a region where pages had previously been mapped. Receive zerocopy allows reusing such regions, and hitherto called zap_page_range() before calling vm_insert_page() in that range. zap_page_range() can also be triggered from userspace with madvise(MADV_DONTNEED). If userspace is configured to call this before reusing a segment, or if there was nothing mapped at this virtual address to begin with, we can avoid calling zap_page_range() under the socket lock. That said, if userspace does not do that, then we are still responsible for calling zap_page_range(). This patch adds a flag that the user can use to hint to the kernel that a zap is not required. If the flag is not set, or if an older user application does not have a flags field at all, then the kernel calls zap_page_range as before. Also, if the flag is set but a zap is still required, the kernel performs that zap as necessary. Thus incorrectly indicating that a zap can be avoided does not change the correctness of operation. It also increases the batchsize for vm_insert_pages and prefetches the page struct for the batch since we're about to bump the refcount. An alternative mechanism could be to not have a flag, assume by default a zap is not needed, and fall back to zapping if needed. However, this would harm performance for older applications for which a zap is necessary, and thus we implement it with an explicit flag so newer applications can opt in. When using RPC-style traffic with medium sized (tens of KB) RPCs, this change yields an efficency improvement of about 30% for QPS/CPU usage. Signed-off-by: Arjun Roy <arjunroy@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Soheil Hassas Yeganeh <soheil@google.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-12-04net-zerocopy: Set zerocopy hint when data is copiedArjun Roy
Set zerocopy hint, event when falling back to copy, so that the pending data can be efficiently received using zerocopy when possible. Signed-off-by: Arjun Roy <arjunroy@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Soheil Hassas Yeganeh <soheil@google.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-12-04net-zerocopy: Introduce short-circuit small reads.Arjun Roy
Sometimes, we may call tcp receive zerocopy when inq is 0, or inq < PAGE_SIZE, or inq is generally small enough that it is cheaper to copy rather than remap pages. In these cases, we may want to either return early (inq=0) or attempt to use the provided copy buffer to simply copy the received data. This allows us to save both system call overhead and the latency of acquiring mmap_sem in read mode for cases where it would be useless to do so. This patchset enables this behaviour by: 1. Returning quickly if inq is 0. 2. Attempting to perform a regular copy if a hybrid copybuffer is provided and it is large enough to absorb all available bytes. 3. Return quickly if no such buffer was provided and there are less than PAGE_SIZE bytes available. For small RPC ping-pong workloads, normally we would have 1 getsockopt(), 1 recvmsg() and 1 sendmsg() call per RPC. With this change, we remove the recvmsg() call entirely, reducing the syscall overhead by about 33%. In testing with small (hundreds of bytes) RPC traffic, this yields a syscall reduction of about 33% and an efficiency gain of about 3-5% when defined as QPS/CPU Util. Signed-off-by: Arjun Roy <arjunroy@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Soheil Hassas Yeganeh <soheil@google.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-12-04net-zerocopy: Fast return if inq < PAGE_SIZEArjun Roy
Sometimes, we may call tcp receive zerocopy when inq is 0, or inq < PAGE_SIZE, in which case we cannot remap pages. In this case, simply return the appropriate hint for regular copying without taking mmap_sem. Signed-off-by: Arjun Roy <arjunroy@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Soheil Hassas Yeganeh <soheil@google.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-12-04net-zerocopy: Refactor frag-is-remappable test.Arjun Roy
Refactor frag-is-remappable test for tcp receive zerocopy. This is part of a patch set that introduces short-circuited hybrid copies for small receive operations, which results in roughly 33% fewer syscalls for small RPC scenarios. Signed-off-by: Arjun Roy <arjunroy@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Soheil Hassas Yeganeh <soheil@google.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-12-04net-zerocopy: Refactor skb frag fast-forward op.Arjun Roy
Refactor skb frag fast-forwarding for tcp receive zerocopy. This is part of a patch set that introduces short-circuited hybrid copies for small receive operations, which results in roughly 33% fewer syscalls for small RPC scenarios. skb_advance_to_frag(), given a skb and an offset into the skb, iterates from the first frag for the skb until we're at the frag specified by the offset. Assuming the offset provided refers to how many bytes in the skb are already read, the returned frag points to the next frag we may read from, while offset_frag is set to the number of bytes from this frag that we have already read. If frag is not null and offset_frag is equal to 0, then we may be able to map this frag's page into the process address space with vm_insert_page(). However, if offset_frag is not equal to 0, then we cannot do so. Signed-off-by: Arjun Roy <arjunroy@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Soheil Hassas Yeganeh <soheil@google.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-12-04net-tcp: Introduce tcp_recvmsg_locked().Arjun Roy
Refactor tcp_recvmsg() by splitting it into locked and unlocked portions. Callers already holding the socket lock and not using ERRQUEUE/cmsg/busy polling can simply call tcp_recvmsg_locked(). This is in preparation for a short-circuit copy performed by TCP receive zerocopy for small (< PAGE_SIZE, or otherwise requested by the user) reads. Signed-off-by: Arjun Roy <arjunroy@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Soheil Hassas Yeganeh <soheil@google.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-12-04net-zerocopy: Copy straggler unaligned data for TCP Rx. zerocopy.Arjun Roy
When TCP receive zerocopy does not successfully map the entire requested space, it outputs a 'hint' that the caller should recvmsg(). Augment zerocopy to accept a user buffer that it tries to copy this hint into - if it is possible to copy the entire hint, it will do so. This elides a recvmsg() call for received traffic that isn't exactly page-aligned in size. This was tested with RPC-style traffic of arbitrary sizes. Normally, each received message required at least one getsockopt() call, and one recvmsg() call for the remaining unaligned data. With this change, almost all of the recvmsg() calls are eliminated, leading to a savings of about 25%-50% in number of system calls for RPC-style workloads. Signed-off-by: Arjun Roy <arjunroy@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Soheil Hassas Yeganeh <soheil@google.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-12-04tracing: Fix userstacktrace option for instancesSteven Rostedt (VMware)
When the instances were able to use their own options, the userstacktrace option was left hardcoded for the top level. This made the instance userstacktrace option bascially into a nop, and will confuse users that set it, but nothing happens (I was confused when it happened to me!) Cc: stable@vger.kernel.org Fixes: 16270145ce6b ("tracing: Add trace options for core options to instances") Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2020-12-04scripts: get_feat.pl: reduce table width for all features outputMauro Carvalho Chehab
Auto-adjust the table columns width to better fit under terminals, by breaking the description on multiple lines and auto-estimating the minimal size for the per-architecture status. Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org> Link: https://lore.kernel.org/r/9d39ac3fd51f1360aecc328c01558be88a1d6930.1607095090.git.mchehab+huawei@kernel.org Signed-off-by: Jonathan Corbet <corbet@lwn.net>
2020-12-04scripts: get_feat.pl: change the group by orderMauro Carvalho Chehab
Right now, arch compatibility is grouped by status at the alphabetical order from A to Z, and then from a to z, e. g:. --- TODO ok Revert the order, in order to print first the OK results, then TODO, and, finally, the not compatible ones. Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org> Link: https://lore.kernel.org/r/46d53d138eab8e4a55124323ceb5b212c6eedd08.1607095090.git.mchehab+huawei@kernel.org Signed-off-by: Jonathan Corbet <corbet@lwn.net>
2020-12-04scripts: get_feat.pl: make complete table more coinciseMauro Carvalho Chehab
Currently, there are too many white spaces at the tables, and the information is very sparsed on it. Make the format a lot more compact. Suggested-by: Jonathan Corbet <corbet@lwn.net> Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org> Link: https://lore.kernel.org/r/8165ff379313e63a69898db19d790e4436224ffd.1607095090.git.mchehab+huawei@kernel.org Signed-off-by: Jonathan Corbet <corbet@lwn.net>
2020-12-04selftests/bpf: Test bpf_sk_storage_get in tcp iteratorsFlorent Revest
This extends the existing bpf_sk_storage_get test where a socket is created and tagged with its creator's pid by a task_file iterator. A TCP iterator is now also used at the end of the test to negate the values already stored in the local storage. The test therefore expects -getpid() to be stored in the local storage. Signed-off-by: Florent Revest <revest@google.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Yonghong Song <yhs@fb.com> Acked-by: Martin KaFai Lau <kafai@fb.com> Link: https://lore.kernel.org/bpf/20201204113609.1850150-6-revest@google.com
2020-12-04selftests/bpf: Add an iterator selftest for bpf_sk_storage_getFlorent Revest
The eBPF program iterates over all files and tasks. For all socket files, it stores the tgid of the last task it encountered with a handle to that socket. This is a heuristic for finding the "owner" of a socket similar to what's done by lsof, ss, netstat or fuser. Potentially, this information could be used from a cgroup_skb/*gress hook to try to associate network traffic with processes. The test makes sure that a socket it created is tagged with prog_tests's pid. Signed-off-by: Florent Revest <revest@google.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Yonghong Song <yhs@fb.com> Acked-by: Martin KaFai Lau <kafai@fb.com> Link: https://lore.kernel.org/bpf/20201204113609.1850150-5-revest@google.com
2020-12-04selftests/bpf: Add an iterator selftest for bpf_sk_storage_deleteFlorent Revest
The eBPF program iterates over all entries (well, only one) of a socket local storage map and deletes them all. The test makes sure that the entry is indeed deleted. Signed-off-by: Florent Revest <revest@google.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Martin KaFai Lau <kafai@fb.com> Link: https://lore.kernel.org/bpf/20201204113609.1850150-4-revest@google.com
2020-12-04bpf: Expose bpf_sk_storage_* to iterator programsFlorent Revest
Iterators are currently used to expose kernel information to userspace over fast procfs-like files but iterators could also be used to manipulate local storage. For example, the task_file iterator could be used to initialize a socket local storage with associations between processes and sockets or to selectively delete local storage values. Signed-off-by: Florent Revest <revest@google.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Martin KaFai Lau <kafai@fb.com> Acked-by: KP Singh <kpsingh@google.com> Link: https://lore.kernel.org/bpf/20201204113609.1850150-3-revest@google.com
2020-12-04bpf: Add a bpf_sock_from_file helperFlorent Revest
While eBPF programs can check whether a file is a socket by file->f_op == &socket_file_ops, they cannot convert the void private_data pointer to a struct socket BTF pointer. In order to do this a new helper wrapping sock_from_file is added. This is useful to tracing programs but also other program types inheriting this set of helpers such as iterators or LSM programs. Signed-off-by: Florent Revest <revest@google.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: KP Singh <kpsingh@google.com> Acked-by: Martin KaFai Lau <kafai@fb.com> Link: https://lore.kernel.org/bpf/20201204113609.1850150-2-revest@google.com
2020-12-04net: Remove the err argument from sock_from_fileFlorent Revest
Currently, the sock_from_file prototype takes an "err" pointer that is either not set or set to -ENOTSOCK IFF the returned socket is NULL. This makes the error redundant and it is ignored by a few callers. This patch simplifies the API by letting callers deduce the error based on whether the returned socket is NULL or not. Suggested-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Florent Revest <revest@google.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Reviewed-by: KP Singh <kpsingh@google.com> Link: https://lore.kernel.org/bpf/20201204113609.1850150-1-revest@google.com
2020-12-04Merge branch 'seg6-add-support-for-srv6-end-dt4-dt6-behavior'Jakub Kicinski
Andrea Mayer says: ==================== seg6: add support for SRv6 End.DT4/DT6 behavior This patchset provides support for the SRv6 End.DT4 and End.DT6 (VRF mode) behaviors. The SRv6 End.DT4 behavior is used to implement multi-tenant IPv4 L3 VPNs. It decapsulates the received packets and performs IPv4 routing lookup in the routing table of the tenant. The SRv6 End.DT4 Linux implementation leverages a VRF device in order to force the routing lookup into the associated routing table. The SRv6 End.DT4 behavior is defined in the SRv6 Network Programming [1]. The Linux kernel already offers an implementation of the SRv6 End.DT6 behavior which allows us to set up IPv6 L3 VPNs over SRv6 networks. This new implementation of DT6 is based on the same VRF infrastructure already exploited for implementing the SRv6 End.DT4 behavior. The aim of the new SRv6 End.DT6 in VRF mode consists in simplifying the construction of IPv6 L3 VPN services in the multi-tenant environment. Currently, the two SRv6 End.DT6 implementations (legacy and VRF mode) coexist seamlessly and can be chosen according to the context and the user preferences. - Patch 1 is needed to solve a pre-existing issue with tunneled packets when a sniffer is attached; - Patch 2 improves the management of the seg6local attributes used by the SRv6 behaviors; - Patch 3 adds support for optional attributes in SRv6 behaviors; - Patch 4 introduces two callbacks used for customizing the creation/destruction of a SRv6 behavior; - Patch 5 is the core patch that adds support for the SRv6 End.DT4 behavior; - Patch 6 introduces the VRF support for SRv6 End.DT6 behavior; - Patch 7 adds the selftest for SRv6 End.DT4 behavior; - Patch 8 adds the selftest for SRv6 End.DT6 (VRF mode) behavior. Regarding iproute2, the support for the new "vrftable" attribute, required by both SRv6 End.DT4 and End.DT6 (VRF mode) behaviors, is provided in a different patchset that will follow shortly. I would like to thank David Ahern for his support during the development of this patchset. [1] https://tools.ietf.org/html/draft-ietf-spring-srv6-network-programming ==================== Link: https://lore.kernel.org/r/20201202130517.4967-1-andrea.mayer@uniroma2.it Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-12-04selftests: add selftest for the SRv6 End.DT6 (VRF) behaviorAndrea Mayer
this selftest is designed for evaluating the new SRv6 End.DT6 (VRF) behavior used, in this example, for implementing IPv6 L3 VPN use cases. Signed-off-by: Andrea Mayer <andrea.mayer@uniroma2.it> Signed-off-by: Paolo Lungaroni <paolo.lungaroni@cnit.it> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-12-04selftests: add selftest for the SRv6 End.DT4 behaviorAndrea Mayer
this selftest is designed for evaluating the new SRv6 End.DT4 behavior used, in this example, for implementing IPv4 L3 VPN use cases. Signed-off-by: Andrea Mayer <andrea.mayer@uniroma2.it> Reviewed-by: David Ahern <dsahern@kernel.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-12-04seg6: add VRF support for SRv6 End.DT6 behaviorAndrea Mayer
SRv6 End.DT6 is defined in the SRv6 Network Programming [1]. The Linux kernel already offers an implementation of the SRv6 End.DT6 behavior which permits IPv6 L3 VPNs over SRv6 networks. This implementation is not particularly suitable in contexts where we need to deploy IPv6 L3 VPNs among different tenants which share the same network address schemes. The underlying problem lies in the fact that the current version of DT6 (called legacy DT6 from now on) needs a complex configuration to be applied on routers which requires ad-hoc routes and routing policy rules to ensure the correct isolation of tenants. Consequently, a new implementation of DT6 has been introduced with the aim of simplifying the construction of IPv6 L3 VPN services in the multi-tenant environment using SRv6 networks. To accomplish this task, we reused the same VRF infrastructure and SRv6 core components already exploited for implementing the SRv6 End.DT4 behavior. Currently the two End.DT6 implementations coexist seamlessly and can be used depending on the context and the user preferences. So, in order to support both versions of DT6 a new attribute (vrftable) has been introduced which allows us to differentiate the implementation of the behavior to be used. A SRv6 End.DT6 legacy behavior is still instantiated using a command like the following one: $ ip -6 route add 2001:db8::1 encap seg6local action End.DT6 table 100 dev eth0 While to instantiate the SRv6 End.DT6 in VRF mode, the command is still pretty straight forward: $ ip -6 route add 2001:db8::1 encap seg6local action End.DT6 vrftable 100 dev eth0. Obviously as in the case of SRv6 End.DT4, the VRF strict_mode parameter must be set (net.vrf.strict_mode=1) and the VRF associated with table 100 must exist. Please note that the instances of SRv6 End.DT6 legacy and End.DT6 VRF mode can coexist in the same system/configuration without problems. [1] https://tools.ietf.org/html/draft-ietf-spring-srv6-network-programming Signed-off-by: Andrea Mayer <andrea.mayer@uniroma2.it> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-12-04seg6: add support for the SRv6 End.DT4 behaviorAndrea Mayer
SRv6 End.DT4 is defined in the SRv6 Network Programming [1]. The SRv6 End.DT4 is used to implement IPv4 L3VPN use-cases in multi-tenants environments. It decapsulates the received packets and it performs IPv4 routing lookup in the routing table of the tenant. The SRv6 End.DT4 Linux implementation leverages a VRF device in order to force the routing lookup into the associated routing table. To make the End.DT4 work properly, it must be guaranteed that the routing table used for routing lookup operations is bound to one and only one VRF during the tunnel creation. Such constraint has to be enforced by enabling the VRF strict_mode sysctl parameter, i.e: $ sysctl -wq net.vrf.strict_mode=1. At JANOG44, LINE corporation presented their multi-tenant DC architecture using SRv6 [2]. In the slides, they reported that the Linux kernel is missing the support of SRv6 End.DT4 behavior. The SRv6 End.DT4 behavior can be instantiated using a command similar to the following: $ ip route add 2001:db8::1 encap seg6local action End.DT4 vrftable 100 dev eth0 We introduce the "vrftable" extension in iproute2 in a following patch. [1] https://tools.ietf.org/html/draft-ietf-spring-srv6-network-programming [2] https://speakerdeck.com/line_developers/line-data-center-networking-with-srv6 Signed-off-by: Andrea Mayer <andrea.mayer@uniroma2.it> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-12-04seg6: add callbacks for customizing the creation/destruction of a behaviorAndrea Mayer
We introduce two callbacks used for customizing the creation/destruction of a SRv6 behavior. Such callbacks are defined in the new struct seg6_local_lwtunnel_ops and hereafter we provide a brief description of them: - build_state(...): used for calling the custom constructor of the behavior during its initialization phase and after all the attributes have been parsed successfully; - destroy_state(...): used for calling the custom destructor of the behavior before it is completely destroyed. Signed-off-by: Andrea Mayer <andrea.mayer@uniroma2.it> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-12-04seg6: add support for optional attributes in SRv6 behaviorsAndrea Mayer
Before this patch, each SRv6 behavior specifies a set of required attributes that must be provided by the userspace application when such behavior is going to be instantiated. If at least one of the required attributes is not provided, the creation of the behavior fails. The SRv6 behavior framework lacks a way to manage optional attributes. By definition, an optional attribute for a SRv6 behavior consists of an attribute which may or may not be provided by the userspace. Therefore, if an optional attribute is missing (and thus not supplied by the user) the creation of the behavior goes ahead without any issue. This patch explicitly differentiates the required attributes from the optional attributes. In particular, each behavior can declare a set of required attributes and a set of optional ones. The semantic of the required attributes remains *totally* unaffected by this patch. The introduction of the optional attributes does NOT impact on the backward compatibility of the existing SRv6 behaviors. It is essential to note that if an (optional or required) attribute is supplied to a SRv6 behavior which does not expect it, the behavior simply discards such attribute without generating any error or warning. This operating mode remained unchanged both before and after the introduction of the optional attributes extension. The optional attributes are one of the key components used to implement the SRv6 End.DT6 behavior based on the Virtual Routing and Forwarding (VRF) framework. The optional attributes make possible the coexistence of the already existing SRv6 End.DT6 implementation with the new SRv6 End.DT6 VRF-based implementation without breaking any backward compatibility. Further details on the SRv6 End.DT6 behavior (VRF mode) are reported in subsequent patches. From the userspace point of view, the support for optional attributes DO NOT require any changes to the userspace applications, i.e: iproute2 unless new attributes (required or optional) are needed. Signed-off-by: Andrea Mayer <andrea.mayer@uniroma2.it> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-12-04seg6: improve management of behavior attributesAndrea Mayer
Depending on the attribute (i.e.: SEG6_LOCAL_SRH, SEG6_LOCAL_TABLE, etc), the parse() callback performs some validity checks on the provided input and updates the tunnel state (slwt) with the result of the parsing operation. However, an attribute may also need to reserve some additional resources (i.e.: memory or setting up an eBPF program) in the parse() callback to complete the parsing operation. The parse() callbacks are invoked by the parse_nla_action() for each attribute belonging to a specific behavior. Given a behavior with N attributes, if the parsing of the i-th attribute fails, the parse_nla_action() returns immediately with an error. Nonetheless, the resources acquired during the parsing of the i-1 attributes are not freed by the parse_nla_action(). Attributes which acquire resources must release them *in an explicit way* in both the seg6_local_{build/destroy}_state(). However, adding a new attribute of this type requires changes to seg6_local_{build/destroy}_state() to release the resources correctly. The seg6local infrastructure still lacks a simple and structured way to release the resources acquired in the parse() operations. We introduced a new callback in the struct seg6_action_param named destroy(). This callback releases any resource which may have been acquired in the parse() counterpart. Each attribute may or may not implement the destroy() callback depending on whether it needs to free some acquired resources. The destroy() callback comes with several of advantages: 1) we can have many attributes as we want for a given behavior with no need to explicitly free the taken resources; 2) As in case of the seg6_local_build_state(), the seg6_local_destroy_state() does not need to handle the release of resources directly. Indeed, it calls the destroy_attrs() function which is in charge of calling the destroy() callback for every set attribute. We do not need to patch seg6_local_{build/destroy}_state() anymore as we add new attributes; 3) the code is more readable and better structured. Indeed, all the information needed to handle a given attribute are contained in only one place; 4) it facilitates the integration with new features introduced in further patches. Signed-off-by: Andrea Mayer <andrea.mayer@uniroma2.it> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-12-04vrf: add mac header for tunneled packets when sniffer is attachedAndrea Mayer
Before this patch, a sniffer attached to a VRF used as the receiving interface of L3 tunneled packets detects them as malformed packets and it complains about that (i.e.: tcpdump shows bogus packets). The reason is that a tunneled L3 packet does not carry any L2 information and when the VRF is set as the receiving interface of a decapsulated L3 packet, no mac header is currently set or valid. Therefore, the purpose of this patch consists of adding a MAC header to any packet which is directly received on the VRF interface ONLY IF: i) a sniffer is attached on the VRF and ii) the mac header is not set. In this case, the mac address of the VRF is copied in both the destination and the source address of the ethernet header. The protocol type is set either to IPv4 or IPv6, depending on which L3 packet is received. Signed-off-by: Andrea Mayer <andrea.mayer@uniroma2.it> Reviewed-by: David Ahern <dsahern@kernel.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-12-04Merge tag 'for-5.10/dm-fixes' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm Pull device mapper fixes from Mike Snitzer: - Fix DM's bio splitting changes that were made during v5.9. This restores splitting in terms of varied per-target ti->max_io_len rather than use block core's single stacked 'chunk_sectors' limit. - Like DM crypt, update DM integrity to not use crypto drivers that have CRYPTO_ALG_ALLOCATES_MEMORY set. - Fix DM writecache target's argument parsing and status display. - Remove needless BUG() from dm writecache's persistent_memory_claim() - Remove old gcc workaround in DM cache target's block_div() for ARM link errors now that gcc >= 4.9 is required. - Fix RCU locking in dm_blk_report_zones and dm_dax_zero_page_range. - Remove old, and now frowned upon, BUG_ON(in_interrupt()) in dm_table_event(). - Remove invalid sparse annotations from dm_prepare_ioctl() and dm_unprepare_ioctl(). * tag 'for-5.10/dm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm: dm: remove invalid sparse __acquires and __releases annotations dm: fix double RCU unlock in dm_dax_zero_page_range() error path dm: fix IO splitting dm writecache: remove BUG() and fail gracefully instead dm table: Remove BUG_ON(in_interrupt()) dm: fix bug with RCU locking in dm_blk_report_zones Revert "dm cache: fix arm link errors with inline" dm writecache: fix the maximum number of arguments dm writecache: advance the number of arguments when reporting max_age dm integrity: don't use drivers that have CRYPTO_ALG_ALLOCATES_MEMORY
2020-12-04mt76: mt7615: Fix fall-through warnings for ClangGustavo A. R. Silva
In preparation to enable -Wimplicit-fallthrough for Clang, fix a warning by replacing a /* fall through */ comment with the new pseudo-keyword macro fallthrough; instead of letting the code fall through to the next case. Notice that Clang doesn't recognize /* fall through */ comments as implicit fall-through markings. Link: https://github.com/KSPP/linux/issues/115 Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org> Signed-off-by: Felix Fietkau <nbd@nbd.name>