summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2020-10-06usermodehelper: reset umask to default before executing user processLinus Torvalds
Kernel threads intentionally do CLONE_FS in order to follow any changes that 'init' does to set up the root directory (or cwd). It is admittedly a bit odd, but it avoids the situation where 'init' does some extensive setup to initialize the system environment, and then we execute a usermode helper program, and it uses the original FS setup from boot time that may be very limited and incomplete. [ Both Al Viro and Eric Biederman point out that 'pivot_root()' will follow the root regardless, since it fixes up other users of root (see chroot_fs_refs() for details), but overmounting root and doing a chroot() would not. ] However, Vegard Nossum noticed that the CLONE_FS not only means that we follow the root and current working directories, it also means we share umask with whatever init changed it to. That wasn't intentional. Just reset umask to the original default (0022) before actually starting the usermode helper program. Reported-by: Vegard Nossum <vegard.nossum@oracle.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Acked-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-10-06splice: teach splice pipe reading about empty pipe buffersLinus Torvalds
Tetsuo Handa reports that splice() can return 0 before the real EOF, if the data in the splice source pipe is an empty pipe buffer. That empty pipe buffer case doesn't happen in any normal situation, but you can trigger it by doing a write to a pipe that fails due to a page fault. Tetsuo has a test-case to show the behavior: #define _GNU_SOURCE #include <sys/types.h> #include <sys/stat.h> #include <fcntl.h> #include <unistd.h> int main(int argc, char *argv[]) { const int fd = open("/tmp/testfile", O_WRONLY | O_CREAT, 0600); int pipe_fd[2] = { -1, -1 }; pipe(pipe_fd); write(pipe_fd[1], NULL, 4096); /* This splice() should wait unless interrupted. */ return !splice(pipe_fd[0], NULL, fd, NULL, 65536, 0); } which results in write(5, NULL, 4096) = -1 EFAULT (Bad address) splice(4, NULL, 3, NULL, 65536, 0) = 0 and this can confuse splice() users into believing they have hit EOF prematurely. The issue was introduced when the pipe write code started pre-allocating the pipe buffers before copying data from user space. This is modified verion of Tetsuo's original patch. Fixes: a194dfe6e6f6 ("pipe: Rearrange sequence in pipe_write() to preallocate slot") Link:https://lore.kernel.org/linux-fsdevel/20201005121339.4063-1-penguin-kernel@I-love.SAKURA.ne.jp/ Reported-by: Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp> Acked-by: Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-10-05Merge tag 'platform-drivers-x86-v5.9-2' of ↵Linus Torvalds
git://git.infradead.org/linux-platform-drivers-x86 Pull x86 platform driver fixes from Andy Shevchenko: "We have some fixes for Tablet Mode reporting in particular, that users are complaining a lot about. Summary: - Attempt #3 of enabling Tablet Mode reporting w/o regressions - Improve battery recognition code in ASUS WMI driver - Fix Kconfig dependency warning for Fujitsu and LG laptop drivers - Add fixes in Thinkpad ACPI driver for _BCL method and NVRAM polling - Fix power supply extended topology in Mellanox driver - Fix memory leak in OLPC EC driver - Avoid static struct device in Intel PMC core driver - Add support for the touchscreen found in MPMAN Converter9 2-in-1 - Update MAINTAINERS to reflect the real state of affairs" * tag 'platform-drivers-x86-v5.9-2' of git://git.infradead.org/linux-platform-drivers-x86: platform/x86: thinkpad_acpi: re-initialize ACPI buffer size when reuse MAINTAINERS: Add Mark Gross and Hans de Goede as x86 platform drivers maintainers platform/x86: intel-vbtn: Switch to an allow-list for SW_TABLET_MODE reporting platform/x86: intel-vbtn: Revert "Fix SW_TABLET_MODE always reporting 1 on the HP Pavilion 11 x360" platform/x86: intel_pmc_core: do not create a static struct device platform/x86: mlx-platform: Fix extended topology configuration for power supply units platform/x86: pcengines-apuv2: Fix typo on define of AMD_FCH_GPIO_REG_GPIO55_DEVSLP0 platform/x86: fix kconfig dependency warning for FUJITSU_LAPTOP platform/x86: fix kconfig dependency warning for LG_LAPTOP platform/x86: thinkpad_acpi: initialize tp_nvram_state variable platform/x86: intel-vbtn: Fix SW_TABLET_MODE always reporting 1 on the HP Pavilion 11 x360 platform/x86: asus-wmi: Add BATC battery name to the list of supported platform/x86: asus-nb-wmi: Revert "Do not load on Asus T100TA and T200TA" platform/x86: touchscreen_dmi: Add info for the MPMAN Converter9 2-in-1 Documentation: laptops: thinkpad-acpi: fix underline length build warning Platform: OLPC: Fix memleak in olpc_ec_probe
2020-10-05Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netLinus Torvalds
Pull networking fixes from David Miller: 1) Make sure SKB control block is in the proper state during IPSEC ESP-in-TCP encapsulation. From Sabrina Dubroca. 2) Various kinds of attributes were not being cloned properly when we build new xfrm_state objects from existing ones. Fix from Antony Antony. 3) Make sure to keep BTF sections, from Tony Ambardar. 4) TX DMA channels need proper locking in lantiq driver, from Hauke Mehrtens. 5) Honour route MTU during forwarding, always. From Maciej Żenczykowski. 6) Fix races in kTLS which can result in crashes, from Rohit Maheshwari. 7) Skip TCP DSACKs with rediculous sequence ranges, from Priyaranjan Jha. 8) Use correct address family in xfrm state lookups, from Herbert Xu. 9) A bridge FDB flush should not clear out user managed fdb entries with the ext_learn flag set, from Nikolay Aleksandrov. 10) Fix nested locking of netdev address lists, from Taehee Yoo. 11) Fix handling of 32-bit DATA_FIN values in mptcp, from Mat Martineau. 12) Fix r8169 data corruptions on RTL8402 chips, from Heiner Kallweit. 13) Don't free command entries in mlx5 while comp handler could still be running, from Eran Ben Elisha. 14) Error flow of request_irq() in mlx5 is busted, due to an off by one we try to free and IRQ never allocated. From Maor Gottlieb. 15) Fix leak when dumping netlink policies, from Johannes Berg. 16) Sendpage cannot be performed when a page is a slab page, or the page count is < 1. Some subsystems such as nvme were doing so. Create a "sendpage_ok()" helper and use it as needed, from Coly Li. 17) Don't leak request socket when using syncookes with mptcp, from Paolo Abeni. * git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (111 commits) net/core: check length before updating Ethertype in skb_mpls_{push,pop} net: mvneta: fix double free of txq->buf net_sched: check error pointer in tcf_dump_walker() net: team: fix memory leak in __team_options_register net: typhoon: Fix a typo Typoon --> Typhoon net: hinic: fix DEVLINK build errors net: stmmac: Modify configuration method of EEE timers tcp: fix syn cookied MPTCP request socket leak libceph: use sendpage_ok() in ceph_tcp_sendpage() scsi: libiscsi: use sendpage_ok() in iscsi_tcp_segment_map() drbd: code cleanup by using sendpage_ok() to check page for kernel_sendpage() tcp: use sendpage_ok() to detect misused .sendpage nvme-tcp: check page by sendpage_ok() before calling kernel_sendpage() net: add WARN_ONCE in kernel_sendpage() for improper zero-copy send net: introduce helper sendpage_ok() in include/linux/net.h net: usb: pegasus: Proper error handing when setting pegasus' MAC address net: core: document two new elements of struct net_device netlink: fix policy dump leak net/mlx5e: Fix race condition on nhe->n pointer in neigh update net/mlx5e: Fix VLAN create flow ...
2020-10-05platform/x86: thinkpad_acpi: re-initialize ACPI buffer size when reuseAaron Ma
Evaluating ACPI _BCL could fail, then ACPI buffer size will be set to 0. When reuse this ACPI buffer, AE_BUFFER_OVERFLOW will be triggered. Re-initialize buffer size will make ACPI evaluate successfully. Fixes: 46445b6b896fd ("thinkpad-acpi: fix handle locate for video and query of _BCL") Signed-off-by: Aaron Ma <aaron.ma@canonical.com> Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
2020-10-04Linux 5.9-rc8Linus Torvalds
2020-10-04net/core: check length before updating Ethertype in skb_mpls_{push,pop}Guillaume Nault
Openvswitch allows to drop a packet's Ethernet header, therefore skb_mpls_push() and skb_mpls_pop() might be called with ethernet=true and mac_len=0. In that case the pointer passed to skb_mod_eth_type() doesn't point to an Ethernet header and the new Ethertype is written at unexpected locations. Fix this by verifying that mac_len is big enough to contain an Ethernet header. Fixes: fa4e0f8855fc ("net/sched: fix corrupted L2 header with MPLS 'push' and 'pop' actions") Signed-off-by: Guillaume Nault <gnault@redhat.com> Acked-by: Davide Caratti <dcaratti@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-10-04net: mvneta: fix double free of txq->bufTom Rix
clang static analysis reports this problem: drivers/net/ethernet/marvell/mvneta.c:3465:2: warning: Attempt to free released memory kfree(txq->buf); ^~~~~~~~~~~~~~~ When mvneta_txq_sw_init() fails to alloc txq->tso_hdrs, it frees without poisoning txq->buf. The error is caught in the mvneta_setup_txqs() caller which handles the error by cleaning up all of the txqs with a call to mvneta_txq_sw_deinit which also frees txq->buf. Since mvneta_txq_sw_deinit is a general cleaner, all of the partial cleaning in mvneta_txq_sw_deinit()'s error handling is not needed. Fixes: 2adb719d74f6 ("net: mvneta: Implement software TSO") Signed-off-by: Tom Rix <trix@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-10-04net_sched: check error pointer in tcf_dump_walker()Cong Wang
Although we take RTNL on dump path, it is possible to skip RTNL on insertion path. So the following race condition is possible: rtnl_lock() // no rtnl lock mutex_lock(&idrinfo->lock); // insert ERR_PTR(-EBUSY) mutex_unlock(&idrinfo->lock); tc_dump_action() rtnl_unlock() So we have to skip those temporary -EBUSY entries on dump path too. Reported-and-tested-by: syzbot+b47bc4f247856fb4d9e1@syzkaller.appspotmail.com Fixes: 0fedc63fadf0 ("net_sched: commit action insertions together") Cc: Vlad Buslov <vladbu@mellanox.com> Cc: Jamal Hadi Salim <jhs@mojatatu.com> Cc: Jiri Pirko <jiri@resnulli.us> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-10-04net: team: fix memory leak in __team_options_registerAnant Thazhemadam
The variable "i" isn't initialized back correctly after the first loop under the label inst_rollback gets executed. The value of "i" is assigned to be option_count - 1, and the ensuing loop (under alloc_rollback) begins by initializing i--. Thus, the value of i when the loop begins execution will now become i = option_count - 2. Thus, when kfree(dst_opts[i]) is called in the second loop in this order, (i.e., inst_rollback followed by alloc_rollback), dst_optsp[option_count - 2] is the first element freed, and dst_opts[option_count - 1] does not get freed, and thus, a memory leak is caused. This memory leak can be fixed, by assigning i = option_count (instead of option_count - 1). Fixes: 80f7c6683fe0 ("team: add support for per-port options") Reported-by: syzbot+69b804437cfec30deac3@syzkaller.appspotmail.com Tested-by: syzbot+69b804437cfec30deac3@syzkaller.appspotmail.com Signed-off-by: Anant Thazhemadam <anant.thazhemadam@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-10-03net: typhoon: Fix a typo Typoon --> TyphoonChristophe JAILLET
s/Typoon/Typhoon/ Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-10-03net: hinic: fix DEVLINK build errorsRandy Dunlap
Fix many (lots deleted here) build errors in hinic by selecting NET_DEVLINK. ld: drivers/net/ethernet/huawei/hinic/hinic_hw_dev.o: in function `mgmt_watchdog_timeout_event_handler': hinic_hw_dev.c:(.text+0x30a): undefined reference to `devlink_health_report' ld: drivers/net/ethernet/huawei/hinic/hinic_devlink.o: in function `hinic_fw_reporter_dump': hinic_devlink.c:(.text+0x1c): undefined reference to `devlink_fmsg_u32_pair_put' ld: drivers/net/ethernet/huawei/hinic/hinic_devlink.o: in function `hinic_fw_reporter_dump': hinic_devlink.c:(.text+0x126): undefined reference to `devlink_fmsg_binary_pair_put' ld: drivers/net/ethernet/huawei/hinic/hinic_devlink.o: in function `hinic_hw_reporter_dump': hinic_devlink.c:(.text+0x1ba): undefined reference to `devlink_fmsg_string_pair_put' ld: hinic_devlink.c:(.text+0x227): undefined reference to `devlink_fmsg_u8_pair_put' ld: drivers/net/ethernet/huawei/hinic/hinic_devlink.o: in function `hinic_devlink_alloc': hinic_devlink.c:(.text+0xaee): undefined reference to `devlink_alloc' ld: drivers/net/ethernet/huawei/hinic/hinic_devlink.o: in function `hinic_devlink_free': hinic_devlink.c:(.text+0xb04): undefined reference to `devlink_free' ld: drivers/net/ethernet/huawei/hinic/hinic_devlink.o: in function `hinic_devlink_register': hinic_devlink.c:(.text+0xb26): undefined reference to `devlink_register' ld: drivers/net/ethernet/huawei/hinic/hinic_devlink.o: in function `hinic_devlink_unregister': hinic_devlink.c:(.text+0xb46): undefined reference to `devlink_unregister' ld: drivers/net/ethernet/huawei/hinic/hinic_devlink.o: in function `hinic_health_reporters_create': hinic_devlink.c:(.text+0xb75): undefined reference to `devlink_health_reporter_create' ld: hinic_devlink.c:(.text+0xb95): undefined reference to `devlink_health_reporter_create' ld: hinic_devlink.c:(.text+0xbac): undefined reference to `devlink_health_reporter_destroy' ld: drivers/net/ethernet/huawei/hinic/hinic_devlink.o: in function `hinic_health_reporters_destroy': Fixes: 51ba902a16e6 ("net-next/hinic: Initialize hw interface") Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Cc: Bin Luo <luobin9@huawei.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: Jakub Kicinski <kuba@kernel.org> Cc: Aviad Krawczyk <aviad.krawczyk@huawei.com> Cc: Zhao Chen <zhaochen6@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-10-03net: stmmac: Modify configuration method of EEE timersVineetha G. Jaya Kumaran
Ethtool manual stated that the tx-timer is the "the amount of time the device should stay in idle mode prior to asserting its Tx LPI". The previous implementation for "ethtool --set-eee tx-timer" sets the LPI TW timer duration which is not correct. Hence, this patch fixes the "ethtool --set-eee tx-timer" to configure the EEE LPI timer. The LPI TW Timer will be using the defined default value instead of "ethtool --set-eee tx-timer" which follows the EEE LS timer implementation. Changelog V2 *Not removing/modifying the eee_timer. *EEE LPI timer can be configured through ethtool and also the eee_timer module param. *EEE TW Timer will be configured with default value only, not able to be configured through ethtool or module param. This follows the implementation of the EEE LS Timer. Fixes: d765955d2ae0 ("stmmac: add the Energy Efficient Ethernet support") Signed-off-by: Vineetha G. Jaya Kumaran <vineetha.g.jaya.kumaran@intel.com> Signed-off-by: Voon Weifeng <weifeng.voon@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-10-03Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds
Pull kvm fixes from Paolo Bonzini: "Two bugfixes" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: KVM: VMX: update PFEC_MASK/PFEC_MATCH together with PF intercept KVM: arm64: Restore missing ISB on nVHE __tlb_switch_to_guest
2020-10-03Merge tag 'for-linus-5.9b-rc8-tag' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip Pull xen fix from Juergen Gross: "Fix a regression introduced in 5.9-rc3 which caused a system running as fully virtualized guest under Xen to crash when using legacy devices like a floppy" * tag 'for-linus-5.9b-rc8-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip: xen/events: don't use chip_data for legacy IRQs
2020-10-03Merge tag 'usb-5.9-rc8' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb Pull USB/PHY fixes from Greg KH: "Here are some small USB and PHY driver fixes for 5.9-rc8 The PHY driver fix resolves an issue found by Dan Carpenter for a memory leak. The USB fixes fall into two groups: - usb gadget fix from Bryan that is a fix for a previous security fix that showed up in in-the-wild testing - usb core driver matching bugfixes. This fixes a bug that has plagued the both the usbip driver and syzbot testing tools this -rc release cycle. All is now working properly so usbip connections will work, and syzbot can get back to fuzzing USB drivers properly. All have been in linux-next for a while with no reported issues" * tag 'usb-5.9-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb: usbcore/driver: Accommodate usbip usbcore/driver: Fix incorrect downcast usbcore/driver: Fix specific driver selection Revert "usbip: Implement a match function to fix usbip" USB: gadget: f_ncm: Fix NDP16 datagram validation phy: ti: am654: Fix a leak in serdes_am654_probe()
2020-10-03Merge branch 'i2c/for-current' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux Pull i2c fixes from Wolfram Sang: "Some more driver fixes for i2c" * 'i2c/for-current' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux: i2c: npcm7xx: Clear LAST bit after a failed transaction. i2c: cpm: Fix i2c_ram structure i2c: i801: Exclude device from suspend direct complete optimization
2020-10-03Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input Pull input fixes from Dmitry Torokhov: "A couple more driver quirks, now enabling newer trackpoints from Synaptics for real" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input: Input: i8042 - add nopnp quirk for Acer Aspire 5 A515 Input: trackpoint - enable Synaptics trackpoints
2020-10-03scripts/spelling.txt: fix malformed entryEric Biggers
One of the entries has three fields "mistake||correction||correction" rather than the expected two fields "mistake||correction". Fix it. Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Link: https://lkml.kernel.org/r/20200930234359.255295-1-ebiggers@kernel.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-10-03mm/page_alloc: handle a missing case for memalloc_nocma_{save/restore} APIsJoonsoo Kim
memalloc_nocma_{save/restore} APIs can be used to skip page allocation on CMA area, but, there is a missing case and the page on CMA area could be allocated even if APIs are used. This patch handles this case to fix the potential issue. For now, these APIs are used to prevent long-term pinning on the CMA page. When the long-term pinning is requested on the CMA page, it is migrated to the non-CMA page before pinning. This non-CMA page is allocated by using memalloc_nocma_{save/restore} APIs. If APIs doesn't work as intended, the CMA page is allocated and it is pinned for a long time. This long-term pin for the CMA page causes cma_alloc() failure and it could result in wrong behaviour on the device driver who uses the cma_alloc(). Missing case is an allocation from the pcplist. MIGRATE_MOVABLE pcplist could have the pages on CMA area so we need to skip it if ALLOC_CMA isn't specified. Fixes: 8510e69c8efe (mm/page_alloc: fix memalloc_nocma_{save/restore} APIs) Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Acked-by: Vlastimil Babka <vbabka@suse.cz> Acked-by: Michal Hocko <mhocko@suse.com> Cc: "Aneesh Kumar K . V" <aneesh.kumar@linux.ibm.com> Cc: Mel Gorman <mgorman@techsingularity.net> Link: https://lkml.kernel.org/r/1601429472-12599-1-git-send-email-iamjoonsoo.kim@lge.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-10-03mm, slub: restore initial kmem_cache flagsEric Farman
The routine that applies debug flags to the kmem_cache slabs inadvertantly prevents non-debug flags from being applied to those same objects. That is, if slub_debug=<flag>,<slab> is specified, non-debugged slabs will end up having flags of zero, and the slabs may be unusable. Fix this by including the input flags for non-matching slabs with the contents of slub_debug, so that the caches are created as expected alongside any debugging options that may be requested. With this, we can remove the check for a NULL slub_debug_string, since it's covered by the loop itself. Fixes: e17f1dfba37b ("mm, slub: extend slub_debug syntax for multiple blocks") Signed-off-by: Eric Farman <farman@linux.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Acked-by: Vlastimil Babka <vbabka@suse.cz> Cc: Kees Cook <keescook@chromium.org> Cc: Christoph Lameter <cl@linux.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: David Rientjes <rientjes@google.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Link: https://lkml.kernel.org/r/20200930161931.28575-1-farman@linux.ibm.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-10-03Merge tag 'kvmarm-fixes-5.9-3' of ↵Paolo Bonzini
git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into kvm-master KVM/arm64 fixes for 5.9, take #3 - Fix synchronization of VTTBR update on TLB invalidation for nVHE systems
2020-10-03KVM: VMX: update PFEC_MASK/PFEC_MATCH together with PF interceptPaolo Bonzini
The PFEC_MASK and PFEC_MATCH fields in the VMCS reverse the meaning of the #PF intercept bit in the exception bitmap when they do not match. This means that, if PFEC_MASK and/or PFEC_MATCH are set, the hypervisor can get a vmexit for #PF exceptions even when the corresponding bit is clear in the exception bitmap. This is unexpected and is promptly detected by a WARN_ON_ONCE. To fix it, reset PFEC_MASK and PFEC_MATCH when the #PF intercept is disabled (as is common with enable_ept && !allow_smaller_maxphyaddr). Reported-by: Qian Cai <cai@redhat.com>> Reported-by: Naresh Kamboju <naresh.kamboju@linaro.org> Tested-by: Naresh Kamboju <naresh.kamboju@linaro.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-10-02Merge tag 'mlx5-fixes-2020-09-30' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux From: Saeed Mahameed <saeedm@nvidia.com> ==================== This series introduces some fixes to mlx5 driver. v1->v2: - Patch #1 Don't return while mutex is held. (Dave) v2->v3: - Drop patch #1, will consider a better approach (Jakub) - use cpu_relax() instead of cond_resched() (Jakub) - while(i--) to reveres a loop (Jakub) - Drop old mellanox email sign-off and change the committer email (Jakub) Please pull and let me know if there is any problem. For -stable v4.15 ('net/mlx5e: Fix VLAN cleanup flow') ('net/mlx5e: Fix VLAN create flow') For -stable v4.16 ('net/mlx5: Fix request_irqs error flow') For -stable v5.4 ('net/mlx5e: Add resiliency in Striding RQ mode for packets larger than MTU') ('net/mlx5: Avoid possible free of command entry while timeout comp handler') For -stable v5.7 ('net/mlx5e: Fix return status when setting unsupported FEC mode') For -stable v5.8 ('net/mlx5e: Fix race condition on nhe->n pointer in neigh update') ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2020-10-02tcp: fix syn cookied MPTCP request socket leakPaolo Abeni
If a syn-cookies request socket don't pass MPTCP-level validation done in syn_recv_sock(), we need to release it immediately, or it will be leaked. Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/89 Fixes: 9466a1ccebbe ("mptcp: enable JOIN requests even if cookies are in use") Reported-and-tested-by: Geliang Tang <geliangtang@gmail.com> Reviewed-by: Matthieu Baerts <matthieu.baerts@tessares.net> Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-10-02Merge branch ↵David S. Miller
'Introduce-sendpage_ok-to-detect-misused-sendpage-in-network-related-drivers' Coly Li says: ==================== Introduce sendpage_ok() to detect misused sendpage in network related drivers As Sagi Grimberg suggested, the original fix is refind to a more common inline routine: static inline bool sendpage_ok(struct page *page) { return (!PageSlab(page) && page_count(page) >= 1); } If sendpage_ok() returns true, the checking page can be handled by the concrete zero-copy sendpage method in network layer. The v10 series has 7 patches, fixes a WARN_ONCE() usage from v9 series, - The 1st patch in this series introduces sendpage_ok() in header file include/linux/net.h. - The 2nd patch adds WARN_ONCE() for improper zero-copy send in kernel_sendpage(). - The 3rd patch fixes the page checking issue in nvme-over-tcp driver. - The 4th patch adds page_count check by using sendpage_ok() in do_tcp_sendpages() as Eric Dumazet suggested. - The 5th and 6th patches just replace existing open coded checks with the inline sendpage_ok() routine. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2020-10-02libceph: use sendpage_ok() in ceph_tcp_sendpage()Coly Li
In libceph, ceph_tcp_sendpage() does the following checks before handle the page by network layer's zero copy sendpage method, if (page_count(page) >= 1 && !PageSlab(page)) This check is exactly what sendpage_ok() does. This patch replace the open coded checks by sendpage_ok() as a code cleanup. Signed-off-by: Coly Li <colyli@suse.de> Acked-by: Jeff Layton <jlayton@kernel.org> Cc: Ilya Dryomov <idryomov@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-10-02scsi: libiscsi: use sendpage_ok() in iscsi_tcp_segment_map()Coly Li
In iscsci driver, iscsi_tcp_segment_map() uses the following code to check whether the page should or not be handled by sendpage: if (!recv && page_count(sg_page(sg)) >= 1 && !PageSlab(sg_page(sg))) The "page_count(sg_page(sg)) >= 1 && !PageSlab(sg_page(sg)" part is to make sure the page can be sent to network layer's zero copy path. This part is exactly what sendpage_ok() does. This patch uses use sendpage_ok() in iscsi_tcp_segment_map() to replace the original open coded checks. Signed-off-by: Coly Li <colyli@suse.de> Reviewed-by: Lee Duncan <lduncan@suse.com> Acked-by: Martin K. Petersen <martin.petersen@oracle.com> Cc: Vasily Averin <vvs@virtuozzo.com> Cc: Cong Wang <amwang@redhat.com> Cc: Mike Christie <michaelc@cs.wisc.edu> Cc: Chris Leech <cleech@redhat.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Hannes Reinecke <hare@suse.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-10-02drbd: code cleanup by using sendpage_ok() to check page for kernel_sendpage()Coly Li
In _drbd_send_page() a page is checked by following code before sending it by kernel_sendpage(), (page_count(page) < 1) || PageSlab(page) If the check is true, this page won't be send by kernel_sendpage() and handled by sock_no_sendpage(). This kind of check is exactly what macro sendpage_ok() does, which is introduced into include/linux/net.h to solve a similar send page issue in nvme-tcp code. This patch uses macro sendpage_ok() to replace the open coded checks to page type and refcount in _drbd_send_page(), as a code cleanup. Signed-off-by: Coly Li <colyli@suse.de> Cc: Philipp Reisner <philipp.reisner@linbit.com> Cc: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-10-02tcp: use sendpage_ok() to detect misused .sendpageColy Li
commit a10674bf2406 ("tcp: detecting the misuse of .sendpage for Slab objects") adds the checks for Slab pages, but the pages don't have page_count are still missing from the check. Network layer's sendpage method is not designed to send page_count 0 pages neither, therefore both PageSlab() and page_count() should be both checked for the sending page. This is exactly what sendpage_ok() does. This patch uses sendpage_ok() in do_tcp_sendpages() to detect misused .sendpage, to make the code more robust. Fixes: a10674bf2406 ("tcp: detecting the misuse of .sendpage for Slab objects") Suggested-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: Coly Li <colyli@suse.de> Cc: Vasily Averin <vvs@virtuozzo.com> Cc: David S. Miller <davem@davemloft.net> Cc: stable@vger.kernel.org Signed-off-by: David S. Miller <davem@davemloft.net>
2020-10-02nvme-tcp: check page by sendpage_ok() before calling kernel_sendpage()Coly Li
Currently nvme_tcp_try_send_data() doesn't use kernel_sendpage() to send slab pages. But for pages allocated by __get_free_pages() without __GFP_COMP, which also have refcount as 0, they are still sent by kernel_sendpage() to remote end, this is problematic. The new introduced helper sendpage_ok() checks both PageSlab tag and page_count counter, and returns true if the checking page is OK to be sent by kernel_sendpage(). This patch fixes the page checking issue of nvme_tcp_try_send_data() with sendpage_ok(). If sendpage_ok() returns true, send this page by kernel_sendpage(), otherwise use sock_no_sendpage to handle this page. Signed-off-by: Coly Li <colyli@suse.de> Cc: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Hannes Reinecke <hare@suse.de> Cc: Jan Kara <jack@suse.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: Mikhail Skorzhinskii <mskorzhinskiy@solarflare.com> Cc: Philipp Reisner <philipp.reisner@linbit.com> Cc: Sagi Grimberg <sagi@grimberg.me> Cc: Vlastimil Babka <vbabka@suse.com> Cc: stable@vger.kernel.org Signed-off-by: David S. Miller <davem@davemloft.net>
2020-10-02net: add WARN_ONCE in kernel_sendpage() for improper zero-copy sendColy Li
If a page sent into kernel_sendpage() is a slab page or it doesn't have ref_count, this page is improper to send by the zero copy sendpage() method. Otherwise such page might be unexpected released in network code path and causes impredictable panic due to kernel memory management data structure corruption. This path adds a WARN_ON() on the sending page before sends it into the concrete zero-copy sendpage() method, if the page is improper for the zero-copy sendpage() method, a warning message can be observed before the consequential unpredictable kernel panic. This patch does not change existing kernel_sendpage() behavior for the improper page zero-copy send, it just provides hint warning message for following potential panic due the kernel memory heap corruption. Signed-off-by: Coly Li <colyli@suse.de> Cc: Cong Wang <amwang@redhat.com> Cc: Christoph Hellwig <hch@lst.de> Cc: David S. Miller <davem@davemloft.net> Cc: Sridhar Samudrala <sri@us.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-10-02net: introduce helper sendpage_ok() in include/linux/net.hColy Li
The original problem was from nvme-over-tcp code, who mistakenly uses kernel_sendpage() to send pages allocated by __get_free_pages() without __GFP_COMP flag. Such pages don't have refcount (page_count is 0) on tail pages, sending them by kernel_sendpage() may trigger a kernel panic from a corrupted kernel heap, because these pages are incorrectly freed in network stack as page_count 0 pages. This patch introduces a helper sendpage_ok(), it returns true if the checking page, - is not slab page: PageSlab(page) is false. - has page refcount: page_count(page) is not zero All drivers who want to send page to remote end by kernel_sendpage() may use this helper to check whether the page is OK. If the helper does not return true, the driver should try other non sendpage method (e.g. sock_no_sendpage()) to handle the page. Signed-off-by: Coly Li <colyli@suse.de> Cc: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Hannes Reinecke <hare@suse.de> Cc: Jan Kara <jack@suse.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: Mikhail Skorzhinskii <mskorzhinskiy@solarflare.com> Cc: Philipp Reisner <philipp.reisner@linbit.com> Cc: Sagi Grimberg <sagi@grimberg.me> Cc: Vlastimil Babka <vbabka@suse.com> Cc: stable@vger.kernel.org Signed-off-by: David S. Miller <davem@davemloft.net>
2020-10-02net: usb: pegasus: Proper error handing when setting pegasus' MAC addressPetko Manolov
v2: If reading the MAC address from eeprom fail don't throw an error, use randomly generated MAC instead. Either way the adapter will soldier on and the return type of set_ethernet_addr() can be reverted to void. v1: Fix a bug in set_ethernet_addr() which does not take into account possible errors (or partial reads) returned by its helpers. This can potentially lead to writing random data into device's MAC address registers. Signed-off-by: Petko Manolov <petko.manolov@konsulko.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-10-02net: core: document two new elements of struct net_deviceMauro Carvalho Chehab
As warned by "make htmldocs", there are two new struct elements that aren't documented: ../include/linux/netdevice.h:2159: warning: Function parameter or member 'unlink_list' not described in 'net_device' ../include/linux/netdevice.h:2159: warning: Function parameter or member 'nested_level' not described in 'net_device' Fixes: 1fc70edb7d7b ("net: core: add nested_level variable in net_device") Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-10-02Merge tag 'pinctrl-v5.9-2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl Pull pin control fixes from Linus Walleij: "Some pin control fixes here. All of them are driver fixes, the Intel Cherryview being the most interesting one. - Fix a mux problem for I2C in the MVEBU driver. - Fix a really hairy inversion problem in the Intel Cherryview driver. - Fix the register for the sdc2_clk in the Qualcomm SM8250 driver. - Check the virtual GPIO boot failur in the Mediatek driver" * tag 'pinctrl-v5.9-2' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl: pinctrl: mediatek: check mtk_is_virt_gpio input parameter pinctrl: qcom: sm8250: correct sdc2_clk pinctrl: cherryview: Preserve CHV_PADCTRL1_INVRXTX_TXDATA flag on GPIOs pinctrl: mvebu: Fix i2c sda definition for 98DX3236
2020-10-02Merge tag 'pci-v5.9-fixes-2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci Pull PCI fixes from Bjorn Helgaas: - Fix rockchip regression in rockchip_pcie_valid_device() (Lorenzo Pieralisi) - Add Pali Rohár as aardvark PCI maintainer (Pali Rohár) * tag 'pci-v5.9-fixes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci: MAINTAINERS: Add Pali Rohár as aardvark PCI maintainer PCI: rockchip: Fix bus checks in rockchip_pcie_valid_device()
2020-10-02Merge tag 'scsi-fixes' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi Pull SCSI fixes from James Bottomley: "Two patches in driver frameworks. The iscsi one corrects a bug induced by a BPF change to network locking and the other is a regression we introduced" * tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: scsi: iscsi: iscsi_tcp: Avoid holding spinlock while calling getpeername() scsi: target: Fix lun lookup for TARGET_SCF_LOOKUP_LUN_FROM_TAG case
2020-10-02Merge tag 'io_uring-5.9-2020-10-02' of git://git.kernel.dk/linux-blockLinus Torvalds
Pull io_uring fixes from Jens Axboe: - fix for async buffered reads if read-ahead is fully disabled (Hao) - double poll match fix - ->show_fdinfo() potential ABBA deadlock complaint fix * tag 'io_uring-5.9-2020-10-02' of git://git.kernel.dk/linux-block: io_uring: fix async buffered reads when readahead is disabled io_uring: fix potential ABBA deadlock in ->show_fdinfo() io_uring: always delete double poll wait entry on match
2020-10-02Merge tag 'block-5.9-2020-10-02' of git://git.kernel.dk/linux-blockLinus Torvalds
Pull block fix from Jens Axboe: "Single fix for a ->commit_rqs failure case" * tag 'block-5.9-2020-10-02' of git://git.kernel.dk/linux-block: blk-mq: call commit_rqs while list empty but error happen
2020-10-02netlink: fix policy dump leakJohannes Berg
If userspace doesn't complete the policy dump, we leak the allocated state. Fix this. Fixes: d07dcf9aadd6 ("netlink: add infrastructure to expose policies to userspace") Signed-off-by: Johannes Berg <johannes.berg@intel.com> Reviewed-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-10-02net/mlx5e: Fix race condition on nhe->n pointer in neigh updateVlad Buslov
Current neigh update event handler implementation takes reference to neighbour structure, assigns it to nhe->n, tries to schedule workqueue task and releases the reference if task was already enqueued. This results potentially overwriting existing nhe->n pointer with another neighbour instance, which causes double release of the instance (once in neigh update handler that failed to enqueue to workqueue and another one in neigh update workqueue task that processes updated nhe->n pointer instead of original one): [ 3376.512806] ------------[ cut here ]------------ [ 3376.513534] refcount_t: underflow; use-after-free. [ 3376.521213] Modules linked in: act_skbedit act_mirred act_tunnel_key vxlan ip6_udp_tunnel udp_tunnel nfnetlink act_gact cls_flower sch_ingress openvswitch nsh nf_conncount nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 mlx5_ib mlx5_core mlxfw pci_hyperv_intf ptp pps_core nfsv3 nfs_acl rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache ib_isert iscsi_target_mod ib_srpt target_core_mod ib_srp rpcrdma rdma_ucm ib_umad ib_ipoib ib_iser rdma_cm ib_cm iw_cm rfkill ib_uverbs ib_core sunrpc kvm_intel kvm iTCO_wdt iTCO_vendor_support virtio_net irqbypass net_failover crc32_pclmul lpc_ich i2c_i801 failover pcspkr i2c_smbus mfd_core ghash_clmulni_intel sch_fq_codel drm i2c _core ip_tables crc32c_intel serio_raw [last unloaded: mlxfw] [ 3376.529468] CPU: 8 PID: 22756 Comm: kworker/u20:5 Not tainted 5.9.0-rc5+ #6 [ 3376.530399] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.org 04/01/2014 [ 3376.531975] Workqueue: mlx5e mlx5e_rep_neigh_update [mlx5_core] [ 3376.532820] RIP: 0010:refcount_warn_saturate+0xd8/0xe0 [ 3376.533589] Code: ff 48 c7 c7 e0 b8 27 82 c6 05 0b b6 09 01 01 e8 94 93 c1 ff 0f 0b c3 48 c7 c7 88 b8 27 82 c6 05 f7 b5 09 01 01 e8 7e 93 c1 ff <0f> 0b c3 0f 1f 44 00 00 8b 07 3d 00 00 00 c0 74 12 83 f8 01 74 13 [ 3376.536017] RSP: 0018:ffffc90002a97e30 EFLAGS: 00010286 [ 3376.536793] RAX: 0000000000000000 RBX: ffff8882de30d648 RCX: 0000000000000000 [ 3376.537718] RDX: ffff8882f5c28f20 RSI: ffff8882f5c18e40 RDI: ffff8882f5c18e40 [ 3376.538654] RBP: ffff8882cdf56c00 R08: 000000000000c580 R09: 0000000000001a4d [ 3376.539582] R10: 0000000000000731 R11: ffffc90002a97ccd R12: 0000000000000000 [ 3376.540519] R13: ffff8882de30d600 R14: ffff8882de30d640 R15: ffff88821e000900 [ 3376.541444] FS: 0000000000000000(0000) GS:ffff8882f5c00000(0000) knlGS:0000000000000000 [ 3376.542732] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 3376.543545] CR2: 0000556e5504b248 CR3: 00000002c6f10005 CR4: 0000000000770ee0 [ 3376.544483] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 3376.545419] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 3376.546344] PKRU: 55555554 [ 3376.546911] Call Trace: [ 3376.547479] mlx5e_rep_neigh_update.cold+0x33/0xe2 [mlx5_core] [ 3376.548299] process_one_work+0x1d8/0x390 [ 3376.548977] worker_thread+0x4d/0x3e0 [ 3376.549631] ? rescuer_thread+0x3e0/0x3e0 [ 3376.550295] kthread+0x118/0x130 [ 3376.550914] ? kthread_create_worker_on_cpu+0x70/0x70 [ 3376.551675] ret_from_fork+0x1f/0x30 [ 3376.552312] ---[ end trace d84e8f46d2a77eec ]--- Fix the bug by moving work_struct to dedicated dynamically-allocated structure. This enabled every event handler to work on its own private neighbour pointer and removes the need for handling the case when task is already enqueued. Fixes: 232c001398ae ("net/mlx5e: Add support to neighbour update flow") Signed-off-by: Vlad Buslov <vladbu@nvidia.com> Reviewed-by: Roi Dayan <roid@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2020-10-02net/mlx5e: Fix VLAN create flowAya Levin
When interface is attached while in promiscuous mode and with VLAN filtering turned off, both configurations are not respected and VLAN filtering is performed. There are 2 flows which add the any-vid rules during interface attach: VLAN creation table and set rx mode. Each is relaying on the other to add any-vid rules, eventually non of them does. Fix this by adding any-vid rules on VLAN creation regardless of promiscuous mode. Fixes: 9df30601c843 ("net/mlx5e: Restore vlan filter after seamless reset") Signed-off-by: Aya Levin <ayal@nvidia.com> Reviewed-by: Moshe Shemesh <moshe@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2020-10-02net/mlx5e: Fix VLAN cleanup flowAya Levin
Prior to this patch unloading an interface in promiscuous mode with RX VLAN filtering feature turned off - resulted in a warning. This is due to a wrong condition in the VLAN rules cleanup flow, which left the any-vid rules in the VLAN steering table. These rules prevented destroying the flow group and the flow table. The any-vid rules are removed in 2 flows, but none of them remove it in case both promiscuous is set and VLAN filtering is off. Fix the issue by changing the condition of the VLAN table cleanup flow to clean also in case of promiscuous mode. mlx5_core 0000:00:08.0: mlx5_destroy_flow_group:2123:(pid 28729): Flow group 20 wasn't destroyed, refcount > 1 mlx5_core 0000:00:08.0: mlx5_destroy_flow_group:2123:(pid 28729): Flow group 19 wasn't destroyed, refcount > 1 mlx5_core 0000:00:08.0: mlx5_destroy_flow_table:2112:(pid 28729): Flow table 262149 wasn't destroyed, refcount > 1 ... ... ------------[ cut here ]------------ FW pages counter is 11560 after reclaiming all pages WARNING: CPU: 1 PID: 28729 at drivers/net/ethernet/mellanox/mlx5/core/pagealloc.c:660 mlx5_reclaim_startup_pages+0x178/0x230 [mlx5_core] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.org 04/01/2014 Call Trace: mlx5_function_teardown+0x2f/0x90 [mlx5_core] mlx5_unload_one+0x71/0x110 [mlx5_core] remove_one+0x44/0x80 [mlx5_core] pci_device_remove+0x3e/0xc0 device_release_driver_internal+0xfb/0x1c0 device_release_driver+0x12/0x20 pci_stop_bus_device+0x68/0x90 pci_stop_and_remove_bus_device+0x12/0x20 hv_eject_device_work+0x6f/0x170 [pci_hyperv] ? __schedule+0x349/0x790 process_one_work+0x206/0x400 worker_thread+0x34/0x3f0 ? process_one_work+0x400/0x400 kthread+0x126/0x140 ? kthread_park+0x90/0x90 ret_from_fork+0x22/0x30 ---[ end trace 6283bde8d26170dc ]--- Fixes: 9df30601c843 ("net/mlx5e: Restore vlan filter after seamless reset") Signed-off-by: Aya Levin <ayal@nvidia.com> Reviewed-by: Moshe Shemesh <moshe@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2020-10-02net/mlx5e: Fix return status when setting unsupported FEC modeAya Levin
Verify the configured FEC mode is supported by at least a single link mode before applying the command. Otherwise fail the command and return "Operation not supported". Prior to this patch, the command was successful, yet it falsely set all link modes to FEC auto mode - like configuring FEC mode to auto. Auto mode is the default configuration if a link mode doesn't support the configured FEC mode. Fixes: b5ede32d3329 ("net/mlx5e: Add support for FEC modes based on 50G per lane links") Signed-off-by: Aya Levin <ayal@mellanox.com> Reviewed-by: Eran Ben Elisha <eranbe@nvidia.com> Reviewed-by: Moshe Shemesh <moshe@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2020-10-02net/mlx5e: Fix driver's declaration to support GRE offloadAya Levin
Declare GRE offload support with respect to the inner protocol. Add a list of supported inner protocols on which the driver can offload checksum and GSO. For other protocols, inform the stack to do the needed operations. There is no noticeable impact on GRE performance. Fixes: 2729984149e6 ("net/mlx5e: Support TSO and TX checksum offloads for GRE tunnels") Signed-off-by: Aya Levin <ayal@mellanox.com> Reviewed-by: Moshe Shemesh <moshe@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2020-10-02net/mlx5e: CT, Fix coverity issueMaor Dickman
The cited commit introduced the following coverity issue at function mlx5_tc_ct_rule_to_tuple_nat: - Memory - corruptions (OVERRUN) Overrunning array "tuple->ip.src_v6.in6_u.u6_addr32" of 4 4-byte elements at element index 7 (byte offset 31) using index "ip6_offset" (which evaluates to 7). In case of IPv6 destination address rewrite, ip6_offset values are between 4 to 7, which will cause memory overrun of array "tuple->ip.src_v6.in6_u.u6_addr32" to array "tuple->ip.dst_v6.in6_u.u6_addr32". Fixed by writing the value directly to array "tuple->ip.dst_v6.in6_u.u6_addr32" in case ip6_offset values are between 4 to 7. Fixes: bc562be9674b ("net/mlx5e: CT: Save ct entries tuples in hashtables") Signed-off-by: Maor Dickman <maord@nvidia.com> Reviewed-by: Roi Dayan <roid@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2020-10-02net/mlx5e: Add resiliency in Striding RQ mode for packets larger than MTUAya Levin
Prior to this fix, in Striding RQ mode the driver was vulnerable when receiving packets in the range (stride size - headroom, stride size]. Where stride size is calculated by mtu+headroom+tailroom aligned to the closest power of 2. Usually, this filtering is performed by the HW, except for a few cases: - Between 2 VFs over the same PF with different MTUs - On bluefield, when the host physical function sets a larger MTU than the ARM has configured on its representor and uplink representor. When the HW filtering is not present, packets that are larger than MTU might be harmful for the RQ's integrity, in the following impacts: 1) Overflow from one WQE to the next, causing a memory corruption that in most cases is unharmful: as the write happens to the headroom of next packet, which will be overwritten by build_skb(). In very rare cases, high stress/load, this is harmful. When the next WQE is not yet reposted and points to existing SKB head. 2) Each oversize packet overflows to the headroom of the next WQE. On the last WQE of the WQ, where addresses wrap-around, the address of the remainder headroom does not belong to the next WQE, but it is out of the memory region range. This results in a HW CQE error that moves the RQ into an error state. Solution: Add a page buffer at the end of each WQE to absorb the leak. Actually the maximal overflow size is headroom but since all memory units must be of the same size, we use page size to comply with UMR WQEs. The increase in memory consumption is of a single page per RQ. Initialize the mkey with all MTTs pointing to a default page. When the channels are activated, UMR WQEs will redirect the RX WQEs to the actual memory from the RQ's pool, while the overflow MTTs remain mapped to the default page. Fixes: 73281b78a37a ("net/mlx5e: Derive Striding RQ size from MTU") Signed-off-by: Aya Levin <ayal@mellanox.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2020-10-02net/mlx5e: Fix error path for RQ allocAya Levin
Increase granularity of the error path to avoid unneeded free/release. Fix the cleanup to be symmetric to the order of creation. Fixes: 0ddf543226ac ("xdp/mlx5: setup xdp_rxq_info") Fixes: 422d4c401edd ("net/mlx5e: RX, Split WQ objects for different RQ types") Signed-off-by: Aya Levin <ayal@mellanox.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2020-10-02net/mlx5: Fix request_irqs error flowMaor Gottlieb
Fix error flow handling in request_irqs which try to free irq that we failed to request. It fixes the below trace. WARNING: CPU: 1 PID: 7587 at kernel/irq/manage.c:1684 free_irq+0x4d/0x60 CPU: 1 PID: 7587 Comm: bash Tainted: G W OE 4.15.15-1.el7MELLANOXsmp-x86_64 #1 Hardware name: Advantech SKY-6200/SKY-6200, BIOS F2.00 08/06/2020 RIP: 0010:free_irq+0x4d/0x60 RSP: 0018:ffffc9000ef47af0 EFLAGS: 00010282 RAX: ffff88001476ae00 RBX: 0000000000000655 RCX: 0000000000000000 RDX: ffff88001476ae00 RSI: ffffc9000ef47ab8 RDI: ffff8800398bb478 RBP: ffff88001476a838 R08: ffff88001476ae00 R09: 000000000000156d R10: 0000000000000000 R11: 0000000000000004 R12: ffff88001476a838 R13: 0000000000000006 R14: ffff88001476a888 R15: 00000000ffffffe4 FS: 00007efeadd32740(0000) GS:ffff88047fc40000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007fc9cc010008 CR3: 00000001a2380004 CR4: 00000000007606e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 PKRU: 55555554 Call Trace: mlx5_irq_table_create+0x38d/0x400 [mlx5_core] ? atomic_notifier_chain_register+0x50/0x60 mlx5_load_one+0x7ee/0x1130 [mlx5_core] init_one+0x4c9/0x650 [mlx5_core] pci_device_probe+0xb8/0x120 driver_probe_device+0x2a1/0x470 ? driver_allows_async_probing+0x30/0x30 bus_for_each_drv+0x54/0x80 __device_attach+0xa3/0x100 pci_bus_add_device+0x4a/0x90 pci_iov_add_virtfn+0x2dc/0x2f0 pci_enable_sriov+0x32e/0x420 mlx5_core_sriov_configure+0x61/0x1b0 [mlx5_core] ? kstrtoll+0x22/0x70 num_vf_store+0x4b/0x70 [mlx5_core] kernfs_fop_write+0x102/0x180 __vfs_write+0x26/0x140 ? rcu_all_qs+0x5/0x80 ? _cond_resched+0x15/0x30 ? __sb_start_write+0x41/0x80 vfs_write+0xad/0x1a0 SyS_write+0x42/0x90 do_syscall_64+0x60/0x110 entry_SYSCALL_64_after_hwframe+0x3d/0xa2 Fixes: 24163189da48 ("net/mlx5: Separate IRQ request/free from EQ life cycle") Signed-off-by: Maor Gottlieb <maorg@nvidia.com> Reviewed-by: Eran Ben Elisha <eranbe@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>