summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2020-06-24dsa: Allow forwarding of redirected IGMP trafficDaniel Mack
The driver for Marvell switches puts all ports in IGMP snooping mode which results in all IGMP/MLD frames that ingress on the ports to be forwarded to the CPU only. The bridge code in the kernel can then interpret these frames and act upon them, for instance by updating the mdb in the switch to reflect multicast memberships of stations connected to the ports. However, the IGMP/MLD frames must then also be forwarded to other ports of the bridge so external IGMP queriers can track membership reports, and external multicast clients can receive query reports from foreign IGMP queriers. Currently, this is impossible as the EDSA tagger sets offload_fwd_mark on the skb when it unwraps the tagged frames, and that will make the switchdev layer prevent the skb from egressing on any other port of the same switch. To fix that, look at the To_CPU code in the DSA header and make forwarding of the frame possible for trapped IGMP packets. Introduce some #defines for the frame types to make the code a bit more comprehensive. This was tested on a Marvell 88E6352 variant. Signed-off-by: Daniel Mack <daniel@zonque.org> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Tested-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-06-24Merge branch 'net-bridge-fdb-activity-tracking'David S. Miller
Nikolay Aleksandrov says: ==================== net: bridge: fdb activity tracking This set adds extensions needed for EVPN multi-homing proper and efficient mac sync. User-space (e.g. FRR) needs to be able to track non-dynamic entry activity on per-fdb basis depending if a tracked fdb is currently peer active or locally active and needs to be able to add new peer active fdb (static + track + inactive) without refreshing it to get real activity tracking. Patch 02 adds a new NDA attribute - NDA_FDB_EXT_ATTRS to avoid future pollution of NDA attributes by bridge or vxlan. New bridge/vxlan specific fdb attributes are embedded in NDA_FDB_EXT_ATTRS, which is used in patch 03 to pass the new NFEA_ACTIVITY_NOTIFY attribute which controls if an fdb should be tracked and also reflects its current state when dumping. It is treated as a bitfield, current valid bits are: 1 - mark an entry for activity tracking 2 - mark an entry as inactive to avoid multiple notifications and reflect state properly Patch 04 adds the ability to avoid refreshing an entry when changing it via the NFEA_DONT_REFRESH flag. That allows user-space to mark a static entry for tracking and keep its real activity unchanged. The set has been extensively tested with FRR and those changes will be upstreamed if/after it gets accepted. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2020-06-24net: bridge: add a flag to avoid refreshing fdb when changing/addingNikolay Aleksandrov
When we modify or create a new fdb entry sometimes we want to avoid refreshing its activity in order to track it properly. One example is when a mac is received from EVPN multi-homing peer by FRR, which doesn't want to change local activity accounting. It makes it static and sets a flag to track its activity. Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-06-24net: bridge: add option to allow activity notifications for any fdb entriesNikolay Aleksandrov
This patch adds the ability to notify about activity of any entries (static, permanent or ext_learn). EVPN multihoming peers need it to properly and efficiently handle mac sync (peer active/locally active). We add a new NFEA_ACTIVITY_NOTIFY attribute which is used to dump the current activity state and to control if static entries should be monitored at all. We use 2 bits - one to activate fdb entry tracking (disabled by default) and the second to denote that an entry is inactive. We need the second bit in order to avoid multiple notifications of inactivity. Obviously this makes no difference for dynamic entries since at the time of inactivity they get deleted, while the tracked non-dynamic entries get the inactive bit set and get a notification. Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-06-24net: neighbor: add fdb extended attributeNikolay Aleksandrov
Add an attribute to NDA which will contain all future fdb-specific attributes in order to avoid polluting the NDA namespace with e.g. bridge or vxlan specific attributes. The attribute is called NDA_FDB_EXT_ATTRS and the structure would look like: [NDA_FDB_EXT_ATTRS] = { [NFEA_xxx] } Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-06-24net: bridge: fdb_add_entry takes ndm as argumentNikolay Aleksandrov
We can just pass ndm as an argument instead of its fields separately. Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-06-24openvswitch: take into account de-fragmentation/gso_size in ↵Lorenzo Bianconi
execute_check_pkt_len ovs connection tracking module performs de-fragmentation on incoming fragmented traffic. Take info account if traffic has been de-fragmented in execute_check_pkt_len action otherwise we will perform the wrong nested action considering the original packet size. This issue typically occurs if ovs-vswitchd adds a rule in the pipeline that requires connection tracking (e.g. OVN stateful ACLs) before execute_check_pkt_len action. Moreover take into account GSO fragment size for GSO packet in execute_check_pkt_len routine Fixes: 4d5ec89fc8d14 ("net: openvswitch: Add a new action check_pkt_len") Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-06-24Merge branch 'net-phy-mscc-PHC-and-timestamping-support'David S. Miller
Antoine Tenart says: ==================== net: phy: mscc: PHC and timestamping support This series aims at adding support for PHC and timestamping operations in the MSCC PHY driver, for the VSC858x and VSC8575. Those PHYs are capable of timestamping in 1-step and 2-step for both L2 and L4 traffic. As of this series, only IPv4 support was implemented when using L4 mode. This is because of an hardware limitation which prevents us for supporting both IPv4 and IPv6 at the same time. Implementing support for IPv6 should be quite easy (I do have the modifications needed for the hardware configuration) but I did not see a way to retrieve this information in hwtstamp(). What would you suggest? Those PHYs are distributed in hardware packages containing multiple times the PHY. The VSC8584 for example is composed of 4 PHYs. With hardware packages, parts of the logic is usually common and one of the PHY has to be used for some parts of the initialization. Following this logic, the 1588 blocks of those PHYs are shared between two PHYs and accessing the registers has to be done using the "base" PHY of the group. This is handled thanks to helpers in the PTP code (and locks). We also need the MDIO bus lock while performing a single read or write to the 1588 registers as the read/write are composed of multiple MDIO transactions (and we don't want other threads updating the page). To get and set the PHC time, a GPIO has to be used and changes are only retrieved or committed when on a rising edge. The same GPIO is shared by all PHYs, so the granularity of the lock protecting it has to be different from the ones protecting the 1588 registers (the VSC8584 PHY has 2 1588 blocks, and a single load/save pin). Patch 1 extends the recently added helpers to share information between PHYs of the same hardware package; to allow having part of the probe to be shared (in addition to the already supported init part). This will be used when adding support for PHC/TS to initialize locks. Patches 2 and 3 are mostly cosmetic. Patch 4 takes into account the 1588 block in the MACsec initialization, to allow having both the MACsec and 1588 blocks initialized on a running system. Patches 5 and 6 add support for PHC and timestamping operations in the MSCC driver. An initialization of the 1588 block (plus all the registers definition; and helpers) is added first; and then comes a patch to implement the PHC and timestamping API. Patches 7 and 8 add the required hardware description for device trees, to be able to use the load/save GPIO pin on the PCB120 board. To use this on a PCB120 board, two other series are needed and have already been sent upstream (one is merged). There are no dependency between all those series. Since v3: - Fixed a SKB leak. - Removed ts_lock from the init, as TS and PHC operations aren't registered at this time. - Refectored the ts_base_addr/phy intialization. - Cleaned up the ingr/egr latencies definitons. - Fixed a comment about locking and the shared GPIO. - A few cosmetic fixes. Since v2: - Removed explicit inlines from .c files. - Fixed three warnings. Since v1: - Removed checks in rxtstamp/txtstamp as skb cannot be NULL here. - Reworked get_ptp_header_rx/get_ptp_header. - Reworked the locking logic between the PHC and timestamping operations. - Fixed a compilation issue on x86 reported by Jakub. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2020-06-24MIPS: dts: ocelot: describe the load/save GPIOQuentin Schulz
This patch adds a description of the load/save GPIN pin, used in the VSC8584 PHY for timestamping operations. The related pinctrl description is also added. Signed-off-by: Quentin Schulz <quentin.schulz@bootlin.com> Signed-off-by: Antoine Tenart <antoine.tenart@bootlin.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-06-24dt-bindings: net: phy: vsc8531: document the load/save GPIOAntoine Tenart
A new optional property can be used to reference the load/save GPIO, used for PTP hardware clock (PHC) operations. This patch documents it in the binding documentation. Signed-off-by: Antoine Tenart <antoine.tenart@bootlin.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-06-24net: phy: mscc: timestamping and PHC supportAntoine Tenart
This patch adds support for PHC and timestamping operations for the MSCC PHY. PTP 1-step and 2-step modes are supported, over Ethernet and UDP. To get and set the PHC time, a GPIO has to be used and changes are only retrieved or committed when on a rising edge. The same GPIO is shared by all PHYs, so the granularity of the lock protecting it has to be different from the ones protecting the 1588 registers (the VSC8584 PHY has 2 1588 blocks, and a single load/save pin). Co-developed-by: Quentin Schulz <quentin.schulz@bootlin.com> Signed-off-by: Quentin Schulz <quentin.schulz@bootlin.com> Signed-off-by: Antoine Tenart <antoine.tenart@bootlin.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-06-24net: phy: mscc: 1588 block initializationQuentin Schulz
This patch adds the first parts of the 1588 support in the MSCC PHY, with registers definition and the 1588 block initialization. Those PHYs are distributed in hardware packages containing multiple times the PHY. The VSC8584 for example is composed of 4 PHYs. With hardware packages, parts of the logic is usually common and one of the PHY has to be used for some parts of the initialization. Following this logic, the 1588 blocks of those PHYs are shared between two PHYs and accessing the registers has to be done using the "base" PHY of the group. This is handled thanks to helpers in the PTP code (and locks). We also need the MDIO bus lock while performing a single read or write to the 1588 registers as the read/write are composed of multiple MDIO transactions (and we don't want other threads updating the page). Co-developed-by: Antoine Tenart <antoine.tenart@bootlin.com> Signed-off-by: Quentin Schulz <quentin.schulz@bootlin.com> Signed-off-by: Antoine Tenart <antoine.tenart@bootlin.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-06-24net: phy: mscc: take into account the 1588 block in MACsec initAntoine Tenart
This patch takes in account the use of the 1588 block in the MACsec initialization, as a conditional configuration has to be done (when the 1588 block is used). Signed-off-by: Antoine Tenart <antoine.tenart@bootlin.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-06-24net: phy: mscc: remove the TR CLK disable magic valueQuentin Schulz
This patch adds a define for the 0x8000 magic value used to perform enable/disable actions on the "token ring clock". The patch is only cosmetic. Signed-off-by: Quentin Schulz <quentin.schulz@bootlin.com> Signed-off-by: Antoine Tenart <antoine.tenart@bootlin.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-06-24net: phy: mscc: fix copyright and author information in MACsecAntoine Tenart
All headers in the MSCC PHY driver have been copied and pasted from the original mscc.c file. However the information is not necessarily correct, as in the MACsec support. Fix this. Signed-off-by: Antoine Tenart <antoine.tenart@bootlin.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-06-24net: phy: add support for a common probe between shared PHYsAntoine Tenart
Shared PHYs (PHYs in the same hardware package) may have shared registers and their drivers would usually need to share information. There is currently a way to have a shared (part of the) init, by using phy_package_init_once(). This patch extends the logic to share parts of the probe to allow sharing the initialization of locks or resources retrieval. Signed-off-by: Antoine Tenart <antoine.tenart@bootlin.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-06-24Merge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhostLinus Torvalds
Pull virtio fixes from Michael Tsirkin: "Fixes all over the place. This includes a couple of tests that I would normally defer, but since they have already been helpful in catching some bugs, don't build for any users at all, and having them upstream makes life easier for everyone, I think it's ok even at this late stage" * tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost: tools/virtio: Use tools/include/list.h instead of stubs tools/virtio: Reset index in virtio_test --reset. tools/virtio: Extract virtqueue initialization in vq_reset tools/virtio: Use __vring_new_virtqueue in virtio_test.c tools/virtio: Add --reset tools/virtio: Add --batch=random option tools/virtio: Add --batch option virtio-mem: add memory via add_memory_driver_managed() virtio-mem: silence a static checker warning vhost_vdpa: Fix potential underflow in vhost_vdpa_mmap() vdpa: fix typos in the comments for __vdpa_alloc_device()
2020-06-24Merge tag 'for-linus-2020-06-24' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux Pull thread fix from Christian Brauner: "This fixes a regression introduced with 303cc571d107 ("nsproxy: attach to namespaces via pidfds"). The LTP testsuite reported a regression where users would now see EBADF returned instead of EINVAL when an fd was passed that referred to an open file but the file was not a namespace file. Fix this by continuing to report EINVAL and add a regression test" * tag 'for-linus-2020-06-24' of git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux: tests: test for setns() EINVAL regression nsproxy: restore EINVAL for non-namespace file descriptor
2020-06-24drm/fb-helper: Fix vt restoreDaniel Vetter
In the past we had a pile of hacks to orchestrate access between fbdev emulation and native kms clients. We've tried to streamline this, by always preferring the kms side above fbdev calls when a drm master exists, because drm master controls access to the display resources. Unfortunately this breaks existing userspace, specifically Xorg. When exiting Xorg first restores the console to text mode using the KDSET ioctl on the vt. This does nothing, because a drm master is still around. Then it drops the drm master status, which again does nothing, because logind is keeping additional drm fd open to be able to orchestrate vt switches. In the past this is the point where fbdev was restored, as part of the ->lastclose hook on the drm side. Now to fix this regression we don't want to go back to letting fbdev restore things whenever it feels like, or to the pile of hacks we've had before. Instead try and go with a minimal exception to make the KDSET case work again, and nothing else. This means that if userspace does a KDSET call when switching between graphical compositors, there will be some flickering with fbcon showing up for a bit. But a) that's not a regression and b) userspace can fix it by improving the vt switching dance - logind should have all the information it needs. While pondering all this I'm also wondering wheter we should have a SWITCH_MASTER ioctl to allow race-free master status handover. But that's for another day. v2: Somehow forgot to cc all the fbdev people. v3: Fix typo Alex spotted. Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=208179 Cc: shlomo@fastmail.com Reported-and-Tested-by: shlomo@fastmail.com Cc: Michel Dänzer <michel@daenzer.net> Fixes: 64914da24ea9 ("drm/fbdev-helper: don't force restores") Cc: Noralf Trønnes <noralf@tronnes.org> Cc: Thomas Zimmermann <tzimmermann@suse.de> Cc: Daniel Vetter <daniel.vetter@intel.com> Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Cc: Maxime Ripard <mripard@kernel.org> Cc: David Airlie <airlied@linux.ie> Cc: Daniel Vetter <daniel@ffwll.ch> Cc: dri-devel@lists.freedesktop.org Cc: <stable@vger.kernel.org> # v5.7+ Cc: Bartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Nathan Chancellor <natechancellor@gmail.com> Cc: Qiujun Huang <hqjagain@gmail.com> Cc: Peter Rosin <peda@axentia.se> Cc: linux-fbdev@vger.kernel.org Signed-off-by: Daniel Vetter <daniel.vetter@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200624092910.3280448-1-daniel.vetter@ffwll.ch
2020-06-24IB/hfi1: Add atomic triggered sleep/wakeupMike Marciniszyn
When running iperf in a two host configuration the following trace can occur: [ 319.728730] NETDEV WATCHDOG: ib0 (hfi1): transmit queue 0 timed out The issue happens because the current implementation relies on the netif txq being stopped to control the flushing of the tx list. There are two resources that the transmit logic can wait on and stop the txq: - SDMA descriptors - Ring space to hold completions The ring space is tested on the sending side and relieved when the ring is consumed in the napi tx reaping. Unfortunately, that reaping can run conncurrently with the workqueue flushing of the txlist. If the txq is started just before the workitem executes, the txlist will never be flushed, leading to the txq being stuck. Fix by: - Adding sleep/wakeup wrappers * Use an atomic to control the call to the netif routines inside the wrappers - Use another atomic to record ring space exhaustion * Only wakeup when the a ring space exhaustion has happened and it relieved Add additional wrappers to clarify the ring space resource handling. Fixes: d99dc602e2a5 ("IB/hfi1: Add functions to transmit datagram ipoib packets") Link: https://lore.kernel.org/r/20200623204327.108092.4024.stgit@awfm-01.aw.intel.com Reviewed-by: Kaike Wan <kaike.wan@intel.com> Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-06-24IB/hfi1: Correct -EBUSY handling in tx codeMike Marciniszyn
The current code mishandles -EBUSY in two ways: - The flow change doesn't test the return from the flush and runs on to process the current packet racing with the wakeup processing - The -EBUSY handling for a single packet inserts the tx into the txlist after the submit call, racing with the same wakeup processing Fix the first by dropping the skb and returning NETDEV_TX_OK. Fix the second by insuring the the list entry within the txreq is inited when allocated. This enables the sleep routine to detect that the txreq has used the non-list api and queue the packet to the txlist. Both flaws can lead to having the flushing thread executing in causing two threads to manipulate the txlist. Fixes: d99dc602e2a5 ("IB/hfi1: Add functions to transmit datagram ipoib packets") Link: https://lore.kernel.org/r/20200623204321.108092.83898.stgit@awfm-01.aw.intel.com Reviewed-by: Kaike Wan <kaike.wan@intel.com> Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-06-24IB/hfi1: Fix module use count flaw due to leftover module put callsDennis Dalessandro
When the try_module_get calls were removed from opening and closing of the i2c debugfs file, the corresponding module_put calls were missed. This results in an inaccurate module use count that requires a power cycle to fix. Fixes: 09fbca8e6240 ("IB/hfi1: No need to use try_module_get for debugfs") Link: https://lore.kernel.org/r/20200623203230.106975.76240.stgit@awfm-01.aw.intel.com Cc: <stable@vger.kernel.org> Reviewed-by: Kaike Wan <kaike.wan@intel.com> Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-06-24IB/hfi1: Restore kfree in dummy_netdev cleanupDennis Dalessandro
We need to do some rework on the dummy netdev. Calling the free_netdev() would normally make sense, and that will be addressed in an upcoming patch. For now just revert the behavior to what it was before keeping the unused variable removal part of the patch. The dd->dumm_netdev is mainly used for packet receiving through alloc_netdev_mqs() for typical net devices. A a result, it should be freed with kfree instead of free_netdev() that leads to a crash when unloading the hfi1 module: BUG: kernel NULL pointer dereference, address: 0000000000000000 #PF: supervisor read access in kernel mode #PF: error_code(0x0000) - not-present page PGD 8000000855b54067 P4D 8000000855b54067 PUD 84a4f5067 PMD 0 Oops: 0000 [#1] SMP PTI CPU: 73 PID: 10299 Comm: modprobe Not tainted 5.6.0-rc5+ #1 Hardware name: Intel Corporation S2600WT2R/S2600WT2R, BIOS SE5C610.86B.01.01.0016.033120161139 03/31/2016 RIP: 0010:__hw_addr_flush+0x12/0x80 Code: 40 00 48 83 c4 08 4c 89 e7 5b 5d 41 5c e9 76 77 18 00 66 0f 1f 44 00 00 0f 1f 44 00 00 41 54 49 89 fc 55 53 48 8b 1f 48 39 df <48> 8b 2b 75 08 eb 4a 48 89 eb 48 89 c5 48 89 df e8 99 bf d0 ff 84 RSP: 0018:ffffb40e08783db8 EFLAGS: 00010282 RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000002 RDX: ffffb40e00000000 RSI: 0000000000000246 RDI: ffff88ab13662298 RBP: ffff88ab13662000 R08: 0000000000001549 R09: 0000000000001549 R10: 0000000000000001 R11: 0000000000aaaaaa R12: ffff88ab13662298 R13: ffff88ab1b259e20 R14: ffff88ab1b259e42 R15: 0000000000000000 FS: 00007fb39b534740(0000) GS:ffff88b31f940000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000000 CR3: 000000084d3ea004 CR4: 00000000003606e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: dev_addr_flush+0x15/0x30 free_netdev+0x7e/0x130 hfi1_netdev_free+0x59/0x70 [hfi1] remove_one+0x65/0x110 [hfi1] pci_device_remove+0x3b/0xc0 device_release_driver_internal+0xec/0x1b0 driver_detach+0x46/0x90 bus_remove_driver+0x58/0xd0 pci_unregister_driver+0x26/0xa0 hfi1_mod_cleanup+0xc/0xd54 [hfi1] __x64_sys_delete_module+0x16c/0x260 ? exit_to_usermode_loop+0xa4/0xc0 do_syscall_64+0x5b/0x200 entry_SYSCALL_64_after_hwframe+0x44/0xa9 Fixes: 193ba03141bb ("IB/hfi1: Use free_netdev() in hfi1_netdev_free()") Link: https://lore.kernel.org/r/20200623203224.106975.16926.stgit@awfm-01.aw.intel.com Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Kaike Wan <kaike.wan@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-06-24bpf: Add SO_KEEPALIVE and related options to bpf_setsockoptDmitry Yakunin
This patch adds support of SO_KEEPALIVE flag and TCP related options to bpf_setsockopt() routine. This is helpful if we want to enable or tune TCP keepalive for applications which don't do it in the userspace code. v3: - update kernel-doc in uapi (Nikita Vetoshkin <nekto0n@yandex-team.ru>) v4: - update kernel-doc in tools too (Alexei Starovoitov) - add test to selftests (Alexei Starovoitov) Signed-off-by: Dmitry Yakunin <zeil@yandex-team.ru> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Martin KaFai Lau <kafai@fb.com> Link: https://lore.kernel.org/bpf/20200620153052.9439-3-zeil@yandex-team.ru
2020-06-24tcp: Expose tcp_sock_set_keepidle_lockedDmitry Yakunin
This is preparation for usage in bpf_setsockopt. v2: - remove redundant EXPORT_SYMBOL (Alexei Starovoitov) Signed-off-by: Dmitry Yakunin <zeil@yandex-team.ru> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20200620153052.9439-2-zeil@yandex-team.ru
2020-06-24sock: Move sock_valbool_flag to headerDmitry Yakunin
This is preparation for usage in bpf_setsockopt. Signed-off-by: Dmitry Yakunin <zeil@yandex-team.ru> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Martin KaFai Lau <kafai@fb.com> Link: https://lore.kernel.org/bpf/20200620153052.9439-1-zeil@yandex-team.ru
2020-06-24selftests/bpf: Workaround for get_stack_rawtp test.Alexei Starovoitov
./test_progs-no_alu32 -t get_stack_raw_tp fails due to: 52: (85) call bpf_get_stack#67 53: (bf) r8 = r0 54: (bf) r1 = r8 55: (67) r1 <<= 32 56: (c7) r1 s>>= 32 ; if (usize < 0) 57: (c5) if r1 s< 0x0 goto pc+26 R0=inv(id=0,smax_value=800) R1_w=inv(id=0,umax_value=800,var_off=(0x0; 0x3ff)) R6=ctx(id=0,off=0,imm=0) R7=map_value(id=0,off=0,ks=4,vs=1600,imm=0) R8_w=inv(id=0,smax_value=800) R9=inv800 ; ksize = bpf_get_stack(ctx, raw_data + usize, max_len - usize, 0); 58: (1f) r9 -= r8 ; ksize = bpf_get_stack(ctx, raw_data + usize, max_len - usize, 0); 59: (bf) r2 = r7 60: (0f) r2 += r1 regs=1 stack=0 before 52: (85) call bpf_get_stack#67 ; ksize = bpf_get_stack(ctx, raw_data + usize, max_len - usize, 0); 61: (bf) r1 = r6 62: (bf) r3 = r9 63: (b7) r4 = 0 64: (85) call bpf_get_stack#67 R0=inv(id=0,smax_value=800) R1_w=ctx(id=0,off=0,imm=0) R2_w=map_value(id=0,off=0,ks=4,vs=1600,umax_value=800,var_off=(0x0; 0x3ff),s32_max_value=1023,u32_max_value=1023) R3_w=inv(id=0,umax_value=9223372036854776608) R3 unbounded memory access, use 'var &= const' or 'if (var < const)' In the C code: usize = bpf_get_stack(ctx, raw_data, max_len, BPF_F_USER_STACK); if (usize < 0) return 0; ksize = bpf_get_stack(ctx, raw_data + usize, max_len - usize, 0); if (ksize < 0) return 0; We used to have problem with pointer arith in R2. Now it's a problem with two integers in R3. 'if (usize < 0)' is comparing R1 and makes it [0,800], but R8 stays [-inf,800]. Both registers represent the same 'usize' variable. Then R9 -= R8 is doing 800 - [-inf, 800] so the result of "max_len - usize" looks unbounded to the verifier while it's obvious in C code that "max_len - usize" should be [0, 800]. To workaround the problem convert ksize and usize variables from int to long. Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2020-06-24hwmon: (max6697) Make sure the OVERT mask is set correctlyChu Lin
Per the datasheet for max6697, OVERT mask and ALERT mask are different. For example, the 7th bit of OVERT is the local channel but for alert mask, the 6th bit is the local channel. Therefore, we can't apply the same mask for both registers. In addition to that, the max6697 driver is supposed to be compatibale with different models. I manually went over all the listed chips and made sure all chip types have the same layout. Testing; mask value of 0x9 should map to 0x44 for ALERT and 0x84 for OVERT. I used iotool to read the reg value back to verify. I only tested this change on max6581. Reference: https://datasheets.maximintegrated.com/en/ds/MAX6581.pdf https://datasheets.maximintegrated.com/en/ds/MAX6697.pdf https://datasheets.maximintegrated.com/en/ds/MAX6699.pdf Signed-off-by: Chu Lin <linchuyuan@google.com> Fixes: 5372d2d71c46e ("hwmon: Driver for Maxim MAX6697 and compatibles") Signed-off-by: Guenter Roeck <linux@roeck-us.net>
2020-06-24nvme-multipath: fix bogus request queue reference putSagi Grimberg
The mpath disk node takes a reference on the request mpath request queue when adding live path to the mpath gendisk. However if we connected to an inaccessible path device_add_disk is not called, so if we disconnect and remove the mpath gendisk we endup putting an reference on the request queue that was never taken [1]. Fix that to check if we ever added a live path (using NVME_NS_HEAD_HAS_DISK flag) and if not, clear the disk->queue reference. [1]: ------------[ cut here ]------------ refcount_t: underflow; use-after-free. WARNING: CPU: 1 PID: 1372 at lib/refcount.c:28 refcount_warn_saturate+0xa6/0xf0 CPU: 1 PID: 1372 Comm: nvme Tainted: G O 5.7.0-rc2+ #3 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.13.0-1ubuntu1 04/01/2014 RIP: 0010:refcount_warn_saturate+0xa6/0xf0 RSP: 0018:ffffb29e8053bdc0 EFLAGS: 00010282 RAX: 0000000000000000 RBX: ffff8b7a2f4fc060 RCX: 0000000000000007 RDX: 0000000000000007 RSI: 0000000000000092 RDI: ffff8b7a3ec99980 RBP: ffff8b7a2f4fc000 R08: 00000000000002e1 R09: 0000000000000004 R10: 0000000000000000 R11: 0000000000000001 R12: 0000000000000000 R13: fffffffffffffff2 R14: ffffb29e8053bf08 R15: ffff8b7a320e2da0 FS: 00007f135d4ca800(0000) GS:ffff8b7a3ec80000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00005651178c0c30 CR3: 000000003b650005 CR4: 0000000000360ee0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: disk_release+0xa2/0xc0 device_release+0x28/0x80 kobject_put+0xa5/0x1b0 nvme_put_ns_head+0x26/0x70 [nvme_core] nvme_put_ns+0x30/0x60 [nvme_core] nvme_remove_namespaces+0x9b/0xe0 [nvme_core] nvme_do_delete_ctrl+0x43/0x5c [nvme_core] nvme_sysfs_delete.cold+0x8/0xd [nvme_core] kernfs_fop_write+0xc1/0x1a0 vfs_write+0xb6/0x1a0 ksys_write+0x5f/0xe0 do_syscall_64+0x52/0x1a0 entry_SYSCALL_64_after_hwframe+0x44/0xa9 Reported-by: Anton Eidelman <anton@lightbitslabs.com> Tested-by: Anton Eidelman <anton@lightbitslabs.com> Signed-off-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Christoph Hellwig <hch@lst.de>
2020-06-24nvme-multipath: fix deadlock due to head->lockAnton Eidelman
In the following scenario scan_work and ana_work will deadlock: When scan_work calls nvme_mpath_add_disk() this holds ana_lock and invokes nvme_parse_ana_log(), which may issue IO in device_add_disk() and hang waiting for an accessible path. While nvme_mpath_set_live() only called when nvme_state_is_live(), a transition may cause NVME_SC_ANA_TRANSITION and requeue the IO. Since nvme_mpath_set_live() holds ns->head->lock, an ana_work on ANY ctrl will not be able to complete nvme_mpath_set_live() on the same ns->head, which is required in order to update the new accessible path and remove NVME_NS_ANA_PENDING.. Therefore IO never completes: deadlock [1]. Fix: Move device_add_disk out of the head->lock and protect it with an atomic test_and_set for a new NVME_NS_HEAD_HAS_DISK bit. [1]: kernel: INFO: task kworker/u8:2:160 blocked for more than 120 seconds. kernel: Tainted: G OE 5.3.5-050305-generic #201910071830 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. kernel: kworker/u8:2 D 0 160 2 0x80004000 kernel: Workqueue: nvme-wq nvme_ana_work [nvme_core] kernel: Call Trace: kernel: __schedule+0x2b9/0x6c0 kernel: schedule+0x42/0xb0 kernel: schedule_preempt_disabled+0xe/0x10 kernel: __mutex_lock.isra.0+0x182/0x4f0 kernel: __mutex_lock_slowpath+0x13/0x20 kernel: mutex_lock+0x2e/0x40 kernel: nvme_update_ns_ana_state+0x22/0x60 [nvme_core] kernel: nvme_update_ana_state+0xca/0xe0 [nvme_core] kernel: nvme_parse_ana_log+0xa1/0x180 [nvme_core] kernel: nvme_read_ana_log+0x76/0x100 [nvme_core] kernel: nvme_ana_work+0x15/0x20 [nvme_core] kernel: process_one_work+0x1db/0x380 kernel: worker_thread+0x4d/0x400 kernel: kthread+0x104/0x140 kernel: ret_from_fork+0x35/0x40 kernel: INFO: task kworker/u8:4:439 blocked for more than 120 seconds. kernel: Tainted: G OE 5.3.5-050305-generic #201910071830 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. kernel: kworker/u8:4 D 0 439 2 0x80004000 kernel: Workqueue: nvme-wq nvme_scan_work [nvme_core] kernel: Call Trace: kernel: __schedule+0x2b9/0x6c0 kernel: schedule+0x42/0xb0 kernel: io_schedule+0x16/0x40 kernel: do_read_cache_page+0x438/0x830 kernel: read_cache_page+0x12/0x20 kernel: read_dev_sector+0x27/0xc0 kernel: read_lba+0xc1/0x220 kernel: efi_partition+0x1e6/0x708 kernel: check_partition+0x154/0x244 kernel: rescan_partitions+0xae/0x280 kernel: __blkdev_get+0x40f/0x560 kernel: blkdev_get+0x3d/0x140 kernel: __device_add_disk+0x388/0x480 kernel: device_add_disk+0x13/0x20 kernel: nvme_mpath_set_live+0x119/0x140 [nvme_core] kernel: nvme_update_ns_ana_state+0x5c/0x60 [nvme_core] kernel: nvme_mpath_add_disk+0xbe/0x100 [nvme_core] kernel: nvme_validate_ns+0x396/0x940 [nvme_core] kernel: nvme_scan_work+0x256/0x390 [nvme_core] kernel: process_one_work+0x1db/0x380 kernel: worker_thread+0x4d/0x400 kernel: kthread+0x104/0x140 kernel: ret_from_fork+0x35/0x40 Fixes: 0d0b660f214d ("nvme: add ANA support") Signed-off-by: Anton Eidelman <anton@lightbitslabs.com> Signed-off-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Christoph Hellwig <hch@lst.de>
2020-06-24nvme: don't protect ns mutation with ns->head->lockSagi Grimberg
Right now ns->head->lock is protecting namespace mutation which is wrong and unneeded. Move it to only protect against head mutations. While we're at it, remove unnecessary ns->head reference as we already have head pointer. The problem with this is that the head->lock spans mpath disk node I/O that may block under some conditions (if for example the controller is disconnecting or the path became inaccessible), The locking scheme does not allow any other path to enable itself, preventing blocked I/O to complete and forward-progress from there. This is a preparation patch for the fix in a subsequent patch where the disk I/O will also be done outside the head->lock. Fixes: 0d0b660f214d ("nvme: add ANA support") Signed-off-by: Anton Eidelman <anton@lightbitslabs.com> Signed-off-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Christoph Hellwig <hch@lst.de>
2020-06-24nvme-multipath: fix deadlock between ana_work and scan_workAnton Eidelman
When scan_work calls nvme_mpath_add_disk() this holds ana_lock and invokes nvme_parse_ana_log(), which may issue IO in device_add_disk() and hang waiting for an accessible path. While nvme_mpath_set_live() only called when nvme_state_is_live(), a transition may cause NVME_SC_ANA_TRANSITION and requeue the IO. In order to recover and complete the IO ana_work on the same ctrl should be able to update the path state and remove NVME_NS_ANA_PENDING. The deadlock occurs because scan_work keeps holding ana_lock, so ana_work hangs [1]. Fix: Now nvme_mpath_add_disk() uses nvme_parse_ana_log() to obtain a copy of the ANA group desc, and then calls nvme_update_ns_ana_state() without holding ana_lock. [1]: kernel: Workqueue: nvme-wq nvme_scan_work [nvme_core] kernel: Call Trace: kernel: __schedule+0x2b9/0x6c0 kernel: schedule+0x42/0xb0 kernel: io_schedule+0x16/0x40 kernel: do_read_cache_page+0x438/0x830 kernel: read_cache_page+0x12/0x20 kernel: read_dev_sector+0x27/0xc0 kernel: read_lba+0xc1/0x220 kernel: efi_partition+0x1e6/0x708 kernel: check_partition+0x154/0x244 kernel: rescan_partitions+0xae/0x280 kernel: __blkdev_get+0x40f/0x560 kernel: blkdev_get+0x3d/0x140 kernel: __device_add_disk+0x388/0x480 kernel: device_add_disk+0x13/0x20 kernel: nvme_mpath_set_live+0x119/0x140 [nvme_core] kernel: nvme_update_ns_ana_state+0x5c/0x60 [nvme_core] kernel: nvme_set_ns_ana_state+0x1e/0x30 [nvme_core] kernel: nvme_parse_ana_log+0xa1/0x180 [nvme_core] kernel: nvme_mpath_add_disk+0x47/0x90 [nvme_core] kernel: nvme_validate_ns+0x396/0x940 [nvme_core] kernel: nvme_scan_work+0x24f/0x380 [nvme_core] kernel: process_one_work+0x1db/0x380 kernel: worker_thread+0x249/0x400 kernel: kthread+0x104/0x140 kernel: Workqueue: nvme-wq nvme_ana_work [nvme_core] kernel: Call Trace: kernel: __schedule+0x2b9/0x6c0 kernel: schedule+0x42/0xb0 kernel: schedule_preempt_disabled+0xe/0x10 kernel: __mutex_lock.isra.0+0x182/0x4f0 kernel: ? __switch_to_asm+0x34/0x70 kernel: ? select_task_rq_fair+0x1aa/0x5c0 kernel: ? kvm_sched_clock_read+0x11/0x20 kernel: ? sched_clock+0x9/0x10 kernel: __mutex_lock_slowpath+0x13/0x20 kernel: mutex_lock+0x2e/0x40 kernel: nvme_read_ana_log+0x3a/0x100 [nvme_core] kernel: nvme_ana_work+0x15/0x20 [nvme_core] kernel: process_one_work+0x1db/0x380 kernel: worker_thread+0x4d/0x400 kernel: kthread+0x104/0x140 kernel: ? process_one_work+0x380/0x380 kernel: ? kthread_park+0x80/0x80 kernel: ret_from_fork+0x35/0x40 Fixes: 0d0b660f214d ("nvme: add ANA support") Signed-off-by: Anton Eidelman <anton@lightbitslabs.com> Signed-off-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Christoph Hellwig <hch@lst.de>
2020-06-24nvme: fix possible deadlock when I/O is blockedSagi Grimberg
Revert fab7772bfbcf ("nvme-multipath: revalidate nvme_ns_head gendisk in nvme_validate_ns") When adding a new namespace to the head disk (via nvme_mpath_set_live) we will see partition scan which triggers I/O on the mpath device node. This process will usually be triggered from the scan_work which holds the scan_lock. If I/O blocks (if we got ana change currently have only available paths but none are accessible) this can deadlock on the head disk bd_mutex as both partition scan I/O takes it, and head disk revalidation takes it to check for resize (also triggered from scan_work on a different path). See trace [1]. The mpath disk revalidation was originally added to detect online disk size change, but this is no longer needed since commit cb224c3af4df ("nvme: Convert to use set_capacity_revalidate_and_notify") which already updates resize info without unnecessarily revalidating the disk (the mpath disk doesn't even implement .revalidate_disk fop). [1]: -- kernel: INFO: task kworker/u65:9:494 blocked for more than 241 seconds. kernel: Tainted: G OE 5.3.5-050305-generic #201910071830 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. kernel: kworker/u65:9 D 0 494 2 0x80004000 kernel: Workqueue: nvme-wq nvme_scan_work [nvme_core] kernel: Call Trace: kernel: __schedule+0x2b9/0x6c0 kernel: schedule+0x42/0xb0 kernel: schedule_preempt_disabled+0xe/0x10 kernel: __mutex_lock.isra.0+0x182/0x4f0 kernel: __mutex_lock_slowpath+0x13/0x20 kernel: mutex_lock+0x2e/0x40 kernel: revalidate_disk+0x63/0xa0 kernel: __nvme_revalidate_disk+0xfe/0x110 [nvme_core] kernel: nvme_revalidate_disk+0xa4/0x160 [nvme_core] kernel: ? evict+0x14c/0x1b0 kernel: revalidate_disk+0x2b/0xa0 kernel: nvme_validate_ns+0x49/0x940 [nvme_core] kernel: ? blk_mq_free_request+0xd2/0x100 kernel: ? __nvme_submit_sync_cmd+0xbe/0x1e0 [nvme_core] kernel: nvme_scan_work+0x24f/0x380 [nvme_core] kernel: process_one_work+0x1db/0x380 kernel: worker_thread+0x249/0x400 kernel: kthread+0x104/0x140 kernel: ? process_one_work+0x380/0x380 kernel: ? kthread_park+0x80/0x80 kernel: ret_from_fork+0x1f/0x40 ... kernel: INFO: task kworker/u65:1:2630 blocked for more than 241 seconds. kernel: Tainted: G OE 5.3.5-050305-generic #201910071830 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. kernel: kworker/u65:1 D 0 2630 2 0x80004000 kernel: Workqueue: nvme-wq nvme_scan_work [nvme_core] kernel: Call Trace: kernel: __schedule+0x2b9/0x6c0 kernel: schedule+0x42/0xb0 kernel: io_schedule+0x16/0x40 kernel: do_read_cache_page+0x438/0x830 kernel: ? __switch_to_asm+0x34/0x70 kernel: ? file_fdatawait_range+0x30/0x30 kernel: read_cache_page+0x12/0x20 kernel: read_dev_sector+0x27/0xc0 kernel: read_lba+0xc1/0x220 kernel: ? kmem_cache_alloc_trace+0x19c/0x230 kernel: efi_partition+0x1e6/0x708 kernel: ? vsnprintf+0x39e/0x4e0 kernel: ? snprintf+0x49/0x60 kernel: check_partition+0x154/0x244 kernel: rescan_partitions+0xae/0x280 kernel: __blkdev_get+0x40f/0x560 kernel: blkdev_get+0x3d/0x140 kernel: __device_add_disk+0x388/0x480 kernel: device_add_disk+0x13/0x20 kernel: nvme_mpath_set_live+0x119/0x140 [nvme_core] kernel: nvme_update_ns_ana_state+0x5c/0x60 [nvme_core] kernel: nvme_set_ns_ana_state+0x1e/0x30 [nvme_core] kernel: nvme_parse_ana_log+0xa1/0x180 [nvme_core] kernel: ? nvme_update_ns_ana_state+0x60/0x60 [nvme_core] kernel: nvme_mpath_add_disk+0x47/0x90 [nvme_core] kernel: nvme_validate_ns+0x396/0x940 [nvme_core] kernel: ? blk_mq_free_request+0xd2/0x100 kernel: nvme_scan_work+0x24f/0x380 [nvme_core] kernel: process_one_work+0x1db/0x380 kernel: worker_thread+0x249/0x400 kernel: kthread+0x104/0x140 kernel: ? process_one_work+0x380/0x380 kernel: ? kthread_park+0x80/0x80 kernel: ret_from_fork+0x1f/0x40 -- Fixes: fab7772bfbcf ("nvme-multipath: revalidate nvme_ns_head gendisk in nvme_validate_ns") Signed-off-by: Anton Eidelman <anton@lightbitslabs.com> Signed-off-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Christoph Hellwig <hch@lst.de>
2020-06-24nvme-rdma: assign completion vector correctlyMax Gurtovoy
The completion vector index that is given during CQ creation can't exceed the number of support vectors by the underlying RDMA device. This violation currently can accure, for example, in case one will try to connect with N regular read/write queues and M poll queues and the sum of N + M > num_supported_vectors. This will lead to failure in establish a connection to remote target. Instead, in that case, share a completion vector between queues. Signed-off-by: Max Gurtovoy <maxg@mellanox.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
2020-06-24nvme-loop: initialize tagset numa value to the value of the ctrlMax Gurtovoy
Both admin's and drive's tagsets should be set according the numa node of the controller. Signed-off-by: Max Gurtovoy <maxg@mellanox.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
2020-06-24nvme-tcp: initialize tagset numa value to the value of the ctrlMax Gurtovoy
Both admin's and drive's tagsets should be set according the numa node of the controller. Signed-off-by: Max Gurtovoy <maxg@mellanox.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
2020-06-24nvme-pci: initialize tagset numa value to the value of the ctrlMax Gurtovoy
Both admin's and drive's tagsets should be set according the numa node of the controller. Signed-off-by: Max Gurtovoy <maxg@mellanox.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
2020-06-24nvme-pci: override the value of the controller's numa nodeMax Gurtovoy
Set the node value according to the PCI device numa node. Signed-off-by: Max Gurtovoy <maxg@mellanox.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
2020-06-24nvme: set initial value for controller's numa nodeMax Gurtovoy
Initialize the node to NUMA_NO_NODE value. Transports that are aware of numa node affinity can override it (e.g. RDMA transport set the affinity according to the RDMA HCA). Signed-off-by: Max Gurtovoy <maxg@mellanox.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
2020-06-24usb: renesas_usbhs: getting residue from callback_resultYoshihiro Shimoda
This driver assumed that dmaengine_tx_status() could return the residue even if the transfer was completed. However, this was not correct usage [1] and this caused to break getting the residue after the commit 24461d9792c2 ("dmaengine: virt-dma: Fix access after free in vchan_complete()") actually. So, this is possible to get wrong received size if the usb controller gets a short packet. For example, g_zero driver causes "bad OUT byte" errors. The usb-dmac driver will support the callback_result, so this driver can use it to get residue correctly. Note that even if the usb-dmac driver has not supported the callback_result yet, this patch doesn't cause any side-effects. [1] https://lore.kernel.org/dmaengine/20200616165550.GP2324254@vkoul-mobl/ Reported-by: Hien Dang <hien.dang.eb@renesas.com> Fixes: 24461d9792c2 ("dmaengine: virt-dma: Fix access after free in vchan_complete()") Signed-off-by: Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com> Link: https://lore.kernel.org/r/1592482277-19563-1-git-send-email-yoshihiro.shimoda.uh@renesas.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-06-24block: release bip in a right way in error pathChengguang Xu
Release bip using kfree() in error path when that was allocated by kmalloc(). Signed-off-by: Chengguang Xu <cgxu519@mykernel.net> Reviewed-by: Christoph Hellwig <hch@lst.de> Acked-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-06-24drm/radeon: fix fb_div check in ni_init_smc_spll_table()Denis Efremov
clk_s is checked twice in a row in ni_init_smc_spll_table(). fb_div should be checked instead. Fixes: 69e0b57a91ad ("drm/radeon/kms: add dpm support for cayman (v5)") Cc: stable@vger.kernel.org Signed-off-by: Denis Efremov <efremov@linux.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-06-24ASoC: rt5682: fix the pop noise while OMTP type headset pluginShuming Fan
To turn the headphone output switch off during jack type detection, it could avoid the pop noise when jack type switches to OMTP type. Signed-off-by: Shuming Fan <shumingf@realtek.com> Link: https://lore.kernel.org/r/20200623125312.27896-1-shumingf@realtek.com Signed-off-by: Mark Brown <broonie@kernel.org>
2020-06-24libbpf: Prevent loading vmlinux BTF twiceAndrii Nakryiko
Prevent loading/parsing vmlinux BTF twice in some cases: for CO-RE relocations and for BTF-aware hooks (tp_btf, fentry/fexit, etc). Fixes: a6ed02cac690 ("libbpf: Load btf_vmlinux only once per object.") Signed-off-by: Andrii Nakryiko <andriin@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20200624043805.1794620-1-andriin@fb.com
2020-06-24Revert "usb: dwc3: exynos: Add support for Exynos5422 suspend clk"Anand Moon
This reverts commit 07f6842341abe978e6375078f84506ec3280ece5. Since SCLK_SCLK_USBD300 suspend clock need to be configured for phy module, I wrongly mapped this clock to DWC3 code. Cc: Felipe Balbi <balbi@kernel.org> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Anand Moon <linux.amoon@gmail.com> Cc: stable <stable@vger.kernel.org> Fixes: 07f6842341ab ("usb: dwc3: exynos: Add support for Exynos5422 suspend clk") Link: https://lore.kernel.org/r/20200623074637.756-1-linux.amoon@gmail.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-06-24xhci: Poll for U0 after disabling USB2 LPMKai-Heng Feng
USB2 devices with LPM enabled may interrupt the system suspend: [ 932.510475] usb 1-7: usb suspend, wakeup 0 [ 932.510549] hub 1-0:1.0: hub_suspend [ 932.510581] usb usb1: bus suspend, wakeup 0 [ 932.510590] xhci_hcd 0000:00:14.0: port 9 not suspended [ 932.510593] xhci_hcd 0000:00:14.0: port 8 not suspended .. [ 932.520323] xhci_hcd 0000:00:14.0: Port change event, 1-7, id 7, portsc: 0x400e03 .. [ 932.591405] PM: pci_pm_suspend(): hcd_pci_suspend+0x0/0x30 returns -16 [ 932.591414] PM: dpm_run_callback(): pci_pm_suspend+0x0/0x160 returns -16 [ 932.591418] PM: Device 0000:00:14.0 failed to suspend async: error -16 During system suspend, USB core will let HC suspends the device if it doesn't have remote wakeup enabled and doesn't have any children. However, from the log above we can see that the usb 1-7 doesn't get bus suspended due to not in U0. After a while the port finished U2 -> U0 transition, interrupts the suspend process. The observation is that after disabling LPM, port doesn't transit to U0 immediately and can linger in U2. xHCI spec 4.23.5.2 states that the maximum exit latency for USB2 LPM should be BESL + 10us. The BESL for the affected device is advertised as 400us, which is still not enough based on my testing result. So let's use the maximum permitted latency, 10000, to poll for U0 status to solve the issue. Cc: stable@vger.kernel.org Signed-off-by: Kai-Heng Feng <kai.heng.feng@canonical.com> Signed-off-by: Mathias Nyman <mathias.nyman@linux.intel.com> Link: https://lore.kernel.org/r/20200624135949.22611-6-mathias.nyman@linux.intel.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-06-24xhci: Return if xHCI doesn't support LPMKai-Heng Feng
Just return if xHCI is quirked to disable LPM. We can save some time from reading registers and doing spinlocks. Add stable tag as we want this patch together with the next one, "Poll for U0 after disabling USB2 LPM" which fixes a suspend issue for some USB2 LPM devices Cc: stable@vger.kernel.org Signed-off-by: Kai-Heng Feng <kai.heng.feng@canonical.com> Signed-off-by: Mathias Nyman <mathias.nyman@linux.intel.com> Link: https://lore.kernel.org/r/20200624135949.22611-5-mathias.nyman@linux.intel.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-06-24usb: host: xhci-mtk: avoid runtime suspend when removing hcdMacpaul Lin
When runtime suspend was enabled, runtime suspend might happen when xhci is removing hcd. This might cause kernel panic when hcd has been freed but runtime pm suspend related handle need to reference it. Signed-off-by: Macpaul Lin <macpaul.lin@mediatek.com> Reviewed-by: Chunfeng Yun <chunfeng.yun@mediatek.com> Cc: stable@vger.kernel.org Signed-off-by: Mathias Nyman <mathias.nyman@linux.intel.com> Link: https://lore.kernel.org/r/20200624135949.22611-4-mathias.nyman@linux.intel.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-06-24xhci: Fix enumeration issue when setting max packet size for FS devices.Al Cooper
Unable to complete the enumeration of a USB TV Tuner device. Per XHCI spec (4.6.5), the EP state field of the input context shall be cleared for a set address command. In the special case of an FS device that has "MaxPacketSize0 = 8", the Linux XHCI driver does not do this before evaluating the context. With an XHCI controller that checks the EP state field for parameter context error this causes a problem in cases such as the device getting reset again after enumeration. When that field is cleared, the problem does not occur. This was found and fixed by Sasi Kumar. Cc: stable@vger.kernel.org Signed-off-by: Al Cooper <alcooperx@gmail.com> Signed-off-by: Mathias Nyman <mathias.nyman@linux.intel.com> Link: https://lore.kernel.org/r/20200624135949.22611-3-mathias.nyman@linux.intel.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-06-24xhci: Fix incorrect EP_STATE_MASKMathias Nyman
EP_STATE_MASK should be 0x7 instead of 0xf xhci spec 6.2.3 shows that the EP state field in the endpoint context data structure consist of bits [2:0]. The old value included a bit from the next field which fortunately is a RsvdZ region. So hopefully this hasn't caused too much harm Cc: stable@vger.kernel.org Signed-off-by: Mathias Nyman <mathias.nyman@linux.intel.com> Link: https://lore.kernel.org/r/20200624135949.22611-2-mathias.nyman@linux.intel.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>