summaryrefslogtreecommitdiff
path: root/include/linux
AgeCommit message (Collapse)Author
2021-07-27tty: move tty_ldisc_receive_buf to tty_flip.hJiri Slaby
It's the only remaining tty_buffer.c prototype residing in tty.h. Move it along others to tty_flip.h. Signed-off-by: Jiri Slaby <jslaby@suse.cz> Link: https://lore.kernel.org/r/20210723103147.18250-6-jslaby@suse.cz Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-07-27tty: include list & lockdep from tty_ldisc.hJiri Slaby
We use structs list_head and lockdep_map as non-pointers in tty_ldisc.h. So better have headers defining them explicitly included so that the structs are always defined. Not only implicitly via random include chains. Signed-off-by: Jiri Slaby <jslaby@suse.cz> Link: https://lore.kernel.org/r/20210723103147.18250-5-jslaby@suse.cz Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-07-27tty: move ldisc prototypes to tty_ldisc.hJiri Slaby
We already have tty_ldisc.h, so cleanup tty.h a bit by moving out tty_ldisc-related function prototypes and a variable into tty_ldisc.h. They are implemented in tty_ldisc.c anyway. Signed-off-by: Jiri Slaby <jslaby@suse.cz> Link: https://lore.kernel.org/r/20210723103147.18250-4-jslaby@suse.cz Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-07-27tty: include kref.h in tty_driver.hJiri Slaby
We use kref in tty_driver.h, but do not include kref.h. It is currently included by linux/cdev.h -> linux/kobject.h -> linux/kref.h chain, so everything is in order only implicitly. So make this dependency explicit. Signed-off-by: Jiri Slaby <jslaby@suse.cz> Link: https://lore.kernel.org/r/20210723103147.18250-3-jslaby@suse.cz Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-07-27tty: move tty_driver related prototypes to tty_driver.hJiri Slaby
We already have tty_driver.h, so cleanup tty.h a bit by moving out tty_driver-related function prototypes into tty_driver.h. Note that tty.h already includes tty_driver.h. Signed-off-by: Jiri Slaby <jslaby@suse.cz> Link: https://lore.kernel.org/r/20210723103147.18250-2-jslaby@suse.cz Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-07-27fs: remove generic_block_fiemapChristoph Hellwig
Remove the now unused generic_block_fiemap helper. Link: https://lore.kernel.org/r/20210720133341.405438-5-hch@lst.de Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jan Kara <jack@suse.cz>
2021-07-27kernfs: switch kernfs to use an rwsemIan Kent
The kernfs global lock restricts the ability to perform kernfs node lookup operations in parallel during path walks. Change the kernfs mutex to an rwsem so that, when opportunity arises, node searches can be done in parallel with path walk lookups. Reviewed-by: Miklos Szeredi <mszeredi@redhat.com> Signed-off-by: Ian Kent <raven@themaw.net> Link: https://lore.kernel.org/r/162642770946.63632.2218304587223241374.stgit@web.messagingengine.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-07-27kernfs: add a revision to identify directory node changesIan Kent
Add a revision counter to kernfs directory nodes so it can be used to detect if a directory node has changed during negative dentry revalidation. There's an assumption that sizeof(unsigned long) <= sizeof(pointer) on all architectures and as far as I know that assumption holds. So adding a revision counter to the struct kernfs_elem_dir variant of the kernfs_node type union won't increase the size of the kernfs_node struct. This is because struct kernfs_elem_dir is at least sizeof(pointer) smaller than the largest union variant. It's tempting to make the revision counter a u64 but that would increase the size of kernfs_node on archs where sizeof(pointer) is smaller than the revision counter. Reviewed-by: Miklos Szeredi <mszeredi@redhat.com> Signed-off-by: Ian Kent <raven@themaw.net> Link: https://lore.kernel.org/r/162642769895.63632.8356662784964509867.stgit@web.messagingengine.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-07-27Merge 5.14-rc3 into driver-core-nextGreg Kroah-Hartman
We need the driver-core fixes in here as well. Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-07-26net: dsa: sja1105: add bridge TX data plane offload based on tag_8021qVladimir Oltean
The main desire for having this feature in sja1105 is to support network stack termination for traffic coming from a VLAN-aware bridge. For sja1105, offloading the bridge data plane means sending packets as-is, with the proper VLAN tag, to the chip. The chip will look up its FDB and forward them to the correct destination port. But we support bridge data plane offload even for VLAN-unaware bridges, and the implementation there is different. In fact, VLAN-unaware bridging is governed by tag_8021q, so it makes sense to have the .bridge_fwd_offload_add() implementation fully within tag_8021q. The key difference is that we only support 1 VLAN-aware bridge, but we support multiple VLAN-unaware bridges. So we need to make sure that the forwarding domain is not crossed by packets injected from the stack. For this, we introduce the concept of a tag_8021q TX VLAN for bridge forwarding offload. As opposed to the regular TX VLANs which contain only 2 ports (the user port and the CPU port), a bridge data plane TX VLAN is "multicast" (or "imprecise"): it contains all the ports that are part of a certain bridge, and the hardware will select where the packet goes within this "imprecise" forwarding domain. Each VLAN-unaware bridge has its own "imprecise" TX VLAN, so we make use of the unique "bridge_num" provided by DSA for the data plane offload. We use the same 3 bits from the tag_8021q VLAN ID format to encode this bridge number. Note that these 3 bit positions have been used before for sub-VLANs in best-effort VLAN filtering mode. The difference is that for best-effort, the sub-VLANs were only valid on RX (and it was documented that the sub-VLAN field needed to be transmitted as zero). Whereas for the bridge data plane offload, these 3 bits are only valid on TX. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-07-26net: bridge: add a helper for retrieving port VLANs from the data pathVladimir Oltean
Introduce a brother of br_vlan_get_info() which is protected by the RCU mechanism, as opposed to br_vlan_get_info() which relies on taking the write-side rtnl_mutex. This is needed for drivers which need to find out whether a bridge port has a VLAN configured or not. For example, certain DSA switches might not offer complete source port identification to the CPU on RX, just the VLAN in which the packet was received. Based on this VLAN, we cannot set an accurate skb->dev ingress port, but at least we can configure one that behaves the same as the correct one would (this is possible because DSA sets skb->offload_fwd_mark = 1). When we look at the bridge RX handler (br_handle_frame), we see that what matters regarding skb->dev is the VLAN ID and the port STP state. So we need to select an skb->dev that has the same bridge VLAN as the packet we're receiving, and is in the LEARNING or FORWARDING STP state. The latter is easy, but for the former, we should somehow keep a shadow list of the bridge VLANs on each port, and a lookup table between VLAN ID and the 'designated port for imprecise RX'. That is rather complicated to keep in sync properly (the designated port per VLAN needs to be updated on the addition and removal of a VLAN, as well as on the join/leave events of the bridge on that port). So, to avoid all that complexity, let's just iterate through our finite number of ports and ask the bridge, for each packet: "do you have this VLAN configured on this port?". Cc: Roopa Prabhu <roopa@nvidia.com> Cc: Nikolay Aleksandrov <nikolay@nvidia.com> Cc: Ido Schimmel <idosch@nvidia.com> Cc: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-07-26Merge tag 'mlx5-updates-2021-07-24' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux mlx5-updates-2021-07-24 This series aims to reduce coupling in mlx5e, particularly between RX resources (TIRs, RQTs) and numerous code units that use them. This refactoring is required for upcoming features, ADQ and TX lag hashing. The issue with the current code is that TIRs and RQTs are unmanaged, different places all over the driver create, destroy, track and configure them, often in an uncoordinated way. The responsibilities of different units become vague, leading to a lot of hidden dependencies between numerous units and tight coupling between them, which is prone to bugs and hard to maintain. The result of this refactoring is: 1. Creating a manager for RX resources, that controls their lifecycle and provides a clear API, which restricts the set of actions that other units can do. 2. Using object-oriented approach for TIRs, RQTs and RX resource manager (struct mlx5e_rx_res). 3. Fixing a few bugs and misbehaviors found during the refactoring. 4. Reducing the amount of dependencies, removing hidden dependencies, making them one-directional and organizing the code in clear abstraction layers. 5. Explicitly exposing the remaining weird dependencies. 6. Simplifying and organizing code that creates and modifies TIRs and RQTs. Saeed Mahameed says: ==================== mlx5 updates 2021-07-24 This series provides some refactoring to mlx5e RX resource management, it is required for upcoming ADQ and TX lag hashing features. The first two patches in this series : net/mlx5e: Prohibit inner indir TIRs in IPoIB net/mlx5e: Block LRO if firmware asks for tunneled LRO Were supposed to go to net, but due to dependency and timing they were included here. I would appreciate it if you'd apply them to net and mark for -stable. For more information please see tag log below. Please pull and let me know if there is any problem. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2021-07-26net/mlx5e: Block LRO if firmware asks for tunneled LROMaxim Mikityanskiy
This commit does a cleanup in LRO configuration. LRO is a parameter of an RQ, but its state is changed by modifying a TIR related to the RQ. The current status: LRO for tunneled packets is not supported in the driver, inner TIRs may enable LRO on creation, but LRO status of inner TIRs isn't changed in mlx5e_modify_tirs_lro(). This is inconsistent, but as long as the firmware doesn't declare support for tunneled LRO, it works, because the same RQs are shared between the inner and outer TIRs. This commit does two fixes: 1. If the firmware has the tunneled LRO capability, LRO is blocked altogether, because it's not possible to block it for inner TIRs only, when the same RQs are shared between inner and outer TIRs, and the driver won't be able to handle tunneled LRO traffic. 2. mlx5e_modify_tirs_lro() is patched to modify LRO state for all TIRs, including inner ones, because all TIRs related to an RQ should agree on their LRO state. Fixes: 7b3722fa9ef6 ("net/mlx5e: Support RSS for GRE tunneled packets") Signed-off-by: Maxim Mikityanskiy <maximmi@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-07-26printk: remove NMI trackingJohn Ogness
All NMI contexts are handled the same as the safe context: store the message and defer printing. There is no need to have special NMI context tracking for this. Using in_nmi() is enough. There are several parts of the kernel that are manually calling into the printk NMI context tracking in order to cause general printk deferred printing: arch/arm/kernel/smp.c arch/powerpc/kexec/crash.c kernel/trace/trace.c For arm/kernel/smp.c and powerpc/kexec/crash.c, provide a new function pair printk_deferred_enter/exit that explicitly achieves the same objective. For ftrace, remove the printk context manipulation completely. It was added in commit 03fc7f9c99c1 ("printk/nmi: Prevent deadlock when accessing the main log buffer in NMI"). The purpose was to enforce storing messages directly into the ring buffer even in NMI context. It really should have only modified the behavior in NMI context. There is no need for a special behavior any longer. All messages are always stored directly now. The console deferring is handled transparently in vprintk(). Signed-off-by: John Ogness <john.ogness@linutronix.de> [pmladek@suse.com: Remove special handling in ftrace.c completely. Signed-off-by: Petr Mladek <pmladek@suse.com> Link: https://lore.kernel.org/r/20210715193359.25946-5-john.ogness@linutronix.de
2021-07-26printk: remove safe buffersJohn Ogness
With @logbuf_lock removed, the high level printk functions for storing messages are lockless. Messages can be stored from any context, so there is no need for the NMI and safe buffers anymore. Remove the NMI and safe buffers. Although the safe buffers are removed, the NMI and safe context tracking is still in place. In these contexts, store the message immediately but still use irq_work to defer the console printing. Since printk recursion tracking is in place, safe context tracking for most of printk is not needed. Remove it. Only safe context tracking relating to the console and console_owner locks is left in place. This is because the console and console_owner locks are needed for the actual printing. Signed-off-by: John Ogness <john.ogness@linutronix.de> Reviewed-by: Petr Mladek <pmladek@suse.com> Signed-off-by: Petr Mladek <pmladek@suse.com> Link: https://lore.kernel.org/r/20210715193359.25946-4-john.ogness@linutronix.de
2021-07-26iommu: Remove mode argument from iommu_set_dma_strict()John Garry
We only ever now set strict mode enabled in iommu_set_dma_strict(), so just remove the argument. Signed-off-by: John Garry <john.garry@huawei.com> Reviewed-by: Robin Murphy <robin.murphy@arm.com> Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> Link: https://lore.kernel.org/r/1626088340-5838-7-git-send-email-john.garry@huawei.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
2021-07-26Merge 5.14-rc3 into char-misc-nextGreg Kroah-Hartman
We need the char-misc fixes from 5.14-rc3 into here as well. Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-07-26iommu: Add a map_pages() op for IOMMU driversIsaac J. Manjarres
Add a callback for IOMMU drivers to provide a path for the IOMMU framework to call into an IOMMU driver, which can call into the io-pgtable code, to map a physically contiguous rnage of pages of the same size. For IOMMU drivers that do not specify a map_pages() callback, the existing logic of mapping memory one page block at a time will be used. Signed-off-by: Isaac J. Manjarres <isaacm@codeaurora.org> Suggested-by: Will Deacon <will@kernel.org> Acked-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Georgi Djakov <quic_c_gdjako@quicinc.com> Link: https://lore.kernel.org/r/1623850736-389584-5-git-send-email-quic_c_gdjako@quicinc.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
2021-07-26iommu/io-pgtable: Introduce map_pages() as a page table opIsaac J. Manjarres
Mapping memory into io-pgtables follows the same semantics that unmapping memory used to follow (i.e. a buffer will be mapped one page block per call to the io-pgtable code). This means that it can be optimized in the same way that unmapping memory was, so add a map_pages() callback to the io-pgtable ops structure, so that a range of pages of the same size can be mapped within the same call. Signed-off-by: Isaac J. Manjarres <isaacm@codeaurora.org> Suggested-by: Will Deacon <will@kernel.org> Signed-off-by: Georgi Djakov <quic_c_gdjako@quicinc.com> Link: https://lore.kernel.org/r/1623850736-389584-4-git-send-email-quic_c_gdjako@quicinc.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
2021-07-26iommu: Add an unmap_pages() op for IOMMU driversIsaac J. Manjarres
Add a callback for IOMMU drivers to provide a path for the IOMMU framework to call into an IOMMU driver, which can call into the io-pgtable code, to unmap a virtually contiguous range of pages of the same size. For IOMMU drivers that do not specify an unmap_pages() callback, the existing logic of unmapping memory one page block at a time will be used. Signed-off-by: Isaac J. Manjarres <isaacm@codeaurora.org> Suggested-by: Will Deacon <will@kernel.org> Signed-off-by: Will Deacon <will@kernel.org> Acked-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Georgi Djakov <quic_c_gdjako@quicinc.com> Link: https://lore.kernel.org/r/1623850736-389584-3-git-send-email-quic_c_gdjako@quicinc.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
2021-07-26iommu/io-pgtable: Introduce unmap_pages() as a page table opIsaac J. Manjarres
The io-pgtable code expects to operate on a single block or granule of memory that is supported by the IOMMU hardware when unmapping memory. This means that when a large buffer that consists of multiple such blocks is unmapped, the io-pgtable code will walk the page tables to the correct level to unmap each block, even for blocks that are virtually contiguous and at the same level, which can incur an overhead in performance. Introduce the unmap_pages() page table op to express to the io-pgtable code that it should unmap a number of blocks of the same size, instead of a single block. Doing so allows multiple blocks to be unmapped in one call to the io-pgtable code, reducing the number of page table walks, and indirect calls. Signed-off-by: Isaac J. Manjarres <isaacm@codeaurora.org> Suggested-by: Will Deacon <will@kernel.org> Signed-off-by: Will Deacon <will@kernel.org> Signed-off-by: Georgi Djakov <quic_c_gdjako@quicinc.com> Link: https://lore.kernel.org/r/1623850736-389584-2-git-send-email-quic_c_gdjako@quicinc.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
2021-07-26printk: Move the printk() kerneldoc comment to its new homeJonathan Corbet
Commit 337015573718 ("printk: Userspace format indexing support") turned printk() into a macro, but left the kerneldoc comment for it with the (now) _printk() function, resulting in this docs-build warning: kernel/printk/printk.c:1: warning: 'printk' not found Move the kerneldoc comment back next to the (now) macro it's meant to describe and have the docs build find it there. Fixes: 337015573718b161 ("printk: Userspace format indexing support") Signed-off-by: Jonathan Corbet <corbet@lwn.net> Signed-off-by: Petr Mladek <pmladek@suse.com> Link: https://lore.kernel.org/r/87o8aqt7qn.fsf@meer.lwn.net
2021-07-26Merge v5.14-rc3 into usb-nextGreg Kroah-Hartman
We need the fixes in here, and this resolves a merge issue with drivers/usb/dwc3/gadget.c Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-07-25fscrypt: add fscrypt_symlink_getattr() for computing st_sizeEric Biggers
Add a helper function fscrypt_symlink_getattr() which will be called from the various filesystems' ->getattr() methods to read and decrypt the target of encrypted symlinks in order to report the correct st_size. Detailed explanation: As required by POSIX and as documented in various man pages, st_size for a symlink is supposed to be the length of the symlink target. Unfortunately, st_size has always been wrong for encrypted symlinks because st_size is populated from i_size from disk, which intentionally contains the length of the encrypted symlink target. That's slightly greater than the length of the decrypted symlink target (which is the symlink target that userspace usually sees), and usually won't match the length of the no-key encoded symlink target either. This hadn't been fixed yet because reporting the correct st_size would require reading the symlink target from disk and decrypting or encoding it, which historically has been considered too heavyweight to do in ->getattr(). Also historically, the wrong st_size had only broken a test (LTP lstat03) and there were no known complaints from real users. (This is probably because the st_size of symlinks isn't used too often, and when it is, typically it's for a hint for what buffer size to pass to readlink() -- which a slightly-too-large size still works for.) However, a couple things have changed now. First, there have recently been complaints about the current behavior from real users: - Breakage in rpmbuild: https://github.com/rpm-software-management/rpm/issues/1682 https://github.com/google/fscrypt/issues/305 - Breakage in toybox cpio: https://www.mail-archive.com/toybox@lists.landley.net/msg07193.html - Breakage in libgit2: https://issuetracker.google.com/issues/189629152 (on Android public issue tracker, requires login) Second, we now cache decrypted symlink targets in ->i_link. Therefore, taking the performance hit of reading and decrypting the symlink target in ->getattr() wouldn't be as big a deal as it used to be, since usually it will just save having to do the same thing later. Also note that eCryptfs ended up having to read and decrypt symlink targets in ->getattr() as well, to fix this same issue; see commit 3a60a1686f0d ("eCryptfs: Decrypt symlink target for stat size"). So, let's just bite the bullet, and read and decrypt the symlink target in ->getattr() in order to report the correct st_size. Add a function fscrypt_symlink_getattr() which the filesystems will call to do this. (Alternatively, we could store the decrypted size of symlinks on-disk. But there isn't a great place to do so, and encryption is meant to hide the original size to some extent; that property would be lost.) Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20210702065350.209646-2-ebiggers@kernel.org Signed-off-by: Eric Biggers <ebiggers@google.com>
2021-07-26Backmerge tag 'v5.14-rc3' into drm-nextDave Airlie
Linux 5.14-rc3 Daniel said we should pull the nouveau fix from fixes in here, probably a good plan. Signed-off-by: Dave Airlie <airlied@redhat.com>
2021-07-25can: flexcan: add platform data headerAngelo Dureghello
Add platform data header for flexcan. Link: https://lore.kernel.org/r/20210702094841.327679-1-angelo@kernel-space.org Signed-off-by: Angelo Dureghello <angelo@kernel-space.org> Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2021-07-25can: bittiming: fix documentation for struct can_tdcMarc Kleine-Budde
This patch fixes a typo in the documentation for struct can_tdc::tdcv. The number "0" refers to automatic mode not the letter "O". Further two grammar errors in the documentation for struct can_tdc are fixed. First grammar error: add a missing third person 's'. Second grammar error: replace "such as" by "such that". The intent is to give a condition, not an example. Fixes: 289ea9e4ae59 ("can: add new CAN FD bittiming parameters: Transmitter Delay Compensation (TDC)") Link: https://lore.kernel.org/r/20210616095922.2430415-1-mkl@pengutronix.de Link: https://lore.kernel.org/r/20210616124057.60723-1-mailhol.vincent@wanadoo.fr Co-developed-by: Vincent Mailhol <mailhol.vincent@wanadoo.fr> Signed-off-by: Vincent Mailhol <mailhol.vincent@wanadoo.fr> Acked-by: Vincent Mailhol <mailhol.vincent@wanadoo.fr> Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2021-07-25can: rx-offload: can_rx_offload_threaded_irq_finish(): add new function to ↵Marc Kleine-Budde
be called from threaded interrupt After reading all CAN frames from the controller in the IRQ handler and storing them into a skb_queue, the driver calls napi_schedule(). In the napi poll function the skb from the skb_queue are then pushed into the networking stack. However if napi_schedule() is called from a threaded IRQ handler this triggers the following error: | NOHZ tick-stop error: Non-RCU local softirq work is pending, handler #08!!! To avoid this, create a new rx-offload function (can_rx_offload_threaded_irq_finish()) with a call to local_bh_disable()/local_bh_enable() around the napi_schedule() call. Convert all drivers that call can_rx_offload_irq_finish() from threaded IRQ context to can_rx_offload_threaded_irq_finish(). Link: https://lore.kernel.org/r/20210724204745.736053-4-mkl@pengutronix.de Suggested-by: Daniel Glöckner <dg@emlix.com> Tested-by: Oleksij Rempel <o.rempel@pengutronix.de> Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2021-07-25can: rx-offload: can_rx_offload_irq_finish(): directly call napi_schedule()Marc Kleine-Budde
Instead of calling can_rx_offload_schedule() call napi_schedule() directly. As this was the last use of can_rx_offload_schedule() remove this helper function. Link: https://lore.kernel.org/r/20210724204745.736053-3-mkl@pengutronix.de Tested-by: Oleksij Rempel <o.rempel@pengutronix.de> Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2021-07-25can: rx-offload: add skb queue for use during ISRMarc Kleine-Budde
Adding a skb to the skb_queue in rx-offload requires to take a lock. This commit avoids this by adding an unlocked skb queue that is appended at the end of the ISR. Having one lock at the end of the ISR should be OK as the HW is empty, not about to overflow. Link: https://lore.kernel.org/r/20210724204745.736053-2-mkl@pengutronix.de Tested-by: Oleksij Rempel <o.rempel@pengutronix.de> Co-developed-by: Kurt Van Dijck <dev.kurt@vandijck-laurijssen.be> Signed-off-by: Kurt Van Dijck <dev.kurt@vandijck-laurijssen.be> Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2021-07-25IB/mlx5: Rename is_apu_thread_cq function to is_apu_cqTal Gilboa
is_apu_thread_cq() used to detect CQs which are attached to APU threads. This was extended to support other elements as well, so the function was renamed to is_apu_cq(). c_eqn_or_apu_element was extended from 8 bits to 32 bits, which wan't reflected when the APU support was first introduced. Acked-by: Michael S. Tsirkin <mst@redhat.com> # vdpa Signed-off-by: Tal Gilboa <talgi@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
2021-07-24Merge tag 'block-5.14-2021-07-24' of git://git.kernel.dk/linux-blockLinus Torvalds
Pull block fixes from Jens Axboe: - NVMe pull request (Christoph): - tracing fix (Keith Busch) - fix multipath head refcounting (Hannes Reinecke) - Write Zeroes vs PI fix (me) - drop a bogus WARN_ON (Zhihao Cheng) - Increase max blk-cgroup policy size, now that mq-deadline uses it too (Oleksandr) * tag 'block-5.14-2021-07-24' of git://git.kernel.dk/linux-block: nvme: set the PRACT bit when using Write Zeroes with T10 PI nvme: fix nvme_setup_command metadata trace event nvme: fix refcounting imbalance when all paths are down nvme-pci: don't WARN_ON in nvme_reset_work if ctrl.state is not RESETTING block: increase BLKCG_MAX_POLS
2021-07-23memblock: make for_each_mem_range() traverse MEMBLOCK_HOTPLUG regionsMike Rapoport
Commit b10d6bca8720 ("arch, drivers: replace for_each_membock() with for_each_mem_range()") didn't take into account that when there is movable_node parameter in the kernel command line, for_each_mem_range() would skip ranges marked with MEMBLOCK_HOTPLUG. The page table setup code in POWER uses for_each_mem_range() to create the linear mapping of the physical memory and since the regions marked as MEMORY_HOTPLUG are skipped, they never make it to the linear map. A later access to the memory in those ranges will fail: BUG: Unable to handle kernel data access on write at 0xc000000400000000 Faulting instruction address: 0xc00000000008a3c0 Oops: Kernel access of bad area, sig: 11 [#1] LE PAGE_SIZE=64K MMU=Radix SMP NR_CPUS=2048 NUMA pSeries Modules linked in: CPU: 0 PID: 53 Comm: kworker/u2:0 Not tainted 5.13.0 #7 NIP: c00000000008a3c0 LR: c0000000003c1ed8 CTR: 0000000000000040 REGS: c000000008a57770 TRAP: 0300 Not tainted (5.13.0) MSR: 8000000002009033 <SF,VEC,EE,ME,IR,DR,RI,LE> CR: 84222202 XER: 20040000 CFAR: c0000000003c1ed4 DAR: c000000400000000 DSISR: 42000000 IRQMASK: 0 GPR00: c0000000003c1ed8 c000000008a57a10 c0000000019da700 c000000400000000 GPR04: 0000000000000280 0000000000000180 0000000000000400 0000000000000200 GPR08: 0000000000000100 0000000000000080 0000000000000040 0000000000000300 GPR12: 0000000000000380 c000000001bc0000 c0000000001660c8 c000000006337e00 GPR16: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 GPR20: 0000000040000000 0000000020000000 c000000001a81990 c000000008c30000 GPR24: c000000008c20000 c000000001a81998 000fffffffff0000 c000000001a819a0 GPR28: c000000001a81908 c00c000001000000 c000000008c40000 c000000008a64680 NIP clear_user_page+0x50/0x80 LR __handle_mm_fault+0xc88/0x1910 Call Trace: __handle_mm_fault+0xc44/0x1910 (unreliable) handle_mm_fault+0x130/0x2a0 __get_user_pages+0x248/0x610 __get_user_pages_remote+0x12c/0x3e0 get_arg_page+0x54/0xf0 copy_string_kernel+0x11c/0x210 kernel_execve+0x16c/0x220 call_usermodehelper_exec_async+0x1b0/0x2f0 ret_from_kernel_thread+0x5c/0x70 Instruction dump: 79280fa4 79271764 79261f24 794ae8e2 7ca94214 7d683a14 7c893a14 7d893050 7d4903a6 60000000 60000000 60000000 <7c001fec> 7c091fec 7c081fec 7c051fec ---[ end trace 490b8c67e6075e09 ]--- Making for_each_mem_range() include MEMBLOCK_HOTPLUG regions in the traversal fixes this issue. Link: https://bugzilla.redhat.com/show_bug.cgi?id=1976100 Link: https://lkml.kernel.org/r/20210712071132.20902-1-rppt@kernel.org Fixes: b10d6bca8720 ("arch, drivers: replace for_each_membock() with for_each_mem_range()") Signed-off-by: Mike Rapoport <rppt@linux.ibm.com> Tested-by: Greg Kurz <groug@kaod.org> Reviewed-by: David Hildenbrand <david@redhat.com> Cc: <stable@vger.kernel.org> [5.10+] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-07-23mm: use kmap_local_page in memzero_pageChristoph Hellwig
The commit message introducing the global memzero_page explicitly mentions switching to kmap_local_page in the commit log but doesn't actually do that. Link: https://lkml.kernel.org/r/20210713055231.137602-3-hch@lst.de Fixes: 28961998f858 ("iov_iter: lift memzero_page() to highmem.h") Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Reviewed-by: Ira Weiny <ira.weiny@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-07-23mm: call flush_dcache_page() in memcpy_to_page() and memzero_page()Christoph Hellwig
memcpy_to_page and memzero_page can write to arbitrary pages, which could be in the page cache or in high memory, so call flush_kernel_dcache_pages to flush the dcache. This is a problem when using these helpers on dcache challeneged architectures. Right now there are just a few users, chances are no one used the PC floppy driver, the aha1542 driver for an ISA SCSI HBA, and a few advanced and optional btrfs and ext4 features on those platforms yet since the conversion. Link: https://lkml.kernel.org/r/20210713055231.137602-2-hch@lst.de Fixes: bb90d4bc7b6a ("mm/highmem: Lift memcpy_[to|from]_page to core") Fixes: 28961998f858 ("iov_iter: lift memzero_page() to highmem.h") Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Ira Weiny <ira.weiny@intel.com> Cc: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-07-23swiotlb: Convert io_default_tlb_mem to static allocationWill Deacon
Since commit 69031f500865 ("swiotlb: Set dev->dma_io_tlb_mem to the swiotlb pool used"), 'struct device' may hold a copy of the global 'io_default_tlb_mem' pointer if the device is using swiotlb for DMA. A subsequent call to swiotlb_exit() will therefore leave dangling pointers behind in these device structures, resulting in KASAN splats such as: | BUG: KASAN: use-after-free in __iommu_dma_unmap_swiotlb+0x64/0xb0 | Read of size 8 at addr ffff8881d7830000 by task swapper/0/0 | | CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.12.0-rc3-debug #1 | Hardware name: HP HP Desktop M01-F1xxx/87D6, BIOS F.12 12/17/2020 | Call Trace: | <IRQ> | dump_stack+0x9c/0xcf | print_address_description.constprop.0+0x18/0x130 | kasan_report.cold+0x7f/0x111 | __iommu_dma_unmap_swiotlb+0x64/0xb0 | nvme_pci_complete_rq+0x73/0x130 | blk_complete_reqs+0x6f/0x80 | __do_softirq+0xfc/0x3be Convert 'io_default_tlb_mem' to a static structure, so that the per-device pointers remain valid after swiotlb_exit() has been invoked. All users are updated to reference the static structure directly, using the 'nslabs' field to determine whether swiotlb has been initialised. The 'slots' array is still allocated dynamically and referenced via a pointer rather than a flexible array member. Cc: Claire Chang <tientzu@chromium.org> Cc: Christoph Hellwig <hch@lst.de> Cc: Robin Murphy <robin.murphy@arm.com> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Fixes: 69031f500865 ("swiotlb: Set dev->dma_io_tlb_mem to the swiotlb pool used") Reported-by: Nathan Chancellor <nathan@kernel.org> Tested-by: Nathan Chancellor <nathan@kernel.org> Tested-by: Claire Chang <tientzu@chromium.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Will Deacon <will@kernel.org> Signed-off-by: Konrad Rzeszutek Wilk <konrad@kernel.org>
2021-07-23bpf: tcp: Support bpf_(get|set)sockopt in bpf tcp iterMartin KaFai Lau
This patch allows bpf tcp iter to call bpf_(get|set)sockopt. To allow a specific bpf iter (tcp here) to call a set of helpers, get_func_proto function pointer is added to bpf_iter_reg. The bpf iter is a tracing prog which currently requires CAP_PERFMON or CAP_SYS_ADMIN, so this patch does not impose other capability checks for bpf_(get|set)sockopt. Signed-off-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Reviewed-by: Eric Dumazet <edumazet@google.com> Acked-by: Kuniyuki Iwashima <kuniyu@amazon.co.jp> Acked-by: Yonghong Song <yhs@fb.com> Link: https://lore.kernel.org/bpf/20210701200619.1036715-1-kafai@fb.com
2021-07-23signal: Rename SIL_PERF_EVENT SIL_FAULT_PERF_EVENT for consistencyEric W. Biederman
It helps to know which part of the siginfo structure the siginfo_layout value is talking about. v1: https://lkml.kernel.org/r/m18s4zs7nu.fsf_-_@fess.ebiederm.org v2: https://lkml.kernel.org/r/20210505141101.11519-9-ebiederm@xmission.com Link: https://lkml.kernel.org/r/87zgumw8cc.fsf_-_@disp2133 Acked-by: Marco Elver <elver@google.com> Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
2021-07-23signal: Remove the generic __ARCH_SI_TRAPNO supportEric W. Biederman
Now that __ARCH_SI_TRAPNO is no longer set by any architecture remove all of the code it enabled from the kernel. On alpha and sparc a more explict approach of using send_sig_fault_trapno or force_sig_fault_trapno in the very limited circumstances where si_trapno was set to a non-zero value. The generic support that is being removed always set si_trapno on all fault signals. With only SIGILL ILL_ILLTRAP on sparc and SIGFPE and SIGTRAP TRAP_UNK on alpla providing si_trapno values asking all senders of fault signals to provide an si_trapno value does not make sense. Making si_trapno an ordinary extension of the fault siginfo layout has enabled the architecture generic implementation of SIGTRAP TRAP_PERF, and enables other faulting signals to grow architecture generic senders as well. v1: https://lkml.kernel.org/r/m18s4zs7nu.fsf_-_@fess.ebiederm.org v2: https://lkml.kernel.org/r/20210505141101.11519-8-ebiederm@xmission.com Link: https://lkml.kernel.org/r/87bl73xx6x.fsf_-_@disp2133 Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2021-07-23signal/alpha: si_trapno is only used with SIGFPE and SIGTRAP TRAP_UNKEric W. Biederman
While reviewing the signal handlers on alpha it became clear that si_trapno is only set to a non-zero value when sending SIGFPE and when sending SITGRAP with si_code TRAP_UNK. Add send_sig_fault_trapno and send SIGTRAP TRAP_UNK, and SIGFPE with it. Remove the define of __ARCH_SI_TRAPNO and remove the always zero si_trapno parameter from send_sig_fault and force_sig_fault. v1: https://lkml.kernel.org/r/m1eeers7q7.fsf_-_@fess.ebiederm.org v2: https://lkml.kernel.org/r/20210505141101.11519-7-ebiederm@xmission.com Link: https://lkml.kernel.org/r/87h7gvxx7l.fsf_-_@disp2133 Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2021-07-23signal/sparc: si_trapno is only used with SIGILL ILL_ILLTRPEric W. Biederman
While reviewing the signal handlers on sparc it became clear that si_trapno is only set to a non-zero value when sending SIGILL with si_code ILL_ILLTRP. Add force_sig_fault_trapno and send SIGILL ILL_ILLTRP with it. Remove the define of __ARCH_SI_TRAPNO and remove the always zero si_trapno parameter from send_sig_fault and force_sig_fault. v1: https://lkml.kernel.org/r/m1eeers7q7.fsf_-_@fess.ebiederm.org v2: https://lkml.kernel.org/r/20210505141101.11519-7-ebiederm@xmission.com Link: https://lkml.kernel.org/r/87mtqnxx89.fsf_-_@disp2133 Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2021-07-23net: bridge: switchdev: allow the TX data plane forwarding to be offloadedTobias Waldekranz
Allow switchdevs to forward frames from the CPU in accordance with the bridge configuration in the same way as is done between bridge ports. This means that the bridge will only send a single skb towards one of the ports under the switchdev's control, and expects the driver to deliver the packet to all eligible ports in its domain. Primarily this improves the performance of multicast flows with multiple subscribers, as it allows the hardware to perform the frame replication. The basic flow between the driver and the bridge is as follows: - When joining a bridge port, the switchdev driver calls switchdev_bridge_port_offload() with tx_fwd_offload = true. - The bridge sends offloadable skbs to one of the ports under the switchdev's control using skb->offload_fwd_mark = true. - The switchdev driver checks the skb->offload_fwd_mark field and lets its FDB lookup select the destination port mask for this packet. v1->v2: - convert br_input_skb_cb::fwd_hwdoms to a plain unsigned long - introduce a static key "br_switchdev_fwd_offload_used" to minimize the impact of the newly introduced feature on all the setups which don't have hardware that can make use of it - introduce a check for nbp->flags & BR_FWD_OFFLOAD to optimize cache line access - reorder nbp_switchdev_frame_mark_accel() and br_handle_vlan() in __br_forward() - do not strip VLAN on egress if forwarding offload on VLAN-aware bridge is being used - propagate errors from .ndo_dfwd_add_station() if not EOPNOTSUPP v2->v3: - replace the solution based on .ndo_dfwd_add_station with a solution based on switchdev_bridge_port_offload - rename BR_FWD_OFFLOAD to BR_TX_FWD_OFFLOAD v3->v4: rebase v4->v5: - make sure the static key is decremented on bridge port unoffload - more function and variable renaming and comments for them: br_switchdev_fwd_offload_used to br_switchdev_tx_fwd_offload br_switchdev_accels_skb to br_switchdev_frame_uses_tx_fwd_offload nbp_switchdev_frame_mark_tx_fwd to nbp_switchdev_frame_mark_tx_fwd_to_hwdom nbp_switchdev_frame_mark_accel to nbp_switchdev_frame_mark_tx_fwd_offload fwd_accel to tx_fwd_offload Signed-off-by: Tobias Waldekranz <tobias@waldekranz.com> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-07-23Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netDavid S. Miller
Conflicts are simple overlapping changes. Signed-off-by: David S. Miller <davem@davemloft.net>
2021-07-23ima: Add digest and digest_len params to the functions to measure a bufferRoberto Sassu
This patch performs the final modification necessary to pass the buffer measurement to callers, so that they provide a functionality similar to ima_file_hash(). It adds the 'digest' and 'digest_len' parameters to ima_measure_critical_data() and process_buffer_measurement(). These functions calculate the digest even if there is no suitable rule in the IMA policy and, in this case, they simply return 1 before generating a new measurement entry. Signed-off-by: Roberto Sassu <roberto.sassu@huawei.com> Reviewed-by: Lakshmi Ramasubramanian <nramas@linux.microsoft.com> Signed-off-by: Mimi Zohar <zohar@linux.ibm.com>
2021-07-23ima: Return int in the functions to measure a bufferRoberto Sassu
ima_measure_critical_data() and process_buffer_measurement() currently don't return a result as, unlike appraisal-related functions, the result is not used by callers to deny an operation. Measurement-related functions instead rely on the audit subsystem to notify the system administrator when an error occurs. However, ima_measure_critical_data() and process_buffer_measurement() are a special case, as these are the only functions that can return a buffer measurement (for files, there is ima_file_hash()). In a subsequent patch, they will be modified to return the calculated digest. In preparation to return the result of the digest calculation, this patch modifies the return type from void to int, and returns 0 if the buffer has been successfully measured, a negative value otherwise. Given that the result of the measurement is still not necessary, this patch does not modify the behavior of existing callers by processing the returned value. For those, the return value is ignored. Signed-off-by: Roberto Sassu <roberto.sassu@huawei.com> Reviewed-by: Lakshmi Ramasubramanian <nramas@linux.microsoft.com> Acked-by: Paul Moore <paul@paul-moore.com> (for the SELinux bits) Signed-off-by: Mimi Zohar <zohar@linux.ibm.com>
2021-07-23ima: Introduce ima_get_current_hash_algo()Roberto Sassu
Buffer measurements, unlike file measurements, are not accessible after the measurement is done, as buffers are not suitable for use with the integrity_iint_cache structure (there is no index, for files it is the inode number). In the subsequent patches, the measurement (digest) will be returned directly by the functions that perform the buffer measurement, ima_measure_critical_data() and process_buffer_measurement(). A caller of those functions also needs to know the algorithm used to calculate the digest. Instead of adding the algorithm as a new parameter to the functions, this patch provides it separately with the new function ima_get_current_hash_algo(). Since the hash algorithm does not change after the IMA setup phase, there is no risk of races (obtaining a digest calculated with a different algorithm than the one returned). Signed-off-by: Roberto Sassu <roberto.sassu@huawei.com> Reviewed-by: Lakshmi Ramasubramanian <nramas@linux.microsoft.com> [zohar@linux.ibm.com: annotate ima_hash_algo as __ro_after_init] Signed-off-by: Mimi Zohar <zohar@linux.ibm.com>
2021-07-23net: socket: rework compat_ifreq_ioctl()Arnd Bergmann
compat_ifreq_ioctl() is one of the last users of copy_in_user() and compat_alloc_user_space(), as it attempts to convert the 'struct ifreq' arguments from 32-bit to 64-bit format as used by dev_ioctl() and a couple of socket family specific interpretations. The current implementation works correctly when calling dev_ioctl(), inet_ioctl(), ieee802154_sock_ioctl(), atalk_ioctl(), qrtr_ioctl() and packet_ioctl(). The ioctl handlers for x25, netrom, rose and x25 do not interpret the arguments and only block the corresponding commands, so they do not care. For af_inet6 and af_decnet however, the compat conversion is slightly incorrect, as it will copy more data than the native handler accesses, both of them use a structure that is shorter than ifreq. Replace the copy_in_user() conversion with a pair of accessor functions to read and write the ifreq data in place with the correct length where needed, while leaving the other ones to copy the (already compatible) structures directly. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-07-23net: socket: simplify dev_ifconf handlingArnd Bergmann
The dev_ifconf() calling conventions make compat handling more complicated than necessary, simplify this by moving the in_compat_syscall() check into the function. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-07-23net: socket: remove register_gifconfArnd Bergmann
Since dynamic registration of the gifconf() helper is only used for IPv4, and this can not be in a loadable module, this can be simplified noticeably by turning it into a direct function call as a preparation for cleaning up the compat handling. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-07-23ethtool: improve compat ioctl handlingArnd Bergmann
The ethtool compat ioctl handling is hidden away in net/socket.c, which introduces a couple of minor oddities: - The implementation may end up diverging, as seen in the RXNFC extension in commit 84a1d9c48200 ("net: ethtool: extend RXNFC API to support RSS spreading of filter matches") that does not work in compat mode. - Most architectures do not need the compat handling at all because u64 and compat_u64 have the same alignment. - On x86, the conversion is done for both x32 and i386 user space, but it's actually wrong to do it for x32 and cannot work there. - On 32-bit Arm, it never worked for compat oabi user space, since that needs to do the same conversion but does not. - It would be nice to get rid of both compat_alloc_user_space() and copy_in_user() throughout the kernel. None of these actually seems to be a serious problem that real users are likely to encounter, but fixing all of them actually leads to code that is both shorter and more readable. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: David S. Miller <davem@davemloft.net>