summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2022-02-09selftests/bpf: Cover 4-byte load from remote_port in bpf_sk_lookupJakub Sitnicki
Extend the context access tests for sk_lookup prog to cover the surprising case of a 4-byte load from the remote_port field, where the expected value is actually shifted by 16 bits. Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Yonghong Song <yhs@fb.com> Link: https://lore.kernel.org/bpf/20220209184333.654927-3-jakub@cloudflare.com
2022-02-09bpf: Make remote_port field in struct bpf_sk_lookup 16-bit wideJakub Sitnicki
remote_port is another case of a BPF context field documented as a 32-bit value in network byte order for which the BPF context access converter generates a load of a zero-padded 16-bit integer in network byte order. First such case was dst_port in bpf_sock which got addressed in commit 4421a582718a ("bpf: Make dst_port field in struct bpf_sock 16-bit wide"). Loading 4-bytes from the remote_port offset and converting the value with bpf_ntohl() leads to surprising results, as the expected value is shifted by 16 bits. Reduce the confusion by splitting the field in two - a 16-bit field holding a big-endian integer, and a 16-bit zero-padding anonymous field that follows it. Suggested-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20220209184333.654927-2-jakub@cloudflare.com
2022-02-09Merge tag 'nfsd-5.17-2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux Pull more nfsd fixes from Chuck Lever: "Ensure that NFS clients cannot send file size or offset values that can cause the NFS server to crash or to return incorrect or surprising results. In particular, fix how the NFS server handles values larger than OFFSET_MAX" * tag 'nfsd-5.17-2' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux: NFSD: Deprecate NFS_OFFSET_MAX NFSD: Fix offset type in I/O trace points NFSD: COMMIT operations must not return NFS?ERR_INVAL NFSD: Clamp WRITE offsets NFSD: Fix NFSv3 SETATTR/CREATE's handling of large file sizes NFSD: Fix ia_size underflow NFSD: Fix the behavior of READ near OFFSET_MAX
2022-02-09Merge branch 'linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6 Pull crypto fixes from Herbert Xu: "Fix two regressions: - Potential boot failure due to missing cryptomgr on initramfs - Stack overflow in octeontx2" * 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6: crypto: api - Move cryptomgr soft dependency into algapi crypto: octeontx2 - Avoid stack variable overflow
2022-02-09Fix regression due to "fs: move binfmt_misc sysctl to its own file"Domenico Andreoli
Commit 3ba442d5331f ("fs: move binfmt_misc sysctl to its own file") did not go unnoticed, binfmt-support stopped to work on my Debian system since v5.17-rc2 (did not check with -rc1). The existance of the /proc/sys/fs/binfmt_misc is a precondition for attempting to mount the binfmt_misc fs, which in turn triggers the autoload of the binfmt_misc module. Without it, no module is loaded and no binfmt is available at boot. Building as built-in or manually loading the module and mounting the fs works fine, it's therefore only a matter of interaction with user-space. I could try to improve the Debian systemd configuration but I can't say anything about the other distributions. This patch restores a working system right after boot. Fixes: 3ba442d5331f ("fs: move binfmt_misc sysctl to its own file") Signed-off-by: Domenico Andreoli <domenico.andreoli@linux.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Luis Chamberlain <mcgrof@kernel.org> Reviewed-by: Tong Zhang <ztong0001@gmail.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-02-09ice: Add ability for PF admin to enable VF VLAN pruningBrett Creeley
VFs by default are able to see all tagged traffic regardless of trust and VLAN filters. Based on legacy devices (i.e. ixgbe, i40e), customers expect VFs to receive all VLAN tagged traffic with a matching destination MAC. Add an ethtool private flag 'vf-vlan-pruning' and set the default to off so VFs will receive all VLAN traffic directed towards them. When the flag is turned on, VF will only be able to receive untagged traffic or traffic with VLAN tags it has created interfaces for. Also, the flag cannot be changed while any VFs are allocated. This was done to simplify the implementation. So, if this flag is needed, then the PF admin must enable it. If the user tries to enable the flag while VFs are active, then print an unsupported message with the vf-vlan-pruning flag included. In case multiple flags were specified, this makes it clear to the user which flag failed. Signed-off-by: Brett Creeley <brett.creeley@intel.com> Tested-by: Gurucharan G <gurucharanx.g@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2022-02-09ice: Add support for 802.1ad port VLANs VFBrett Creeley
Currently there is only support for 802.1Q port VLANs on SR-IOV VFs. Add support to also allow 802.1ad port VLANs when double VLAN mode is enabled. Signed-off-by: Brett Creeley <brett.creeley@intel.com> Tested-by: Konrad Jankowski <konrad0.jankowski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2022-02-09ice: Advertise 802.1ad VLAN filtering and offloads for PF netdevBrett Creeley
In order for the driver to support 802.1ad VLAN filtering and offloads, it needs to advertise those VLAN features and also support modifying those VLAN features, so make the necessary changes to ice_set_netdev_features(). By default, enable CTAG insertion/stripping and CTAG filtering for both Single and Double VLAN Modes (SVM/DVM). Also, in DVM, enable STAG filtering by default. This is done by setting the feature bits in netdev->features. Also, in DVM, support toggling of STAG insertion/stripping, but don't enable them by default. This is done by setting the feature bits in netdev->hw_features. Since 802.1ad VLAN filtering and offloads are only supported in DVM, make sure they are not enabled by default and that they cannot be enabled during runtime, when the device is in SVM. Add an implementation for the ndo_fix_features() callback. This is needed since the hardware cannot support multiple VLAN ethertypes for VLAN insertion/stripping simultaneously and all supported VLAN filtering must either be enabled or disabled together. Disable inner VLAN stripping by default when DVM is enabled. If a VSI supports stripping the inner VLAN in DVM, then it will have to configure that during runtime. For example if a VF is configured in a port VLAN while DVM is enabled it will be allowed to offload inner VLANs. Signed-off-by: Brett Creeley <brett.creeley@intel.com> Tested-by: Gurucharan G <gurucharanx.g@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2022-02-09ice: Support configuring the device to Double VLAN ModeBrett Creeley
In order to support configuring the device in Double VLAN Mode (DVM), the DDP and FW have to support DVM. If both support DVM, the PF that downloads the package needs to update the default recipes, set the VLAN mode, and update boost TCAM entries. To support updating the default recipes in DVM, add support for updating an existing switch recipe's lkup_idx and mask. This is done by first calling the get recipe AQ (0x0292) with the desired recipe ID. Then, if that is successful update one of the lookup indices (lkup_idx) and its associated mask if the mask is valid otherwise the already existing mask will be used. The VLAN mode of the device has to be configured while the global configuration lock is held while downloading the DDP, specifically after the DDP has been downloaded. If supported, the device will default to DVM. Co-developed-by: Dan Nowlin <dan.nowlin@intel.com> Signed-off-by: Dan Nowlin <dan.nowlin@intel.com> Signed-off-by: Brett Creeley <brett.creeley@intel.com> Tested-by: Gurucharan G <gurucharanx.g@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2022-02-09ice: Add support for VIRTCHNL_VF_OFFLOAD_VLAN_V2Brett Creeley
Add support for the VF driver to be able to request VIRTCHNL_VF_OFFLOAD_VLAN_V2, negotiate its VLAN capabilities via VIRTCHNL_OP_GET_OFFLOAD_VLAN_V2_CAPS, add/delete VLAN filters, and enable/disable VLAN offloads. VFs supporting VIRTCHNL_OFFLOAD_VLAN_V2 will be able to use the following virtchnl opcodes: VIRTCHNL_OP_GET_OFFLOAD_VLAN_V2_CAPS VIRTCHNL_OP_ADD_VLAN_V2 VIRTCHNL_OP_DEL_VLAN_V2 VIRTCHNL_OP_ENABLE_VLAN_STRIPPING_V2 VIRTCHNL_OP_DISABLE_VLAN_STRIPPING_V2 VIRTCHNL_OP_ENABLE_VLAN_INSERTION_V2 VIRTCHNL_OP_DISABLE_VLAN_INSERTION_V2 Legacy VF drivers may expect the initial VLAN stripping settings to be configured by the PF, so the PF initializes VLAN stripping based on the VIRTCHNL_OP_GET_VF_RESOURCES opcode. However, with VLAN support via VIRTCHNL_VF_OFFLOAD_VLAN_V2, this function is only expected to be used for VFs that only support VIRTCHNL_VF_OFFLOAD_VLAN, which will only be supported when a port VLAN is configured. Update the function based on the new expectations. Also, change the message when the PF can't enable/disable VLAN stripping to a dev_dbg() as this isn't fatal. When a VF isn't in a port VLAN and it only supports VIRTCHNL_VF_OFFLOAD_VLAN when Double VLAN Mode (DVM) is enabled, then the PF needs to reject the VIRTCHNL_VF_OFFLOAD_VLAN capability and configure the VF in software only VLAN mode. To do this add the new function ice_vf_vsi_cfg_legacy_vlan_mode(), which updates the VF's inner and outer ice_vsi_vlan_ops functions and sets up software only VLAN mode. Signed-off-by: Brett Creeley <brett.creeley@intel.com> Tested-by: Konrad Jankowski <konrad0.jankowski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2022-02-09ice: Add hot path support for 802.1Q and 802.1ad VLAN offloadsBrett Creeley
Currently the driver only supports 802.1Q VLAN insertion and stripping. However, once Double VLAN Mode (DVM) is fully supported, then both 802.1Q and 802.1ad VLAN insertion and stripping will be supported. Unfortunately the VSI context parameters only allow for one VLAN ethertype at a time for VLAN offloads so only one or the other VLAN ethertype offload can be supported at once. To support this, multiple changes are needed. Rx path changes: [1] In DVM, the Rx queue context l2tagsel field needs to be cleared so the outermost tag shows up in the l2tag2_2nd field of the Rx flex descriptor. In Single VLAN Mode (SVM), the l2tagsel field should remain 1 to support SVM configurations. [2] Modify the ice_test_staterr() function to take a __le16 instead of the ice_32b_rx_flex_desc union pointer so this function can be used for both rx_desc->wb.status_error0 and rx_desc->wb.status_error1. [3] Add the new inline function ice_get_vlan_tag_from_rx_desc() that checks if there is a VLAN tag in l2tag1 or l2tag2_2nd. [4] In ice_receive_skb(), add a check to see if NETIF_F_HW_VLAN_STAG_RX is enabled in netdev->features. If it is, then this is the VLAN ethertype that needs to be added to the stripping VLAN tag. Since ice_fix_features() prevents CTAG_RX and STAG_RX from being enabled simultaneously, the VLAN ethertype will only ever be 802.1Q or 802.1ad. Tx path changes: [1] In DVM, the VLAN tag needs to be placed in the l2tag2 field of the Tx context descriptor. The new define ICE_TX_FLAGS_HW_OUTER_SINGLE_VLAN was added to the list of tx_flags to handle this case. [2] When the stack requests the VLAN tag to be offloaded on Tx, the driver needs to set either ICE_TX_FLAGS_HW_OUTER_SINGLE_VLAN or ICE_TX_FLAGS_HW_VLAN, so the tag is inserted in l2tag2 or l2tag1 respectively. To determine which location to use, set a bit in the Tx ring flags field during ring allocation that can be used to determine which field to use in the Tx descriptor. In DVM, always use l2tag2, and in SVM, always use l2tag1. Signed-off-by: Brett Creeley <brett.creeley@intel.com> Tested-by: Gurucharan G <gurucharanx.g@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2022-02-09ice: Add outer_vlan_ops and VSI specific VLAN ops implementationsBrett Creeley
Add a new outer_vlan_ops member to the ice_vsi structure as outer VLAN ops are only available when the device is in Double VLAN Mode (DVM). Depending on the VSI type, the requirements for what operations to use/allow differ. By default all VSI's have unsupported inner and outer VSI VLAN ops. This implementation was chosen to prevent unexpected crashes due to null pointer dereferences. Instead, if a VSI calls an unsupported op, it will just return -EOPNOTSUPP. Add implementations to support modifying outer VLAN fields for VSI context. This includes the ability to modify VLAN stripping, insertion, and the port VLAN based on the outer VLAN handling fields of the VSI context. These functions should only ever be used if DVM is enabled because that means the firmware supports the outer VLAN fields in the VSI context. If the device is in DVM, then always use the outer_vlan_ops, else use the vlan_ops since the device is in Single VLAN Mode (SVM). Also, move adding the untagged VLAN 0 filter from ice_vsi_setup() to ice_vsi_vlan_setup() as the latter function is specific to the PF and all other VSI types that need an untagged VLAN 0 filter already do this in their specific flows. Without this change, Flow Director is failing to initialize because it does not implement any VSI VLAN ops. Signed-off-by: Brett Creeley <brett.creeley@intel.com> Tested-by: Gurucharan G <gurucharanx.g@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2022-02-09ice: Adjust naming for inner VLAN operationsBrett Creeley
Current operations act on inner VLAN fields. To support double VLAN, outer VLAN operations and functions will be implemented. Add the "inner" naming to existing VLAN operations to distinguish them from the upcoming outer values and functions. Some spacing adjustments are made to align values. Note that the inner is not talking about a tunneled VLAN, but the second VLAN in the packet. For SVM the driver uses inner or single VLAN filtering and offloads and in Double VLAN Mode the driver uses the inner filtering and offloads for SR-IOV VFs in port VLANs in order to support offloading the guest VLAN while a port VLAN is configured. Signed-off-by: Brett Creeley <brett.creeley@intel.com> Tested-by: Gurucharan G <gurucharanx.g@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2022-02-09ice: Use the proto argument for VLAN opsBrett Creeley
Currently the proto argument is unused. This is because the driver only supports 802.1Q VLAN filtering. This policy is enforced via netdev features that the driver sets up when configuring the netdev, so the proto argument won't ever be anything other than 802.1Q. However, this will allow for future iterations of the driver to seemlessly support 802.1ad filtering. Begin using the proto argument and extend the related structures to support its use. Signed-off-by: Brett Creeley <brett.creeley@intel.com> Tested-by: Gurucharan G <gurucharanx.g@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2022-02-09ice: Refactor vf->port_vlan_info to use ice_vlanBrett Creeley
The current vf->port_vlan_info variable is a packed u16 that contains the port VLAN ID and QoS/prio value. This is fine, but changes are incoming that allow for an 802.1ad port VLAN. Add flexibility by changing the vf->port_vlan_info member to be an ice_vlan structure. Signed-off-by: Brett Creeley <brett.creeley@intel.com> Tested-by: Gurucharan G <gurucharanx.g@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2022-02-09ice: Introduce ice_vlan structBrett Creeley
Add a new struct for VLAN related information. Currently this holds VLAN ID and priority values, but will be expanded to hold TPID value. This reduces the changes necessary if any other values are added in future. Remove the action argument from these calls as it's always ICE_FWD_VSI. Signed-off-by: Brett Creeley <brett.creeley@intel.com> Tested-by: Gurucharan G <gurucharanx.g@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2022-02-09ice: Add new VSI VLAN opsBrett Creeley
Incoming changes to support 802.1Q and/or 802.1ad VLAN filtering and offloads require more flexibility when configuring VLANs. The VSI VLAN interface will allow flexibility for configuring VLANs for all VSI types. Add new files to separate the VSI VLAN ops and move functions to make the code more organized. Signed-off-by: Brett Creeley <brett.creeley@intel.com> Tested-by: Gurucharan G <gurucharanx.g@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2022-02-09ice: Add helper function for adding VLAN 0Brett Creeley
There are multiple places where VLAN 0 is being added. Create a function to be called in order to minimize changes as the implementation is expanded to support double VLAN and avoid duplicated code. Signed-off-by: Brett Creeley <brett.creeley@intel.com> Tested-by: Gurucharan G <gurucharanx.g@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2022-02-09ice: Refactor spoofcheck configuration functionsBrett Creeley
Add functions to configure Tx VLAN antispoof based on iproute configuration and/or VLAN mode and VF driver support. This is needed later so the driver can control when it can be configured. Also, add functions that can be used to enable and disable MAC and VLAN spoofcheck. Move spoofchk configuration during VSI setup into the SR-IOV initialization path and into the post VSI rebuild flow for VF VSIs. Signed-off-by: Brett Creeley <brett.creeley@intel.com> Tested-by: Gurucharan G <gurucharanx.g@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2022-02-09Merge tag 'kvm-s390-kernel-access' from emailed bundleLinus Torvalds
Pull s390 kvm fix from Christian Borntraeger: "Add missing check for the MEMOP ioctl The SIDA MEMOPs must only be used for secure guests, otherwise userspace can do unwanted memory accesses" * tag 'kvm-s390-kernel-access' from emailed bundle: KVM: s390: Return error on SIDA memop on normal guest
2022-02-09NFSD: Deprecate NFS_OFFSET_MAXChuck Lever
NFS_OFFSET_MAX was introduced way back in Linux v2.3.y before there was a kernel-wide OFFSET_MAX value. As a clean up, replace the last few uses of it with its generic equivalent, and get rid of it. Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2022-02-09NFSD: Fix offset type in I/O trace pointsChuck Lever
NFSv3 and NFSv4 use u64 offset values on the wire. Record these values verbatim without the implicit type case to loff_t. Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2022-02-09NFSD: COMMIT operations must not return NFS?ERR_INVALChuck Lever
Since, well, forever, the Linux NFS server's nfsd_commit() function has returned nfserr_inval when the passed-in byte range arguments were non-sensical. However, according to RFC 1813 section 3.3.21, NFSv3 COMMIT requests are permitted to return only the following non-zero status codes: NFS3ERR_IO NFS3ERR_STALE NFS3ERR_BADHANDLE NFS3ERR_SERVERFAULT NFS3ERR_INVAL is not included in that list. Likewise, NFS4ERR_INVAL is not listed in the COMMIT row of Table 6 in RFC 8881. RFC 7530 does permit COMMIT to return NFS4ERR_INVAL, but does not specify when it can or should be used. Instead of dropping or failing a COMMIT request in a byte range that is not supported, turn it into a valid request by treating one or both arguments as zero. Offset zero means start-of-file, count zero means until-end-of-file, so we only ever extend the commit range. NFS servers are always allowed to commit more and sooner than requested. The range check is no longer bounded by NFS_OFFSET_MAX, but rather by the value that is returned in the maxfilesize field of the NFSv3 FSINFO procedure or the NFSv4 maxfilesize file attribute. Note that this change results in a new pynfs failure: CMT4 st_commit.testCommitOverflow : RUNNING CMT4 st_commit.testCommitOverflow : FAILURE COMMIT with offset + count overflow should return NFS4ERR_INVAL, instead got NFS4_OK IMO the test is not correct as written: RFC 8881 does not allow the COMMIT operation to return NFS4ERR_INVAL. Reported-by: Dan Aloni <dan.aloni@vastdata.com> Cc: stable@vger.kernel.org Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Reviewed-by: Bruce Fields <bfields@fieldses.org>
2022-02-09NFSD: Clamp WRITE offsetsChuck Lever
Ensure that a client cannot specify a WRITE range that falls in a byte range outside what the kernel's internal types (such as loff_t, which is signed) can represent. The kiocb iterators, invoked in nfsd_vfs_write(), should properly limit write operations to within the underlying file system's s_maxbytes. Cc: stable@vger.kernel.org Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2022-02-09NFSD: Fix NFSv3 SETATTR/CREATE's handling of large file sizesChuck Lever
iattr::ia_size is a loff_t, so these NFSv3 procedures must be careful to deal with incoming client size values that are larger than s64_max without corrupting the value. Silently capping the value results in storing a different value than the client passed in which is unexpected behavior, so remove the min_t() check in decode_sattr3(). Note that RFC 1813 permits only the WRITE procedure to return NFS3ERR_FBIG. We believe that NFSv3 reference implementations also return NFS3ERR_FBIG when ia_size is too large. Cc: stable@vger.kernel.org Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2022-02-09NFSD: Fix ia_size underflowChuck Lever
iattr::ia_size is a loff_t, which is a signed 64-bit type. NFSv3 and NFSv4 both define file size as an unsigned 64-bit type. Thus there is a range of valid file size values an NFS client can send that is already larger than Linux can handle. Currently decode_fattr4() dumps a full u64 value into ia_size. If that value happens to be larger than S64_MAX, then ia_size underflows. I'm about to fix up the NFSv3 behavior as well, so let's catch the underflow in the common code path: nfsd_setattr(). Cc: stable@vger.kernel.org Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2022-02-09NFSD: Fix the behavior of READ near OFFSET_MAXChuck Lever
Dan Aloni reports: > Due to commit 8cfb9015280d ("NFS: Always provide aligned buffers to > the RPC read layers") on the client, a read of 0xfff is aligned up > to server rsize of 0x1000. > > As a result, in a test where the server has a file of size > 0x7fffffffffffffff, and the client tries to read from the offset > 0x7ffffffffffff000, the read causes loff_t overflow in the server > and it returns an NFS code of EINVAL to the client. The client as > a result indefinitely retries the request. The Linux NFS client does not handle NFS?ERR_INVAL, even though all NFS specifications permit servers to return that status code for a READ. Instead of NFS?ERR_INVAL, have out-of-range READ requests succeed and return a short result. Set the EOF flag in the result to prevent the client from retrying the READ request. This behavior appears to be consistent with Solaris NFS servers. Note that NFSv3 and NFSv4 use u64 offset values on the wire. These must be converted to loff_t internally before use -- an implicit type cast is not adequate for this purpose. Otherwise VFS checks against sb->s_maxbytes do not work properly. Reported-by: Dan Aloni <dan.aloni@vastdata.com> Cc: stable@vger.kernel.org Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2022-02-09Merge branch 'vlan-QinQ-leak-fix'David S. Miller
Xin Long says: ==================== vlan: fix a netdev refcnt leak for QinQ This issue can be simply reproduced by: # ip link add dummy0 type dummy # ip link add link dummy0 name dummy0.1 type vlan id 1 # ip link add link dummy0.1 name dummy0.1.2 type vlan id 2 # rmmod 8021q unregister_netdevice: waiting for dummy0.1 to become free. Usage count = 1 So as to fix it, adjust vlan_dev_uninit() in Patch 1/1 so that it won't be called twice for the same device, then do the fix in vlan_dev_uninit() in Patch 2/2. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-09vlan: move dev_put into vlan_dev_uninitXin Long
Shuang Li reported an QinQ issue by simply doing: # ip link add dummy0 type dummy # ip link add link dummy0 name dummy0.1 type vlan id 1 # ip link add link dummy0.1 name dummy0.1.2 type vlan id 2 # rmmod 8021q unregister_netdevice: waiting for dummy0.1 to become free. Usage count = 1 When rmmods 8021q, all vlan devs are deleted from their real_dev's vlan grp and added into list_kill by unregister_vlan_dev(). dummy0.1 is unregistered before dummy0.1.2, as it's using for_each_netdev() in __rtnl_kill_links(). When unregisters dummy0.1, dummy0.1.2 is not unregistered in the event of NETDEV_UNREGISTER, as it's been deleted from dummy0.1's vlan grp. However, due to dummy0.1.2 still holding dummy0.1, dummy0.1 will keep waiting in netdev_wait_allrefs(), while dummy0.1.2 will never get unregistered and release dummy0.1, as it delays dev_put until calling dev->priv_destructor, vlan_dev_free(). This issue was introduced by Commit 563bcbae3ba2 ("net: vlan: fix a UAF in vlan_dev_real_dev()"), and this patch is to fix it by moving dev_put() into vlan_dev_uninit(), which is called after NETDEV_UNREGISTER event but before netdev_wait_allrefs(). Fixes: 563bcbae3ba2 ("net: vlan: fix a UAF in vlan_dev_real_dev()") Reported-by: Shuang Li <shuali@redhat.com> Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-09vlan: introduce vlan_dev_free_egress_priorityXin Long
This patch is to introduce vlan_dev_free_egress_priority() to free egress priority for vlan dev, and keep vlan_dev_uninit() static as .ndo_uninit. It makes the code more clear and safer when adding new code in vlan_dev_uninit() in the future. Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-09libbpf: Fix compilation warning due to mismatched printf formatAndrii Nakryiko
On ppc64le architecture __s64 is long int and requires %ld. Cast to ssize_t and use %zd to avoid architecture-specific specifiers. Fixes: 4172843ed4a3 ("libbpf: Fix signedness bug in btf_dump_array_data()") Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20220209063909.1268319-1-andrii@kernel.org
2022-02-09ax25: fix UAF bugs of net_device caused by rebinding operationDuoming Zhou
The ax25_kill_by_device() will set s->ax25_dev = NULL and call ax25_disconnect() to change states of ax25_cb and sock, if we call ax25_bind() before ax25_kill_by_device(). However, if we call ax25_bind() again between the window of ax25_kill_by_device() and ax25_dev_device_down(), the values and states changed by ax25_kill_by_device() will be reassigned. Finally, ax25_dev_device_down() will deallocate net_device. If we dereference net_device in syscall functions such as ax25_release(), ax25_sendmsg(), ax25_getsockopt(), ax25_getname() and ax25_info_show(), a UAF bug will occur. One of the possible race conditions is shown below: (USE) | (FREE) ax25_bind() | | ax25_kill_by_device() ax25_bind() | ax25_connect() | ... | ax25_dev_device_down() | ... | dev_put_track(dev, ...) //FREE ax25_release() | ... ax25_send_control() | alloc_skb() //USE | the corresponding fail log is shown below: =============================================================== BUG: KASAN: use-after-free in ax25_send_control+0x43/0x210 ... Call Trace: ... ax25_send_control+0x43/0x210 ax25_release+0x2db/0x3b0 __sock_release+0x6d/0x120 sock_close+0xf/0x20 __fput+0x11f/0x420 ... Allocated by task 1283: ... __kasan_kmalloc+0x81/0xa0 alloc_netdev_mqs+0x5a/0x680 mkiss_open+0x6c/0x380 tty_ldisc_open+0x55/0x90 ... Freed by task 1969: ... kfree+0xa3/0x2c0 device_release+0x54/0xe0 kobject_put+0xa5/0x120 tty_ldisc_kill+0x3e/0x80 ... In order to fix these UAF bugs caused by rebinding operation, this patch adds dev_hold_track() into ax25_bind() and corresponding dev_put_track() into ax25_kill_by_device(). Signed-off-by: Duoming Zhou <duoming@zju.edu.cn> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-09net: usb: smsc95xx: add generic selftest supportOleksij Rempel
Provide generic selftest support. Tested with LAN9500 and LAN9512. Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-09net: ethernet: cavium: use div64_u64() instead of do_div()Wang Qing
do_div() does a 64-by-32 division. When the divisor is u64, do_div() truncates it to 32 bits, this means it can test non-zero and be truncated to zero for division. fix do_div.cocci warning: do_div() does a 64-by-32 division, please consider using div64_u64 instead. Signed-off-by: Wang Qing <wangqing@vivo.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-09net:enetc: enetc qos using the CBDR dma alloc functionPo Liu
Now we can use the enetc_cbd_alloc_data_mem() to replace complicated DMA data alloc method and CBDR memory basic seting. Signed-off-by: Po Liu <po.liu@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-09net:enetc: command BD ring data memory alloc as one function alonePo Liu
Separate the CBDR data memory alloc standalone. It is convenient for other part loading, for example the ENETC QOS part. Reported-and-suggested-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: Po Liu <po.liu@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-09net:enetc: allocate CBD ring data memory using DMA coherent methodsPo Liu
To replace the dma_map_single() stream DMA mapping with DMA coherent method dma_alloc_coherent() which is more simple. dma_map_single() found by Tim Gardner not proper. Suggested by Claudiu Manoil and Jakub Kicinski to use dma_alloc_coherent(). Discussion at: https://lore.kernel.org/netdev/AM9PR04MB8397F300DECD3C44D2EBD07796BD9@AM9PR04MB8397.eurprd04.prod.outlook.com/t/ Fixes: 888ae5a3952ba ("net: enetc: add tc flower psfp offload driver") cc: Claudiu Manoil <claudiu.manoil@nxp.com> Reported-by: Tim Gardner <tim.gardner@canonical.com> Signed-off-by: Po Liu <po.liu@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-09net: dsa: fix panic when DSA master device unbinds on shutdownVladimir Oltean
Rafael reports that on a system with LX2160A and Marvell DSA switches, if a reboot occurs while the DSA master (dpaa2-eth) is up, the following panic can be seen: systemd-shutdown[1]: Rebooting. Unable to handle kernel paging request at virtual address 00a0000800000041 [00a0000800000041] address between user and kernel address ranges Internal error: Oops: 96000004 [#1] PREEMPT SMP CPU: 6 PID: 1 Comm: systemd-shutdow Not tainted 5.16.5-00042-g8f5585009b24 #32 pc : dsa_slave_netdevice_event+0x130/0x3e4 lr : raw_notifier_call_chain+0x50/0x6c Call trace: dsa_slave_netdevice_event+0x130/0x3e4 raw_notifier_call_chain+0x50/0x6c call_netdevice_notifiers_info+0x54/0xa0 __dev_close_many+0x50/0x130 dev_close_many+0x84/0x120 unregister_netdevice_many+0x130/0x710 unregister_netdevice_queue+0x8c/0xd0 unregister_netdev+0x20/0x30 dpaa2_eth_remove+0x68/0x190 fsl_mc_driver_remove+0x20/0x5c __device_release_driver+0x21c/0x220 device_release_driver_internal+0xac/0xb0 device_links_unbind_consumers+0xd4/0x100 __device_release_driver+0x94/0x220 device_release_driver+0x28/0x40 bus_remove_device+0x118/0x124 device_del+0x174/0x420 fsl_mc_device_remove+0x24/0x40 __fsl_mc_device_remove+0xc/0x20 device_for_each_child+0x58/0xa0 dprc_remove+0x90/0xb0 fsl_mc_driver_remove+0x20/0x5c __device_release_driver+0x21c/0x220 device_release_driver+0x28/0x40 bus_remove_device+0x118/0x124 device_del+0x174/0x420 fsl_mc_bus_remove+0x80/0x100 fsl_mc_bus_shutdown+0xc/0x1c platform_shutdown+0x20/0x30 device_shutdown+0x154/0x330 __do_sys_reboot+0x1cc/0x250 __arm64_sys_reboot+0x20/0x30 invoke_syscall.constprop.0+0x4c/0xe0 do_el0_svc+0x4c/0x150 el0_svc+0x24/0xb0 el0t_64_sync_handler+0xa8/0xb0 el0t_64_sync+0x178/0x17c It can be seen from the stack trace that the problem is that the deregistration of the master causes a dev_close(), which gets notified as NETDEV_GOING_DOWN to dsa_slave_netdevice_event(). But dsa_switch_shutdown() has already run, and this has unregistered the DSA slave interfaces, and yet, the NETDEV_GOING_DOWN handler attempts to call dev_close_many() on those slave interfaces, leading to the problem. The previous attempt to avoid the NETDEV_GOING_DOWN on the master after dsa_switch_shutdown() was called seems improper. Unregistering the slave interfaces is unnecessary and unhelpful. Instead, after the slaves have stopped being uppers of the DSA master, we can now reset to NULL the master->dsa_ptr pointer, which will make DSA start ignoring all future notifier events on the master. Fixes: 0650bf52b31f ("net: dsa: be compatible with masters which unregister on shutdown") Reported-by: Rafael Richter <rafael.richter@gin.de> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-09Merge branch 'dpaa2-eth-sw-TSO'David S. Miller
Ioana Ciornei says: ==================== dpaa2-eth: add support for software TSO This series adds support for driver level TSO in the dpaa2-eth driver. The first 5 patches lay the ground work for the actual feature: rearrange some variable declaration, cleaning up the interraction with the S/G Table buffer cache etc. The 6th patch adds the actual driver level software TSO support by using the usual tso_build_hdr()/tso_build_data() APIs and creates the S/G FDs. With this patch set we can see the following improvement in a TCP flow running on a single A72@2.2GHz of the LX2160A SoC: before: 6.38Gbit/s after: 8.48Gbit/s ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-09soc: fsl: dpio: read the consumer index from the cache inhibited areaIoana Ciornei
Once we added support in the dpaa2-eth for driver level software TSO we observed the following situation: if the EQCR CI (consumer index) is read from the cache-enabled area we sometimes end up with a computed value of available enqueue entries bigger than the size of the ring. This eventually will lead to the multiple enqueue of the same FD which will determine the same FD to end up on the Tx confirmation path and the same skb being freed twice. Just read the consumer index from the cache inhibited area so that we avoid this situation. Signed-off-by: Ioana Ciornei <ioana.ciornei@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-09dpaa2-eth: add support for software TSOIoana Ciornei
This patch adds support for driver level TSO in the enetc driver using the TSO API. There is not much to say about this specific implementation. We are using the usual tso_build_hdr(), tso_build_data() to create each data segment, we create an array of S/G FDs where the first S/G entry is referencing the header data and the remaining ones the data portion. For the S/G Table buffer we use the same cache of buffers used on the other non-GSO cases - dpaa2_eth_sgt_get() and dpaa2_eth_sgt_recycle(). We cannot keep a DMA coherent buffer for all the TSO headers because the DPAA2 architecture does not work in a ring based fashion so we just allocate a buffer each time. Even with these limitations we get the following improvement in TCP termination on the LX2160A SoC, on a single A72 core running at 2.2GHz. before: 6.38Gbit/s after: 8.48Gbit/s Signed-off-by: Ioana Ciornei <ioana.ciornei@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-09dpaa2-eth: work with an array of FDsIoana Ciornei
Up until now, the __dpaa2_eth_tx function used a single FD on the stack to construct the structure to be enqueued. Since we are now preparing the ground work to add support for TSO done in software at the driver level, the same function needs to work with an array of FDs and enqueue as many as the build_*_fd functions create. Make the necessary adjustments in order to do this. These include: keeping an array of FDs in a percpu structure, cleaning up the necessary FDs before populating it and then, retrying the enqueue process up till all the generated FDs were enqueued or until we reach the maximum number retries. This patch does not change the fact that only a single FD will result from a __dpaa2_eth_tx call but rather just creates the necessary changes for the next patch. Signed-off-by: Ioana Ciornei <ioana.ciornei@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-09dpaa2-eth: use the S/G table cache also for the normal S/G pathIoana Ciornei
Instead of allocating memory for an S/G table each time a nonlinear skb is processed, and then freeing it on the Tx confirmation path, use the S/G table cache in order to reuse the memory. For this to work we have to change the size of the cached buffers so that it can hold the maximum number of scatterlist entries. Other than that, each allocate/free call is replaced by a call to the dpaa2_eth_sgt_get/dpaa2_eth_sgt_recycle functions, introduced in the previous patch. Signed-off-by: Ioana Ciornei <ioana.ciornei@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-09dpaa2-eth: extract the S/G table buffer cache interaction into functionsIoana Ciornei
The dpaa2-eth driver uses in certain circumstances a buffer cache for the S/G tables needed in case of a S/G FD. At the moment, the interraction with the cache is open-coded and couldn't be reused easily. Add two new functions - dpaa2_eth_sgt_get and dpaa2_eth_sgt_recycle - which help with code reusability. Signed-off-by: Ioana Ciornei <ioana.ciornei@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-09dpaa2-eth: allocate a fragment already alignedIoana Ciornei
Instead of allocating memory and then manually aligning it to the desired value use napi_alloc_frag_align() directly to streamline the process. Signed-off-by: Ioana Ciornei <ioana.ciornei@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-09dpaa2-eth: rearrange variable declaration in __dpaa2_eth_txIoana Ciornei
In the next patches we'll be moving things arroung in the mentioned function and also add some new variable declarations. Before all this, cleanup the variable declaration order. Signed-off-by: Ioana Ciornei <ioana.ciornei@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-09Merge branch 'octeontx2-af-priority-flow-control'David S. Miller
Hariprasad Kelam says: ==================== Priority flow control support for RVU netdev In network congestion, instead of pausing all traffic on link PFC allows user to selectively pause traffic according to its class. This series of patches add support of PFC for RVU netdev drivers. Patch1 adds support to disable pause frames by default as with PFC user can enable either PFC or 802.3 pause frames. Patch2&3 adds resource management support for flow control and configures necessary registers for PFC. Patch4 adds dcb ops registration for netdev drivers. V2 changes: Fix compilation error by exporting required symbols 'otx2_config_pause_frm' ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-09octeontx2-pf: PFC config support with DCBxHariprasad Kelam
Data centric bridging designed to eliminate packet loss due to queue overflow by adding enhancements to ethernet network such as proprity flow control etc. This patch adds support for management of Priority flow control(PFC) on Octeontx2 and CN10K interfaces. To enable PFC for all priorities dcb pfc set dev eth0 prio-pfc all:on/off To enable PFC on selected priorites dcb pfc set dev eth0 prio-pfc 0:on/off 1:on/off ..7:on/off With the ntuple commands user can map Priority to receive queues. On queue overflow NIX will assert backpressure such that PFC pause frames are genarated with mapped priority. To map priority 7 to Queue 1 ethtool -U eth0 flow-type ether dst xx:xx:xx:xx:xx:xx vlan 0xe00a m 0x1fff queue 1 Signed-off-by: Hariprasad Kelam <hkelam@marvell.com> Signed-off-by: Sunil Kovvuri Goutham <sgoutham@marvell.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-09octeontx2-af: Flow control resource managementHariprasad Kelam
CN10K MAC block (RPM) and Octeontx2 MAC block (CGX) both supports PFC flow control and 802.3X flow control pause frames. Each MAC block supports max 4 LMACS and AF driver assigns same (MAC,LMAC) to PF and its VFs. As PF and its share same (MAC,LMAC) pair we need resource management to address below scenarios 1. Maintain PFC and 8023X pause frames mutually exclusive. 2. Reject disable flow control request if other PF or Vfs enabled it. Signed-off-by: Hariprasad Kelam <hkelam@marvell.com> Signed-off-by: Sunil Kovvuri Goutham <sgoutham@marvell.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-09octeontx2-af: Priority flow control configuration supportSunil Kumar Kori
Prirority based flow control (802.1Qbb) mechanism is similar to ethernet pause frames (802.3x) instead pausing all traffic on a link, PFC allows user to selectively pause traffic according to its class. Oceteontx2 MAC block (CGX) and CN10K Mac block (RPM) both supports PFC. As upper layer mbox handler is same for both the MACs, this patch configures PFC by calling apporopritate callbacks. Signed-off-by: Sunil Kumar Kori <skori@marvell.com> Signed-off-by: Hariprasad Kelam <hkelam@marvell.com> Signed-off-by: Sunil Kovvuri Goutham <sgoutham@marvell.com> Signed-off-by: David S. Miller <davem@davemloft.net>