linux.git - Linus' kernel tree

Age	Commit message (Collapse)	Author
2023-12-27	net: phy: nxp-c45-tja11xx: implement mdo_insert_tx_tag	Radu Pirea (NXP OSS)
	Implement mdo_insert_tx_tag to insert the TLV header in the ethernet frame. Signed-off-by: Radu Pirea (NXP OSS) <radu-nicolae.pirea@oss.nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-12-27	net: phy: nxp-c45-tja11xx: add MACsec statistics	Radu Pirea (NXP OSS)
	Add MACsec statistics callbacks. The statistic registers must be set to 0 if the SC/SA is deleted to read relevant values next time when the SC/SA is used. Signed-off-by: Radu Pirea (NXP OSS) <radu-nicolae.pirea@oss.nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-12-27	net: phy: nxp-c45-tja11xx: add MACsec support	Radu Pirea (NXP OSS)
	Add MACsec support. The MACsec block has four TX SCs and four RX SCs. The driver supports up to four SecY. Each SecY with one TX SC and one RX SC. The RX SCs can have two keys, key A and key B, written in hardware and enabled at the same time. The TX SCs can have two keys written in hardware, but only one can be active at a given time. On TX, the SC is selected using the MAC source address. Due of this selection mechanism, each offloaded netdev must have a unique MAC address. On RX, the SC is selected by SCI(found in SecTAG or calculated using MAC SA), or using RX SC 0 as implicit. Signed-off-by: Radu Pirea (NXP OSS) <radu-nicolae.pirea@oss.nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-12-27	net: macsec: introduce mdo_insert_tx_tag	Radu Pirea (NXP OSS)
	Offloading MACsec in PHYs requires inserting the SecTAG and the ICV in the ethernet frame. This operation will increase the frame size with up to 32 bytes. If the frames are sent at line rate, the PHY will not have enough room to insert the SecTAG and the ICV. Some PHYs use a hardware buffer to store a number of ethernet frames and, if it fills up, a pause frame is sent to the MAC to control the flow. This HW implementation does not need any modification in the stack. Other PHYs might offer to use a specific ethertype with some padding bytes present in the ethernet frame. This ethertype and its associated bytes will be replaced by the SecTAG and ICV. mdo_insert_tx_tag allows the PHY drivers to add any specific tag in the skb. Signed-off-by: Radu Pirea (NXP OSS) <radu-nicolae.pirea@oss.nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-12-27	net: macsec: revert the MAC address if mdo_upd_secy fails	Radu Pirea (NXP OSS)
	Revert the MAC address if mdo_upd_secy fails. Offloaded MACsec device might be left in an inconsistent state. Signed-off-by: Radu Pirea (NXP OSS) <radu-nicolae.pirea@oss.nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-12-27	net: macsec: documentation for macsec_context and macsec_ops	Radu Pirea (NXP OSS)
	Add description for fields of struct macsec_context and struct macsec_ops. Signed-off-by: Radu Pirea (NXP OSS) <radu-nicolae.pirea@oss.nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-12-27	net: macsec: move sci_to_cpu to macsec header	Radu Pirea (NXP OSS)
	Move sci_to_cpu to the MACsec header to use it in drivers. Signed-off-by: Radu Pirea (NXP OSS) <radu-nicolae.pirea@oss.nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-12-27	net: macsec: use skb_ensure_writable_head_tail to expand the skb	Radu Pirea (NXP OSS)
	Use skb_ensure_writable_head_tail to expand the skb if needed instead of reimplementing a similar operation. Signed-off-by: Radu Pirea (NXP OSS) <radu-nicolae.pirea@oss.nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-12-27	net: rename dsa_realloc_skb to skb_ensure_writable_head_tail	Radu Pirea (NXP OSS)
	Rename dsa_realloc_skb to skb_ensure_writable_head_tail and move it to skbuff.c to use it as helper. Signed-off-by: Radu Pirea (NXP OSS) <radu-nicolae.pirea@oss.nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-12-27	gfs2: Fix freeze consistency check in log_write_header	Andreas Gruenbacher
	Functions gfs2_freeze_super() and gfs2_thaw_super() are using the SDF_FROZEN flag to indicate when the filesystem is frozen, synchronized by sd_freeze_mutex. However, this doesn't prevent writes from happening between the point of calling thaw_super() and the point where the SDF_FROZEN flag is cleared, so the following assert can trigger in log_write_header(): gfs2_assert_withdraw(sdp, !test_bit(SDF_FROZEN, &sdp->sd_flags)); Fix that by checking for sb->s_writers.frozen != SB_FREEZE_COMPLETE in log_write_header() instead. To make sure that the filesystem-specific part of freezing happens before sb->s_writers.frozen is set to SB_FREEZE_COMPLETE, move that code from gfs2_freeze_locally() into gfs2_freeze_fs() and hook that up to the .freeze_fs operation. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2023-12-27	gfs2: Refcounting fix in gfs2_thaw_super	Andreas Gruenbacher
	It turns out that the .freeze_super and .thaw_super operations require the filesystem to manage the superblock refcount itself. We are using the freeze_super() and thaw_super() helpers to mostly take care of that for us, but this means that the superblock may no longer be around by when thaw_super() returns, and gfs2_thaw_super() will then access freed memory. Take an extra superblock reference in gfs2_thaw_super() to fix that. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2023-12-27	gfs2: Minor gfs2_{freeze,thaw}_super cleanup	Andreas Gruenbacher
	This minor cleanup to gfs2_freeze_super() and gfs2_thaw_super() prepares for the following refcounting fix. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2023-12-27	drm/i915/perf: Update handling of MMIO triggered reports	Umesh Nerlige Ramappa
	On XEHP platforms user is not able to find MMIO triggered reports in the OA buffer since i915 squashes the context ID fields. These context ID fields hold the MMIO trigger markers. Update logic to not squash the context ID fields of MMIO triggered reports. Fixes: cba94bbcff08 ("drm/i915/perf: Determine context valid in OA reports") Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com> Reviewed-by: Ashutosh Dixit <ashutosh.dixit@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20231219000543.1087706-1-umesh.nerlige.ramappa@intel.com (cherry picked from commit 0c68132df6e66244acec1bb5b9e19b0751414389) Signed-off-by: Jani Nikula <jani.nikula@intel.com>
2023-12-27	drm/i915/dp: Fix passing the correct DPCD_REV for drm_dp_set_phy_test_pattern	Khaled Almahallawy
	Using link_status to get DPCD_REV fails when disabling/defaulting phy pattern. Use intel_dp->dpcd to access DPCD_REV correctly. Fixes: 8cdf72711928 ("drm/i915/dp: Program vswing, pre-emphasis, test-pattern") Cc: Jani Nikula <jani.nikula@intel.com> Cc: Imre Deak <imre.deak@intel.com> Cc: Lee Shawn C <shawn.c.lee@intel.com> Signed-off-by: Khaled Almahallawy <khaled.almahallawy@intel.com> Signed-off-by: Jani Nikula <jani.nikula@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20231213211542.3585105-3-khaled.almahallawy@intel.com (cherry picked from commit 3ee302ec22d6e1d7d1e6d381b0d507ee80f2135c)
2023-12-27	OPP: The level field is always of unsigned int type	Viresh Kumar
	By mistake, dev_pm_opp_find_level_floor() used the level parameter as unsigned long instead of unsigned int. Fix it. Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
2023-12-26	fscrypt: document that CephFS supports fscrypt now	Eric Biggers
	The help text for CONFIG_FS_ENCRYPTION and the fscrypt.rst documentation file both list the filesystems that support fscrypt. CephFS added support for fscrypt in v6.6, so add CephFS to the list. Link: https://lore.kernel.org/r/20231227045158.87276-1-ebiggers@kernel.org Signed-off-by: Eric Biggers <ebiggers@google.com>
2023-12-26	bcachefs: Fix promotes	Kent Overstreet
	The recent work to fix data moves w.r.t. durability broke promotes, because the caused us to bail out when the extent minus pointers being dropped still has enough pointers to satisfy the current number of replicas. Disable this check when we're adding cached replicas. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-12-26	bridge: cfm: fix enum typo in br_cc_ccm_tx_parse	Lin Ma
	It appears that there is a typo in the code where the nlattr array is being parsed with policy br_cfm_cc_ccm_tx_policy, but the instance is being accessed via IFLA_BRIDGE_CFM_CC_RDI_INSTANCE, which is associated with the policy br_cfm_cc_rdi_policy. This problem was introduced by commit 2be665c3940d ("bridge: cfm: Netlink SET configuration Interface."). Though it seems like a harmless typo since these two enum owns the exact same value (1 here), it is quite misleading hence fix it by using the correct enum IFLA_BRIDGE_CFM_CC_CCM_TX_INSTANCE here. Signed-off-by: Lin Ma <linma@zju.edu.cn> Reviewed-by: Simon Horman <horms@kernel.org> Acked-by: Nikolay Aleksandrov <razor@blackwall.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-12-26	Merge branch 'mptcp-cleanups-ephemeral-port-sockopts'	David S. Miller
	Matthieu Baerts says: ==================== mptcp: cleanup and support more ephemeral ports sockopts Patch 1 is a cleanup one: mptcp_is_tcpsk() helper was modifying sock_ops in some cases which is unexpected with that name. Patch 2 to 4 add support for two socket options: IP_LOCAL_PORT_RANGE and IP_BIND_ADDRESS_NO_PORT. The first one is a preparation patch, the second one adds the support while the last one modifies an existing selftest to validate the new features. ==================== Signed-off-by: Matthieu Baerts <matttbe@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-12-26	selftests/net: add MPTCP coverage for IP_LOCAL_PORT_RANGE	Maxim Galaganov
	Since previous commit, MPTCP has support for IP_BIND_ADDRESS_NO_PORT and IP_LOCAL_PORT_RANGE sockopts. Add ip4_mptcp and ip6_mptcp fixture variants to ip_local_port_range selftest to provide selftest coverage for these sockopts. Acked-by: Mat Martineau <martineau@kernel.org> Signed-off-by: Maxim Galaganov <max@internet.ru> Signed-off-by: Matthieu Baerts <matttbe@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-12-26	mptcp: sockopt: support IP_LOCAL_PORT_RANGE and IP_BIND_ADDRESS_NO_PORT	Maxim Galaganov
	Support for IP_BIND_ADDRESS_NO_PORT sockopt was introduced in [1]. Recently [2] allowed its value to be accessed without locking the socket. Support for (newer) IP_LOCAL_PORT_RANGE sockopt was introduced in [3]. In the same series a selftest was added in [4]. This selftest also covers the IP_BIND_ADDRESS_NO_PORT sockopt. This patch enables getsockopt()/setsockopt() on MPTCP sockets for these socket options, syncing set values to subflows in sync_socket_options(). Ephemeral port range is synced to subflows, enabling NAT usecase described in [3]. [1] commit 90c337da1524 ("inet: add IP_BIND_ADDRESS_NO_PORT to overcome bind(0) limitations") [2] commit ca571e2eb7eb ("inet: move inet->bind_address_no_port to inet->inet_flags") [3] commit 91d0b78c5177 ("inet: Add IP_LOCAL_PORT_RANGE socket option") [4] commit ae5439658cce ("selftests/net: Cover the IP_LOCAL_PORT_RANGE socket option") Signed-off-by: Maxim Galaganov <max@internet.ru> Reviewed-by: Mat Martineau <martineau@kernel.org> Signed-off-by: Matthieu Baerts <matttbe@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-12-26	mptcp: rename mptcp_setsockopt_sol_ip_set_transparent()	Maxim Galaganov
	Next patch extends this function so that it's not specific to IP_TRANSPARENT. Change function name to mptcp_setsockopt_sol_ip_set(). Reviewed-by: Mat Martineau <martineau@kernel.org> Signed-off-by: Maxim Galaganov <max@internet.ru> Signed-off-by: Matthieu Baerts <matttbe@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-12-26	mptcp: don't overwrite sock_ops in mptcp_is_tcpsk()	Davide Caratti
	Eric Dumazet suggests: > The fact that mptcp_is_tcpsk() was able to write over sock->ops was a > bit strange to me. > mptcp_is_tcpsk() should answer a question, with a read-only argument. re-factor code to avoid overwriting sock_ops inside that function. Also, change the helper name to reflect the semantics and to disambiguate from its dual, sk_is_mptcp(). While at it, collapse mptcp_stream_accept() and mptcp_accept() into a single function, where fallback / non-fallback are separated into a single sk_is_mptcp() conditional. Link: https://github.com/multipath-tcp/mptcp_net-next/issues/432 Suggested-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Davide Caratti <dcaratti@redhat.com> Acked-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Matthieu Baerts <matttbe@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-12-26	net: phy: at803x: better align function varibles to open parenthesis	Christian Marangi
	Better align function variables to open parenthesis as suggested by checkpatch script for qca808x function to make code cleaner. For cable_test_get_status function some additional rework was needed to handle too long functions. Signed-off-by: Christian Marangi <ansuelsmth@gmail.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-12-26	Merge branch 'net-sched-tc-block-ports-tracking'	David S. Miller
	Victor Nogueira says: ==================== net/sched: Introduce tc block ports tracking and use __context__ The "tc block" is a collection of netdevs/ports which allow qdiscs to share match-action block instances (as opposed to the traditional tc filter per netdev/port)[1]. Up to this point in the implementation, the block is unaware of its ports. This patch makes the tc block ports available to the datapath. For the datapath we provide a use case of the tc block in a mirred action in patch 3. For users can levarage mirred to do something like the following: $ tc qdisc add dev ens7 ingress_block 22 clsact $ tc qdisc add dev ens8 ingress_block 22 clsact $ tc qdisc add dev ens9 ingress_block 22 clsact $ tc filter add block 22 protocol ip pref 25 \ flower dst_ip 192.168.0.0/16 action mirred egress mirror blockid 22 In this case, if the packet arrives on ens8, it will be copied and sent to all ports in the block excluding ens8. Note that the packet is still in the pipeline at this point - meaning other actions could be added after the mirror because mirred copies/clones the skb. Example the following is valid: $ tc filter add block 22 protocol ip pref 25 flower dst_ip 192.168.0.0/16 \ action mirred egress mirror blockid 22 \ action vlan push id 123 \ action mirred egress redirect dev dummy0 redirect behavior always steals the packet from the pipeline and therefore the skb is no longer available for a subsequent action as illustrated above (in redirecting to dummy0). The behavior of redirecting to a tc block is therefore adapted to work in the same manner. So a setup as such: $ tc qdisc add dev ens7 ingress_block 22 $ tc qdisc add dev ens8 ingress_block 22 $ tc qdisc add dev ens9 ingress_block 22 $ tc filter add block 22 protocol ip pref 25 \ flower dst_ip 192.168.0.0/16 action mirred egress redirect blockid 22 for a matching packet arriving on ens7 will first send a copy/clone to ens8 (as in the "mirror" behavior) then to ens9 as in the redirect behavior above. Once this processing is done - no other actions are able to process this skb. i.e it is removed from the "pipeline". In this case, if the packet arrives on ens8, it will be copied and sent to all ports in the block excluding ens8. Patch 1 separates/exports mirror and redirect functions from act_mirred Patch 2 introduces the required infra. Patch 3 Allows mirred to blocks Subsequent patches will come with tdc test cases. __Acknowledgements__ Suggestions from Vlad Buslov and Marcelo Ricardo Leitner made this patchset better. The idea of integrating the ports into the tc block was suggested by Jiri Pirko. [1] See commit ca46abd6f89f ("Merge branch'net-sched-allow-qdiscs-to-share-filter-block-instances'") Changes in v2: - Remove RFC tag - Add more details in patch 0(Jiri) - When CONFIG_NET_TC_SKB_EXT is selected we have unused qdisc_cb Reported-by: kernel test robot <lkp@intel.com> (and horms@kernel.org) - Fix bad dev dereference in printk of blockcast action (Simon) Changes in v3: - Add missing xa_destroy (pointed out by Vlad) - Remove bugfix pointed by Vlad (will send in separate patch) - Removed ports from subject in patch #2 and typos (suggested by Marcelo) - Remove net_notice_ratelimited debug messages in error cases (suggested by Marcelo) - Minor changes to appease sparse's lock context warning Changes in v4: - Avoid code repetition using gotos in cast_one (suggested by Paolo) - Fix typo in cover letter (pointed out by Paolo) - Create a module description for act_blockcast (reported by Paolo and CI) Changes in v5: - Add new patch which separated mirred into mirror and redirect functions (suggested by Jiri) - Instead of repeating the code to mirror in blockcast use mirror exported function by patch1 (tcf_mirror_act) - Make Block ID into act_blockcast's parameter passed by user space instead of always getting it from SKB (suggested by Jiri) - Add tx_type parameter which will specify what transmission behaviour we want (as described earlier) Changes in v6: - Remove blockcast and make it a part of mirred (suggestd by Jiri) - Block ID is now a mirred parameter - We now allow redirecting and mirroring to either ingress or egress Changes in v7: - Remove set but not used variable in tcf_mirred_act (pointed out by Jakub) Changes in v8: - Fix uapi issues (pointed out by Jiri) - Separate last patch into 3 - two as preparations for adding block ID to mirred and one allowing mirred to block (suggested by Jiri) - Remove declaration initialisation of eg_block and in_block in qdisc_block_add_dev (suggested by Jiri) - Avoid unnecessary if guards in qdisc_block_add_dev (suggested by Jiri) - Remove unncessary block_index retrieval in __qdisc_destroy (suggested by Jiri) ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2023-12-26	net/sched: act_mirred: Allow mirred to block	Victor Nogueira
	So far the mirred action has dealt with syntax that handles mirror/redirection for netdev. A matching packet is redirected or mirrored to a target netdev. In this patch we enable mirred to mirror to a tc block as well. IOW, the new syntax looks as follows: ... mirred <ingress \| egress> <mirror \| redirect> [index INDEX] < <blockid BLOCKID> \| <dev <devname>> > Examples of mirroring or redirecting to a tc block: $ tc filter add block 22 protocol ip pref 25 \ flower dst_ip 192.168.0.0/16 action mirred egress mirror blockid 22 $ tc filter add block 22 protocol ip pref 25 \ flower dst_ip 10.10.10.10/32 action mirred egress redirect blockid 22 Co-developed-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com> Co-developed-by: Pedro Tammela <pctammela@mojatatu.com> Signed-off-by: Pedro Tammela <pctammela@mojatatu.com> Signed-off-by: Victor Nogueira <victor@mojatatu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-12-26	net/sched: act_mirred: Add helper function tcf_mirred_replace_dev	Victor Nogueira
	The act of replacing a device will be repeated by the init logic for the block ID in the patch that allows mirred to a block. Therefore we encapsulate this functionality in a function (tcf_mirred_replace_dev) so that we can reuse it and avoid code repetition. Co-developed-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com> Co-developed-by: Pedro Tammela <pctammela@mojatatu.com> Signed-off-by: Pedro Tammela <pctammela@mojatatu.com> Signed-off-by: Victor Nogueira <victor@mojatatu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-12-26	net/sched: act_mirred: Create function tcf_mirred_to_dev and improve readability	Victor Nogueira
	As a preparation for adding block ID to mirred, separate the part of mirred that redirect/mirrors to a dev into a specific function so that it can be called by blockcast for each dev. Also improve readability. Eg. rename use_reinsert to dont_clone and skb2 to skb_to_send. Co-developed-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com> Co-developed-by: Pedro Tammela <pctammela@mojatatu.com> Signed-off-by: Pedro Tammela <pctammela@mojatatu.com> Signed-off-by: Victor Nogueira <victor@mojatatu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-12-26	net/sched: cls_api: Expose tc block to the datapath	Victor Nogueira
	The datapath can now find the block of the port in which the packet arrived at. In the next patch we show a possible usage of this patch in a new version of mirred that multicasts to all ports except for the port in which the packet arrived on. Co-developed-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com> Co-developed-by: Pedro Tammela <pctammela@mojatatu.com> Signed-off-by: Pedro Tammela <pctammela@mojatatu.com> Signed-off-by: Victor Nogueira <victor@mojatatu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-12-26	net/sched: Introduce tc block netdev tracking infra	Victor Nogueira
	This commit makes tc blocks track which ports have been added to them. And, with that, we'll be able to use this new information to send packets to the block's ports. Which will be done in the patch #3 of this series. Suggested-by: Jiri Pirko <jiri@nvidia.com> Co-developed-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com> Co-developed-by: Pedro Tammela <pctammela@mojatatu.com> Signed-off-by: Pedro Tammela <pctammela@mojatatu.com> Signed-off-by: Victor Nogueira <victor@mojatatu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-12-26	keys, dns: Fix missing size check of V1 server-list header	Edward Adam Davis
	The dns_resolver_preparse() function has a check on the size of the payload for the basic header of the binary-style payload, but is missing a check for the size of the V1 server-list payload header after determining that's what we've been given. Fix this by getting rid of the the pointer to the basic header and just assuming that we have a V1 server-list payload and moving the V1 server list pointer inside the if-statement. Dealing with other types and versions can be left for when such have been defined. This can be tested by doing the following with KASAN enabled: echo -n -e '\x0\x0\x1\x2' \| keyctl padd dns_resolver foo @p and produces an oops like the following: BUG: KASAN: slab-out-of-bounds in dns_resolver_preparse+0xc9f/0xd60 net/dns_resolver/dns_key.c:127 Read of size 1 at addr ffff888028894084 by task syz-executor265/5069 ... Call Trace: dns_resolver_preparse+0xc9f/0xd60 net/dns_resolver/dns_key.c:127 __key_create_or_update+0x453/0xdf0 security/keys/key.c:842 key_create_or_update+0x42/0x50 security/keys/key.c:1007 __do_sys_add_key+0x29c/0x450 security/keys/keyctl.c:134 do_syscall_x64 arch/x86/entry/common.c:52 [inline] do_syscall_64+0x40/0x110 arch/x86/entry/common.c:83 entry_SYSCALL_64_after_hwframe+0x62/0x6a This patch was originally by Edward Adam Davis, but was modified by Linus. Fixes: b946001d3bb1 ("keys, dns: Allow key types (eg. DNS) to be reclaimed immediately on expiry") Reported-and-tested-by: syzbot+94bbb75204a05da3d89f@syzkaller.appspotmail.com Link: https://lore.kernel.org/r/0000000000009b39bc060c73e209@google.com/ Suggested-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Edward Adam Davis <eadavis@qq.com> Signed-off-by: David Howells <dhowells@redhat.com> Tested-by: David Howells <dhowells@redhat.com> Cc: Edward Adam Davis <eadavis@qq.com> Cc: Jarkko Sakkinen <jarkko@kernel.org> Cc: Jeffrey E Altman <jaltman@auristor.com> Cc: Wang Lei <wang840925@gmail.com> Cc: Jeff Layton <jlayton@redhat.com> Cc: Steve French <sfrench@us.ibm.com> Cc: Marc Dionne <marc.dionne@auristor.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: Eric Dumazet <edumazet@google.com> Cc: Jakub Kicinski <kuba@kernel.org> Cc: Paolo Abeni <pabeni@redhat.com> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2023-12-26	net: remove SOCK_DEBUG macro	Denis Kirjanov
	Since there are no more users of the macro let's finally burn it Signed-off-by: Denis Kirjanov <dkirjanov@suse.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-12-26	net: remove SOCK_DEBUG leftovers	Denis Kirjanov
	SOCK_DEBUG comes from the old days. Let's move logging to standard net core ratelimited logging functions Signed-off-by: Denis Kirjanov <dkirjanov@suse.de> changes in v2: - remove SOCK_DEBUG macro altogether Signed-off-by: David S. Miller <davem@davemloft.net>
2023-12-26	octeontx2-af: Fix marking couple of structure as __packed	Suman Ghosh
	Couple of structures was not marked as __packed. This patch fixes the same and mark them as __packed. Fixes: 42006910b5ea ("octeontx2-af: cleanup KPU config data") Signed-off-by: Suman Ghosh <sumang@marvell.com> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-12-26	Merge branch 'net-smcv2.1-ISM-device-support'	David S. Miller
	Wen Gu says: ==================== net/smc: implement SMCv2.1 virtual ISM device support The fourth edition of SMCv2 adds the SMC version 2.1 feature updates for SMC-Dv2 with virtual ISM. Virtual ISM are created and supported mainly by OS or hypervisor software, comparable to IBM ISM which is based on platform firmware or hardware. With the introduction of virtual ISM, SMCv2.1 makes some updates: - Introduce feature bitmask to indicate supplemental features. - Reserve a range of CHIDs for virtual ISM. - Support extended GIDs (128 bits) in CLC handshake. So this patch set aims to implement these updates in Linux kernel. And it acts as the first part of SMC-D virtual ISM extension & loopback-ism [1]. [1] https://lore.kernel.org/netdev/1695568613-125057-1-git-send-email-guwen@linux.alibaba.com/ v8->v7: - Patch #7: v7 mistakenly changed the type of gid_ext in smc_clc_msg_accept_confirm to u64 instead of __be64 as previous versions when fixing the rebase conflicts. So fix this mistake. v7->v6: Link: https://lore.kernel.org/netdev/20231219084536.8158-1-guwen@linux.alibaba.com/ - Collect the Reviewed-by tag in v6; - Patch #3: redefine the struct smc_clc_msg_accept_confirm; - Patch #7: Because that the Patch #3 already adds '__packed' to smc_clc_msg_accept_confirm, so Patch #7 doesn't need to do the same thing. But this is a minor change, so I kept the 'Reviewed-by' tag. Other changes in previous versions but not yet acked: - Patch #1: Some minor changes in subject and fix the format issue (length exceeds 80 columns) compared to v3. - Patch #5: removes useless ini->feature_mask assignment in __smc_connect() and smc_listen_v2_check() compared to v4. - Patch #8: new added, compared to v3. v6->v5: Link: https://lore.kernel.org/netdev/1702371151-125258-1-git-send-email-guwen@linux.alibaba.com/ - Add 'Reviewed-by' label given in the previous versions: * Patch #4, #6, #9, #10 have nothing changed since v3; - Patch #2: * fix the format issue (Alignment should match open parenthesis) compared to v5; * remove useless clc->hdr.length assignment in smcr_clc_prep_confirm_accept() compared to v5; - Patch #3: new added compared to v5. - Patch #7: some minor changes like aclc_v2->aclc or clc_v2->clc compared to v5 due to the introduction of Patch #3. Since there were no major changes, I kept the 'Reviewed-by' label. Other changes in previous versions but not yet acked: - Patch #1: Some minor changes in subject and fix the format issue (length exceeds 80 columns) compared to v3. - Patch #5: removes useless ini->feature_mask assignment in __smc_connect() and smc_listen_v2_check() compared to v4. - Patch #8: new added, compared to v3. v5->v4: Link: https://lore.kernel.org/netdev/1702021259-41504-1-git-send-email-guwen@linux.alibaba.com/ - Patch #6: improve the comment of SMCD_CLC_MAX_V2_GID_ENTRIES; - Patch #4: remove useless ini->feature_mask assignment; v4->v3: https://lore.kernel.org/netdev/1701920994-73705-1-git-send-email-guwen@linux.alibaba.com/ - Patch #6: use SMCD_CLC_MAX_V2_GID_ENTRIES to indicate the max gid entries in CLC proposal and using SMC_MAX_V2_ISM_DEVS to indicate the max devices to propose; - Patch #6: use i and i+1 in smc_find_ism_v2_device_serv(); - Patch #2: replace the large if-else block in smc_clc_send_confirm_accept() with 2 subfunctions; - Fix missing byte order conversion of GID and token in CLC handshake, which is in a separate patch sending to net: https://lore.kernel.org/netdev/1701882157-87956-1-git-send-email-guwen@linux.alibaba.com/ - Patch #7: add extended GID in SMC-D lgr netlink attribute; v3->v2: https://lore.kernel.org/netdev/1701343695-122657-1-git-send-email-guwen@linux.alibaba.com/ - Rename smc_clc_fill_fce as smc_clc_fill_fce_v2x; - Remove ISM_IDENT_MASK from drivers/s390/net/ism.h; - Add explicitly assigning 'false' to ism_v2_capable in ism_dev_init(); - Remove smc_ism_set_v2_capable() helper for now, and introduce it in later loopback-ism implementation; v2->v1: - Fix sparse complaint; - Rebase to the latest net-next; ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2023-12-26	net/smc: manage system EID in SMC stack instead of ISM driver	Wen Gu
	The System EID (SEID) is an internal EID that is used by the SMCv2 software stack that has a predefined and constant value representing the s390 physical machine that the OS is executing on. So it should be managed by SMC stack instead of ISM driver and be consistent for all ISMv2 device (including virtual ISM devices) on s390 architecture. Suggested-by: Alexandra Winter <wintera@linux.ibm.com> Signed-off-by: Wen Gu <guwen@linux.alibaba.com> Reviewed-and-tested-by: Wenjia Zhang <wenjia@linux.ibm.com> Reviewed-by: Alexandra Winter <wintera@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-12-26	net/smc: disable SEID on non-s390 archs where virtual ISM may be used	Wen Gu
	The system EID (SEID) is an internal EID used by SMC-D to represent the s390 physical machine that OS is executing on. On s390 architecture, it predefined by fixed string and part of cpuid and is enabled regardless of whether underlay device is virtual ISM or platform firmware ISM. However on non-s390 architectures where SMC-D can be used with virtual ISM devices, there is no similar information to identify physical machines, especially in virtualization scenarios. So in such cases, SEID is forcibly disabled and the user-defined UEID will be used to represent the communicable space. Signed-off-by: Wen Gu <guwen@linux.alibaba.com> Reviewed-and-tested-by: Wenjia Zhang <wenjia@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-12-26	net/smc: support extended GID in SMC-D lgr netlink attribute	Wen Gu
	Virtual ISM devices introduced in SMCv2.1 requires a 128 bit extended GID vs. the existing ISM 64bit GID. So the 2nd 64 bit of extended GID should be included in SMC-D linkgroup netlink attribute as well. Signed-off-by: Wen Gu <guwen@linux.alibaba.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-12-26	net/smc: compatible with 128-bits extended GID of virtual ISM device	Wen Gu
	According to virtual ISM support feature defined by SMCv2.1, GIDs of virtual ISM device are UUIDs defined by RFC4122, which are 128-bits long. So some adaptation work is required. And note that the GIDs of existing platform firmware ISM devices still remain 64-bits long. Signed-off-by: Wen Gu <guwen@linux.alibaba.com> Reviewed-by: Alexandra Winter <wintera@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-12-26	net/smc: define a reserved CHID range for virtual ISM devices	Wen Gu
	According to virtual ISM support feature defined by SMCv2.1, CHIDs in the range 0xFF00 to 0xFFFF are reserved for use by virtual ISM devices. And two helpers are introduced to distinguish virtual ISM devices from the existing platform firmware ISM devices. Signed-off-by: Wen Gu <guwen@linux.alibaba.com> Reviewed-and-tested-by: Wenjia Zhang <wenjia@linux.ibm.com> Reviewed-by: Alexandra Winter <wintera@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-12-26	net/smc: introduce virtual ISM device support feature	Wen Gu
	This introduces virtual ISM device support feature to SMCv2.1 as the first supplemental feature. Signed-off-by: Wen Gu <guwen@linux.alibaba.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-12-26	net/smc: support SMCv2.x supplemental features negotiation	Wen Gu
	This patch adds SMCv2.x supplemental features negotiation. Supported SMCv2.x supplemental features are represented by feature_mask in FCE field. The negotiation process is as follows. Server Client Proposal(features(c-mask bits)) <----------------------------------------- Accept(features(s-mask bits)) -----------------------------------------> Confirm(features(s&c-mask bits)) <----------------------------------------- Signed-off-by: Wen Gu <guwen@linux.alibaba.com> Reviewed-and-tested-by: Wenjia Zhang <wenjia@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-12-26	net/smc: unify the structs of accept or confirm message for v1 and v2	Wen Gu
	The structs of CLC accept and confirm messages for SMCv1 and SMCv2 are separately defined and often casted to each other in the code, which may increase the risk of errors caused by future divergence of them. So unify them into one struct for better maintainability. Suggested-by: Alexandra Winter <wintera@linux.ibm.com> Signed-off-by: Wen Gu <guwen@linux.alibaba.com> Reviewed-by: Alexandra Winter <wintera@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-12-26	net/smc: introduce sub-functions for smc_clc_send_confirm_accept()	Wen Gu
	There is a large if-else block in smc_clc_send_confirm_accept() and it is better to split it into two sub-functions. Suggested-by: Alexandra Winter <wintera@linux.ibm.com> Signed-off-by: Wen Gu <guwen@linux.alibaba.com> Reviewed-by: Alexandra Winter <wintera@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-12-26	net/smc: rename some 'fce' to 'fce_v2x' for clarity	Wen Gu
	Rename some functions or variables with 'fce' in their name but used in SMCv2.1 as 'fce_v2x' for clarity. Signed-off-by: Wen Gu <guwen@linux.alibaba.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-12-26	idpf: avoid compiler introduced padding in virtchnl2_rss_key struct	Pavan Kumar Linga
	Size of the virtchnl2_rss_key struct should be 7 bytes but the compiler introduces a padding byte for the structure alignment. This results in idpf sending an additional byte of memory to the device control plane than the expected buffer size. As the control plane enforces virtchnl message size checks to validate the message, set RSS key message fails resulting in the driver load failure. Remove implicit compiler padding by using "__packed" structure attribute for the virtchnl2_rss_key struct. Also there is no need to use __DECLARE_FLEX_ARRAY macro for the 'key_flex' struct field. So drop it. Fixes: 0d7502a9b4a7 ("virtchnl: add virtchnl version 2 ops") Reviewed-by: Larysa Zaremba <larysa.zaremba@intel.com> Signed-off-by: Pavan Kumar Linga <pavan.kumar.linga@intel.com> Reviewed-by: Simon Horman <horms@kernel.org> Tested-by: Scott Register <scott.register@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2023-12-26	idpf: fix corrupted frames and skb leaks in singleq mode	Alexander Lobakin
	idpf_ring::skb serves only for keeping an incomplete frame between several NAPI Rx polling cycles, as one cycle may end up before processing the end of packet descriptor. The pointer is taken from the ring onto the stack before entering the loop and gets written there after the loop exits. When inside the loop, only the onstack pointer is used. For some reason, the logics is broken in the singleq mode, where the pointer is taken from the ring each iteration. This means that if a frame got fragmented into several descriptors, each fragment will have its own skb, but only the last one will be passed up the stack (containing garbage), leaving the rest leaked. Then, on ifdown, rxq::skb is being freed only in the splitq mode, while it can point to a valid skb in singleq as well. This can lead to a yet another skb leak. Just don't touch the ring skb field inside the polling loop, letting the onstack skb pointer work as expected: build a new skb if it's the first frame descriptor and attach a frag otherwise. On ifdown, free rxq::skb unconditionally if the pointer is non-NULL. Fixes: a5ab9ee0df0b ("idpf: add singleq start_xmit and napi poll") Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Reviewed-by: Michal Kubiak <michal.kubiak@intel.com> Reviewed-by: Simon Horman <horms@kernel.org> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com> Tested-by: Scott Register <scott.register@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2023-12-26	block: renumber QUEUE_FLAG_HW_WC	Christoph Hellwig
	For the QUEUE_FLAG_HW_WC to actually work, it needs to have a separate number from QUEUE_FLAG_FUA, doh. Fixes: 43c9835b144c ("block: don't allow enabling a cache on devices that don't support it") Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20231226081524.180289-1-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-12-25	Merge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost	Linus Torvalds
	Pull virtio fixes from Michael Tsirkin: "A couple of bugfixes: one for a regression" * tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost: virtio_blk: fix snprintf truncation compiler warning virtio_ring: fix syncs DMA memory with different direction
2023-12-25	Merge branch 'nfc-refcounting'	David S. Miller
	@ 2023-12-19 17:49 Siddh Raman Pant 2023-12-19 17:49 ` [PATCH net-next v7 1/2] nfc: llcp_core: Hold a ref to llcp_local->dev when holding a ref to llcp_local Siddh Raman Pant 2023-12-19 17:49 ` [PATCH net-next v7 2/2] nfc: Do not send datagram if socket state isn't LLCP_BOUND Siddh Raman Pant 0 siblings, 2 replies; 4+ messages in thread Siddh Raman Pant says: ==================== [PATCH net-next v7 0/2] nfc: Fix UAF during datagram sending caused by missing refcounting Changes in v7: - Stupidly reverted ordering in recv() too, fix that. - Remove redundant call to nfc_llcp_sock_free(). Changes in v6: - Revert label introduction from v4, and thus also v5 entirely. Changes in v5: - Move reason = LLCP_DM_REJ under the fail_put_sock label. - Checkpatch now warns about == NULL check for new_sk, so fix that, and also at other similar places in the same function. Changes in v4: - Fix put ordering and comments. - Separate freeing in recv() into end labels. - Remove obvious comment and add reasoning. - Picked up r-bs by Suman. Changes in v3: - Fix missing freeing statements. Changes in v2: - Add net-next in patch subject. - Removed unnecessary extra lock and hold nfc_dev ref when holding llcp_sock. - Remove last formatting patch. - Picked up r-b from Krzysztof for LLCP_BOUND patch. --- For connectionless transmission, llcp_sock_sendmsg() codepath will eventually call nfc_alloc_send_skb() which takes in an nfc_dev as an argument for calculating the total size for skb allocation. virtual_ncidev_close() codepath eventually releases socket by calling nfc_llcp_socket_release() (which sets the sk->sk_state to LLCP_CLOSED) and afterwards the nfc_dev will be eventually freed. When an ndev gets freed, llcp_sock_sendmsg() will result in an use-after-free as it (1) doesn't have any checks in place for avoiding the datagram sending. (2) calls nfc_llcp_send_ui_frame(), which also has a do-while loop which can race with freeing. This loop contains the call to nfc_alloc_send_skb() where we dereference the nfc_dev pointer. nfc_dev is being freed because we do not hold a reference to it when we hold a reference to llcp_local. Thus, virtual_ncidev_close() eventually calls nfc_release() due to refcount going to 0. Since state has to be LLCP_BOUND for datagram sending, we can bail out early in llcp_sock_sendmsg(). Please review and let me know if any errors are there, and hopefully this gets accepted. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>