summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2023-01-23bpf: Reshuffle some parts of bpf/offload.cStanislav Fomichev
To avoid adding forward declarations in the main patch, shuffle some code around. No functional changes. Cc: John Fastabend <john.fastabend@gmail.com> Cc: David Ahern <dsahern@gmail.com> Cc: Martin KaFai Lau <martin.lau@linux.dev> Cc: Jakub Kicinski <kuba@kernel.org> Cc: Willem de Bruijn <willemb@google.com> Cc: Jesper Dangaard Brouer <brouer@redhat.com> Cc: Anatoly Burakov <anatoly.burakov@intel.com> Cc: Alexander Lobakin <alexandr.lobakin@intel.com> Cc: Magnus Karlsson <magnus.karlsson@gmail.com> Cc: Maryam Tahhan <mtahhan@redhat.com> Cc: xdp-hints@xdp-project.net Cc: netdev@vger.kernel.org Signed-off-by: Stanislav Fomichev <sdf@google.com> Link: https://lore.kernel.org/r/20230119221536.3349901-5-sdf@google.com Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
2023-01-23bpf: Move offload initialization into late_initcallStanislav Fomichev
So we don't have to initialize it manually from several paths. Cc: John Fastabend <john.fastabend@gmail.com> Cc: David Ahern <dsahern@gmail.com> Cc: Martin KaFai Lau <martin.lau@linux.dev> Cc: Jakub Kicinski <kuba@kernel.org> Cc: Willem de Bruijn <willemb@google.com> Cc: Jesper Dangaard Brouer <brouer@redhat.com> Cc: Anatoly Burakov <anatoly.burakov@intel.com> Cc: Alexander Lobakin <alexandr.lobakin@intel.com> Cc: Magnus Karlsson <magnus.karlsson@gmail.com> Cc: Maryam Tahhan <mtahhan@redhat.com> Cc: xdp-hints@xdp-project.net Cc: netdev@vger.kernel.org Signed-off-by: Stanislav Fomichev <sdf@google.com> Link: https://lore.kernel.org/r/20230119221536.3349901-4-sdf@google.com Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
2023-01-23bpf: Rename bpf_{prog,map}_is_dev_bound to is_offloadedStanislav Fomichev
BPF offloading infra will be reused to implement bound-but-not-offloaded bpf programs. Rename existing helpers for clarity. No functional changes. Cc: John Fastabend <john.fastabend@gmail.com> Cc: David Ahern <dsahern@gmail.com> Cc: Martin KaFai Lau <martin.lau@linux.dev> Cc: Willem de Bruijn <willemb@google.com> Cc: Jesper Dangaard Brouer <brouer@redhat.com> Cc: Anatoly Burakov <anatoly.burakov@intel.com> Cc: Alexander Lobakin <alexandr.lobakin@intel.com> Cc: Magnus Karlsson <magnus.karlsson@gmail.com> Cc: Maryam Tahhan <mtahhan@redhat.com> Cc: xdp-hints@xdp-project.net Cc: netdev@vger.kernel.org Reviewed-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Stanislav Fomichev <sdf@google.com> Link: https://lore.kernel.org/r/20230119221536.3349901-3-sdf@google.com Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
2023-01-23bpf: Document XDP RX metadataStanislav Fomichev
Document all current use-cases and assumptions. Cc: John Fastabend <john.fastabend@gmail.com> Cc: David Ahern <dsahern@gmail.com> Cc: Martin KaFai Lau <martin.lau@linux.dev> Cc: Jakub Kicinski <kuba@kernel.org> Cc: Willem de Bruijn <willemb@google.com> Cc: Jesper Dangaard Brouer <brouer@redhat.com> Cc: Anatoly Burakov <anatoly.burakov@intel.com> Cc: Alexander Lobakin <alexandr.lobakin@intel.com> Cc: Magnus Karlsson <magnus.karlsson@gmail.com> Cc: Maryam Tahhan <mtahhan@redhat.com> Cc: xdp-hints@xdp-project.net Cc: netdev@vger.kernel.org Acked-by: David Vernet <void@manifault.com> Signed-off-by: Stanislav Fomichev <sdf@google.com> Link: https://lore.kernel.org/r/20230119221536.3349901-2-sdf@google.com Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
2023-01-23nvme-fc: fix initialization orderRoss Lagerwall
ctrl->ops is used by nvme_alloc_admin_tag_set() but set by nvme_init_ctrl() so reorder the calls to avoid a NULL pointer dereference. Fixes: 6dfba1c09c10 ("nvme-fc: use the tagset alloc/free helpers") Signed-off-by: Ross Lagerwall <ross.lagerwall@citrix.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Christoph Hellwig <hch@lst.de>
2023-01-23gpiolib-acpi: Don't set GPIOs for wakeup in S3 modeMario Limonciello
commit 1796f808e4bb ("HID: i2c-hid: acpi: Stop setting wakeup_capable") adjusted the policy to enable wakeup by default if the ACPI tables indicated that a device was wake capable. It was reported however that this broke suspend on at least two System76 systems in S3 mode and two Lenovo Gen2a systems, but only with S3. When the machines are set to s2idle, wakeup behaves properly. Configuring the GPIOs for wakeup with S3 doesn't work properly, so only set it when the system supports low power idle. Fixes: 1796f808e4bb ("HID: i2c-hid: acpi: Stop setting wakeup_capable") Fixes: b38f2d5d9615c ("i2c: acpi: Use ACPI wake capability bit to set wake_irq") Link: https://gitlab.freedesktop.org/drm/amd/-/issues/2357 Link: https://bugzilla.redhat.com/show_bug.cgi?id=2162013 Reported-by: Nathan Smythe <ncsmythe@scruboak.org> Tested-by: Nathan Smythe <ncsmythe@scruboak.org> Suggested-by: Raul Rangel <rrangel@chromium.org> Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> Acked-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org>
2023-01-23nfsd: don't free files unconditionally in __nfsd_file_cache_purgeJeff Layton
nfsd_file_cache_purge is called when the server is shutting down, in which case, tearing things down is generally fine, but it also gets called when the exports cache is flushed. Instead of walking the cache and freeing everything unconditionally, handle it the same as when we have a notification of conflicting access. Fixes: ac3a2585f018 ("nfsd: rework refcounting in filecache") Reported-by: Ruben Vestergaard <rubenv@drcmr.dk> Reported-by: Torkil Svensgaard <torkil@drcmr.dk> Reported-by: Shachar Kagan <skagan@nvidia.com> Signed-off-by: Jeff Layton <jlayton@kernel.org> Tested-by: Shachar Kagan <skagan@nvidia.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2023-01-23net: mdio: mux-meson-g12a: use devm_clk_get_enabled to simplify the codeHeiner Kallweit
Use devm_clk_get_enabled() to simplify the code. Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Reviewed-by: Jerome Brunet <jbrunet@baylibre.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-01-23net: mdiobus: Convert to use fwnode_device_is_compatible()Andy Shevchenko
Replace open coded fwnode_device_is_compatible() in the driver. Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-01-23ARM: 9287/1: Reduce __thumb2__ definition to crypto files that require itNathan Chancellor
Commit 1d2e9b67b001 ("ARM: 9265/1: pass -march= only to compiler") added a __thumb2__ define to ASFLAGS to avoid build errors in the crypto code, which relies on __thumb2__ for preprocessing. Commit 59e2cf8d21e0 ("ARM: 9275/1: Drop '-mthumb' from AFLAGS_ISA") followed up on this by removing -mthumb from AFLAGS so that __thumb2__ would not be defined when the default target was ARMv7 or newer. Unfortunately, the second commit's fix assumes that the toolchain defaults to -mno-thumb / -marm, which is not the case for Debian's arm-linux-gnueabihf target, which defaults to -mthumb: $ echo | arm-linux-gnueabihf-gcc -dM -E - | grep __thumb #define __thumb2__ 1 #define __thumb__ 1 This target is used by several CI systems, which will still see redefined macro warnings, despite '-mthumb' not being present in the flags: <command-line>: warning: "__thumb2__" redefined <built-in>: note: this is the location of the previous definition Remove the global AFLAGS __thumb2__ define and move it to the crypto folder where it is required by the imported OpenSSL algorithms; the rest of the kernel should use the internal CONFIG_THUMB2_KERNEL symbol to know whether or not Thumb2 is being used or not. Be sure that __thumb2__ is undefined first so that there are no macro redefinition warnings. Link: https://github.com/ClangBuiltLinux/linux/issues/1772 Reported-by: "kernelci.org bot" <bot@kernelci.org> Suggested-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Nathan Chancellor <nathan@kernel.org> Reviewed-by: Nick Desaulniers <ndesaulniers@google.com> Tested-by: Nick Desaulniers <ndesaulniers@google.com> Fixes: 59e2cf8d21e0 ("ARM: 9275/1: Drop '-mthumb' from AFLAGS_ISA") Fixes: 1d2e9b67b001 ("ARM: 9265/1: pass -march= only to compiler") Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
2023-01-23io_uring/net: cache provided buffer group value for multishot receivesJens Axboe
If we're using ring provided buffers with multishot receive, and we end up doing an io-wq based issue at some points that also needs to select a buffer, we'll lose the initially assigned buffer group as io_ring_buffer_select() correctly clears the buffer group list as the issue isn't serialized by the ctx uring_lock. This is fine for normal receives as the request puts the buffer and finishes, but for multishot, we will re-arm and do further receives. On the next trigger for this multishot receive, the receive will try and pick from a buffer group whose value is the same as the buffer ID of the las receive. That is obviously incorrect, and will result in a premature -ENOUFS error for the receive even if we had available buffers in the correct group. Cache the buffer group value at prep time, so we can restore it for future receives. This only needs doing for the above mentioned case, but just do it by default to keep it easier to read. Cc: stable@vger.kernel.org Fixes: b3fdea6ecb55 ("io_uring: multishot recv") Fixes: 9bb66906f23e ("io_uring: support multishot in recvmsg") Cc: Dylan Yudaken <dylany@meta.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-01-23ethtool: Add and use ethnl_update_bool.David S. Miller
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-01-23ASoC: amd: yc: Add Xiaomi Redmi Book Pro 15 2022 into DMI tablefengwk
This model requires an additional detection quirk to enable the internal microphone - BIOS doesn't seem to support AcpDmicConnected (nothing in acpidump output). Signed-off-by: fengwk <fengwk94@gmail.com> Link: https://lore.kernel.org/r/Y8wmCutc74j/tyHP@arch Signed-off-by: Mark Brown <broonie@kernel.org>
2023-01-23Merge branch 'enetc-mac-merge-prep'David S. Miller
Vladimir Oltean says: ==================== ENETC MAC Merge cleanup This is a preparatory patch set for MAC Merge layer support in enetc via ethtool. It does the following: - consolidates a software lockstep register write procedure for the pMAC - detects per-port frame preemption capability and only writes pMAC registers if a pMAC exists - stops enabling the pMAC by default Additionally, I noticed some build warnings in the driver which are new in this kernel version, so patch 1/6 fixes those. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2023-01-23net: enetc: stop auto-configuring the port pMACVladimir Oltean
The pMAC (ENETC_PFPMR_PMACE) is probably unconditionally enabled in the enetc driver to allow RX of preemptible packets and not see them as error frames. I don't know why TX preemption (ENETC_MMCSR_ME) is enabled though. With no way to say which traffic classes are preemptible (all are express by default), no preemptible frames would be transmitted anyway. Lastly, it may have been believed that the register write lock-step mode (now deleted) needed the pMAC to be enabled at all times. I don't know if that's true. However, I've checked that driver writes to PM1 registers do propagate through to the ENETC IP even when the pMAC is disabled. With such incomplete support for frame preemption, it's best to just remove whatever exists right now and come with something more coherent later. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-01-23net: enetc: implement software lockstep for port MAC registersVladimir Oltean
Currently the enetc driver duplicates its writes to the PM0 registers also to PM1, but it doesn't do this consistently - for example we write to ENETC_PM0_MAXFRM but not to ENETC_PM1_MAXFRM. Create enetc_port_mac_wr() which writes both the PM0 and PM1 register with the same value (if frame preemption is supported on this port). Also create enetc_port_mac_rd() which reads from PM0 - the assumption being that PM1 contains just the same value. This will be necessary when we enable the MAC Merge layer properly, and the pMAC becomes operational. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-01-23net: enetc: stop configuring pMAC in lockstep with eMACVladimir Oltean
The MWLM bit (MAC write lock-step mode) allows register writes to the pMAC to be auto-performed whenever the corresponding eMAC register is written by the driver. This allows their configuration to remain in sync. The driver has set this bit since the initial commit, but it doesn't do anything, since the hardware feature doesn't work (and the bit has been removed from more recent versions of the documentation). The driver does attempt, more or less, to keep those MAC registers in sync by writing the same value once to e.g. ENETC_PM0_CMD_CFG (eMAC) and once to ENETC_PM1_CMD_CFG (pMAC). Because the lockstep feature doesn't work, that's what it will stick to. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-01-23net: enetc: add definition for offset between eMAC and pMAC regsVladimir Oltean
This is a preliminary patch which replaces the hardcoded 0x1000 present in other PM1 (port MAC 1, aka pMAC) register definitions, which is an offset to the PM0 (port MAC 0, aka eMAC) equivalent register. This definition will be used in more places by future code. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-01-23net: enetc: detect frame preemption hardware capabilityVladimir Oltean
Similar to other TSN features, query the Station Interface capability register to see whether preemption is supported on this port or not. On LS1028A, preemption is available on ports 0 and 2, but not on 1 and 3. This will allow us in the future to write the pMAC registers only on the ENETC ports where a pMAC actually exists. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-01-23net: enetc: build common object files into a separate moduleVladimir Oltean
The build system is complaining about the following: enetc.o is added to multiple modules: fsl-enetc fsl-enetc-vf enetc_cbdr.o is added to multiple modules: fsl-enetc fsl-enetc-vf enetc_ethtool.o is added to multiple modules: fsl-enetc fsl-enetc-vf Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-01-23Merge branch 'ethtool-mac-merge'David S. Miller
Vladimir Oltean says: ==================== ethtool support for IEEE 802.3 MAC Merge layer Change log ---------- v3->v4: - add missing opening bracket in ocelot_port_mm_irq() - moved cfg.verify_time range checking so that it actually takes place for the updated rather than old value v3 at: https://patchwork.kernel.org/project/netdevbpf/cover/20230117085947.2176464-1-vladimir.oltean@nxp.com/ v2->v3: - made get_mm return int instead of void - deleted ETHTOOL_A_MM_SUPPORTED - renamed ETHTOOL_A_MM_ADD_FRAG_SIZE to ETHTOOL_A_MM_TX_MIN_FRAG_SIZE - introduced ETHTOOL_A_MM_RX_MIN_FRAG_SIZE - cleaned up documentation - rebased on top of PLCA changes - renamed ETHTOOL_STATS_SRC_* to ETHTOOL_MAC_STATS_SRC_* v2 at: https://patchwork.kernel.org/project/netdevbpf/cover/20230111161706.1465242-1-vladimir.oltean@nxp.com/ v1->v2: I've decided to focus just on the MAC Merge layer for now, which is why I am able to submit this patch set as non-RFC. v1 (RFC) at: https://patchwork.kernel.org/project/netdevbpf/cover/20220816222920.1952936-1-vladimir.oltean@nxp.com/ What is being introduced ------------------------ TL;DR: a MAC Merge layer as defined by IEEE 802.3-2018, clause 99 (interspersing of express traffic). This is controlled through ethtool netlink (ETHTOOL_MSG_MM_GET, ETHTOOL_MSG_MM_SET). The raw ethtool commands are posted here: https://patchwork.kernel.org/project/netdevbpf/cover/20230111153638.1454687-1-vladimir.oltean@nxp.com/ The MAC Merge layer has its own statistics counters (ethtool --include-statistics --show-mm swp0) as well as two member MACs, the statistics of which can be queried individually, through a new ethtool netlink attribute, corresponding to: $ ethtool -I --show-pause eno2 --src aggregate $ ethtool -S eno2 --groups eth-mac eth-phy eth-ctrl rmon -- --src pmac The core properties of the MAC Merge layer are described in great detail in patches 02/12 and 03/12. They can be viewed in "make htmldocs" format. Devices for which the API is supported -------------------------------------- I decided to start with the Ethernet switch on NXP LS1028A (Felix) because of the smaller patch set. I also have support for the ENETC controller pending. I would like to get confirmation that the UAPI being proposed here will not restrict any use cases known by other hardware vendors. Why is support for preemptible traffic classes not here? -------------------------------------------------------- There is legitimate concern whether the 802.1Q portion of the standard (which traffic classes go to the eMAC and which to the pMAC) should be modeled in Linux using tc or using another UAPI. I think that is stalling the entire series, but should be discussed separately instead. Removing FP adminStatus support makes me confident enough to submit this patch set without an RFC tag (meaning: I wouldn't mind if it was merged as is). What is submitted here is sufficient for an LLDP daemon to do its job. I've patched openlldp to advertise and configure frame preemption: https://github.com/vladimiroltean/openlldp/tree/frame-preemption-v3 In case someone wants to try it out, here are some commands I've used. # Configure the interfaces to receive and transmit LLDP Data Units lldptool -L -i eno0 adminStatus=rxtx lldptool -L -i swp0 adminStatus=rxtx # Enable the transmission of certain TLVs on switch's interface lldptool -T -i eno0 -V addEthCap enableTx=yes lldptool -T -i swp0 -V addEthCap enableTx=yes # Query LLDP statistics on switch's interface lldptool -S -i swp0 # Query the received neighbor TLVs lldptool -i swp0 -t -n -V addEthCap Additional Ethernet Capabilities TLV Preemption capability supported Preemption capability enabled Preemption capability active Additional fragment size: 60 octets So using this patch set, lldpad will be able to advertise and configure frame preemption, but still, no data packet will be sent as preemptible over the link, because there is no UAPI to control which traffic classes are sent as preemptible and which as express. Preemptable or preemptible? --------------------------- IEEE 802.3 uses "preemptable" throughout. IEEE 802.1Q uses "preemptible" throughout. Because the definition of "preemptible" falls under 802.1Q's jurisdiction and 802.3 just references it, I went with the 802.1Q naming even where supporting an 802.3 feature. Also, checkpatch agrees with this. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2023-01-23net: mscc: ocelot: add MAC Merge layer support for VSC9959Vladimir Oltean
Felix (VSC9959) has a DEV_GMII:MM_CONFIG block composed of 2 registers (ENABLE_CONFIG and VERIF_CONFIG). Because the MAC Merge statistics and pMAC statistics are already in the Ocelot switch lib even if just Felix supports them, I'm adding support for the whole MAC Merge layer in the common Ocelot library too. There is an interrupt (shared with the PTP interrupt) which signals changes to the MM verification state. This is done because the preemptible traffic classes should be committed to hardware only once the verification procedure has declared the link partner of being capable of receiving preemptible frames. We implement ethtool getters and setters for the MAC Merge layer state. The "TX enabled" and "verify status" are taken from the IRQ handler, using a mutex to ensure serialized access. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-01-23net: mscc: ocelot: export ethtool MAC Merge stats for Felix VSC9959Vladimir Oltean
The Felix VSC9959 switch supports frame preemption and has a MAC Merge layer. In addition to the structured stats that exist for the eMAC, export the counters associated with its pMAC (pause, RMON, MAC, PHY, control) plus the high-level MAC Merge layer stats. The unstructured ethtool counters, as well as the rtnl_link_stats64 were left to report only the eMAC counters. Because statistics processing is quite self-contained in ocelot_stats.c now, I've opted for introducing an ocelot->mm_supported bool, based on which the common switch lib does everything, rather than pushing the TSN-specific code in felix_vsc9959.c, as happens for other TSN stuff. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-01-23net: mscc: ocelot: hide access to ocelot_stats_layout behind a helperVladimir Oltean
Some hardware instances of the ocelot driver support the MAC Merge layer, which gives access to an extra preemptible MAC. This has implications upon the statistics. There will be a stats layout when MM isn't supported, and a different one when it is. The ocelot_stats_layout() helper will return the correct one. In preparation of that, refactor the existing code to use this helper. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-01-23net: mscc: ocelot: allow ocelot_stat_layout elements with no nameVladimir Oltean
We will add support for pMAC counters and MAC merge layer counters, which are only reported via the structured stats, and the current ocelot_get_strings() stands in our way, because it expects that the statistics should be placed in the data array at the same index as found in the ocelot_stats_layout array. That is not true. Statistics which don't have a name should not be exported to the unstructured ethtool -S, so we need to have different indices into the ocelot_stats_layout array (i) and into the data array (data itself). Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-01-23net: dsa: add plumbing for changing and getting MAC merge layer stateVladimir Oltean
The DSA core is in charge of the ethtool_ops of the net devices associated with switch ports, so in case a hardware driver supports the MAC merge layer, DSA must pass the callbacks through to the driver. Add support for precisely that. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-01-23net: ethtool: add helpers for MM fragment size translationVladimir Oltean
We deliberately make the Linux UAPI pass the minimum fragment size in octets, even though IEEE 802.3 defines it as discrete values, and addFragSize is just the multiplier. This is because there is nothing impossible in operating with an in-between value for the fragment size of non-final preempted fragments, and there may even appear hardware which supports the in-between sizes. For the hardware which just understands the addFragSize multiplier, create two helpers which translate back and forth the values passed in octets. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-01-23net: ethtool: add helpers for aggregate statisticsVladimir Oltean
When a pMAC exists but the driver is unable to atomically query the aggregate eMAC+pMAC statistics, the user should be given back at least the sum of eMAC and pMAC counters queried separately. This is a generic problem, so add helpers in ethtool to do this operation, if the driver doesn't have a better way to report aggregate stats. Do this in a way that does not require changes to these functions when new stats are added (basically treat the structures as an array of u64 values, except for the first element which is the stats source). In include/linux/ethtool.h, there is already a section where helper function prototypes should be placed. The trouble is, this section is too early, before the definitions of struct ethtool_eth_mac_stats et.al. Move that section at the end and append these new helpers to it. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-01-23docs: ethtool: document ETHTOOL_A_STATS_SRC and ETHTOOL_A_PAUSE_STATS_SRCVladimir Oltean
Two new netlink attributes were added to PAUSE_GET and STATS_GET and their replies. Document them. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-01-23net: ethtool: netlink: retrieve stats from multiple sources (eMAC, pMAC)Vladimir Oltean
IEEE 802.3-2018 clause 99 defines a MAC Merge sublayer which contains an Express MAC and a Preemptible MAC. Both MACs are hidden to higher and lower layers and visible as a single MAC (packet classification to eMAC or pMAC on TX is done based on priority; classification on RX is done based on SFD). For devices which support a MAC Merge sublayer, it is desirable to retrieve individual packet counters from the eMAC and the pMAC, as well as aggregate statistics (their sum). Introduce a new ETHTOOL_A_STATS_SRC attribute which is part of the policy of ETHTOOL_MSG_STATS_GET and, and an ETHTOOL_A_PAUSE_STATS_SRC which is part of the policy of ETHTOOL_MSG_PAUSE_GET (accepted when ETHTOOL_FLAG_STATS is set in the common ethtool header). Both of these take values from enum ethtool_mac_stats_src, defaulting to "aggregate" in the absence of the attribute. Existing drivers do not need to pay attention to this enum which was added to all driver-facing structures, just the ones which report the MAC merge layer as supported. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-01-23docs: ethtool-netlink: document interface for MAC Merge layerVladimir Oltean
Show details about the structures passed back and forth related to MAC Merge layer configuration, state and statistics. The rendered htmldocs will be much more verbose due to the kerneldoc references. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-01-23net: ethtool: add support for MAC Merge layerVladimir Oltean
The MAC merge sublayer (IEEE 802.3-2018 clause 99) is one of 2 specifications (the other being Frame Preemption; IEEE 802.1Q-2018 clause 6.7.2), which work together to minimize latency caused by frame interference at TX. The overall goal of TSN is for normal traffic and traffic with a bounded deadline to be able to cohabitate on the same L2 network and not bother each other too much. The standards achieve this (partly) by introducing the concept of preemptible traffic, i.e. Ethernet frames that have a custom value for the Start-of-Frame-Delimiter (SFD), and these frames can be fragmented and reassembled at L2 on a link-local basis. The non-preemptible frames are called express traffic, they are transmitted using a normal SFD, and they can preempt preemptible frames, therefore having lower latency, which can matter at lower (100 Mbps) link speeds, or at high MTUs (jumbo frames around 9K). Preemption is not recursive, i.e. a P frame cannot preempt another P frame. Preemption also does not depend upon priority, or otherwise said, an E frame with prio 0 will still preempt a P frame with prio 7. In terms of implementation, the standards talk about the presence of an express MAC (eMAC) which handles express traffic, and a preemptible MAC (pMAC) which handles preemptible traffic, and these MACs are multiplexed on the same MII by a MAC merge layer. To support frame preemption, the definition of the SFD was generalized to SMD (Start-of-mPacket-Delimiter), where an mPacket is essentially an Ethernet frame fragment, or a complete frame. Stations unaware of an SMD value different from the standard SFD will treat P frames as error frames. To prevent that from happening, a negotiation process is defined. On RX, packets are dispatched to the eMAC or pMAC after being filtered by their SMD. On TX, the eMAC/pMAC classification decision is taken by the 802.1Q spec, based on packet priority (each of the 8 user priority values may have an admin-status of preemptible or express). The MAC Merge layer and the Frame Preemption parameters have some degree of independence in terms of how software stacks are supposed to deal with them. The activation of the MM layer is supposed to be controlled by an LLDP daemon (after it has been communicated that the link partner also supports it), after which a (hardware-based or not) verification handshake takes place, before actually enabling the feature. So the process is intended to be relatively plug-and-play. Whereas FP settings are supposed to be coordinated across a network using something approximating NETCONF. The support contained here is exclusively for the 802.3 (MAC Merge) portions and not for the 802.1Q (Frame Preemption) parts. This API is sufficient for an LLDP daemon to do its job. The FP adminStatus variable from 802.1Q is outside the scope of an LLDP daemon. I have taken a few creative licenses and augmented the Linux kernel UAPI compared to the standard managed objects recommended by IEEE 802.3. These are: - ETHTOOL_A_MM_PMAC_ENABLED: According to Figure 99-6: Receive Processing state diagram, a MAC Merge layer is always supposed to be able to receive P frames. However, this implies keeping the pMAC powered on, which will consume needless power in applications where FP will never be used. If LLDP is used, the reception of an Additional Ethernet Capabilities TLV from the link partner is sufficient indication that the pMAC should be enabled. So my proposal is that in Linux, we keep the pMAC turned off by default and that user space turns it on when needed. - ETHTOOL_A_MM_VERIFY_ENABLED: The IEEE managed object is called aMACMergeVerifyDisableTx. I opted for consistency (positive logic) in the boolean netlink attributes offered, so this is also positive here. Other than the meaning being reversed, they correspond to the same thing. - ETHTOOL_A_MM_MAX_VERIFY_TIME: I found it most reasonable for a LLDP daemon to maximize the verifyTime variable (delay between SMD-V transmissions), to maximize its chances that the LP replies. IEEE says that the verifyTime can range between 1 and 128 ms, but the NXP ENETC stupidly keeps this variable in a 7 bit register, so the maximum supported value is 127 ms. I could have chosen to hardcode this in the LLDP daemon to a lower value, but why not let the kernel expose its supported range directly. - ETHTOOL_A_MM_TX_MIN_FRAG_SIZE: the standard managed object is called aMACMergeAddFragSize, and expresses the "additional" fragment size (on top of ETH_ZLEN), whereas this expresses the absolute value of the fragment size. - ETHTOOL_A_MM_RX_MIN_FRAG_SIZE: there doesn't appear to exist a managed object mandated by the standard, but user space clearly needs to know what is the minimum supported fragment size of our local receiver, since LLDP must advertise a value no lower than that. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-01-23net/sock: Introduce trace_sk_data_ready()Peilin Ye
As suggested by Cong, introduce a tracepoint for all ->sk_data_ready() callback implementations. For example: <...> iperf-609 [002] ..... 70.660425: sk_data_ready: family=2 protocol=6 func=sock_def_readable iperf-609 [002] ..... 70.660436: sk_data_ready: family=2 protocol=6 func=sock_def_readable <...> Suggested-by: Cong Wang <cong.wang@bytedance.com> Signed-off-by: Peilin Ye <peilin.ye@bytedance.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-01-23ipv6: fix reachability confirmation with proxy_ndpGergely Risko
When proxying IPv6 NDP requests, the adverts to the initial multicast solicits are correct and working. On the other hand, when later a reachability confirmation is requested (on unicast), no reply is sent. This causes the neighbor entry expiring on the sending node, which is mostly a non-issue, as a new multicast request is sent. There are routers, where the multicast requests are intentionally delayed, and in these environments the current implementation causes periodic packet loss for the proxied endpoints. The root cause is the erroneous decrease of the hop limit, as this is checked in ndisc.c and no answer is generated when it's 254 instead of the correct 255. Cc: stable@vger.kernel.org Fixes: 46c7655f0b56 ("ipv6: decrease hop limit counter in ip6_forward()") Signed-off-by: Gergely Risko <gergely.risko@gmail.com> Tested-by: Gergely Risko <gergely.risko@gmail.com> Reviewed-by: David Ahern <dsahern@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-01-23Merge branch 'ethtool-mac-merge'David S. Miller
Vladimir Oltean say: ==================== ethtool support for IEEE 802.3 MAC Merge layer Change log ---------- v3->v4: - add missing opening bracket in ocelot_port_mm_irq() - moved cfg.verify_time range checking so that it actually takes place for the updated rather than old value v3 at: https://patchwork.kernel.org/project/netdevbpf/cover/20230117085947.2176464-1-vladimir.oltean@nxp.com/ v2->v3: - made get_mm return int instead of void - deleted ETHTOOL_A_MM_SUPPORTED - renamed ETHTOOL_A_MM_ADD_FRAG_SIZE to ETHTOOL_A_MM_TX_MIN_FRAG_SIZE - introduced ETHTOOL_A_MM_RX_MIN_FRAG_SIZE - cleaned up documentation - rebased on top of PLCA changes - renamed ETHTOOL_STATS_SRC_* to ETHTOOL_MAC_STATS_SRC_* v2 at: https://patchwork.kernel.org/project/netdevbpf/cover/20230111161706.1465242-1-vladimir.oltean@nxp.com/ v1->v2: I've decided to focus just on the MAC Merge layer for now, which is why I am able to submit this patch set as non-RFC. v1 (RFC) at: https://patchwork.kernel.org/project/netdevbpf/cover/20220816222920.1952936-1-vladimir.oltean@nxp.com/ What is being introduced ------------------------ TL;DR: a MAC Merge layer as defined by IEEE 802.3-2018, clause 99 (interspersing of express traffic). This is controlled through ethtool netlink (ETHTOOL_MSG_MM_GET, ETHTOOL_MSG_MM_SET). The raw ethtool commands are posted here: https://patchwork.kernel.org/project/netdevbpf/cover/20230111153638.1454687-1-vladimir.oltean@nxp.com/ The MAC Merge layer has its own statistics counters (ethtool --include-statistics --show-mm swp0) as well as two member MACs, the statistics of which can be queried individually, through a new ethtool netlink attribute, corresponding to: $ ethtool -I --show-pause eno2 --src aggregate $ ethtool -S eno2 --groups eth-mac eth-phy eth-ctrl rmon -- --src pmac The core properties of the MAC Merge layer are described in great detail in patches 02/12 and 03/12. They can be viewed in "make htmldocs" format. Devices for which the API is supported -------------------------------------- I decided to start with the Ethernet switch on NXP LS1028A (Felix) because of the smaller patch set. I also have support for the ENETC controller pending. I would like to get confirmation that the UAPI being proposed here will not restrict any use cases known by other hardware vendors. Why is support for preemptible traffic classes not here? -------------------------------------------------------- There is legitimate concern whether the 802.1Q portion of the standard (which traffic classes go to the eMAC and which to the pMAC) should be modeled in Linux using tc or using another UAPI. I think that is stalling the entire series, but should be discussed separately instead. Removing FP adminStatus support makes me confident enough to submit this patch set without an RFC tag (meaning: I wouldn't mind if it was merged as is). What is submitted here is sufficient for an LLDP daemon to do its job. I've patched openlldp to advertise and configure frame preemption: https://github.com/vladimiroltean/openlldp/tree/frame-preemption-v3 In case someone wants to try it out, here are some commands I've used. # Configure the interfaces to receive and transmit LLDP Data Units lldptool -L -i eno0 adminStatus=rxtx lldptool -L -i swp0 adminStatus=rxtx # Enable the transmission of certain TLVs on switch's interface lldptool -T -i eno0 -V addEthCap enableTx=yes lldptool -T -i swp0 -V addEthCap enableTx=yes # Query LLDP statistics on switch's interface lldptool -S -i swp0 # Query the received neighbor TLVs lldptool -i swp0 -t -n -V addEthCap Additional Ethernet Capabilities TLV Preemption capability supported Preemption capability enabled Preemption capability active Additional fragment size: 60 octets So using this patch set, lldpad will be able to advertise and configure frame preemption, but still, no data packet will be sent as preemptible over the link, because there is no UAPI to control which traffic classes are sent as preemptible and which as express. Preemptable or preemptible? --------------------------- IEEE 802.3 uses "preemptable" throughout. IEEE 802.1Q uses "preemptible" throughout. Because the definition of "preemptible" falls under 802.1Q's jurisdiction and 802.3 just references it, I went with the 802.1Q naming even where supporting an 802.3 feature. Also, checkpatch agrees with this. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2023-01-23net: ethtool: netlink: introduce ethnl_update_bool()Vladimir Oltean
Due to the fact that the kernel-side data structures have been carried over from the ioctl-based ethtool, we are now in the situation where we have an ethnl_update_bool32() function, but the plain function that operates on a boolean value kept in an actual u8 netlink attribute doesn't exist. With new ethtool features that are exposed solely over netlink, the kernel data structures will use the "bool" type, so we will need this kind of helper. Introduce it now; it's needed for things like verify-disabled for the MAC merge configuration. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-01-23net: fec: Use page_pool_put_full_page when freeing rx buffersWei Fang
The page_pool_release_page was used when freeing rx buffers, and this function just unmaps the page (if mapped) and does not recycle the page. So after hundreds of down/up the eth0, the system will out of memory. For more details, please refer to the following reproduce steps and bug logs. To solve this issue and refer to the doc of page pool, the page_pool_put_full_page should be used to replace page_pool_release_page. Because this API will try to recycle the page if the page refcnt equal to 1. After testing 20000 times, the issue can not be reproduced anymore (about testing 391 times the issue will occur on i.MX8MN-EVK before). Reproduce steps: Create the test script and run the script. The script content is as follows: LOOPS=20000 i=1 while [ $i -le $LOOPS ] do echo "TINFO:ENET $curface up and down test $i times" org_macaddr=$(cat /sys/class/net/eth0/address) ifconfig eth0 down ifconfig eth0 hw ether $org_macaddr up i=$(expr $i + 1) done sleep 5 if cat /sys/class/net/eth0/operstate | grep 'up';then echo "TEST PASS" else echo "TEST FAIL" fi Bug detail logs: TINFO:ENET up and down test 391 times [ 850.471205] Qualcomm Atheros AR8031/AR8033 30be0000.ethernet-1:00: attached PHY driver (mii_bus:phy_addr=30be0000.ethernet-1:00, irq=POLL) [ 853.535318] IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready [ 853.541694] fec 30be0000.ethernet eth0: Link is Up - 1Gbps/Full - flow control rx/tx [ 870.590531] page_pool_release_retry() stalled pool shutdown 199 inflight 60 sec [ 931.006557] page_pool_release_retry() stalled pool shutdown 199 inflight 120 sec TINFO:ENET up and down test 392 times [ 991.426544] page_pool_release_retry() stalled pool shutdown 192 inflight 181 sec [ 1051.838531] page_pool_release_retry() stalled pool shutdown 170 inflight 241 sec [ 1093.751217] Qualcomm Atheros AR8031/AR8033 30be0000.ethernet-1:00: attached PHY driver (mii_bus:phy_addr=30be0000.ethernet-1:00, irq=POLL) [ 1096.446520] page_pool_release_retry() stalled pool shutdown 308 inflight 60 sec [ 1096.831245] fec 30be0000.ethernet eth0: Link is Up - 1Gbps/Full - flow control rx/tx [ 1096.839092] IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready [ 1112.254526] page_pool_release_retry() stalled pool shutdown 103 inflight 302 sec [ 1156.862533] page_pool_release_retry() stalled pool shutdown 308 inflight 120 sec [ 1172.674516] page_pool_release_retry() stalled pool shutdown 103 inflight 362 sec [ 1217.278532] page_pool_release_retry() stalled pool shutdown 308 inflight 181 sec TINFO:ENET up and down test 393 times [ 1233.086535] page_pool_release_retry() stalled pool shutdown 103 inflight 422 sec [ 1277.698513] page_pool_release_retry() stalled pool shutdown 308 inflight 241 sec [ 1293.502525] page_pool_release_retry() stalled pool shutdown 86 inflight 483 sec [ 1338.110518] page_pool_release_retry() stalled pool shutdown 308 inflight 302 sec [ 1353.918540] page_pool_release_retry() stalled pool shutdown 32 inflight 543 sec [ 1361.179205] Qualcomm Atheros AR8031/AR8033 30be0000.ethernet-1:00: attached PHY driver (mii_bus:phy_addr=30be0000.ethernet-1:00, irq=POLL) [ 1364.255298] fec 30be0000.ethernet eth0: Link is Up - 1Gbps/Full - flow control rx/tx [ 1364.263189] IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready [ 1371.998532] page_pool_release_retry() stalled pool shutdown 310 inflight 60 sec [ 1398.530542] page_pool_release_retry() stalled pool shutdown 308 inflight 362 sec [ 1414.334539] page_pool_release_retry() stalled pool shutdown 16 inflight 604 sec [ 1432.414520] page_pool_release_retry() stalled pool shutdown 310 inflight 120 sec [ 1458.942523] page_pool_release_retry() stalled pool shutdown 308 inflight 422 sec [ 1474.750521] page_pool_release_retry() stalled pool shutdown 16 inflight 664 sec TINFO:ENET up and down test 394 times [ 1492.830522] page_pool_release_retry() stalled pool shutdown 310 inflight 181 sec [ 1519.358519] page_pool_release_retry() stalled pool shutdown 308 inflight 483 sec [ 1535.166545] page_pool_release_retry() stalled pool shutdown 2 inflight 724 sec [ 1537.090278] eth_test2.sh invoked oom-killer: gfp_mask=0x400dc0(GFP_KERNEL_ACCOUNT|__GFP_ZERO), order=0, oom_score_adj=0 [ 1537.101192] CPU: 3 PID: 2379 Comm: eth_test2.sh Tainted: G C 6.1.1+g56321e101aca #1 [ 1537.110249] Hardware name: NXP i.MX8MNano EVK board (DT) [ 1537.115561] Call trace: [ 1537.118005] dump_backtrace.part.0+0xe0/0xf0 [ 1537.122289] show_stack+0x18/0x40 [ 1537.125608] dump_stack_lvl+0x64/0x80 [ 1537.129276] dump_stack+0x18/0x34 [ 1537.132592] dump_header+0x44/0x208 [ 1537.136083] oom_kill_process+0x2b4/0x2c0 [ 1537.140097] out_of_memory+0xe4/0x594 [ 1537.143766] __alloc_pages+0xb68/0xd00 [ 1537.147521] alloc_pages+0xac/0x160 [ 1537.151013] __get_free_pages+0x14/0x40 [ 1537.154851] pgd_alloc+0x1c/0x30 [ 1537.158082] mm_init+0xf8/0x1d0 [ 1537.161228] mm_alloc+0x48/0x60 [ 1537.164368] alloc_bprm+0x7c/0x240 [ 1537.167777] do_execveat_common.isra.0+0x70/0x240 [ 1537.172486] __arm64_sys_execve+0x40/0x54 [ 1537.176502] invoke_syscall+0x48/0x114 [ 1537.180255] el0_svc_common.constprop.0+0xcc/0xec [ 1537.184964] do_el0_svc+0x2c/0xd0 [ 1537.188280] el0_svc+0x2c/0x84 [ 1537.191340] el0t_64_sync_handler+0xf4/0x120 [ 1537.195613] el0t_64_sync+0x18c/0x190 [ 1537.199334] Mem-Info: [ 1537.201620] active_anon:342 inactive_anon:10343 isolated_anon:0 [ 1537.201620] active_file:54 inactive_file:112 isolated_file:0 [ 1537.201620] unevictable:0 dirty:0 writeback:0 [ 1537.201620] slab_reclaimable:2620 slab_unreclaimable:7076 [ 1537.201620] mapped:1489 shmem:2473 pagetables:466 [ 1537.201620] sec_pagetables:0 bounce:0 [ 1537.201620] kernel_misc_reclaimable:0 [ 1537.201620] free:136672 free_pcp:96 free_cma:129241 [ 1537.240419] Node 0 active_anon:1368kB inactive_anon:41372kB active_file:216kB inactive_file:5052kB unevictable:0kB isolated(anon):0kB isolated(file):0kB s [ 1537.271422] Node 0 DMA free:541636kB boost:0kB min:30000kB low:37500kB high:45000kB reserved_highatomic:0KB active_anon:1368kB inactive_anon:41372kB actiB [ 1537.300219] lowmem_reserve[]: 0 0 0 0 [ 1537.303929] Node 0 DMA: 1015*4kB (UMEC) 743*8kB (UMEC) 417*16kB (UMEC) 235*32kB (UMEC) 116*64kB (UMEC) 25*128kB (UMEC) 4*256kB (UC) 2*512kB (UC) 0*1024kBB [ 1537.323938] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB [ 1537.332708] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=32768kB [ 1537.341292] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB [ 1537.349776] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=64kB [ 1537.358087] 2939 total pagecache pages [ 1537.361876] 0 pages in swap cache [ 1537.365229] Free swap = 0kB [ 1537.368147] Total swap = 0kB [ 1537.371065] 516096 pages RAM [ 1537.373959] 0 pages HighMem/MovableOnly [ 1537.377834] 17302 pages reserved [ 1537.381103] 163840 pages cma reserved [ 1537.384809] 0 pages hwpoisoned [ 1537.387902] Tasks state (memory values in pages): [ 1537.392652] [ pid ] uid tgid total_vm rss pgtables_bytes swapents oom_score_adj name [ 1537.401356] [ 201] 993 201 1130 72 45056 0 0 rpcbind [ 1537.409772] [ 202] 0 202 4529 1640 77824 0 -250 systemd-journal [ 1537.418861] [ 222] 0 222 4691 801 69632 0 -1000 systemd-udevd [ 1537.427787] [ 248] 994 248 20914 130 65536 0 0 systemd-timesyn [ 1537.436884] [ 497] 0 497 620 31 49152 0 0 atd [ 1537.444938] [ 500] 0 500 854 77 53248 0 0 crond [ 1537.453165] [ 503] 997 503 1470 160 49152 0 -900 dbus-daemon [ 1537.461908] [ 505] 0 505 633 24 40960 0 0 firmwared [ 1537.470491] [ 513] 0 513 2507 180 61440 0 0 ofonod [ 1537.478800] [ 514] 990 514 69640 137 81920 0 0 parsec [ 1537.487120] [ 533] 0 533 599 39 40960 0 0 syslogd [ 1537.495518] [ 534] 0 534 4546 148 65536 0 0 systemd-logind [ 1537.504560] [ 535] 0 535 690 24 45056 0 0 tee-supplicant [ 1537.513564] [ 540] 996 540 2769 168 61440 0 0 systemd-network [ 1537.522680] [ 566] 0 566 3878 228 77824 0 0 connmand [ 1537.531168] [ 645] 998 645 1538 133 57344 0 0 avahi-daemon [ 1537.540004] [ 646] 998 646 1461 64 57344 0 0 avahi-daemon [ 1537.548846] [ 648] 992 648 781 41 45056 0 0 rpc.statd [ 1537.557415] [ 650] 64371 650 590 23 45056 0 0 ninfod [ 1537.565754] [ 653] 61563 653 555 24 45056 0 0 rdisc [ 1537.573971] [ 655] 0 655 374569 2999 290816 0 -999 containerd [ 1537.582621] [ 658] 0 658 1311 20 49152 0 0 agetty [ 1537.590922] [ 663] 0 663 1529 97 49152 0 0 login [ 1537.599138] [ 666] 0 666 3430 202 69632 0 0 wpa_supplicant [ 1537.608147] [ 667] 0 667 2344 96 61440 0 0 systemd-userdbd [ 1537.617240] [ 677] 0 677 2964 314 65536 0 100 systemd [ 1537.625651] [ 679] 0 679 3720 646 73728 0 100 (sd-pam) [ 1537.634138] [ 687] 0 687 1289 403 45056 0 0 sh [ 1537.642108] [ 789] 0 789 970 93 45056 0 0 eth_test2.sh [ 1537.650955] [ 2355] 0 2355 2346 94 61440 0 0 systemd-userwor [ 1537.660046] [ 2356] 0 2356 2346 94 61440 0 0 systemd-userwor [ 1537.669137] [ 2358] 0 2358 2346 95 57344 0 0 systemd-userwor [ 1537.678258] [ 2379] 0 2379 970 93 45056 0 0 eth_test2.sh [ 1537.687098] oom-kill:constraint=CONSTRAINT_NONE,nodemask=(null),cpuset=/,mems_allowed=0,global_oom,task_memcg=/user.slice/user-0.slice/user@0.service,tas0 [ 1537.703009] Out of memory: Killed process 679 ((sd-pam)) total-vm:14880kB, anon-rss:2584kB, file-rss:0kB, shmem-rss:0kB, UID:0 pgtables:72kB oom_score_ad0 [ 1553.246526] page_pool_release_retry() stalled pool shutdown 310 inflight 241 sec Fixes: 95698ff6177b ("net: fec: using page pool to manage RX buffers") Signed-off-by: Wei Fang <wei.fang@nxp.com> Reviewed-by: shenwei wang <Shenwei.wang@nxp.com> Reviewed-by: Jesse Brandeburg <jesse.brandeburg@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-01-23ALSA: hda: Do not unset preset when cleaning up codecCezary Rojewski
Several functions that take part in codec's initialization and removal are re-used by ASoC codec drivers implementations. Drivers mimic the behavior of hda_codec_driver_probe/remove() found in sound/pci/hda/hda_bind.c with their component->probe/remove() instead. One of the reasons for that is the expectation of snd_hda_codec_device_new() to receive a valid pointer to an instance of struct snd_card. This expectation can be met only once sound card components probing commences. As ASoC sound card may be unbound without codec device being actually removed from the system, unsetting ->preset in snd_hda_codec_cleanup_for_unbind() interferes with module unload -> load scenario causing null-ptr-deref. Preset is assigned only once, during device/driver matching whereas ASoC codec driver's module reloading may occur several times throughout the lifetime of an audio stack. Suggested-by: Takashi Iwai <tiwai@suse.com> Signed-off-by: Cezary Rojewski <cezary.rojewski@intel.com> Link: https://lore.kernel.org/r/20230119143235.1159814-1-cezary.rojewski@intel.com Signed-off-by: Takashi Iwai <tiwai@suse.de>
2023-01-22Merge tag 'sched_urgent_for_v6.2_rc6' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull scheduler fixes from Borislav Petkov: - Make sure the scheduler doesn't use stale frequency scaling values when latter get disabled due to a value error - Fix a NULL pointer access on UP configs - Use the proper locking when updating CPU capacity * tag 'sched_urgent_for_v6.2_rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/aperfmperf: Erase stale arch_freq_scale values when disabling frequency invariance readings sched/core: Fix NULL pointer access fault in sched_setaffinity() with non-SMP configs sched/fair: Fixes for capacity inversion detection sched/uclamp: Fix a uninitialized variable warnings
2023-01-22Merge tag 'edac_urgent_for_v6.2_rc6' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras Pull EDAC fixes from Borislav Petkov: - Respect user-supplied polling value in the EDAC device code - Fix a use-after-free issue in qcom_edac * tag 'edac_urgent_for_v6.2_rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras: EDAC/qcom: Do not pass llcc_driv_data as edac_device_ctl_info's pvt_info EDAC/device: Respect any driver-supplied workqueue polling value
2023-01-22Merge tag 'perf_urgent_for_v6.2_rc6' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull perf fix from Borislav Petkov: - Add Emerald Rapids model support to more perf machinery * tag 'perf_urgent_for_v6.2_rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: perf/x86/intel/cstate: Add Emerald Rapids perf/x86/intel: Add Emerald Rapids
2023-01-22Merge tag 'gfs2-v6.2-rc4-fix' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2 Pull gfs2 writepage fix from Andreas Gruenbacher: - Fix a regression introduced by commit "gfs2: stop using generic_writepages in gfs2_ail1_start_one". * tag 'gfs2-v6.2-rc4-fix' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2: Revert "gfs2: stop using generic_writepages in gfs2_ail1_start_one"
2023-01-22x86/build: Move '-mindirect-branch-cs-prefix' out of GCC-only blockNathan Chancellor
LLVM 16 will have support for this flag so move it out of the GCC-only block to allow LLVM builds to take advantage of it. Signed-off-by: Nathan Chancellor <nathan@kernel.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Tested-by: Nick Desaulniers <ndesaulniers@google.com> Reviewed-by: Nick Desaulniers <ndesaulniers@google.com> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://github.com/ClangBuiltLinux/linux/issues/1665 Link: https://github.com/llvm/llvm-project/commit/6f867f9102838ebe314c1f3661fdf95700386e5a Link: https://lore.kernel.org/r/20230120165826.2469302-1-nathan@kernel.org
2023-01-22KVM: selftests: Make reclaim_period_ms input always be positiveVipin Sharma
reclaim_period_ms used to be positive only but the commit 0001725d0f9b ("KVM: selftests: Add atoi_positive() and atoi_non_negative() for input validation") incorrectly changed it to non-negative validation. Change validation to allow only positive input. Fixes: 0001725d0f9b ("KVM: selftests: Add atoi_positive() and atoi_non_negative() for input validation") Signed-off-by: Vipin Sharma <vipinsh@google.com> Reported-by: Ben Gardon <bgardon@google.com> Reviewed-by: Ben Gardon <bgardon@google.com> Reviewed-by: Sean Christopherson <seanjc@google.com> Message-Id: <20230111183408.104491-1-vipinsh@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2023-01-22KVM: x86/vmx: Do not skip segment attributes if unusable bit is setHendrik Borghorst
When serializing and deserializing kvm_sregs, attributes of the segment descriptors are stored by user space. For unusable segments, vmx_segment_access_rights skips all attributes and sets them to 0. This means we zero out the DPL (Descriptor Privilege Level) for unusable entries. Unusable segments are - contrary to their name - usable in 64bit mode and are used by guests to for example create a linear map through the NULL selector. VMENTER checks if SS.DPL is correct depending on the CS segment type. For types 9 (Execute Only) and 11 (Execute Read), CS.DPL must be equal to SS.DPL [1]. We have seen real world guests setting CS to a usable segment with DPL=3 and SS to an unusable segment with DPL=3. Once we go through an sregs get/set cycle, SS.DPL turns to 0. This causes the virtual machine to crash reproducibly. This commit changes the attribute logic to always preserve attributes for unusable segments. According to [2] SS.DPL is always saved on VM exits, regardless of the unusable bit so user space applications should have saved the information on serialization correctly. [3] specifies that besides SS.DPL the rest of the attributes of the descriptors are undefined after VM entry if unusable bit is set. So, there should be no harm in setting them all to the previous state. [1] Intel SDM Vol 3C 26.3.1.2 Checks on Guest Segment Registers [2] Intel SDM Vol 3C 27.3.2 Saving Segment Registers and Descriptor-Table Registers [3] Intel SDM Vol 3C 26.3.2.2 Loading Guest Segment Registers and Descriptor-Table Registers Cc: Alexander Graf <graf@amazon.de> Cc: stable@vger.kernel.org Signed-off-by: Hendrik Borghorst <hborghor@amazon.de> Reviewed-by: Jim Mattson <jmattson@google.com> Reviewed-by: Alexander Graf <graf@amazon.com> Message-Id: <20221114164823.69555-1-hborghor@amazon.de> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2023-01-22selftests: kvm: move declaration at the beginning of main()Paolo Bonzini
Placing a declaration of evt_reset is pedantically invalid according to the C standard. While GCC does not really care and only warns with -Wpedantic, clang ignores the declaration altogether with an error: x86_64/xen_shinfo_test.c:965:2: error: expected expression struct kvm_xen_hvm_attr evt_reset = { ^ x86_64/xen_shinfo_test.c:969:38: error: use of undeclared identifier evt_reset vm_ioctl(vm, KVM_XEN_HVM_SET_ATTR, &evt_reset); ^ Reported-by: Yu Zhang <yu.c.zhang@linux.intel.com> Reported-by: Sean Christopherson <seanjc@google.com> Fixes: a79b53aaaab5 ("KVM: x86: fix deadlock for KVM_XEN_EVTCHN_RESET", 2022-12-28) Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2023-01-22Revert "gfs2: stop using generic_writepages in gfs2_ail1_start_one"Andreas Gruenbacher
Commit b2b0a5e97855 switched from generic_writepages() to filemap_fdatawrite_wbc() in gfs2_ail1_start_one() on the path to replacing ->writepage() with ->writepages() and eventually eliminating the former. Function gfs2_ail1_start_one() is called from gfs2_log_flush(), our main function for flushing the filesystem log. Unfortunately, at least as implemented today, ->writepage() and ->writepages() are entirely different operations for journaled data inodes: while the former creates and submits transactions covering the data to be written, the latter flushes dirty buffers out to disk. With gfs2_ail1_start_one() now calling ->writepages(), we end up creating filesystem transactions while we are in the course of a log flush, which immediately deadlocks on the sdp->sd_log_flush_lock semaphore. Work around that by going back to how things used to work before commit b2b0a5e97855 for now; figuring out a superior solution will take time we don't have available right now. However ... Since the removal of generic_writepages() is imminent, open-code it here. We're already inside a blk_start_plug() ... blk_finish_plug() section here, so skip that part of the original generic_writepages(). This reverts commit b2b0a5e978552e348f85ad9c7568b630a5ede659. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> Acked-by: Christoph Hellwig <hch@lst.de>
2023-01-22Merge tag 'kvmarm-fixes-6.2-2' of ↵Paolo Bonzini
git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD KVM/arm64 fixes for 6.2, take #2 - Pass the correct address to mte_clear_page_tags() on initialising a tagged page - Plug a race against a GICv4.1 doorbell interrupt while saving the vgic-v3 pending state.
2023-01-22media: v4l2-ctrls-api.c: move ctrl->is_new = 1 to the correct lineHans Verkuil
The patch that fixed string control support somehow got mangled when it was merged in mainline: the added line ended up in the wrong place. Fix this. Fixes: 73278d483378 ("media: v4l2-ctrls-api.c: add back dropped ctrl->is_new = 1") Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl> Signed-off-by: Mauro Carvalho Chehab <mchehab@kernel.org>
2023-01-21Linux 6.2-rc5v6.2-rc5Linus Torvalds