Age | Commit message (Collapse) | Author |
|
commit 8eb060e10185 ("arch/riscv: add Zihintpause support") broke
building with CONFIG_CC_OPTIMIZE_FOR_SIZE enabled (gcc 11.1.0):
CC arch/riscv/kernel/vdso/vgettimeofday.o
In file included from <command-line>:
./arch/riscv/include/asm/jump_label.h: In function 'cpu_relax':
././include/linux/compiler_types.h:285:33: warning: 'asm' operand 0 probably does not match constraints
285 | #define asm_volatile_goto(x...) asm goto(x)
| ^~~
./arch/riscv/include/asm/jump_label.h:41:9: note: in expansion of macro 'asm_volatile_goto'
41 | asm_volatile_goto(
| ^~~~~~~~~~~~~~~~~
././include/linux/compiler_types.h:285:33: error: impossible constraint in 'asm'
285 | #define asm_volatile_goto(x...) asm goto(x)
| ^~~
./arch/riscv/include/asm/jump_label.h:41:9: note: in expansion of macro 'asm_volatile_goto'
41 | asm_volatile_goto(
| ^~~~~~~~~~~~~~~~~
make[1]: *** [scripts/Makefile.build:249: arch/riscv/kernel/vdso/vgettimeofday.o] Error 1
make: *** [arch/riscv/Makefile:128: vdso_prepare] Error 2
Having a static branch in cpu_relax() is problematic because that
function is widely inlined, including in some quite complex functions
like in the VDSO. A quick measurement shows this static branch is
responsible by itself for around 40% of the jump table.
Drop the static branch, which ends up being the same number of
instructions anyway. If Zihintpause is supported, we trade the nop from
the static branch for a div. If Zihintpause is unsupported, we trade the
jump from the static branch for (what gets interpreted as) a nop.
Fixes: 8eb060e10185 ("arch/riscv: add Zihintpause support")
Signed-off-by: Samuel Holland <samuel@sholland.org>
Reviewed-by: Conor Dooley <conor.dooley@microchip.com>
Cc: stable@vger.kernel.org
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
|
|
Alex Elder says:
====================
net: ipa: remaining IPA v5.0 support
This series includes almost all remaining IPA code changes required
to support IPA v5.0. IPA register definitions and configuration
data for IPA v5.0 will be sent later (soon). Note that the GSI
register definitions still require work. GSI for IPA v5.0 supports
up to 256 (rather than 32) channels, and this changes the way GSI
register offsets are calculated. A few GSI register fields also
change.
The first patch in this series increases the number of IPA endpoints
supported by the driver, from 32 to 36. The next updates the width
of the destination field for the IP_PACKET_INIT immediate command so
it can represent up to 256 endpoints rather than just 32. The next
adds a few definitions of some IPA registers and fields that are
first available in IPA v5.0.
The next two patches update the code that handles router and filter
table caches. Previously these were referred to as "hashed" tables,
and the IPv4 and IPv6 tables are now combined into one "unified"
table. The sixth and seventh patches add support for a new pulse
generator, which allows time periods to be specified with a wider
range of clock resolution. And the last patch just defines two new
memory regions that were not previously used.
====================
Link: https://lore.kernel.org/r/20230130210158.4126129-1-elder@linaro.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
IPA v5.0 uses two memory regions not previously used. Define them
and treat them as valid only for IPA v5.0.
Signed-off-by: Alex Elder <elder@linaro.org>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
The AP has third pulse generator available starting with IPA v5.0.
Redefine ipa_qtime_val() to support that possibility. Pass the IPA
pointer as an argument so the version can be determined. And stop
using the sign of the returned tick count to indicate which of two
pulse generators to use.
Instead, have the caller provide the address of a variable that will
hold the selected pulse generator for the Qtime value. And for
version 5.0, check whether the third pulse generator best represents
the time period.
Add code in ipa_qtime_config() to configure the fourth pulse
generator for IPA v5.0+; in that case configure both the third and
fourth pulse generators to use 10 msec granularity.
Consistently use "ticks" for local variables that represent a tick
count.
Signed-off-by: Alex Elder <elder@linaro.org>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Starting with IPA v5.0, the head-of-line blocking timer has more
than two pulse generators available to define timer granularity.
To prepare for that, change the way the field value is encoded
to use ipa_reg_encode() rather than ipa_reg_bit().
The aggregation granularity selection could (in principle) also use
an additional pulse generator starting with IPA v5.0. Encode the
AGGR_GRAN_SEL field differently to allow that as well.
Signed-off-by: Alex Elder <elder@linaro.org>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
IPA v5.0+ separates the configuration of entries in the cached
(previously "hashed") routing and filtering tables into distinct
registers. Previously a single "filter and router" register updated
entries in both tables at once; now the routing and filter table
caches have separate registers that define their content.
This patch updates the code that zeroes entries in the cached filter
and router tables to support IPA versions including v5.0+.
Signed-off-by: Alex Elder <elder@linaro.org>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Update the code that causes filter and router table caches to be
flushed so that it supports IPA versions 5.0+. It adds a comment in
ipa_hardware_config_hashing() that explains that cacheing does not
need to be enabled, just as before, because it's enabled by default.
(For the record, the FILT_ROUT_CACHE_CFG register would have been
used if we wanted to explicitly enable these.)
Signed-off-by: Alex Elder <elder@linaro.org>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Define some new registers that appear starting with IPA v5.0, along
with enumerated types identifying their fields. Code that uses
these will be added by upcoming patches.
Most of the new registers are related to filter and routing tables,
and in particular, their "hashed" variant. These tables are better
described as "cached", where a hash value determines which entries
are cached. From now on, naming related to this functionality will
use "cache" instead of "hash", and that is reflected in these new
register names. Some registers for managing these caches and their
contents have changed as well.
A few other new field definitions for registers (unrelated to table
caches) are also defined.
Signed-off-by: Alex Elder <elder@linaro.org>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
The IP_PACKET_INIT immediate command defines the destination
endpoint to which a packet should be sent. Prior to IPA v5.0, a
5 bit field in that command represents the endpoint, but starting
with IPA v5.0, the field is extended to 8 bits to support more than
32 endpoints.
Signed-off-by: Alex Elder <elder@linaro.org>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Increase the number of endpoints supported by the driver to 36,
which IPA v5.0 supports. This makes it impossible to check at build
time whether the supported number is too big to fit within the
(5-bit) PACKET_INIT destination endpoint field. Instead, convert
the build time check to compare against what fits in 8 bits.
Add a check in ipa_endpoint_config() to also ensure the hardware
reports an endpoint count that's in the expected range. Just
open-code 32 as the limit (the PACKET_INIT field mask is not
available where we'd want to use it).
Signed-off-by: Alex Elder <elder@linaro.org>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux
Saeed Mahameed says:
====================
mlx5-updates-2023-01-30
Add fast update encryption key
Jianbo Liu Says:
================
Data encryption keys (DEKs) are the keys used for data encryption and
decryption operations. Starting from version 22.33.0783, firmware is
optimized to accelerate the update of user keys into DEK object in
hardware. The support for bulk allocation and destruction of DEK
objects is added, and the bulk allocated DEKs are uninitialized, as
the bulk creation requires no input key. When offload
encryption/decryption, user gets one object from a bulk, and updates
key by a new "modify DEK" command. This command is the same as create
DEK object, but requires no heavy context memory allocation in
firmware, which consumes most cpu cycles of the create DEK command.
DEKs are cached internally by the NIC, so invalidating internal NIC
caches is required before reusing DEKs. The SYNC_CRYPTO command is
added to support it. DEK object can be reused, the keys in it can be
updated after this command is executed.
This patchset enhances the key creation and destruction flow, to get
use of this new feature. Any user, for example, ktls, ipsec and
macsec, can use it to offload keys. But, only ktls uses it, as others
don't need many keys, and caching two many DEKs in pool is wasteful.
There are two new data struts added:
a. DEK pool. One pool is created for each key type. The bulks by
the type, are placed in the pool's different bulk lists, according to
the number of available and in_used DEKs in the bulk.
b. DEK bulk. All DEKs in one bulk allocation are store here. There
are two bitmaps to indicate the state of each DEK.
New APIs are then added. When user need a DEK object,
a. Fetch one bulk with avail DEKs, from the partial_list or
avail_list, otherwise create new one.
b. Pick one DEK, and set its need_sync and in_used bits to 1.
Move the bulk to full_list if no more available keys, or put it to
partial_list if the bulk is newly created.
c. Update DEK object's key with user key, by the "modify DEK"
command.
d. Return DEK struct to user, then it gets the object id and fills
it into the offload commands.
When user free a DEK,
a. Set in_use bit to 0. If all need_sync bits are 1 and all in_use
bits of this bulk are 0, move it to sync_list.
b. If the number of DEKs, which are freed by users, is over the
threshold (128), schedule a workqueue to do the sync process.
For the sync process, the SYNC_CRYPTO command is executed first. Then,
for each bulks in partial_list, full_list and sync_list, reset
need_sync bits of the freed DEK objects. If all need_sync bits in one
bulk are zero, move it to avail_list.
We already supported TIS pool to recycle the TISes. With this series
and TIS pool, TLS CPS performance is improved greatly.
And we tested https on the system:
CPU: dual AMD EPYC 7763 64-Core processors
RAM: 512G
DEV: ConnectX-6 DX, with FW ver 22.33.0838 and TLS_OPTIMISE=true
TLS CPS performance numbers are:
Before: 11k connections/sec
After: 101 connections/sec
================
* tag 'mlx5-updates-2023-01-30' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux:
net/mlx5e: kTLS, Improve connection rate by using fast update encryption key
net/mlx5: Keep only one bulk of full available DEKs
net/mlx5: Add async garbage collector for DEK bulk
net/mlx5: Reuse DEKs after executing SYNC_CRYPTO command
net/mlx5: Use bulk allocation for fast update encryption key
net/mlx5: Add bulk allocation and modify_dek operation
net/mlx5: Add support SYNC_CRYPTO command
net/mlx5: Add new APIs for fast update encryption key
net/mlx5: Refactor the encryption key creation
net/mlx5: Add const to the key pointer of encryption key creation
net/mlx5: Prepare for fast crypto key update if hardware supports it
net/mlx5: Change key type to key purpose
net/mlx5: Add IFC bits and enums for crypto key
net/mlx5: Add IFC bits for general obj create param
net/mlx5: Header file for crypto
====================
Link: https://lore.kernel.org/r/20230131031201.35336-1-saeed@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Pablo Neira Ayuso says:
====================
Netfilter fixes for net
1) Release bridge info once packet escapes the br_netfilter path,
from Florian Westphal.
2) Revert incorrect fix for the SCTP connection tracking chunk
iterator, also from Florian.
First path fixes a long standing issue, the second path addresses
a mistake in the previous pull request for net.
* git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf:
Revert "netfilter: conntrack: fix bug in for_each_sctp_chunk"
netfilter: br_netfilter: disable sabotage_in hook after first suppression
====================
Link: https://lore.kernel.org/r/20230131133158.4052-1-pablo@netfilter.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
The Meson G12A Internal PHY does not support standard IEEE MMD extended
register access, therefore add generic dummy stubs to fail the read and
write MMD calls. This is necessary to prevent the core PHY code from
erroneously believing that EEE is supported by this PHY even though this
PHY does not support EEE, as MMD register access returns all FFFFs.
Fixes: 5c3407abb338 ("net: phy: meson-gxl: add g12a support")
Reviewed-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: Chris Healy <healych@amazon.com>
Reviewed-by: Jerome Brunet <jbrunet@baylibre.com>
Link: https://lore.kernel.org/r/20230130231402.471493-1-cphealy@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Commit 3a1296a38d0c ("net: Support GRO/GSO fraglist chaining.")
introduced UDP listifyed GRO. The segmentation relies on frag_list being
untouched when passing through the network stack. This assumption can be
broken sometimes, where frag_list itself gets pulled into linear area,
leaving frag_list being NULL. When this happens it can trigger
following NULL pointer dereference, and panic the kernel. Reverse the
test condition should fix it.
[19185.577801][ C1] BUG: kernel NULL pointer dereference, address:
...
[19185.663775][ C1] RIP: 0010:skb_segment_list+0x1cc/0x390
...
[19185.834644][ C1] Call Trace:
[19185.841730][ C1] <TASK>
[19185.848563][ C1] __udp_gso_segment+0x33e/0x510
[19185.857370][ C1] inet_gso_segment+0x15b/0x3e0
[19185.866059][ C1] skb_mac_gso_segment+0x97/0x110
[19185.874939][ C1] __skb_gso_segment+0xb2/0x160
[19185.883646][ C1] udp_queue_rcv_skb+0xc3/0x1d0
[19185.892319][ C1] udp_unicast_rcv_skb+0x75/0x90
[19185.900979][ C1] ip_protocol_deliver_rcu+0xd2/0x200
[19185.910003][ C1] ip_local_deliver_finish+0x44/0x60
[19185.918757][ C1] __netif_receive_skb_one_core+0x8b/0xa0
[19185.927834][ C1] process_backlog+0x88/0x130
[19185.935840][ C1] __napi_poll+0x27/0x150
[19185.943447][ C1] net_rx_action+0x27e/0x5f0
[19185.951331][ C1] ? mlx5_cq_tasklet_cb+0x70/0x160 [mlx5_core]
[19185.960848][ C1] __do_softirq+0xbc/0x25d
[19185.968607][ C1] irq_exit_rcu+0x83/0xb0
[19185.976247][ C1] common_interrupt+0x43/0xa0
[19185.984235][ C1] asm_common_interrupt+0x22/0x40
...
[19186.094106][ C1] </TASK>
Fixes: 3a1296a38d0c ("net: Support GRO/GSO fraglist chaining.")
Suggested-by: Daniel Borkmann <daniel@iogearbox.net>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: Yan Zhai <yan@cloudflare.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/r/Y9gt5EUizK1UImEP@debian
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
When memory allocation fails in lynx_pcs_create() and it returns NULL,
there remains a dangling reference to the mdiodev returned by
of_mdio_find_device() which is leaked as soon as memac_pcs_create()
returns empty-handed.
Fixes: a7c2a32e7f22 ("net: fman: memac: Use lynx pcs driver")
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Sean Anderson <sean.anderson@seco.com>
Acked-by: Madalin Bucur <madalin.bucur@oss.nxp.com>
Link: https://lore.kernel.org/r/20230130193051.563315-1-vladimir.oltean@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue
Tony Nguyen says:
====================
Intel Wired LAN: Remove redundant Device Control Error Reporting Enable
Bjorn Helgaas says:
Since f26e58bf6f54 ("PCI/AER: Enable error reporting when AER is native"),
the PCI core sets the Device Control bits that enable error reporting for
PCIe devices.
This series removes redundant calls to pci_enable_pcie_error_reporting()
that do the same thing from several NIC drivers.
There are several more drivers where this should be removed; I started with
just the Intel drivers here.
* '10GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue:
ixgbe: Remove redundant pci_enable_pcie_error_reporting()
igc: Remove redundant pci_enable_pcie_error_reporting()
igb: Remove redundant pci_enable_pcie_error_reporting()
ice: Remove redundant pci_enable_pcie_error_reporting()
iavf: Remove redundant pci_enable_pcie_error_reporting()
i40e: Remove redundant pci_enable_pcie_error_reporting()
fm10k: Remove redundant pci_enable_pcie_error_reporting()
e1000e: Remove redundant pci_enable_pcie_error_reporting()
====================
Link: https://lore.kernel.org/r/20230130192519.686446-1-anthony.l.nguyen@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Petr Machata says:
====================
selftests: mlxsw: Convert to iproute2 dcb
There is a dedicated tool for configuration of DCB in iproute2. Use it
in the selftests instead of lldpad.
Patches #1-#3 convert three tests. Patch #4 drops the now-unnecessary
lldpad helpers.
====================
Link: https://lore.kernel.org/r/cover.1675096231.git.petrm@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
The existing users of these helpers have been converted to iproute2 dcb.
Drop the helpers.
Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Danielle Ratson <danieller@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Set up default port priority through the iproute2 dcb tool, which is easier
to understand and manage.
Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Danielle Ratson <danieller@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Set up DSCP prioritization through the iproute2 dcb tool, which is easier
to understand and manage.
Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Danielle Ratson <danieller@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Set up DSCP prioritization through the iproute2 dcb tool, which is easier
to understand and manage.
Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Danielle Ratson <danieller@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
It tries to avoid the frequently hb_timer refresh in commit ba6f5e33bdbb
("sctp: avoid refreshing heartbeat timer too often"), and it only allows
mod_timer when the new expires is after hb_timer.expires. It means even
a much shorter interval for hb timer gets applied, it will have to wait
until the current hb timer to time out.
In sctp_do_8_2_transport_strike(), when a transport enters PF state, it
expects to update the hb timer to resend a heartbeat every rto after
calling sctp_transport_reset_hb_timer(), which will not work as the
change mentioned above.
The frequently hb_timer refresh was caused by sctp_transport_reset_timers()
called in sctp_outq_flush() and it was already removed in the commit above.
So we don't have to check hb_timer.expires when resetting hb_timer as it is
now not called very often.
Fixes: ba6f5e33bdbb ("sctp: avoid refreshing heartbeat timer too often")
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Link: https://lore.kernel.org/r/d958c06985713ec84049a2d5664879802710179a.1675095933.git.lucien.xin@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Jerome Brunet says:
====================
net: mdio: add amlogic gxl mdio mux support
Add support for the MDIO multiplexer found in the Amlogic GXL SoC family.
This multiplexer allows to choose between the external (SoC pins) MDIO bus,
or the internal one leading to the integrated 10/100M PHY.
This multiplexer has been handled with the mdio-mux-mmioreg generic driver
so far. When it was added, it was thought the logic was handled by a
single register.
It turns out more than a single register need to be properly set.
As long as the device is using the Amlogic vendor bootloader, or upstream
u-boot with net support, it is working fine since the kernel is inheriting
the bootloader settings. Without net support in the bootloader, this glue
comes unset in the kernel and only the external path may operate properly.
With this driver (and the associated change in
arch/arm64/boot/dts/amlogic/meson-gxl.dtsi), the kernel no longer relies
on the bootloader to set things up, fixing the problem.
====================
Link: https://lore.kernel.org/r/20230130151616.375168-1-jbrunet@baylibre.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Add support for the mdio mux and internal phy glue of the GXL SoC
family
Reported-by: Da Xue <da@lessconfused.com>
Signed-off-by: Jerome Brunet <jbrunet@baylibre.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Add documentation for the MDIO bus multiplexer found on the Amlogic GXL
SoC family
Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Signed-off-by: Jerome Brunet <jbrunet@baylibre.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Jakub Kicinski says:
====================
tools: ynl: more docs and basic ethtool support
I got discouraged from supporting ethtool in specs, because
generating the user space C code seems a little tricky.
The messages are ID'ed in a "directional" way (to and from
kernel are separate ID "spaces"). There is value, however,
in having the spec and being able to for example use it
in Python.
After paying off some technical debt - add a partial
ethtool spec. Partial because the header for ethtool is almost
a 1000 LoC, so converting in one sitting is tough. But adding
new commands should be trivial now.
Last but not least I add more docs, I realized that I've been
sending a similar "instructions" email to people working on
new families. It's now intro-specs.rst.
====================
Link: https://lore.kernel.org/r/20230131023354.1732677-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
The scripts require Python 3 and some distros are dropping
Python 2 support.
Reported-by: Stanislav Fomichev <sdf@google.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
We have a bit of documentation about the internals of Netlink
and the specs, but really the goal is for most people to not
worry about those. Add a practical guide for beginners who
want to poke at the specs.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Ethtool is one of the most actively developed families.
With the changes to the CLI it should be possible to use
the YNL based code for easy prototyping and development.
Add a partial family definition. I've tested the string
set and rings. I don't have any MAC Merge implementation
to test with, but I added the definition for it, anyway,
because it's last. New commands can simply be added at
the end without having to worry about manually providing
IDs / values.
Set (with notification support - None is the response,
the data is from the notification):
$ sudo ./tools/net/ynl/cli.py \
--spec Documentation/netlink/specs/ethtool.yaml \
--do rings-set \
--json '{"header":{"dev-name":"enp0s31f6"}, "rx":129}' \
--subscribe monitor
None
[{'msg': {'header': {'dev-index': 2, 'dev-name': 'enp0s31f6'},
'rx': 136,
'rx-max': 4096,
'tx': 256,
'tx-max': 4096,
'tx-push': 0},
'name': 'rings-ntf'}]
Do / dump (yes, the kernel requires that even for dump and even
if empty - the "header" nest must be there):
$ ./tools/net/ynl/cli.py \
--spec Documentation/netlink/specs/ethtool.yaml \
--do rings-get \
--json '{"header":{"dev-index": 2}}'
{'header': {'dev-index': 2, 'dev-name': 'enp0s31f6'},
'rx': 136,
'rx-max': 4096,
'tx': 256,
'tx-max': 4096,
'tx-push': 0}
$ ./tools/net/ynl/cli.py \
--spec Documentation/netlink/specs/ethtool.yaml \
--dump rings-get \
--json '{"header":{}}'
[{'header': {'dev-index': 2, 'dev-name': 'enp0s31f6'},
'rx': 136,
'rx-max': 4096,
'tx': 256,
'tx-max': 4096,
'tx-push': 0},
{'header': {'dev-index': 3, 'dev-name': 'wlp0s20f3'}, 'tx-push': 0},
{'header': {'dev-index': 19, 'dev-name': 'enp58s0u1u1'},
'rx': 100,
'rx-max': 4096,
'tx-push': 0}]
And error reporting:
$ ./tools/net/ynl/cli.py \
--spec Documentation/netlink/specs/ethtool.yaml \
--dump rings-get \
--json '{"header":{"flags":5}}'
Netlink error: Invalid argument
nl_len = 68 (52) nl_flags = 0x300 nl_type = 2
error: -22 extack: {'msg': 'reserved bit set',
'bad-attr-offs': 24,
'bad-attr': '.header.flags'}
None
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
I had a (bright?) idea of introducing the concept of enum-models
to account for all the weird ways families enumerate their messages.
I've never finished it because generating C code for each of them
is pretty daunting. But for languages which can use ID values directly
the support is simple enough, so clean this up a bit.
"unified" model is what I recommend going forward.
"directional" model is what ethtool uses.
"notify-split" is used by the proposed DPLL code, but we can just
make them use "unified", it hasn't been merged :)
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
The CLI script tries to validate jsonschema by default.
It's seems better to validate too many times than too few.
However, when copying the scripts to random servers having
to install jsonschema is tedious. Load jsonschema via
importlib, and let the user opt out.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
When I wrote the first version of the Python code I was quite
excited that we can generate class methods directly from the
spec. Unfortunately we need to use valid identifiers for method
names (specifically no dashes are allowed). Don't reuse those
names on the CLI, it's much more natural to use the operation
names exactly as listed in the spec.
Instead of:
./cli --do rings_get
use:
./cli --do rings-get
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
One of my favorite features of the Netlink specs is that they
make decoding structured extack a ton easier.
Implement pretty printing bad attribute names in YNL.
For example it will now say:
'bad-attr': '.header.flags'
rather than the useless:
'bad-attr-offs': 32
Proof:
$ ./cli.py --spec ethtool.yaml --do rings_get \
--json '{"header":{"dev-index":1, "flags":4}}'
Netlink error: Invalid argument
nl_len = 68 (52) nl_flags = 0x300 nl_type = 2
error: -22 extack: {'msg': 'reserved bit set',
'bad-attr': '.header.flags'}
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Ethtool uses mutli-attr, add the support to YNL.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Support families which use different IDs for messages
to and from the kernel.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Ethtool needs support for handful of extra types.
It doesn't have the definitions section yet.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Adapt the common object hierarchy in code gen and CLI.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
There's a lot of copy and pasting going on between the "cli"
and code gen when it comes to representing the parsed spec.
Create a library which both can use.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Move the CLI code out of samples/ and the library part
of it into tools/net/ynl/lib/. This way we can start
sharing some code with the code gen.
Initially I thought that code gen is too C-specific to
share anything but basic stuff like calculating values
for enums can easily be shared.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
An earlier fix tried to address generated code jumping around
one code-gen run to another. Turns out dict()s are already
ordered since Python 3.7, the problem is that we iterate over
operation modes using a set(). Sets are unordered in Python.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
On Systems where online memory is lesser compared to max memory, the
kexec_file_load system call may fail to load the kdump kernel with the
below errors:
"Failed to update fdt with linux,drconf-usable-memory property"
"Error setting up usable-memory property for kdump kernel"
This happens because the size estimation for usable memory properties
for the kdump kernel's FDT is based on the online memory whereas the
usable memory properties include max memory. In short, the hot-pluggable
memory is not accounted for while estimating the size of the usable
memory properties.
The issue is addressed by calculating usable memory property size using
max hotplug address instead of the last online memory address.
Fixes: 2377c92e37fe ("powerpc/kexec_file: fix FDT size estimation for kdump kernel")
Signed-off-by: Sourabh Jain <sourabhjain@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20230131030615.729894-1-sourabhjain@linux.ibm.com
|
|
As commit 18365225f044 ("hwpoison, memcg: forcibly uncharge LRU pages"),
hwpoison will forcibly uncharg a LRU hwpoisoned page, the folio_memcg
could be NULl, then, mem_cgroup_track_foreign_dirty_slowpath() could
occurs a NULL pointer dereference, let's do not record the foreign
writebacks for folio memcg is null in mem_cgroup_track_foreign_dirty() to
fix it.
Link: https://lkml.kernel.org/r/20230129040945.180629-1-wangkefeng.wang@huawei.com
Fixes: 97b27821b485 ("writeback, memcg: Implement foreign dirty flushing")
Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com>
Reported-by: Ma Wupeng <mawupeng1@huawei.com>
Tested-by: Miko Larsson <mikoxyzzz@gmail.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Kefeng Wang <wangkefeng.wang@huawei.com>
Cc: Ma Wupeng <mawupeng1@huawei.com>
Cc: Naoya Horiguchi <naoya.horiguchi@nec.com>
Cc: Shakeel Butt <shakeelb@google.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
The correct file path for SCHED_DEBUG is /sys/kernel/debug/sched.
Link: https://lkml.kernel.org/r/202301291013573466558@zte.com.cn
Signed-off-by: ye xingchen <ye.xingchen@zte.com.cn>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Geert Uytterhoeven <geert+renesas@glider.be>
Cc: Josh Poimboeuf <jpoimboe@kernel.org>
Cc: Kees Cook <keescook@chromium.org>
Cc: Miguel Ojeda <ojeda@kernel.org>
Cc: Nathan Chancellor <nathan@kernel.org>
Cc: Nick Desaulniers <ndesaulniers@google.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Zhaoyang Huang <zhaoyang.huang@unisoc.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
The softlockup still occurs in get_swap_pages() under memory pressure. 64
CPU cores, 64GB memory, and 28 zram devices, the disksize of each zram
device is 50MB with same priority as si. Use the stress-ng tool to
increase memory pressure, causing the system to oom frequently.
The plist_for_each_entry_safe() loops in get_swap_pages() could reach tens
of thousands of times to find available space (extreme case:
cond_resched() is not called in scan_swap_map_slots()). Let's add
cond_resched() into get_swap_pages() when failed to find available space
to avoid softlockup.
Link: https://lkml.kernel.org/r/20230128094757.1060525-1-xialonglong1@huawei.com
Signed-off-by: Longlong Xia <xialonglong1@huawei.com>
Reviewed-by: "Huang, Ying" <ying.huang@intel.com>
Cc: Chen Wandun <chenwandun@huawei.com>
Cc: Huang Ying <ying.huang@intel.com>
Cc: Kefeng Wang <wangkefeng.wang@huawei.com>
Cc: Nanyong Sun <sunnanyong@huawei.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Mirsad report the below error which is caused by stack_depot_init()
failure in kvcalloc. Solve this by having stackdepot use
stack_depot_early_init().
On 1/4/23 17:08, Mirsad Goran Todorovac wrote:
I hate to bring bad news again, but there seems to be a problem with the output of /sys/kernel/debug/kmemleak:
[root@pc-mtodorov ~]# cat /sys/kernel/debug/kmemleak
unreferenced object 0xffff951c118568b0 (size 16):
comm "kworker/u12:2", pid 56, jiffies 4294893952 (age 4356.548s)
hex dump (first 16 bytes):
6d 65 6d 73 74 69 63 6b 30 00 00 00 00 00 00 00 memstick0.......
backtrace:
[root@pc-mtodorov ~]#
Apparently, backtrace of called functions on the stack is no longer
printed with the list of memory leaks. This appeared on Lenovo desktop
10TX000VCR, with AlmaLinux 8.7 and BIOS version M22KT49A (11/10/2022) and
6.2-rc1 and 6.2-rc2 builds. This worked on 6.1 with the same
CONFIG_KMEMLEAK=y and MGLRU enabled on a vanilla mainstream kernel from
Mr. Torvalds' tree. I don't know if this is deliberate feature for some
reason or a bug. Please find attached the config, lshw and kmemleak
output.
[vbabka@suse.cz: remove stack_depot_init() call]
Link: https://lore.kernel.org/all/5272a819-ef74-65ff-be61-4d2d567337de@alu.unizg.hr/
Link: https://lkml.kernel.org/r/1674091345-14799-2-git-send-email-zhaoyang.huang@unisoc.com
Fixes: 56a61617dd22 ("mm: use stack_depot for recording kmemleak's backtrace")
Reported-by: Mirsad Todorovac <mirsad.todorovac@alu.unizg.hr>
Suggested-by: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Zhaoyang Huang <zhaoyang.huang@unisoc.com>
Acked-by: Mike Rapoport (IBM) <rppt@kernel.org>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Tested-by: Borislav Petkov (AMD) <bp@alien8.de>
Cc: ke.wang <ke.wang@unisoc.com>
Cc: Nathan Chancellor <nathan@kernel.org>
Cc: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
A Sysbot [1] corrupted filesystem exposes two flaws in the handling and
sanity checking of the xattr_ids count in the filesystem. Both of these
flaws cause computation overflow due to incorrect typing.
In the corrupted filesystem the xattr_ids value is 4294967071, which
stored in a signed variable becomes the negative number -225.
Flaw 1 (64-bit systems only):
The signed integer xattr_ids variable causes sign extension.
This causes variable overflow in the SQUASHFS_XATTR_*(A) macros. The
variable is first multiplied by sizeof(struct squashfs_xattr_id) where the
type of the sizeof operator is "unsigned long".
On a 64-bit system this is 64-bits in size, and causes the negative number
to be sign extended and widened to 64-bits and then become unsigned. This
produces the very large number 18446744073709548016 or 2^64 - 3600. This
number when rounded up by SQUASHFS_METADATA_SIZE - 1 (8191 bytes) and
divided by SQUASHFS_METADATA_SIZE overflows and produces a length of 0
(stored in len).
Flaw 2 (32-bit systems only):
On a 32-bit system the integer variable is not widened by the unsigned
long type of the sizeof operator (32-bits), and the signedness of the
variable has no effect due it always being treated as unsigned.
The above corrupted xattr_ids value of 4294967071, when multiplied
overflows and produces the number 4294963696 or 2^32 - 3400. This number
when rounded up by SQUASHFS_METADATA_SIZE - 1 (8191 bytes) and divided by
SQUASHFS_METADATA_SIZE overflows again and produces a length of 0.
The effect of the 0 length computation:
In conjunction with the corrupted xattr_ids field, the filesystem also has
a corrupted xattr_table_start value, where it matches the end of
filesystem value of 850.
This causes the following sanity check code to fail because the
incorrectly computed len of 0 matches the incorrect size of the table
reported by the superblock (0 bytes).
len = SQUASHFS_XATTR_BLOCK_BYTES(*xattr_ids);
indexes = SQUASHFS_XATTR_BLOCKS(*xattr_ids);
/*
* The computed size of the index table (len bytes) should exactly
* match the table start and end points
*/
start = table_start + sizeof(*id_table);
end = msblk->bytes_used;
if (len != (end - start))
return ERR_PTR(-EINVAL);
Changing the xattr_ids variable to be "usigned int" fixes the flaw on a
64-bit system. This relies on the fact the computation is widened by the
unsigned long type of the sizeof operator.
Casting the variable to u64 in the above macro fixes this flaw on a 32-bit
system.
It also means 64-bit systems do not implicitly rely on the type of the
sizeof operator to widen the computation.
[1] https://lore.kernel.org/lkml/000000000000cd44f005f1a0f17f@google.com/
Link: https://lkml.kernel.org/r/20230127061842.10965-1-phillip@squashfs.org.uk
Fixes: 506220d2ba21 ("squashfs: add more sanity checks in xattr id lookup")
Signed-off-by: Phillip Lougher <phillip@squashfs.org.uk>
Reported-by: <syzbot+082fa4af80a5bb1a9843@syzkaller.appspotmail.com>
Cc: Alexey Khoroshilov <khoroshilov@ispras.ru>
Cc: Fedor Pchelkin <pchelkin@ispras.ru>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
sh vmlinux fails to link with GNU ld < 2.40 (likely < 2.36) since
commit 99cb0d917ffa ("arch: fix broken BuildID for arm64 and riscv").
This is similar to fixes for powerpc and s390:
commit 4b9880dbf3bd ("powerpc/vmlinux.lds: Define RUNTIME_DISCARD_EXIT").
commit a494398bde27 ("s390: define RUNTIME_DISCARD_EXIT to fix link error
with GNU ld < 2.36").
$ sh4-linux-gnu-ld --version | head -n1
GNU ld (GNU Binutils for Debian) 2.35.2
$ make ARCH=sh CROSS_COMPILE=sh4-linux-gnu- microdev_defconfig
$ make ARCH=sh CROSS_COMPILE=sh4-linux-gnu-
`.exit.text' referenced in section `__bug_table' of crypto/algboss.o:
defined in discarded section `.exit.text' of crypto/algboss.o
`.exit.text' referenced in section `__bug_table' of
drivers/char/hw_random/core.o: defined in discarded section
`.exit.text' of drivers/char/hw_random/core.o
make[2]: *** [scripts/Makefile.vmlinux:34: vmlinux] Error 1
make[1]: *** [Makefile:1252: vmlinux] Error 2
arch/sh/kernel/vmlinux.lds.S keeps EXIT_TEXT:
/*
* .exit.text is discarded at runtime, not link time, to deal with
* references from __bug_table
*/
.exit.text : AT(ADDR(.exit.text)) { EXIT_TEXT }
However, EXIT_TEXT is thrown away by
DISCARD(include/asm-generic/vmlinux.lds.h) because
sh does not define RUNTIME_DISCARD_EXIT.
GNU ld 2.40 does not have this issue and builds fine.
This corresponds with Masahiro's comments in a494398bde27:
"Nathan [Chancellor] also found that binutils
commit 21401fc7bf67 ("Duplicate output sections in scripts") cured this
issue, so we cannot reproduce it with binutils 2.36+, but it is better
to not rely on it."
Link: https://lkml.kernel.org/r/9166a8abdc0f979e50377e61780a4bba1dfa2f52.1674518464.git.tom.saeger@oracle.com
Fixes: 99cb0d917ffa ("arch: fix broken BuildID for arm64 and riscv")
Link: https://lore.kernel.org/all/Y7Jal56f6UBh1abE@dev-arch.thelio-3990X/
Link: https://lore.kernel.org/all/20230123194218.47ssfzhrpnv3xfez@oracle.com/
Signed-off-by: Tom Saeger <tom.saeger@oracle.com>
Tested-by: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de>
Cc: Ard Biesheuvel <ardb@kernel.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Dennis Gilmore <dennis@ausil.us>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Masahiro Yamada <masahiroy@kernel.org>
Cc: Naresh Kamboju <naresh.kamboju@linaro.org>
Cc: Nathan Chancellor <nathan@kernel.org>
Cc: Palmer Dabbelt <palmer@rivosinc.com>
Cc: Rich Felker <dalias@libc.org>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
We already round down the address in kunmap_local_indexed() which is the
other implementation of __kunmap_local(). The only implementation of
kunmap_flush_on_unmap() is PA-RISC which is expecting a page-aligned
address. This may be causing PA-RISC to be flushing the wrong addresses
currently.
Link: https://lkml.kernel.org/r/20230126200727.1680362-1-willy@infradead.org
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Fixes: 298fa1ad5571 ("highmem: Provide generic variant of kmap_atomic*")
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Cc: "Fabio M. De Francesco" <fmdefrancesco@gmail.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Helge Deller <deller@gmx.de>
Cc: Alexander Potapenko <glider@google.com>
Cc: Andrey Konovalov <andreyknvl@gmail.com>
Cc: Bagas Sanjaya <bagasdotme@gmail.com>
Cc: David Sterba <dsterba@suse.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: Tony Luck <tony.luck@intel.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
migrate_pages/mempolicy semantics state that CAP_SYS_NICE is required to
move pages shared with another process to a different node. page_mapcount
> 1 is being used to determine if a hugetlb page is shared. However, a
hugetlb page will have a mapcount of 1 if mapped by multiple processes via
a shared PMD. As a result, hugetlb pages shared by multiple processes and
mapped with a shared PMD can be moved by a process without CAP_SYS_NICE.
To fix, check for a shared PMD if mapcount is 1. If a shared PMD is found
consider the page shared.
Link: https://lkml.kernel.org/r/20230126222721.222195-3-mike.kravetz@oracle.com
Fixes: e2d8cf405525 ("migrate: add hugepage migration code to migrate_pages()")
Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>
Acked-by: Peter Xu <peterx@redhat.com>
Acked-by: David Hildenbrand <david@redhat.com>
Cc: James Houghton <jthoughton@google.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Muchun Song <songmuchun@bytedance.com>
Cc: Naoya Horiguchi <naoya.horiguchi@linux.dev>
Cc: Vishal Moola (Oracle) <vishal.moola@gmail.com>
Cc: Yang Shi <shy828301@gmail.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Patch series "Fixes for hugetlb mapcount at most 1 for shared PMDs".
This issue of mapcount in hugetlb pages referenced by shared PMDs was
discussed in [1]. The following two patches address user visible behavior
caused by this issue.
[1] https://lore.kernel.org/linux-mm/Y9BF+OCdWnCSilEu@monkey/
This patch (of 2):
A hugetlb page will have a mapcount of 1 if mapped by multiple processes
via a shared PMD. This is because only the first process increases the
map count, and subsequent processes just add the shared PMD page to their
page table.
page_mapcount is being used to decide if a hugetlb page is shared or
private in /proc/PID/smaps. Pages referenced via a shared PMD were
incorrectly being counted as private.
To fix, check for a shared PMD if mapcount is 1. If a shared PMD is found
count the hugetlb page as shared. A new helper to check for a shared PMD
is added.
[akpm@linux-foundation.org: simplification, per David]
[akpm@linux-foundation.org: hugetlb.h: include page_ref.h for page_count()]
Link: https://lkml.kernel.org/r/20230126222721.222195-2-mike.kravetz@oracle.com
Fixes: 25ee01a2fca0 ("mm: hugetlb: proc: add hugetlb-related fields to /proc/PID/smaps")
Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>
Acked-by: Peter Xu <peterx@redhat.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: James Houghton <jthoughton@google.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Muchun Song <songmuchun@bytedance.com>
Cc: Naoya Horiguchi <naoya.horiguchi@linux.dev>
Cc: Vishal Moola (Oracle) <vishal.moola@gmail.com>
Cc: Yang Shi <shy828301@gmail.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|