Age | Commit message (Collapse) | Author |
|
10G-QXGMII is a MAC-to-PHY interface defined by the USXGMII multiport
specification. It uses the same signaling as USXGMII, but it multiplexes
4 ports over the link, resulting in a maximum speed of 2.5G per port.
Some in-tree SoCs like the NXP LS1028A use "usxgmii" when they mean
either the single-port USXGMII or the quad-port 10G-QXGMII variant, and
they could get away just fine with that thus far. But there is a need to
distinguish between the 2 as far as SerDes drivers are concerned.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: Luo Jie <quic_luoj@quicinc.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
This declaration was added to the header to be called from ethtool.
ethtool is separated from core for code organization but it is not really
a separate entity, it controls very core things.
As ethtool is an internal stuff it is not wise to have it in netdevice.h.
Move the declaration to net/core/dev.h instead.
Remove the EXPORT_SYMBOL_GPL call as ethtool can not be built as a module.
Reviewed-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: Kory Maincent <kory.maincent@bootlin.com>
Link: https://lore.kernel.org/r/20240612-feature_ptp_netnext-v15-2-b2a086257b63@bootlin.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
I find the behavior of xa_for_each_start() slightly counter-intuitive.
It doesn't end the iteration by making the index point after the last
element. IOW calling xa_for_each_start() again after it "finished"
will run the body of the loop for the last valid element, instead
of doing nothing.
This works fine for netlink dumps if they terminate correctly
(i.e. coalesce or carefully handle NLM_DONE), but as we keep getting
reminded legacy dumps are unlikely to go away.
Fixing this generically at the xa_for_each_start() level seems hard -
there is no index reserved for "end of iteration".
ifindexes are 31b wide, tho, and iterator is ulong so for
for_each_netdev_dump() it's safe to go to the next element.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Allow platform drivers to provide their logic to select an appropriate
PCS.
Tested-by: Romain Gantois <romain.gantois@bootlin.com>
Reviewed-by: Romain Gantois <romain.gantois@bootlin.com>
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Link: https://lore.kernel.org/r/E1sHhoM-00Fesu-8E@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Calculate the pseudo-header checksum for both IPSec transport mode and
IPSec tunnel mode for mlx5 devices that do not implement a pure hardware
checksum offload for L4 checksum calculation. Introduce a capability bit
that identifies such mlx5 devices.
Signed-off-by: Rahul Rameshbabu <rrameshbabu@nvidia.com>
Reviewed-by: Gal Pressman <gal@nvidia.com>
Reviewed-by: Cosmin Ratiu <cratiu@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://lore.kernel.org/r/20240613210036.1125203-7-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
TSAR is the correct spelling (Transmit Scheduling ARbiter).
Signed-off-by: Cosmin Ratiu <cratiu@nvidia.com>
Reviewed-by: Gal Pressman <gal@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://lore.kernel.org/r/20240613210036.1125203-2-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Cross-merge networking fixes after downstream PR.
No conflicts, no adjacent changes.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Pull networking fixes from Jakub Kicinski:
"Including fixes from bluetooth and netfilter.
Slim pickings this time, probably a combination of summer, DevConf.cz,
and the end of first half of the year at corporations.
Current release - regressions:
- Revert "igc: fix a log entry using uninitialized netdev", it traded
lack of netdev name in a printk() for a crash
Previous releases - regressions:
- Bluetooth: L2CAP: fix rejecting L2CAP_CONN_PARAM_UPDATE_REQ
- geneve: fix incorrectly setting lengths of inner headers in the
skb, confusing the drivers and causing mangled packets
- sched: initialize noop_qdisc owner to avoid false-positive
recursion detection (recursing on CPU 0), which bubbles up to user
space as a sendmsg() error, while noop_qdisc should silently drop
- netdevsim: fix backwards compatibility in nsim_get_iflink()
Previous releases - always broken:
- netfilter: ipset: fix race between namespace cleanup and gc in the
list:set type"
* tag 'net-6.10-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (35 commits)
bnxt_en: Adjust logging of firmware messages in case of released token in __hwrm_send()
af_unix: Read with MSG_PEEK loops if the first unread byte is OOB
bnxt_en: Cap the size of HWRM_PORT_PHY_QCFG forwarded response
gve: Clear napi->skb before dev_kfree_skb_any()
ionic: fix use after netif_napi_del()
Revert "igc: fix a log entry using uninitialized netdev"
net: bridge: mst: fix suspicious rcu usage in br_mst_set_state
net: bridge: mst: pass vlan group directly to br_mst_vlan_set_state
net/ipv6: Fix the RT cache flush via sysctl using a previous delay
net: stmmac: replace priv->speed with the portTransmitRate from the tc-cbs parameters
gve: ignore nonrelevant GSO type bits when processing TSO headers
net: pse-pd: Use EOPNOTSUPP error code instead of ENOTSUPP
netfilter: Use flowlabel flow key when re-routing mangled packets
netfilter: ipset: Fix race between namespace cleanup and gc in the list:set type
netfilter: nft_inner: validate mandatory meta and payload
tcp: use signed arithmetic in tcp_rtx_probe0_timed_out()
mailmap: map Geliang's new email address
mptcp: pm: update add_addr counters after connect
mptcp: pm: inc RmAddr MIB counter once per RM_ADDR ID
mptcp: ensure snd_una is properly initialized on connect
...
|
|
Similar to previous patch: apply same logic for
__skb_get_hash_symmetric and let callers pass the netns to the dissector
core.
Existing function is turned into a wrapper to avoid adjusting all
callers, nft_hash.c uses new function.
Reviewed-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://lore.kernel.org/r/20240608221057.16070-3-fw@strlen.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Years ago flow dissector gained ability to delegate flow dissection
to a bpf program, scoped per netns.
Unfortunately, skb_get_hash() only gets an sk_buff argument instead
of both net+skb. This means the flow dissector needs to obtain the
netns pointer from somewhere else.
The netns is derived from skb->dev, and if that is not available, from
skb->sk. If neither is set, we hit a (benign) WARN_ON_ONCE().
Trying both dev and sk covers most cases, but not all, as recently
reported by Christoph Paasch.
In case of nf-generated tcp reset, both sk and dev are NULL:
WARNING: .. net/core/flow_dissector.c:1104
skb_flow_dissect_flow_keys include/linux/skbuff.h:1536 [inline]
skb_get_hash include/linux/skbuff.h:1578 [inline]
nft_trace_init+0x7d/0x120 net/netfilter/nf_tables_trace.c:320
nft_do_chain+0xb26/0xb90 net/netfilter/nf_tables_core.c:268
nft_do_chain_ipv4+0x7a/0xa0 net/netfilter/nft_chain_filter.c:23
nf_hook_slow+0x57/0x160 net/netfilter/core.c:626
__ip_local_out+0x21d/0x260 net/ipv4/ip_output.c:118
ip_local_out+0x26/0x1e0 net/ipv4/ip_output.c:127
nf_send_reset+0x58c/0x700 net/ipv4/netfilter/nf_reject_ipv4.c:308
nft_reject_ipv4_eval+0x53/0x90 net/ipv4/netfilter/nft_reject_ipv4.c:30
[..]
syzkaller did something like this:
table inet filter {
chain input {
type filter hook input priority filter; policy accept;
meta nftrace set 1
tcp dport 42 reject with tcp reset
}
chain output {
type filter hook output priority filter; policy accept;
# empty chain is enough
}
}
... then sends a tcp packet to port 42.
Initial attempt to simply set skb->dev from nf_reject_ipv4 doesn't cover
all cases: skbs generated via ipv4 igmp_send_report trigger similar splat.
Moreover, Pablo Neira found that nft_hash.c uses __skb_get_hash_symmetric()
which would trigger same warn splat for such skbs.
Lets allow callers to pass the current netns explicitly.
The nf_trace infrastructure is adjusted to use the new helper.
__skb_get_hash_symmetric is handled in the next patch.
Reported-by: Christoph Paasch <cpaasch@apple.com>
Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/494
Reviewed-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://lore.kernel.org/r/20240608221057.16070-2-fw@strlen.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
ENOTSUPP is not a SUSV4 error code, prefer EOPNOTSUPP as reported by
checkpatch script.
Fixes: 18ff0bcda6d1 ("ethtool: add interface to interact with Ethernet Power Equipment")
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Acked-by: Oleksij Rempel <o.rempel@pengutronix.de>
Signed-off-by: Kory Maincent <kory.maincent@bootlin.com>
Link: https://lore.kernel.org/r/20240610083426.740660-1-kory.maincent@bootlin.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
The pcpu_sw_netstats and pcpu_lstats structs both contain a set of
u64_stats_t fields for individual stats, but pcpu_dstats uses u64s
instead.
Make this consistent by using u64_stats_t across all stats types.
The per-cpu dstats are only used by the vrf driver at present, so update
that driver as part of this change.
Signed-off-by: Jeremy Kerr <jk@codeconstruct.com.au>
Reviewed-by: Simon Horman <horms@kernel.org>
Reviewed-by: David Ahern <dsahern@kernel.org>
Link: https://lore.kernel.org/r/20240607-dstats-v3-1-cc781fe116f7@codeconstruct.com.au
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs
Pull vfs fixes from Christian Brauner:
"Misc:
- Restore debugfs behavior of ignoring unknown mount options
- Fix kernel doc for netfs_wait_for_oustanding_io()
- Fix struct statx comment after new addition for this cycle
- Fix a check in find_next_fd()
iomap:
- Fix data zeroing behavior when an extent spans the block that
contains i_size
- Restore i_size increasing in iomap_write_end() for now to avoid
stale data exposure on xfs with a realtime device
Cachefiles:
- Remove unneeded fdtable.h include
- Improve trace output for cachefiles_obj_{get,put}_ondemand_fd()
- Remove requests from the request list to prevent accessing already
freed requests
- Fix UAF when issuing restore command while the daemon is still
alive by adding an additional reference count to requests
- Fix UAF by grabbing a reference during xarray lookup with xa_lock()
held
- Simplify error handling in cachefiles_ondemand_daemon_read()
- Add consistency checks read and open requests to avoid crashes
- Add a spinlock to protect ondemand_id variable which is used to
determine whether an anonymous cachefiles fd has already been
closed
- Make on-demand reads killable allowing to handle broken cachefiles
daemon better
- Flush all requests after the kernel has been marked dead via
CACHEFILES_DEAD to avoid hung-tasks
- Ensure that closed requests are marked as such to avoid reusing
them with a reopen request
- Defer fd_install() until after copy_to_user() succeeded and thereby
get rid of having to use close_fd()
- Ensure that anonymous cachefiles on-demand fds are reused while
they are valid to avoid pinning already freed cookies"
* tag 'vfs-6.10-rc4.fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
iomap: Fix iomap_adjust_read_range for plen calculation
iomap: keep on increasing i_size in iomap_write_end()
cachefiles: remove unneeded include of <linux/fdtable.h>
fs/file: fix the check in find_next_fd()
cachefiles: make on-demand read killable
cachefiles: flush all requests after setting CACHEFILES_DEAD
cachefiles: Set object to close if ondemand_id < 0 in copen
cachefiles: defer exposing anon_fd until after copy_to_user() succeeds
cachefiles: never get a new anonymous fd if ondemand_id is valid
cachefiles: add spin_lock for cachefiles_ondemand_info
cachefiles: add consistency check for copen/cread
cachefiles: remove err_put_fd label in cachefiles_ondemand_daemon_read()
cachefiles: fix slab-use-after-free in cachefiles_ondemand_daemon_read()
cachefiles: fix slab-use-after-free in cachefiles_ondemand_get_fd()
cachefiles: remove requests from xarray during flushing requests
cachefiles: add output string to cachefiles_obj_[get|put]_ondemand_fd
statx: Update offset commentary for struct statx
netfs: fix kernel doc for nets_wait_for_outstanding_io()
debugfs: continue to ignore unknown mount options
|
|
In ice_ptp_cfg_clkout(), the ice driver needs to calculate the nearest next
second of a current time value specified in nanoseconds. It implements this
using div64_u64, because the time value is a u64. It could use div_u64
since NSEC_PER_SEC is smaller than 32-bits.
Ideally this would be implemented directly with roundup(), but that can't
work on all platforms due to a division which requires using the specific
macros and functions due to platform restrictions, and to ensure that the
most appropriate and fast instructions are used.
The kernel doesn't currently provide any 64-bit equivalents for doing
roundup. Attempting to use roundup() on a 32-bit platform will result in a
link failure due to not having a direct 64-bit division.
The closest equivalent for this is DIV64_U64_ROUND_UP, which does a
division always rounding up. However, this only computes the division, and
forces use of the div64_u64 in cases where the divisor is a 32bit value and
could make use of div_u64.
Introduce DIV_U64_ROUND_UP based on div_u64, and then use it to implement
roundup_u64 which takes a u64 input value and a u32 rounding value.
The name roundup_u64 matches the naming scheme of div_u64, and future
patches could implement roundup64_u64 if they need to round by a multiple
that is greater than 32-bits.
Replace the logic in ice_ptp.c which does this equivalent with the newly
added roundup_u64.
Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com>
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Link: https://lore.kernel.org/r/20240607-next-2024-06-03-intel-next-batch-v3-2-d1470cee3347@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next
Daniel Borkmann says:
====================
pull-request: bpf-next 2024-06-06
We've added 54 non-merge commits during the last 10 day(s) which contain
a total of 50 files changed, 1887 insertions(+), 527 deletions(-).
The main changes are:
1) Add a user space notification mechanism via epoll when a struct_ops
object is getting detached/unregistered, from Kui-Feng Lee.
2) Big batch of BPF selftest refactoring for sockmap and BPF congctl
tests, from Geliang Tang.
3) Add BTF field (type and string fields, right now) iterator support
to libbpf instead of using existing callback-based approaches,
from Andrii Nakryiko.
4) Extend BPF selftests for the latter with a new btf_field_iter
selftest, from Alan Maguire.
5) Add new kfuncs for a generic, open-coded bits iterator,
from Yafang Shao.
6) Fix BPF selftests' kallsyms_find() helper under kernels configured
with CONFIG_LTO_CLANG_THIN, from Yonghong Song.
7) Remove a bunch of unused structs in BPF selftests,
from David Alan Gilbert.
8) Convert test_sockmap section names into names understood by libbpf
so it can deduce program type and attach type, from Jakub Sitnicki.
9) Extend libbpf with the ability to configure log verbosity
via LIBBPF_LOG_LEVEL environment variable, from Mykyta Yatsenko.
10) Fix BPF selftests with regards to bpf_cookie and find_vma flakiness
in nested VMs, from Song Liu.
11) Extend riscv32/64 JITs to introduce shift/add helpers to generate Zba
optimization, from Xiao Wang.
12) Enable BPF programs to declare arrays and struct fields with kptr,
bpf_rb_root, and bpf_list_head, from Kui-Feng Lee.
* tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (54 commits)
selftests/bpf: Drop useless arguments of do_test in bpf_tcp_ca
selftests/bpf: Use start_test in test_dctcp in bpf_tcp_ca
selftests/bpf: Use start_test in test_dctcp_fallback in bpf_tcp_ca
selftests/bpf: Add start_test helper in bpf_tcp_ca
selftests/bpf: Use connect_to_fd_opts in do_test in bpf_tcp_ca
libbpf: Auto-attach struct_ops BPF maps in BPF skeleton
selftests/bpf: Add btf_field_iter selftests
selftests/bpf: Fix send_signal test with nested CONFIG_PARAVIRT
libbpf: Remove callback-based type/string BTF field visitor helpers
bpftool: Use BTF field iterator in btfgen
libbpf: Make use of BTF field iterator in BTF handling code
libbpf: Make use of BTF field iterator in BPF linker code
libbpf: Add BTF field iterator
selftests/bpf: Ignore .llvm.<hash> suffix in kallsyms_find()
selftests/bpf: Fix bpf_cookie and find_vma in nested VM
selftests/bpf: Test global bpf_list_head arrays.
selftests/bpf: Test global bpf_rb_root arrays and fields in nested struct types.
selftests/bpf: Test kptr arrays and kptrs in nested struct fields.
bpf: limit the number of levels of a nested struct type.
bpf: look into the types of the fields of a struct type recursively.
...
====================
Link: https://lore.kernel.org/r/20240606223146.23020-1-daniel@iogearbox.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless-next
Kalle Valo says:
====================
wireless-next patches for v6.11
The first "new features" pull request for v6.11 with changes both in
stack and in drivers. Nothing out of ordinary, except that we have
two conflicts this time:
net/mac80211/cfg.c
https://lore.kernel.org/all/20240531124415.05b25e7a@canb.auug.org.au
drivers/net/wireless/microchip/wilc1000/netdev.c
https://lore.kernel.org/all/20240603110023.23572803@canb.auug.org.au
Major changes:
cfg80211/mac80211
* parse Transmit Power Envelope (TPE) data in mac80211 instead of in drivers
wilc1000
* read MAC address during probe to make it visible to user space
iwlwifi
* bump FW API to 91 for BZ/SC devices
* report 64-bit radiotap timestamp
* enable P2P low latency by default
* handle Transmit Power Envelope (TPE) advertised by AP
* start using guard()
rtlwifi
* RTL8192DU support
ath12k
* remove unsupported tx monitor handling
* channel 2 in 6 GHz band support
* Spatial Multiplexing Power Save (SMPS) in 6 GHz band support
* multiple BSSID (MBSSID) and Enhanced Multi-BSSID Advertisements (EMA)
support
* dynamic VLAN support
* add panic handler for resetting the firmware state
ath10k
* add qcom,no-msa-ready-indicator Device Tree property
* LED support for various chipsets
* tag 'wireless-next-2024-06-07' of git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless-next: (194 commits)
wifi: ath12k: add hw_link_id in ath12k_pdev
wifi: ath12k: add panic handler
wifi: rtw89: chan: Use swap() in rtw89_swap_sub_entity()
wifi: brcm80211: remove unused structs
wifi: brcm80211: use sizeof(*pointer) instead of sizeof(type)
wifi: ath12k: do not process consecutive RDDM event
dt-bindings: net: wireless: ath11k: Drop "qcom,ipq8074-wcss-pil" from example
wifi: ath12k: fix memory leak in ath12k_dp_rx_peer_frag_setup()
wifi: rtlwifi: handle return value of usb init TX/RX
wifi: rtlwifi: Enable the new rtl8192du driver
wifi: rtlwifi: Add rtl8192du/sw.c
wifi: rtlwifi: Constify rtl_hal_cfg.{ops,usb_interface_cfg} and rtl_priv.cfg
wifi: rtlwifi: Add rtl8192du/dm.{c,h}
wifi: rtlwifi: Add rtl8192du/fw.{c,h} and rtl8192du/led.{c,h}
wifi: rtlwifi: Add rtl8192du/rf.{c,h}
wifi: rtlwifi: Add rtl8192du/trx.{c,h}
wifi: rtlwifi: Add rtl8192du/phy.{c,h}
wifi: rtlwifi: Add rtl8192du/hw.{c,h}
wifi: rtlwifi: Add new members to struct rtl_priv for RTL8192DU
wifi: rtlwifi: Add rtl8192du/table.{c,h}
...
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
====================
Link: https://lore.kernel.org/r/20240607093517.41394C2BBFC@smtp.kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Back in 2007, in commit af65bdfce98d ("[NETLINK]: Switch cb_lock spinlock
to mutex and allow to override it") netlink core was extended to allow
subsystems to replace the dump mutex lock with its own lock.
The mechanism was used by rtnetlink to take rtnl_lock but it isn't
sufficiently flexible for other users. Over the 17 years since
it was added no other user appeared. Since rtnetlink needs conditional
locking now, and doesn't use it either, axe this feature complete.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
ACLs in Spectrum-2 and newer ASICs can reside in the algorithmic TCAM
(A-TCAM) or in the ordinary circuit TCAM (C-TCAM). The former can
contain more ACLs (i.e., tc filters), but the number of masks in each
region (i.e., tc chain) is limited.
In order to mitigate the effects of the above limitation, the device
allows filters to share a single mask if their masks only differ in up
to 8 consecutive bits. For example, dst_ip/25 can be represented using
dst_ip/24 with a delta of 1 bit. The C-TCAM does not have a limit on the
number of masks being used (and therefore does not support mask
aggregation), but can contain a limited number of filters.
The driver uses the "objagg" library to perform the mask aggregation by
passing it objects that consist of the filter's mask and whether the
filter is to be inserted into the A-TCAM or the C-TCAM since filters in
different TCAMs cannot share a mask.
The set of created objects is dependent on the insertion order of the
filters and is not necessarily optimal. Therefore, the driver will
periodically ask the library to compute a more optimal set ("hints") by
looking at all the existing objects.
When the library asks the driver whether two objects can be aggregated
the driver only compares the provided masks and ignores the A-TCAM /
C-TCAM indication. This is the right thing to do since the goal is to
move as many filters as possible to the A-TCAM. The driver also forbids
two identical masks from being aggregated since this can only happen if
one was intentionally put in the C-TCAM to avoid a conflict in the
A-TCAM.
The above can result in the following set of hints:
H1: {mask X, A-TCAM} -> H2: {mask Y, A-TCAM} // X is Y + delta
H3: {mask Y, C-TCAM} -> H4: {mask Z, A-TCAM} // Y is Z + delta
After getting the hints from the library the driver will start migrating
filters from one region to another while consulting the computed hints
and instructing the device to perform a lookup in both regions during
the transition.
Assuming a filter with mask X is being migrated into the A-TCAM in the
new region, the hints lookup will return H1. Since H2 is the parent of
H1, the library will try to find the object associated with it and
create it if necessary in which case another hints lookup (recursive)
will be performed. This hints lookup for {mask Y, A-TCAM} will either
return H2 or H3 since the driver passes the library an object comparison
function that ignores the A-TCAM / C-TCAM indication.
This can eventually lead to nested objects which are not supported by
the library [1].
Fix by removing the object comparison function from both the driver and
the library as the driver was the only user. That way the lookup will
only return exact matches.
I do not have a reliable reproducer that can reproduce the issue in a
timely manner, but before the fix the issue would reproduce in several
minutes and with the fix it does not reproduce in over an hour.
Note that the current usefulness of the hints is limited because they
include the C-TCAM indication and represent aggregation that cannot
actually happen. This will be addressed in net-next.
[1]
WARNING: CPU: 0 PID: 153 at lib/objagg.c:170 objagg_obj_parent_assign+0xb5/0xd0
Modules linked in:
CPU: 0 PID: 153 Comm: kworker/0:18 Not tainted 6.9.0-rc6-custom-g70fbc2c1c38b #42
Hardware name: Mellanox Technologies Ltd. MSN3700C/VMOD0008, BIOS 5.11 10/10/2018
Workqueue: mlxsw_core mlxsw_sp_acl_tcam_vregion_rehash_work
RIP: 0010:objagg_obj_parent_assign+0xb5/0xd0
[...]
Call Trace:
<TASK>
__objagg_obj_get+0x2bb/0x580
objagg_obj_get+0xe/0x80
mlxsw_sp_acl_erp_mask_get+0xb5/0xf0
mlxsw_sp_acl_atcam_entry_add+0xe8/0x3c0
mlxsw_sp_acl_tcam_entry_create+0x5e/0xa0
mlxsw_sp_acl_tcam_vchunk_migrate_one+0x16b/0x270
mlxsw_sp_acl_tcam_vregion_rehash_work+0xbe/0x510
process_one_work+0x151/0x370
Fixes: 9069a3817d82 ("lib: objagg: implement optimization hints assembly and use hints for object creation")
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Amit Cohen <amcohen@nvidia.com>
Tested-by: Alexander Zubkov <green@qrator.net>
Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull locking doc fix from Ingo Molnar:
"Fix typos in the kerneldoc of some of the atomic APIs"
* tag 'locking-urgent-2024-06-08' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
locking/atomic: scripts: fix ${atomic}_sub_and_test() kerneldoc
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull misc fixes from Andrew Morton:
"14 hotfixes, 6 of which are cc:stable.
All except the nilfs2 fix affect MM and all are singletons - see the
chagelogs for details"
* tag 'mm-hotfixes-stable-2024-06-07-15-24' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm:
nilfs2: fix nilfs_empty_dir() misjudgment and long loop on I/O errors
mm: fix xyz_noprof functions calling profiled functions
codetag: avoid race at alloc_slab_obj_exts
mm/hugetlb: do not call vma_add_reservation upon ENOMEM
mm/ksm: fix ksm_zero_pages accounting
mm/ksm: fix ksm_pages_scanned accounting
kmsan: do not wipe out origin when doing partial unpoisoning
vmalloc: check CONFIG_EXECMEM in is_vmalloc_or_module_addr()
mm: page_alloc: fix highatomic typing in multi-block buddies
nilfs2: fix potential kernel bug due to lack of writeback flag waiting
memcg: remove the lockdep assert from __mod_objcg_mlstate()
mm: arm64: fix the out-of-bounds issue in contpte_clear_young_dirty_ptes
mm: huge_mm: fix undefined reference to `mthp_stats' for CONFIG_SYSFS=n
mm: drop the 'anon_' prefix for swap-out mTHP counters
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu
Pull iommu fixes from Joerg Roedel:
"Core:
- Make iommu-dma code recognize 'force_aperture' again
- Fix for potential NULL-ptr dereference from iommu_sva_bind_device()
return value
AMD IOMMU fixes:
- Fix lockdep splat for invalid wait context
- Add feature bit check before enabling PPR
- Make workqueue name fit into buffer
- Fix memory leak in sysfs code"
* tag 'iommu-fixes-v6.10-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu:
iommu/amd: Fix Invalid wait context issue
iommu/amd: Check EFR[EPHSup] bit before enabling PPR
iommu/amd: Fix workqueue name
iommu: Return right value in iommu_sva_bind_device()
iommu/dma: Fix domain init
iommu/amd: Fix sysfs leak in iommu init
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi
Pull SCSI fixes from James Bottomley:
"The core change is to detect unusually large number of VPD pages
(caused by device manufacturers having an endiannes issue) and reject
them rather than trying to parse a huge non-existent array.
The remaining fixes are in drivers the most user visible of which is
the ALUA state transition recognition (leads to intermittent I/O
errors in some situations otherwise)"
* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
scsi: ufs: mcq: Fix error output and clean up ufshcd_mcq_abort()
scsi: core: Handle devices which return an unusually large VPD page count
scsi: mpt3sas: Add missing kerneldoc parameter descriptions
scsi: qedf: Set qed_slowpath_params to zero before use
scsi: qedf: Wait for stag work during unload
scsi: qedf: Don't process stag work during unload and recovery
scsi: sr: Fix unintentional arithmetic wraparound
scsi: core: alua: I/O errors for ALUA state transitions
scsi: mpi3mr: Use proper format specifier in mpi3mr_sas_port_add()
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci
Pull pci fix from Bjorn Helgaas:
- Revert lockdep checking on locking that protects device resets from
user-space config accesses; it exposed issues for which fixes are in
the works but are too risky for this cycle (Dan Williams)
* tag 'pci-v6.10-fixes-1' of git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci:
PCI: Revert the cfg_access_lock lockdep mechanism
|
|
Cross-merge networking fixes after downstream PR.
No conflicts.
Adjacent changes:
drivers/net/ethernet/pensando/ionic/ionic_txrx.c
d9c04209990b ("ionic: Mark error paths in the data path as unlikely")
491aee894a08 ("ionic: fix kernel panic in XDP_TX action")
net/ipv6/ip6_fib.c
b4cb4a1391dc ("net: use unrcu_pointer() helper")
b01e1c030770 ("ipv6: fix possible race in __fib6_drop_pcpu_from()")
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Add back HW-GRO to the reported features.
As the current implementation of HW-GRO uses KSMs with a
specific fixed buffer size (256B) to map its headers buffer,
we reported the feature only if the NIC is supporting KSM and
the minimum value for buffer size is below the requested one.
iperf3 bandwidth comparison:
+---------+--------+--------+-----------+
| streams | SW GRO | HW GRO | Unit |
|---------+--------+--------+-----------|
| 1 | 36 | 42 | Gbits/sec |
| 4 | 34 | 39 | Gbits/sec |
| 8 | 31 | 35 | Gbits/sec |
+---------+--------+--------+-----------+
A downstream patch will add skb fragment coalescing which will improve
performance considerably.
Benchmark details:
VM based setup
CPU: Intel(R) Xeon(R) Platinum 8380 CPU, 24 cores
NIC: ConnectX-7 100GbE
iperf3 and irq running on same CPU over a single receive queue
Signed-off-by: Yoray Zack <yorayz@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://lore.kernel.org/r/20240603212219.1037656-14-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
KSM Mkey is KLM Mkey with a fixed buffer size. Due to this fact,
it is a faster mechanism than KLM.
SHAMPO feature used KLMs Mkeys for memory mappings of its headers buffer.
As it used KLMs with the same buffer size for each entry,
we can use KSMs instead.
This commit changes the Mkeys that map the SHAMPO headers buffer
from KLMs to KSMs.
Signed-off-by: Yoray Zack <yorayz@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://lore.kernel.org/r/20240603212219.1037656-13-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
We normally ksm_zero_pages++ in ksmd when page is merged with zero page,
but ksm_zero_pages-- is done from page tables side, where there is no any
accessing protection of ksm_zero_pages.
So we can read very exceptional value of ksm_zero_pages in rare cases,
such as -1, which is very confusing to users.
Fix it by changing to use atomic_long_t, and the same case with the
mm->ksm_zero_pages.
Link: https://lkml.kernel.org/r/20240528-b4-ksm-counters-v3-2-34bb358fdc13@linux.dev
Fixes: e2942062e01d ("ksm: count all zero pages placed by KSM")
Fixes: 6080d19f0704 ("ksm: add ksm zero pages for each process")
Signed-off-by: Chengming Zhou <chengming.zhou@linux.dev>
Acked-by: David Hildenbrand <david@redhat.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Ran Xiaokai <ran.xiaokai@zte.com.cn>
Cc: Stefan Roesch <shr@devkernel.io>
Cc: xu xin <xu.xin16@zte.com.cn>
Cc: Yang Yang <yang.yang29@zte.com.cn>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
if CONFIG_SYSFS is not enabled in config, we get the below error,
All errors (new ones prefixed by >>):
s390-linux-ld: mm/memory.o: in function `count_mthp_stat':
>> include/linux/huge_mm.h:285:(.text+0x191c): undefined reference to `mthp_stats'
s390-linux-ld: mm/huge_memory.o:(.rodata+0x10): undefined reference to `mthp_stats'
vim +285 include/linux/huge_mm.h
279
280 static inline void count_mthp_stat(int order, enum mthp_stat_item item)
281 {
282 if (order <= 0 || order > PMD_ORDER)
283 return;
284
> 285 this_cpu_inc(mthp_stats.stats[order][item]);
286 }
287
Link: https://lkml.kernel.org/r/20240523210045.40444-1-21cnbao@gmail.com
Fixes: ec33687c6749 ("mm: add per-order mTHP anon_fault_alloc and anon_fault_fallback counters")
Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202405231728.tCAogiSI-lkp@intel.com/
Signed-off-by: Barry Song <v-songbaohua@oppo.com>
Tested-by: Yujie Liu <yujie.liu@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
The mTHP swap related counters: 'anon_swpout' and 'anon_swpout_fallback'
are confusing with an 'anon_' prefix, since the shmem can swap out
non-anonymous pages. So drop the 'anon_' prefix to keep consistent with
the old swap counter names.
This is needed in 6.10-rcX to avoid having an inconsistent ABI out in the
field.
Link: https://lkml.kernel.org/r/7a8989c13299920d7589007a30065c3e2c19f0e0.1716431702.git.baolin.wang@linux.alibaba.com
Fixes: d0f048ac39f6 ("mm: add per-order mTHP anon_swpout and anon_swpout_fallback counters")
Fixes: 42248b9d34ea ("mm: add docs for per-order mTHP counters and transhuge_page ABI")
Signed-off-by: Baolin Wang <baolin.wang@linux.alibaba.com>
Suggested-by: "Huang, Ying" <ying.huang@intel.com>
Acked-by: Barry Song <baohua@kernel.org>
Cc: David Hildenbrand <david@redhat.com>
Cc: Lance Yang <ioworker0@gmail.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Ryan Roberts <ryan.roberts@arm.com>
Cc: Zi Yan <ziy@nvidia.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull ACPI fixes from Rafael Wysocki:
"These fix the ACPI EC and AC drivers, the ACPI APEI error injection
driver and build issues related to the dev_is_pnp() macro referring to
pnp_bus_type that is not exported to modules.
Specifics:
- Fix error handling during EC operation region accesses in the ACPI
EC driver (Armin Wolf)
- Fix a memory leak in the APEI error injection driver introduced
during its converion to a platform driver (Dan Williams)
- Fix build failures related to the dev_is_pnp() macro by redefining
it as a proper function and exporting it to modules as appropriate
and unexport pnp_bus_type which need not be exported any more (Andy
Shevchenko)
- Update the ACPI AC driver to use power_supply_changed() to let the
power supply core handle configuration changes properly (Thomas
Weißschuh)"
* tag 'acpi-6.10-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
ACPI: AC: Properly notify powermanagement core about changes
PNP: Hide pnp_bus_type from the non-PNP code
PNP: Make dev_is_pnp() to be a function and export it for modules
ACPI: EC: Avoid returning AE_OK on errors in address space handler
ACPI: EC: Abort address space access upon error
ACPI: APEI: EINJ: Fix einj_dev release leak
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull power management fixes from Rafael Wysocki:
"These fix the intel_pstate and amd-pstate cpufreq drivers and the
cpupower utility.
Specifics:
- Fix a recently introduced unchecked HWP MSR access in the
intel_pstate driver (Srinivas Pandruvada)
- Add missing conversion from MHz to KHz to amd_pstate_set_boost() to
address sysfs inteface inconsistency and fix P-state frequency
reporting on AMD Family 1Ah CPUs in the cpupower utility (Dhananjay
Ugwekar)
- Get rid of an excess global header file used by the amd-pstate
cpufreq driver (Arnd Bergmann)"
* tag 'pm-6.10-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
cpufreq: intel_pstate: Fix unchecked HWP MSR access
cpufreq: amd-pstate: Fix the inconsistency in max frequency units
cpufreq: amd-pstate: remove global header file
tools/power/cpupower: Fix Pstate frequency reporting on AMD Family 1Ah CPUs
|
|
For ${atomic}_sub_and_test() the @i parameter is the value to subtract,
not add. Fix the typo in the kerneldoc template and generate the headers
with this update.
Fixes: ad8110706f38 ("locking/atomic: scripts: generate kerneldoc comments")
Suggested-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Carlos Llamas <cmllamas@google.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Mark Rutland <mark.rutland@arm.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/20240515133844.3502360-1-cmllamas@google.com
|
|
While the experiment did reveal that there are additional places that are
missing the lock during secondary bus reset, one of the places that needs
to take cfg_access_lock (pci_bus_lock()) is not prepared for lockdep
annotation.
Specifically, pci_bus_lock() takes pci_dev_lock() recursively and is
currently dependent on the fact that the device_lock() is marked
lockdep_set_novalidate_class(&dev->mutex). Otherwise, without that
annotation, pci_bus_lock() would need to use something like a new
pci_dev_lock_nested() helper, a scheme to track a PCI device's depth in the
topology, and a hope that the depth of a PCI tree never exceeds the max
value for a lockdep subclass.
The alternative to ripping out the lockdep coverage would be to deploy a
dynamic lock key for every PCI device. Unfortunately, there is evidence
that increasing the number of keys that lockdep needs to track to be
per-PCI-device is prohibitively expensive for something like the
cfg_access_lock.
The main motivation for adding the annotation in the first place was to
catch unlocked secondary bus resets, not necessarily catch lock ordering
problems between cfg_access_lock and other locks. Solve that narrower
problem with follow-on patches, and just due to targeted revert for now.
Link: https://lore.kernel.org/r/171711746402.1628941.14575335981264103013.stgit@dwillia2-xfh.jf.intel.com
Fixes: 7e89efc6e9e4 ("PCI: Lock upstream bridge for pci_reset_function()")
Reported-by: Imre Deak <imre.deak@intel.com>
Closes: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_134186v1/shard-dg2-1/igt@device_reset@unbind-reset-rebind.html
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Tested-by: Hans de Goede <hdegoede@redhat.com>
Tested-by: Kalle Valo <kvalo@kernel.org>
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Cc: Jani Saarinen <jani.saarinen@intel.com>
|
|
iommu_sva_bind_device() should return either a sva bond handle or an
ERR_PTR value in error cases. Existing drivers (idxd and uacce) only
check the return value with IS_ERR(). This could potentially lead to
a kernel NULL pointer dereference issue if the function returns NULL
instead of an error pointer.
In reality, this doesn't cause any problems because iommu_sva_bind_device()
only returns NULL when the kernel is not configured with CONFIG_IOMMU_SVA.
In this case, iommu_dev_enable_feature(dev, IOMMU_DEV_FEAT_SVA) will
return an error, and the device drivers won't call iommu_sva_bind_device()
at all.
Fixes: 26b25a2b98e4 ("iommu: Bind process address spaces to devices")
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Reviewed-by: Jean-Philippe Brucker <jean-philippe@linaro.org>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Vasant Hegde <vasant.hegde@amd.com>
Link: https://lore.kernel.org/r/20240528042528.71396-1-baolu.lu@linux.intel.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/andi.shyti/linux into i2c/for-current
Removed the SPD class of i2c devices from the device core.
Additionally, a cleanup in the Synquacer code removes the pclk
from the global structure, as it is used only in the probe.
Therefore, it is now declared locally.
|
|
Pull block fixes from Jens Axboe:
- NVMe fixes via Keith:
- Removing unused fields (Kanchan)
- Large folio offsets support (Kundan)
- Multipath NUMA node initialiazation fix (Nilay)
- Multipath IO stats accounting fixes (Keith)
- Circular lockdep fix (Keith)
- Target race condition fix (Sagi)
- Target memory leak fix (Sagi)
- bcache fixes
- null_blk fixes (Damien)
- Fix regression in io.max due to throttle low removal (Waiman)
- DM limit table fixes (Christoph)
- SCSI and block limit fixes (Christoph)
- zone fixes (Damien)
- Misc fixes (Christoph, Hannes, hexue)
* tag 'block-6.10-20240530' of git://git.kernel.dk/linux: (25 commits)
blk-throttle: Fix incorrect display of io.max
block: Fix zone write plugging handling of devices with a runt zone
block: Fix validation of zoned device with a runt zone
null_blk: Do not allow runt zone with zone capacity smaller then zone size
nvmet: fix a possible leak when destroy a ctrl during qp establishment
nvme: use srcu for iterating namespace list
bcache: code cleanup in __bch_bucket_alloc_set()
bcache: call force_wake_up_gc() if necessary in check_should_bypass()
bcache: allow allocator to invalidate bucket in gc
block: check for max_hw_sectors underflow
block: stack max_user_sectors
sd: also set max_user_sectors when setting max_sectors
null_blk: Print correct max open zones limit in null_init_zoned_dev()
block: delete redundant function declaration
null_blk: Fix return value of nullb_device_power_store()
dm: make dm_set_zones_restrictions work on the queue limits
dm: remove dm_check_zoned
dm: move setting zoned_enabled to dm_table_set_restrictions
block: remove blk_queue_max_integrity_segments
nvme: adjust multiples of NVME_CTRL_PAGE_SIZE in offset
...
|
|
Cross-merge networking fixes after downstream PR.
Conflicts:
drivers/net/ethernet/ti/icssg/icssg_classifier.c
abd5576b9c57 ("net: ti: icssg-prueth: Add support for ICSSG switch firmware")
56a5cf538c3f ("net: ti: icssg-prueth: Fix start counter for ft1 filter")
https://lore.kernel.org/all/20240531123822.3bb7eadf@canb.auug.org.au/
No other adjacent changes.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Rename xpcs_an_inband to default_an_inband to reflect the change in
phylink and its changed functionality.
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Reviewed-by: Andrew Halaney <ahalaney@redhat.com>
Link: https://lore.kernel.org/r/E1sCJN6-00EcrD-43@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Since ovr_an_inband no longer overrides every MLO_AN_xxx mode, rename
it to reflect what it now does - it changes the default mode from
MLO_AN_PHY to MLO_AN_INBAND. Fix up the two users of this.
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Reviewed-by: Andrew Halaney <ahalaney@redhat.com>
Link: https://lore.kernel.org/r/E1sCJMv-00Ecr1-Sk@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
bpf_link_inc_not_zero() will be used by kernel modules. We will use it in
bpf_testmod.c later.
Signed-off-by: Kui-Feng Lee <thinker.li@gmail.com>
Link: https://lore.kernel.org/r/20240530065946.979330-5-thinker.li@gmail.com
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
|
|
Add epoll support to bpf struct_ops links to trigger EPOLLHUP event upon
detachment.
This patch implements the "poll" of the "struct file_operations" for BPF
links and introduces a new "poll" operator in the "struct bpf_link_ops". By
implementing "poll" of "struct bpf_link_ops" for the links of struct_ops,
the file descriptor of a struct_ops link can be added to an epoll file
descriptor to receive EPOLLHUP events.
Signed-off-by: Kui-Feng Lee <thinker.li@gmail.com>
Link: https://lore.kernel.org/r/20240530065946.979330-4-thinker.li@gmail.com
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
|
|
Pass an additional pointer of bpf_struct_ops_link to callback function reg,
unreg, and update provided by subsystems defined in bpf_struct_ops. A
bpf_struct_ops_map can be registered for multiple links. Passing a pointer
of bpf_struct_ops_link helps subsystems to distinguish them.
This pointer will be used in the later patches to let the subsystem
initiate a detachment on a link that was registered to it previously.
Signed-off-by: Kui-Feng Lee <thinker.li@gmail.com>
Link: https://lore.kernel.org/r/20240530065946.979330-2-thinker.li@gmail.com
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
|
|
A zoned device may have a last sequential write required zone that is
smaller than other zones. However, all tests to check if a zone write
plug write offset exceeds the zone capacity use the same capacity
value stored in the gendisk zone_capacity field. This is incorrect for a
zoned device with a last runt (smaller) zone.
Add the new field last_zone_capacity to struct gendisk to store the
capacity of the last zone of the device. blk_revalidate_seq_zone() and
blk_revalidate_conv_zone() are both modified to get this value when
disk_zone_is_last() returns true. Similarly to zone_capacity, the value
is first stored using the last_zone_capacity field of struct
blk_revalidate_zone_args. Once zone revalidation of all zones is done,
this is used to set the gendisk last_zone_capacity field.
The checks to determine if a zone is full or if a sector offset in a
zone exceeds the zone capacity in disk_should_remove_zone_wplug(),
disk_zone_wplug_abort_unaligned(), blk_zone_write_plug_init_request(),
and blk_zone_wplug_prepare_bio() are modified to use the new helper
functions disk_zone_is_full() and disk_zone_wplug_is_full().
disk_zone_is_full() uses the zone index to determine if the zone being
tested is the last one of the disk and uses the either the disk
zone_capacity or last_zone_capacity accordingly.
Fixes: dd291d77cc90 ("block: Introduce zone write plugging")
Signed-off-by: Damien Le Moal <dlemoal@kernel.org>
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Reviewed-by: Niklas Cassel <cassel@kernel.org>
Link: https://lore.kernel.org/r/20240530054035.491497-4-dlemoal@kernel.org
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Pull networking fixes from Paolo Abeni:
"Including fixes from bpf and netfilter.
Current release - regressions:
- gro: initialize network_offset in network layer
- tcp: reduce accepted window in NEW_SYN_RECV state
Current release - new code bugs:
- eth: mlx5e: do not use ptp structure for tx ts stats when not
initialized
- eth: ice: check for unregistering correct number of devlink params
Previous releases - regressions:
- bpf: Allow delete from sockmap/sockhash only if update is allowed
- sched: taprio: extend minimum interval restriction to entire cycle
too
- netfilter: ipset: add list flush to cancel_gc
- ipv4: fix address dump when IPv4 is disabled on an interface
- sock_map: avoid race between sock_map_close and sk_psock_put
- eth: mlx5: use mlx5_ipsec_rx_status_destroy to correctly delete
status rules
Previous releases - always broken:
- core: fix __dst_negative_advice() race
- bpf:
- fix multi-uprobe PID filtering logic
- fix pkt_type override upon netkit pass verdict
- netfilter: tproxy: bail out if IP has been disabled on the device
- af_unix: annotate data-race around unix_sk(sk)->addr
- eth: mlx5e: fix UDP GSO for encapsulated packets
- eth: idpf: don't enable NAPI and interrupts prior to allocating Rx
buffers
- eth: i40e: fully suspend and resume IO operations in EEH case
- eth: octeontx2-pf: free send queue buffers incase of leaf to inner
- eth: ipvlan: dont Use skb->sk in ipvlan_process_v{4,6}_outbound"
* tag 'net-6.10-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (69 commits)
netdev: add qstat for csum complete
ipvlan: Dont Use skb->sk in ipvlan_process_v{4,6}_outbound
net: ena: Fix redundant device NUMA node override
ice: check for unregistering correct number of devlink params
ice: fix 200G PHY types to link speed mapping
i40e: Fully suspend and resume IO operations in EEH case
i40e: factoring out i40e_suspend/i40e_resume
e1000e: move force SMBUS near the end of enable_ulp function
net: dsa: microchip: fix RGMII error in KSZ DSA driver
ipv4: correctly iterate over the target netns in inet_dump_ifaddr()
net: fix __dst_negative_advice() race
nfc/nci: Add the inconsistency check between the input data length and count
MAINTAINERS: dwmac: starfive: update Maintainer
net/sched: taprio: extend minimum interval restriction to entire cycle too
net/sched: taprio: make q->picos_per_byte available to fill_sched_entry()
netfilter: nft_fib: allow from forward/input without iif selector
netfilter: tproxy: bail out if IP has been disabled on the device
netfilter: nft_payload: skbuff vlan metadata mangle support
net: ti: icssg-prueth: Fix start counter for ft1 filter
sock_map: avoid race between sock_map_close and sk_psock_put
...
|
|
Pull in remaining commits from 6.10/scsi-queue.
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
When extra warnings are enabled, gcc points out a global variable
definition in a header:
In file included from drivers/cpufreq/amd-pstate-ut.c:29:
include/linux/amd-pstate.h:123:27: error: 'amd_pstate_mode_string' defined but not used [-Werror=unused-const-variable=]
123 | static const char * const amd_pstate_mode_string[] = {
| ^~~~~~~~~~~~~~~~~~~~~~
This header is only included from two files in the same directory,
and one of them uses only a single definition from it, so clean it
up by moving most of the contents into the driver that uses them,
and making shared bits a local header file.
Fixes: 36c5014e5460 ("cpufreq: amd-pstate: optimize driver working mode selection in amd_pstate_param()")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
|
|
The pnp_bus_type is defined only when CONFIG_PNP=y, while being
not guarded by ifdeffery in the header. Moreover, it's not used
outside of the PNP code. Move it to the internal header to make
sure no-one will try to (ab)use it.
Signed-off-by: Andy Shevchenko <andy.shevchenko@gmail.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
|
|
Since we have a dev_is_pnp() macro that utilises the address of the
pnp_bus_type variable, the users, which can be compiled as modules,
will fail to build. Convert the macro to be a function and export it
to the modules to prevent build breakage.
Reported-by: Woody Suwalski <terraluna977@gmail.com>
Closes: https://lore.kernel.org/r/cc8a93b2-2504-9754-e26c-5d5c3bd1265c@gmail.com
Fixes: 2a49b45cd0e7 ("PNP: Add dev_is_pnp() macro")
Signed-off-by: Andy Shevchenko <andy.shevchenko@gmail.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
|
|
https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next
Daniel Borkmann says:
====================
pull-request: bpf-next 2024-05-28
We've added 23 non-merge commits during the last 11 day(s) which contain
a total of 45 files changed, 696 insertions(+), 277 deletions(-).
The main changes are:
1) Rename skb's mono_delivery_time to tstamp_type for extensibility
and add SKB_CLOCK_TAI type support to bpf_skb_set_tstamp(),
from Abhishek Chauhan.
2) Add netfilter CT zone ID and direction to bpf_ct_opts so that arbitrary
CT zones can be used from XDP/tc BPF netfilter CT helper functions,
from Brad Cowie.
3) Several tweaks to the instruction-set.rst IETF doc to address
the Last Call review comments, from Dave Thaler.
4) Small batch of riscv64 BPF JIT optimizations in order to emit more
compressed instructions to the JITed image for better icache efficiency,
from Xiao Wang.
5) Sort bpftool C dump output from BTF, aiming to simplify vmlinux.h
diffing and forcing more natural type definitions ordering,
from Mykyta Yatsenko.
6) Use DEV_STATS_INC() macro in BPF redirect helpers to silence
a syzbot/KCSAN race report for the tx_errors counter,
from Jiang Yunshui.
7) Un-constify bpf_func_info in bpftool to fix compilation with LLVM 17+
which started treating const structs as constants and thus breaking
full BTF program name resolution, from Ivan Babrou.
8) Fix up BPF program numbers in test_sockmap selftest in order to reduce
some of the test-internal array sizes, from Geliang Tang.
9) Small cleanup in Makefile.btf script to use test-ge check for v1.25-only
pahole, from Alan Maguire.
10) Fix bpftool's make dependencies for vmlinux.h in order to avoid needless
rebuilds in some corner cases, from Artem Savkov.
* tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (23 commits)
bpf, net: Use DEV_STAT_INC()
bpf, docs: Fix instruction.rst indentation
bpf, docs: Clarify call local offset
bpf, docs: Add table captions
bpf, docs: clarify sign extension of 64-bit use of 32-bit imm
bpf, docs: Use RFC 2119 language for ISA requirements
bpf, docs: Move sentence about returning R0 to abi.rst
bpf: constify member bpf_sysctl_kern:: Table
riscv, bpf: Try RVC for reg move within BPF_CMPXCHG JIT
riscv, bpf: Use STACK_ALIGN macro for size rounding up
riscv, bpf: Optimize zextw insn with Zba extension
selftests/bpf: Handle forwarding of UDP CLOCK_TAI packets
net: Add additional bit to support clockid_t timestamp type
net: Rename mono_delivery_time to tstamp_type for scalabilty
selftests/bpf: Update tests for new ct zone opts for nf_conntrack kfuncs
net: netfilter: Make ct zone opts configurable for bpf ct helpers
selftests/bpf: Fix prog numbers in test_sockmap
bpf: Remove unused variable "prev_state"
bpftool: Un-const bpf_func_info to fix it for llvm 17 and newer
bpf: Fix order of args in call to bpf_map_kvcalloc
...
====================
Link: https://lore.kernel.org/r/20240528105924.30905-1-daniel@iogearbox.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
The @inode parameter wasn't documented leading to new doc build
warnings.
Fixes: f89ea63f1c65 ("netfs, 9p: Fix race between umount and async request completion")
Link: https://lore.kernel.org/r/20240528133050.7e09d78e@canb.auug.org.au
Signed-off-by: Christian Brauner <brauner@kernel.org>
|