Age | Commit message (Collapse) | Author |
|
gss_read_common_verf() is now just a wrapper for dup_netobj(), thus
it can be replaced with direct calls to dup_netobj().
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
|
|
Pre-requisite to replacing gss_read_common_verf().
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
|
|
Since upcalls are infrequent, ensure the compiler places the upcall
mechanism out-of-line from the I/O path.
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
|
|
Done as part of hardening the server-side RPC header decoding path.
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
|
|
Done as part of hardening the server-side RPC header decoding path.
Since the server-side of the Linux kernel SunRPC implementation
ignores the contents of the Call's machinename field, there's no
need for its RPC_AUTH_UNIX authenticator to reject names that are
larger than UNX_MAXNODENAME.
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
|
|
Done as part of hardening the server-side RPC header decoding path.
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
|
|
RFC 5531 defines the body of an RPC Call message like this:
struct call_body {
unsigned int rpcvers;
unsigned int prog;
unsigned int vers;
unsigned int proc;
opaque_auth cred;
opaque_auth verf;
/* procedure-specific parameters start here */
};
In the current server code, decoding a struct opaque_auth type is
open-coded in several places, and is thus difficult to harden
everywhere.
Introduce a helper for decoding an opaque_auth within the context
of a xdr_stream. This helper can be shared with all authentication
flavor implemenations, even on the client-side.
Done as part of hardening the server-side RPC header decoding paths.
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
|
|
Refactor: So that the overhaul of each ->accept method can be done
in separate smaller patches, temporarily move the
svcxdr_init_decode() call into those methods.
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
|
|
Now that all vs_dispatch functions invoke svcxdr_init_decode(), it
is common code and can be pushed down into the generic RPC server.
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
|
|
Now with NFSD being able to cross into auto mounts,
the check can be removed.
Signed-off-by: Richard Weinberger <richard@nod.at>
Reviewed-by: Ian Kent <raven@themaw.net>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Acked-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
|
|
This function is only used by NFSD to cross mount points.
If a mount point is of type auto mount, follow_down() will
not uncover it. Add LOOKUP_AUTOMOUNT to the lookup flags
to have ->d_automount() called when NFSD walks down the
mount tree.
Signed-off-by: Richard Weinberger <richard@nod.at>
Reviewed-by: Ian Kent <raven@themaw.net>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Acked-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
|
|
Currently nfsd_mountpoint() tests for mount points using d_mountpoint(),
this works only when a mount point is already uncovered.
In our case the mount point is of type auto mount and can be coverted.
i.e. ->d_automount() was not called.
Using d_managed() nfsd_mountpoint() can test whether a mount point is
either already uncovered or can be uncovered later.
Signed-off-by: Richard Weinberger <richard@nod.at>
Reviewed-by: Ian Kent <raven@themaw.net>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
|
|
When we suspend into s2idle we also need to enable the interrupt line
that generates the MPD and HFB interrupts towards the host CPU interrupt
controller (typically the ARM GIC or MIPS L1) to make it exit s2idle.
When we suspend into other modes such as "standby" or "mem" we engage a
power management state machine which will gate off the CPU L1 controller
(priv->irq0) and ungate the side band wake-up interrupt (priv->wol_irq).
It is safe to have both enabled as wake-up sources because they are
mutually exclusive given any suspend mode.
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
This is a followup of commit 2558b8039d05 ("net: use a bounce
buffer for copying skb->mark")
x86 and powerpc define user_access_begin, meaning
that they are not able to perform user copy checks
when using user_write_access_begin() / unsafe_copy_to_user()
and friends [1]
Instead of waiting bugs to trigger on other arches,
add a check_object_size() in put_cmsg() to make sure
that new code tested on x86 with CONFIG_HARDENED_USERCOPY=y
will perform more security checks.
[1] We can not generically call check_object_size() from
unsafe_copy_to_user() because UACCESS is enabled at this point.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Kees Cook <keescook@chromium.org>
Acked-by: Kees Cook <keescook@chromium.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The recent merge from net left-over some unused code in
leftover.c - nomen omen.
Just drop the unused bits.
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can-next
Marc Kleine-Budde says:
====================
pull-request: can-next 2023-02-17 - fixed
this is a pull request of 4 patches for net-next/master.
The first patch is by Yang Li and converts the ctucanfd driver to
devm_platform_ioremap_resource().
The last 3 patches are by Frank Jungclaus, target the esd_usb driver
and contains preparations for the upcoming support of the esd
CAN-USB/3 hardware.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Since commit 81e164c4aec5 ("net: microchip: sparx5: Add automatic
selection of VCAP rule actionset") the VCAP API has the capability to
select automatically the actionset based on the actions that are attached
to the rule. So it is not needed anymore to hardcode the actionset in the
driver, therefore it is OK to remove this.
Signed-off-by: Horatiu Vultur <horatiu.vultur@microchip.com>
Reviewed-by: Alexander Lobakin <aleksander.lobakin@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Paolo Abeni says:
====================
net: default_rps_mask follow-up
The first patch namespacify the setting. In the common case, once
proper isolation is in place in the main namespace, forwarding
to/from each child netns will allways happen on the desidered CPUs.
Any additional RPS stage inside the child namespace will not provide
additional isolation and could hurt performance badly if picking a
CPU on a remote node.
The 2nd patch adds more self-tests coverage.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Explicitly check for child netns and main ns independency
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
That really was meant to be a per netns attribute from the beginning.
The idea is that once proper isolation is in place in the main
namespace, additional demux in the child namespaces will be redundant.
Let's make child netns default rps mask empty by default.
To avoid bloating the netns with a possibly large cpumask, allocate
it on-demand during the first write operation.
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless-next
Kalle Valo says:
====================
wireless-next patches for v6.3
Third set of patches for v6.3. This time only a set of small fixes
submitted during the last day or two.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Pablo Neira Ayuso says:
====================
Netfilter/IPVS updates for net-next
The following patchset contains Netfilter updates for net-next:
1) Add safeguard to check for NULL tupe in objects updates via
NFT_MSG_NEWOBJ, this should not ever happen. From Alok Tiwari.
2) Incorrect pointer check in the new destroy rule command,
from Yang Yingliang.
3) Incorrect status bitcheck in nf_conntrack_udp_packet(),
from Florian Westphal.
4) Simplify seq_print_acct(), from Ilia Gavrilov.
5) Use 2-arg optimal variant of kfree_rcu() in IPVS,
from Julian Anastasov.
6) TCP connection enters CLOSE state in conntrack for locally
originated TCP reset packet from the reject target,
from Florian Westphal.
The fixes #2 and #3 in this series address issues from the previous pull
nf-next request in this net-next cycle.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The vcap_admin structures in vcap_api_next_lookup_advanced_test()
take several hundred bytes of stack frame, but when CONFIG_KASAN_STACK
is enabled, each one of them also has extra padding before and after
it, which ends up blowing the warning limit:
In file included from drivers/net/ethernet/microchip/vcap/vcap_api.c:3521:
drivers/net/ethernet/microchip/vcap/vcap_api_kunit.c: In function 'vcap_api_next_lookup_advanced_test':
drivers/net/ethernet/microchip/vcap/vcap_api_kunit.c:1954:1: error: the frame size of 1448 bytes is larger than 1400 bytes [-Werror=frame-larger-than=]
1954 | }
Reduce the total stack usage by replacing the five structures with
an array that only needs one pair of padding areas.
Fixes: 1f741f001160 ("net: microchip: sparx5: Add KUNIT tests for enabling/disabling chains")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Alexander Lobakin <aleksander.lobakin@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
One local variable has become unused after a recent change:
drivers/net/ethernet/sfc/ef100_nic.c: In function 'ef100_probe_netdev_pf':
drivers/net/ethernet/sfc/ef100_nic.c:1155:21: error: unused variable 'net_dev' [-Werror=unused-variable]
struct net_device *net_dev = efx->net_dev;
^~~~~~~
The variable is still used in an #ifdef. Replace the #ifdef with
an if(IS_ENABLED()) check that lets the compiler see where it is
used, rather than adding another #ifdef.
This also fixes an uninitialized return value in ef100_probe_netdev_pf()
that gcc did not spot.
Fixes: 7e056e2360d9 ("sfc: obtain device mac address based on firmware handle for ef100")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Devlink reload patchset introduced regression. ICE_VSI_LB wasn't
taken into account when doing default allocation. Fix it by adding a
case for ICE_VSI_LB in ice_vsi_alloc_def().
Fixes: 6624e780a577 ("ice: split ice_vsi_setup into smaller functions")
Reported-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Acked-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Signed-off-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com>
Reviewed-by: Alexander Lobakin <aleksander.lobakin@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
There is a spelling mistake in a pci_warn message. Fix it.
Signed-off-by: Colin Ian King <colin.i.king@gmail.com>
Reviewed-by: Alejandro Lucero <alejandro.lucero-palau@amd.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
This patch adds workaround for below 2 HW erratas
1. Due to improper clock gating, NIXRX may free the same
NPA buffer multiple times.. to avoid this, always enable
NIX RX conditional clock.
2. NIX FIFO does not get initialized on reset, if the SMQ
flush is triggered before the first packet is processed, it
will lead to undefined state. The workaround to perform SMQ
flush only if packet count is non-zero in MDQ.
Signed-off-by: Geetha sowjanya <gakula@marvell.com>
Signed-off-by: Sunil Kovvuri Goutham <sgoutham@marvell.com>
Signed-off-by: Sai Krishna <saikrishnag@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
A PHY driver can use a static integer value to indicate what link mode
features it supports, i.e, its abilities.. This is the old way, but
useful when dynamically determining the devices features does not
work, e.g. support of fibre.
EEE support has been moved into phydev->supported_eee. This needs to
be set otherwise the code assumes EEE is not supported. It is normally
set as part of reading the devices abilities. However if a static
integer value was used, the dynamic reading of the abilities is not
performed. Add a call to genphy_c45_read_eee_abilities() to read the
EEE abilities.
Fixes: 8b68710a3121 ("net: phy: start using genphy_c45_ethtool_get/set_eee()")
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Oleksij Rempel <o.rempel@pengutronix.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Andrew Lunn says:
====================
Add additional phydev locks
The phydev lock should be held when accessing members of phydev, or
calling into the driver. Some of the phy_ethtool_ functions are
missing locks. Add them. To avoid deadlock the marvell driver is
modified since it calls one of the functions which gain locks, which
would result in a deadlock.
The missing locks have not caused noticeable issues, so these patches
are for net-next.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The phydev lock should be held while accessing members of phydev,
or calling into the driver.
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
phy_ethtool_get_eee() is about to gain locking of the phydev lock.
This means it cannot be used within a PHY driver without causing a
deadlock. Swap to using genphy_c45_ethtool_get_eee() which assumes the
lock has already been taken.
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
When the bcmgenet_mii_config() code was refactored it was missed
that the LED control for the MoCA interface got overwritten by
the port_ctrl value. Its previous programming is restored here.
Fixes: 4f8d81b77e66 ("net: bcmgenet: Refactor register access in bcmgenet_mii_config")
Signed-off-by: Doug Berger <opendmb@gmail.com>
Acked-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
When a file descriptor of pppol2tp socket is passed as file descriptor
of UDP socket, a recursive deadlock occurs in l2tp_tunnel_register().
This situation is reproduced by the following program:
int main(void)
{
int sock;
struct sockaddr_pppol2tp addr;
sock = socket(AF_PPPOX, SOCK_DGRAM, PX_PROTO_OL2TP);
if (sock < 0) {
perror("socket");
return 1;
}
addr.sa_family = AF_PPPOX;
addr.sa_protocol = PX_PROTO_OL2TP;
addr.pppol2tp.pid = 0;
addr.pppol2tp.fd = sock;
addr.pppol2tp.addr.sin_family = PF_INET;
addr.pppol2tp.addr.sin_port = htons(0);
addr.pppol2tp.addr.sin_addr.s_addr = inet_addr("192.168.0.1");
addr.pppol2tp.s_tunnel = 1;
addr.pppol2tp.s_session = 0;
addr.pppol2tp.d_tunnel = 0;
addr.pppol2tp.d_session = 0;
if (connect(sock, (const struct sockaddr *)&addr, sizeof(addr)) < 0) {
perror("connect");
return 1;
}
return 0;
}
This program causes the following lockdep warning:
============================================
WARNING: possible recursive locking detected
6.2.0-rc5-00205-gc96618275234 #56 Not tainted
--------------------------------------------
repro/8607 is trying to acquire lock:
ffff8880213c8130 (sk_lock-AF_PPPOX){+.+.}-{0:0}, at: l2tp_tunnel_register+0x2b7/0x11c0
but task is already holding lock:
ffff8880213c8130 (sk_lock-AF_PPPOX){+.+.}-{0:0}, at: pppol2tp_connect+0xa82/0x1a30
other info that might help us debug this:
Possible unsafe locking scenario:
CPU0
----
lock(sk_lock-AF_PPPOX);
lock(sk_lock-AF_PPPOX);
*** DEADLOCK ***
May be due to missing lock nesting notation
1 lock held by repro/8607:
#0: ffff8880213c8130 (sk_lock-AF_PPPOX){+.+.}-{0:0}, at: pppol2tp_connect+0xa82/0x1a30
stack backtrace:
CPU: 0 PID: 8607 Comm: repro Not tainted 6.2.0-rc5-00205-gc96618275234 #56
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.1-2.fc37 04/01/2014
Call Trace:
<TASK>
dump_stack_lvl+0x100/0x178
__lock_acquire.cold+0x119/0x3b9
? lockdep_hardirqs_on_prepare+0x410/0x410
lock_acquire+0x1e0/0x610
? l2tp_tunnel_register+0x2b7/0x11c0
? lock_downgrade+0x710/0x710
? __fget_files+0x283/0x3e0
lock_sock_nested+0x3a/0xf0
? l2tp_tunnel_register+0x2b7/0x11c0
l2tp_tunnel_register+0x2b7/0x11c0
? sprintf+0xc4/0x100
? l2tp_tunnel_del_work+0x6b0/0x6b0
? debug_object_deactivate+0x320/0x320
? lockdep_init_map_type+0x16d/0x7a0
? lockdep_init_map_type+0x16d/0x7a0
? l2tp_tunnel_create+0x2bf/0x4b0
? l2tp_tunnel_create+0x3c6/0x4b0
pppol2tp_connect+0x14e1/0x1a30
? pppol2tp_put_sk+0xd0/0xd0
? aa_sk_perm+0x2b7/0xa80
? aa_af_perm+0x260/0x260
? bpf_lsm_socket_connect+0x9/0x10
? pppol2tp_put_sk+0xd0/0xd0
__sys_connect_file+0x14f/0x190
__sys_connect+0x133/0x160
? __sys_connect_file+0x190/0x190
? lockdep_hardirqs_on+0x7d/0x100
? ktime_get_coarse_real_ts64+0x1b7/0x200
? ktime_get_coarse_real_ts64+0x147/0x200
? __audit_syscall_entry+0x396/0x500
__x64_sys_connect+0x72/0xb0
do_syscall_64+0x38/0xb0
entry_SYSCALL_64_after_hwframe+0x63/0xcd
This patch fixes the issue by getting/creating the tunnel before
locking the pppol2tp socket.
Fixes: 0b2c59720e65 ("l2tp: close all race conditions in l2tp_tunnel_register()")
Cc: Cong Wang <cong.wang@bytedance.com>
Signed-off-by: Shigeru Yoshida <syoshida@redhat.com>
Reviewed-by: Guillaume Nault <gnault@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
If the device is plugged/unplugged without giving time for mcp_init_work()
to complete, we might kick in the devm free code path and thus have
unavailable struct mcp_2221 while in delayed work.
Canceling the delayed_work item is enough to solve the issue, because
cancel_delayed_work_sync will prevent the work item to requeue itself.
Fixes: 960f9df7c620 ("HID: mcp2221: add ADC/DAC support via iio subsystem")
CC: stable@vger.kernel.org
Acked-by: Jiri Kosina <jkosina@suse.cz>
Link: https://lore.kernel.org/r/20230215-wip-mcp2221-v2-1-109f71fd036e@redhat.com
Signed-off-by: Benjamin Tissoires <benjamin.tissoires@redhat.com>
|
|
Eric Dumazet says:
====================
ipv6: icmp6: better drop reason support
This series aims to have more precise drop reason reports for icmp6.
This should reduce false positives on most usual cases.
This can be extended as needed later.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Change icmpv6_echo_reply() to return a drop reason.
For the moment, return NOT_SPECIFIED or SKB_CONSUMED.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Hosts can often receive neighbour discovery messages
that are not for them.
Use a dedicated drop reason to make clear the packet is dropped
for this normal case.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
This is a generic drop reason for any error detected
in ndisc_parse_options().
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Change ndisc_redirect_rcv() to return a drop reason.
For the moment, return PKT_TOO_SMALL, NOT_SPECIFIED
and values from icmpv6_notify().
More reasons are added later.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Change ndisc_router_discovery() to return a drop reason.
For the moment, return PKT_TOO_SMALL, NOT_SPECIFIED
and SKB_CONSUMED.
More reasons are added later.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Change ndisc_recv_rs() to return a drop reason.
For the moment, return PKT_TOO_SMALL, NOT_SPECIFIED
or SKB_CONSUMED. More reasons are added later.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Change ndisc_recv_na() to return a drop reason.
For the moment, return PKT_TOO_SMALL, NOT_SPECIFIED
or SKB_CONSUMED. More reasons are added later.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Change ndisc_recv_ns() to return a drop reason.
For the moment, return PKT_TOO_SMALL, NOT_SPECIFIED
or SKB_CONSUMED.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
kfree_skb() includes the location, it makes sense
to add it to consume_skb() as well.
After patch:
taskd_EventMana 8602 [004] 420.406239: skb:consume_skb: skbaddr=0xffff893a4a6d0500 location=unix_stream_read_generic
swapper 0 [011] 422.732607: skb:consume_skb: skbaddr=0xffff89597f68cee0 location=mlx4_en_free_tx_desc
discipline 9141 [043] 423.065653: skb:consume_skb: skbaddr=0xffff893a487e9c00 location=skb_consume_udp
swapper 0 [010] 423.073166: skb:consume_skb: skbaddr=0xffff8949ce9cdb00 location=icmpv6_rcv
borglet 8672 [014] 425.628256: skb:consume_skb: skbaddr=0xffff8949c42e9400 location=netlink_dump
swapper 0 [028] 426.263317: skb:consume_skb: skbaddr=0xffff893b1589dce0 location=net_rx_action
wget 14339 [009] 426.686380: skb:consume_skb: skbaddr=0xffff893a51b552e0 location=tcp_rcv_state_process
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Data passed to user-space with a (SOL_UDP, UDP_GRO) cmsg carries an
int (see udp_cmsg_recv), not a u16 value, as strace confirms:
recvmsg(8, {msg_name=...,
msg_iov=[{iov_base="\0\0..."..., iov_len=96000}],
msg_iovlen=1,
msg_control=[{cmsg_len=20, <-- sizeof(cmsghdr) + 4
cmsg_level=SOL_UDP,
cmsg_type=0x68}], <-- UDP_GRO
msg_controllen=24,
msg_flags=0}, 0) = 11200
Interpreting the data as an u16 value won't work on big-endian platforms.
Since it is too late to back out of this API decision [1], fix the test.
[1]: https://lore.kernel.org/netdev/20230131174601.203127-1-jakub@cloudflare.com/
Fixes: 3327a9c46352 ("selftests: add functionals test for UDP GRO")
Suggested-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
On default driver load device gets configured with unexpected
higher interrupt coalescing values instead of default expected
values as memory allocated from krealloc() is not supposed to
be zeroed out and may contain garbage values.
Fix this by allocating the memory of required size first with
kcalloc() and then use krealloc() to resize and preserve the
contents across down/up of the interface.
Signed-off-by: Manish Chopra <manishc@marvell.com>
Fixes: b0ec5489c480 ("qede: preserve per queue stats across up/down of interface")
Cc: stable@vger.kernel.org
Cc: Bhaskar Upadhaya <bupadhaya@marvell.com>
Cc: David S. Miller <davem@davemloft.net>
Link: https://bugzilla.redhat.com/show_bug.cgi?id=2160054
Signed-off-by: Alok Prasad <palok@marvell.com>
Signed-off-by: Ariel Elior <aelior@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
When we try to start AF_XDP on some machines with long running time, due
to the machine's memory fragmentation problem, there is no sufficient
contiguous physical memory that will cause the start failure.
If the size of the queue is 8 * 1024, then the size of the desc[] is
8 * 1024 * 8 = 16 * PAGE, but we also add struct xdp_ring size, so it is
16page+. This is necessary to apply for a 4-order memory. If there are a
lot of queues, it is difficult to these machine with long running time.
Here, that we actually waste 15 pages. 4-Order memory is 32 pages, but
we only use 17 pages.
This patch replaces __get_free_pages() by vmalloc() to allocate memory
to solve these problems.
Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Acked-by: Magnus Karlsson <magnus.karlsson@intel.com>
Reviewed-by: Alexander Lobakin <aleksander.lobakin@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
There is a certain probability that following
exceptions will occur in the wrk benchmark test:
Running 10s test @ http://11.213.45.6:80
8 threads and 64 connections
Thread Stats Avg Stdev Max +/- Stdev
Latency 3.72ms 13.94ms 245.33ms 94.17%
Req/Sec 1.96k 713.67 5.41k 75.16%
155262 requests in 10.10s, 23.10MB read
Non-2xx or 3xx responses: 3
We will find that the error is HTTP 400 error, which is a serious
exception in our test, which means the application data was
corrupted.
Consider the following scenarios:
CPU0 CPU1
buf_desc->used = 0;
cmpxchg(buf_desc->used, 0, 1)
deal_with(buf_desc)
memset(buf_desc->cpu_addr,0);
This will cause the data received by a victim connection to be cleared,
thus triggering an HTTP 400 error in the server.
This patch exchange the order between clear used and memset, add
barrier to ensure memory consistency.
Fixes: 1c5526968e27 ("net/smc: Clear memory when release and reuse buffer")
Signed-off-by: D. Wythe <alibuda@linux.alibaba.com>
Reviewed-by: Wenjia Zhang <wenjia@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
There is a certain chance to trigger the following panic:
PID: 5900 TASK: ffff88c1c8af4100 CPU: 1 COMMAND: "kworker/1:48"
#0 [ffff9456c1cc79a0] machine_kexec at ffffffff870665b7
#1 [ffff9456c1cc79f0] __crash_kexec at ffffffff871b4c7a
#2 [ffff9456c1cc7ab0] crash_kexec at ffffffff871b5b60
#3 [ffff9456c1cc7ac0] oops_end at ffffffff87026ce7
#4 [ffff9456c1cc7ae0] page_fault_oops at ffffffff87075715
#5 [ffff9456c1cc7b58] exc_page_fault at ffffffff87ad0654
#6 [ffff9456c1cc7b80] asm_exc_page_fault at ffffffff87c00b62
[exception RIP: ib_alloc_mr+19]
RIP: ffffffffc0c9cce3 RSP: ffff9456c1cc7c38 RFLAGS: 00010202
RAX: 0000000000000000 RBX: 0000000000000002 RCX: 0000000000000004
RDX: 0000000000000010 RSI: 0000000000000000 RDI: 0000000000000000
RBP: ffff88c1ea281d00 R8: 000000020a34ffff R9: ffff88c1350bbb20
R10: 0000000000000000 R11: 0000000000000001 R12: 0000000000000000
R13: 0000000000000010 R14: ffff88c1ab040a50 R15: ffff88c1ea281d00
ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018
#7 [ffff9456c1cc7c60] smc_ib_get_memory_region at ffffffffc0aff6df [smc]
#8 [ffff9456c1cc7c88] smcr_buf_map_link at ffffffffc0b0278c [smc]
#9 [ffff9456c1cc7ce0] __smc_buf_create at ffffffffc0b03586 [smc]
The reason here is that when the server tries to create a second link,
smc_llc_srv_add_link() has no protection and may add a new link to
link group. This breaks the security environment protected by
llc_conf_mutex.
Fixes: 2d2209f20189 ("net/smc: first part of add link processing as SMC server")
Signed-off-by: D. Wythe <alibuda@linux.alibaba.com>
Reviewed-by: Larysa Zaremba <larysa.zaremba@intel.com>
Reviewed-by: Wenjia Zhang <wenjia@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Vladimir Oltean says:
====================
taprio queueMaxSDU fixes
This fixes 3 issues noticed while attempting to reoffload the
dynamically calculated queueMaxSDU values. These are:
- Dynamic queueMaxSDU is not calculated correctly due to a lost patch
- Dynamically calculated queueMaxSDU needs to be clamped on the low end
- Dynamically calculated queueMaxSDU needs to be clamped on the high end
====================
Link: https://lore.kernel.org/r/20230215224632.2532685-1-vladimir.oltean@nxp.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|