summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2022-05-22net: ethernet: mtk_eth_soc: rely on rxd_size field in mtk_rx_alloc/mtk_rx_cleanLorenzo Bianconi
Remove mtk_rx_dma structure layout dependency in mtk_rx_alloc/mtk_rx_clean. Initialize to 0 rxd3 and rxd4 in mtk_rx_alloc. This is a preliminary patch to add mt7986 ethernet support. Tested-by: Sam Shih <sam.shih@mediatek.com> Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-05-22net: ethernet: mtk_eth_soc: rely on txd_size field in mtk_poll_tx/mtk_poll_rxLorenzo Bianconi
This is a preliminary to ad mt7986 ethernet support. Tested-by: Sam Shih <sam.shih@mediatek.com> Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-05-22net: ethernet: mtk_eth_soc: add rxd_size to mtk_soc_dataLorenzo Bianconi
Similar to tx counterpart, introduce rxd_size in mtk_soc_data data structure. This is a preliminary patch to add mt7986 ethernet support. Tested-by: Sam Shih <sam.shih@mediatek.com> Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-05-22net: ethernet: mtk_eth_soc: rely on txd_size in txd_to_idxLorenzo Bianconi
This is a preliminary patch to add mt7986 ethernet support. Tested-by: Sam Shih <sam.shih@mediatek.com> Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-05-22net: ethernet: mtk_eth_soc: rely on txd_size in mtk_desc_to_tx_bufLorenzo Bianconi
This is a preliminary patch to add mt7986 ethernet support. Tested-by: Sam Shih <sam.shih@mediatek.com> Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-05-22net: ethernet: mtk_eth_soc: rely on txd_size in mtk_tx_alloc/mtk_tx_cleanLorenzo Bianconi
This is a preliminary patch to add mt7986 ethernet support. Tested-by: Sam Shih <sam.shih@mediatek.com> Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-05-22net: ethernet: mtk_eth_soc: add txd_size to mtk_soc_dataLorenzo Bianconi
In order to remove mtk_tx_dma size dependency, introduce txd_size in mtk_soc_data data structure. Rely on txd_size in mtk_init_fq_dma() and mtk_dma_free() routines. This is a preliminary patch to add mt7986 ethernet support. Tested-by: Sam Shih <sam.shih@mediatek.com> Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-05-22net: ethernet: mtk_eth_soc: move tx dma desc configuration in ↵Lorenzo Bianconi
mtk_tx_set_dma_desc Move tx dma descriptor configuration in mtk_tx_set_dma_desc routine. This is a preliminary patch to introduce mt7986 ethernet support since it relies on a different tx dma descriptor layout. Tested-by: Sam Shih <sam.shih@mediatek.com> Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-05-22net: ethernet: mtk_eth_soc: rely on GFP_KERNEL for dma_alloc_coherent ↵Lorenzo Bianconi
whenever possible Rely on GFP_KERNEL for dma descriptors mappings in mtk_tx_alloc(), mtk_rx_alloc() and mtk_init_fq_dma() since they are run in non-irq context. Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-05-22dt-bindings: net: mediatek,net: add mt7986-eth bindingLorenzo Bianconi
Introduce dts bindings for mt7986 soc in mediatek,net.yaml. Reviewed-by: Rob Herring <robh@kernel.org> Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-05-22arm64: dts: mediatek: mt7986: introduce ethernet nodesLorenzo Bianconi
Introduce ethernet nodes in mt7986 bindings in order to enable mt7986a/mt7986b ethernet support. Co-developed-by: Sam Shih <sam.shih@mediatek.com> Signed-off-by: Sam Shih <sam.shih@mediatek.com> Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-05-22Merge branch 'net-gcc12-warnings'David S. Miller
Jakub Kicinski says: ==================== eth: silence the GCC 12 array-bounds warnings Silence the array-bounds warnings in Ethernet drivers. v2 uses -Wno-array-bounds directly. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2022-05-22eth: tg3: silence the GCC 12 array-bounds warningJakub Kicinski
GCC 12 currently generates a rather inconsistent warning: drivers/net/ethernet/broadcom/tg3.c:17795:51: warning: array subscript 5 is above array bounds of ‘struct tg3_napi[5]’ [-Warray-bounds] 17795 | struct tg3_napi *tnapi = &tp->napi[i]; | ~~~~~~~~^~~ i is guaranteed < tp->irq_max which in turn is either 1 or 5. There are more loops like this one in the driver, but strangely GCC 12 dislikes only this single one. Silence this silliness for now. Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-05-22eth: ice: silence the GCC 12 array-bounds warningJakub Kicinski
GCC 12 gets upset because driver allocates partial struct ice_aqc_sw_rules_elem buffers. The writes are within bounds. Silence these warnings for now, our build bot runs GCC 12 so we won't allow any new instances. Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-05-22eth: mtk_eth_soc: silence the GCC 12 array-bounds warningJakub Kicinski
GCC 12 gets upset because in mtk_foe_entry_commit_subflow() this driver allocates a partial structure. The writes are within bounds. Silence these warnings for now, our build bot runs GCC 12 so we won't allow any new instances. Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-05-22Merge branch 'dpaa2-swtso-fixes'David S. Miller
Ioana Ciornei says: ==================== dpaa2-eth: software TSO fixes This patch fixes the software TSO feature in dpaa2-eth. There are multiple errors that I made in the initial submission of the code, which I didn't caught since I was always running with passthough IOMMU. The bug report came in bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=215886 The bugs are in the Tx confirmation path, where I was trying to retrieve a virtual address after DMA unmapping the area. Besides that, another dma_unmap call was made with the wrong size. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2022-05-22dpaa2-eth: unmap the SGT buffer before accessing its contentsIoana Ciornei
DMA unmap the Scatter/Gather table before going through the array to unmap and free each of the header and data chunks. This is so we do not touch the data between the dma_map and dma_unmap calls. Fixes: 3dc709e0cd47 ("dpaa2-eth: add support for software TSO") Signed-off-by: Ioana Ciornei <ioana.ciornei@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-05-22dpaa2-eth: use the correct software annotation fieldIoana Ciornei
The incorrect software annotation field was being used, swa->sg.sgt_size instead of swa->tso.sgt_size, which meant that the SGT buffer was unmapped with a wrong size. This is also confirmed by the DMA API debug prints which showed the following: [ 38.962434] DMA-API: fsl_dpaa2_eth dpni.2: device driver frees DMA memory with different size [device address=0x0000fffffafba740] [map size=224 bytes] [unmap size=0 bytes] [ 38.980496] WARNING: CPU: 11 PID: 1131 at kernel/dma/debug.c:973 check_unmap+0x58c/0x9b0 [ 38.988586] Modules linked in: [ 38.991631] CPU: 11 PID: 1131 Comm: iperf3 Not tainted 5.18.0-rc7-00117-g59130eeb2b8f #1972 [ 38.999970] Hardware name: NXP Layerscape LX2160ARDB (DT) Fixes: 3dc709e0cd47 ("dpaa2-eth: add support for software TSO") Signed-off-by: Ioana Ciornei <ioana.ciornei@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-05-22dpaa2-eth: retrieve the virtual address before dma_unmapIoana Ciornei
The TSO header was DMA unmapped before the virtual address was retrieved and then used to free the buffer. This meant that we were actually removing the DMA map and then trying to search for it to help in retrieving the virtual address. This lead to a invalid virtual address being used in the kfree call. Fix this by calling dpaa2_iova_to_virt() prior to the dma_unmap call. [ 487.231819] Unable to handle kernel paging request at virtual address fffffd9807000008 (...) [ 487.354061] Hardware name: SolidRun LX2160A Honeycomb (DT) [ 487.359535] pstate: a0400005 (NzCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--) [ 487.366485] pc : kfree+0xac/0x304 [ 487.369799] lr : kfree+0x204/0x304 [ 487.373191] sp : ffff80000c4eb120 [ 487.376493] x29: ffff80000c4eb120 x28: ffff662240c46400 x27: 0000000000000001 [ 487.383621] x26: 0000000000000001 x25: ffff662246da0cc0 x24: ffff66224af78000 [ 487.390748] x23: ffffad184f4ce008 x22: ffffad1850185000 x21: ffffad1838d13cec [ 487.397874] x20: ffff6601c0000000 x19: fffffd9807000000 x18: 0000000000000000 [ 487.405000] x17: ffffb910cdc49000 x16: ffffad184d7d9080 x15: 0000000000004000 [ 487.412126] x14: 0000000000000008 x13: 000000000000ffff x12: 0000000000000000 [ 487.419252] x11: 0000000000000004 x10: 0000000000000001 x9 : ffffad184d7d927c [ 487.426379] x8 : 0000000000000000 x7 : 0000000ffffffd1d x6 : ffff662240a94900 [ 487.433505] x5 : 0000000000000003 x4 : 0000000000000009 x3 : ffffad184f4ce008 [ 487.440632] x2 : ffff662243eec000 x1 : 0000000100000100 x0 : fffffc0000000000 [ 487.447758] Call trace: [ 487.450194] kfree+0xac/0x304 [ 487.453151] dpaa2_eth_free_tx_fd.isra.0+0x33c/0x3e0 [fsl_dpaa2_eth] [ 487.459507] dpaa2_eth_tx_conf+0x100/0x2e0 [fsl_dpaa2_eth] [ 487.464989] dpaa2_eth_poll+0xdc/0x380 [fsl_dpaa2_eth] Fixes: 3dc709e0cd47 ("dpaa2-eth: add support for software TSO") Link: https://bugzilla.kernel.org/show_bug.cgi?id=215886 Signed-off-by: Ioana Ciornei <ioana.ciornei@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-05-22net: mscc: ocelot: offload tc action "ok" using an empty action vectorVladimir Oltean
The "ok" tc action is useful when placed in front of a more generic filter to exclude some more specific rules from matching it. The ocelot switches can offload this tc action by creating an empty action vector (no _ENA fields set to 1). This makes sense for all of VCAP IS1, IS2 and ES0 (but not for PSFP). Add support for this action. Note that this makes the gact_drop_and_ok_test() selftest pass, where "action ok" is used in front of an "action drop" rule, both offloaded to VCAP IS2. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-05-22Merge branch 'ocelot-selftests'David S. Miller
Vladimir Oltean says: ==================== Streamline Ocelot tc-chains selftest This series changes the output and the argument format of the Ocelot switch selftest so that it is more similar to what can be found in tools/testing/selftests/net/forwarding/. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2022-05-22selftests: ocelot: tc_flower_chains: reorder interfacesVladimir Oltean
Use the standard interface order h1, swp1, swp2, h2 that is used by the forwarding selftest framework. The previous order was confusing even with the ASCII drawing. That isn't needed anymore. This also drops the fixed MAC addresses and uses STABLE_MAC_ADDRS, which ensures the MAC addresses are unique. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-05-22selftests: ocelot: tc_flower_chains: use conventional interface namesVladimir Oltean
This is a robotic rename as follows: eth0 -> swp1 eth1 -> swp2 eth2 -> h2 eth3 -> h1 This brings the selftest more in line with the other forwarding selftests, where h1 is connected to swp1, and h2 to swp2. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-05-22selftests: ocelot: tc_flower_chains: streamline test outputVladimir Oltean
Bring this driver-specific selftest output in line with the other selftests. Before: Testing VLAN pop.. OK Testing VLAN push.. OK Testing ingress VLAN modification.. OK Testing egress VLAN modification.. OK Testing frame prioritization.. OK After: TEST: VLAN pop [ OK ] TEST: VLAN push [ OK ] TEST: Ingress VLAN modification [ OK ] TEST: Egress VLAN modification [ OK ] TEST: Frame prioritization [ OK ] Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-05-22net: wrap the wireless pointers in struct net_device in an ifdefJakub Kicinski
Most protocol-specific pointers in struct net_device are under a respective ifdef. Wireless is the notable exception. Since there's a sizable number of custom-built kernels for datacenter workloads which don't build wireless it seems reasonable to ifdefy those pointers as well. While at it move IPv4 and IPv6 pointers up, those are special for obvious reasons. Acked-by: Johannes Berg <johannes@sipsolutions.net> Acked-by: Stefan Schmidt <stefan@datenfreihafen.org> # ieee802154 Acked-by: Sven Eckelmann <sven@narfation.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-05-22net: fec: Do proper error checking for enet_out clkUwe Kleine-König
An error code returned by devm_clk_get() might have other meanings than "This clock doesn't exist". So use devm_clk_get_optional() and handle all remaining errors as fatal. Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-05-22hinic: Avoid some over memory allocationChristophe JAILLET
'prod_idx' (atomic_t) is larger than 'shadow_idx' (u16), so some memory is over-allocated. Fixes: b15a9f37be2b ("net-next/hinic: Add wq") Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-05-22net: fec: Do proper error checking for optional clksUwe Kleine-König
An error code returned by devm_clk_get() might have other meanings than "This clock doesn't exist". So use devm_clk_get_optional() and handle all remaining errors as fatal. Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-05-22net: phy: DP83822: enable rgmii mode if phy_interface_is_rgmiiTommaso Merciai
RGMII mode can be enable from dp83822 straps, and also writing bit 9 of register 0x17 - RMII and Status Register (RCSR). When phy_interface_is_rgmii rgmii mode must be enabled, same for contrary, this prevents malconfigurations of hw straps References: - https://www.ti.com/lit/gpn/dp83822i p66 Signed-off-by: Tommaso Merciai <tommaso.merciai@amarulasolutions.com> Co-developed-by: Michael Trimarchi <michael@amarulasolutions.com> Suggested-by: Alberto Bianchi <alberto.bianchi@amarulasolutions.com> Tested-by: Tommaso Merciai <tommaso.merciai@amarulasolutions.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-05-22net: selftests: Add stress_reuseport_listen to .gitignoreMuhammad Usama Anjum
Add newly added stress_reuseport_listen object to .gitignore file. Fixes: ec8cb4f617a2 ("net: selftests: Stress reuseport listen") Signed-off-by: Muhammad Usama Anjum <usama.anjum@collabora.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-05-22random: check for signals after page of pool writesJason A. Donenfeld
get_random_bytes_user() checks for signals after producing a PAGE_SIZE worth of output, just like /dev/zero does. write_pool() is doing basically the same work (actually, slightly more expensive), and so should stop to check for signals in the same way. Let's also name it write_pool_user() to match get_random_bytes_user(), so this won't be misused in the future. Before this patch, massive writes to /dev/urandom would tie up the process for an extremely long time and make it unterminatable. After, it can be successfully interrupted. The following test program can be used to see this works as intended: #include <unistd.h> #include <fcntl.h> #include <signal.h> #include <stdio.h> static unsigned char x[~0U]; static void handle(int) { } int main(int argc, char *argv[]) { pid_t pid = getpid(), child; int fd; signal(SIGUSR1, handle); if (!(child = fork())) { for (;;) kill(pid, SIGUSR1); } fd = open("/dev/urandom", O_WRONLY); pause(); printf("interrupted after writing %zd bytes\n", write(fd, x, sizeof(x))); close(fd); kill(child, SIGTERM); return 0; } Result before: "interrupted after writing 2147479552 bytes" Result after: "interrupted after writing 4096 bytes" Cc: Dominik Brodowski <linux@dominikbrodowski.net> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2022-05-22Merge branch 'rxrpc-fixes'David S. Miller
David Howells says: ==================== rxrpc: Miscellaneous fixes Here are some fixes for AF_RXRPC: (1) Fix listen() allowing preallocation to overrun the prealloc buffer. (2) Prevent resending the request if we've seen the reply starting to arrive. (3) Fix accidental sharing of ACK state between transmission and reception. (4) Ignore ACKs in which ack.previousPacket regresses. This indicates the highest DATA number so far seen, so should not be seen to go backwards. (5) Fix the determination of when to generate an IDLE-type ACK, simplifying it so that we generate one if we have more than two DATA packets that aren't hard-acked (consumed) or soft-acked (in the rx buffer, but could be discarded and re-requested). ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2022-05-22rxrpc: Fix decision on when to generate an IDLE ACKDavid Howells
Fix the decision on when to generate an IDLE ACK by keeping a count of the number of packets we've received, but not yet soft-ACK'd, and the number of packets we've processed, but not yet hard-ACK'd, rather than trying to keep track of which DATA sequence numbers correspond to those points. We then generate an ACK when either counter exceeds 2. The counters are both cleared when we transcribe the information into any sort of ACK packet for transmission. IDLE and DELAY ACKs are skipped if both counters are 0 (ie. no change). Fixes: 805b21b929e2 ("rxrpc: Send an ACK after every few DATA packets we receive") Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org Signed-off-by: David S. Miller <davem@davemloft.net>
2022-05-22rxrpc: Don't let ack.previousPacket regressDavid Howells
The previousPacket field in the rx ACK packet should never go backwards - it's now the highest DATA sequence number received, not the last on received (it used to be used for out of sequence detection). Fixes: 248f219cb8bc ("rxrpc: Rewrite the data and ack handling code") Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org Signed-off-by: David S. Miller <davem@davemloft.net>
2022-05-22rxrpc: Fix overlapping ACK accountingDavid Howells
Fix accidental overlapping of Rx-phase ACK accounting with Tx-phase ACK accounting through variables shared between the two. call->acks_* members refer to ACKs received in the Tx phase and call->ackr_* members to ACKs sent/to be sent during the Rx phase. Fixes: 1a2391c30c0b ("rxrpc: Fix detection of out of order acks") Signed-off-by: David Howells <dhowells@redhat.com> cc: Jeffrey Altman <jaltman@auristor.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org Signed-off-by: David S. Miller <davem@davemloft.net>
2022-05-22rxrpc: Don't try to resend the request if we're receiving the replyDavid Howells
rxrpc has a timer to trigger resending of unacked data packets in a call. This is not cancelled when a client call switches to the receive phase on the basis that most calls don't last long enough for it to ever expire. However, if it *does* expire after we've started to receive the reply, we shouldn't then go into trying to retransmit or pinging the server to find out if an ack got lost. Fix this by skipping the resend code if we're into receiving the reply to a client call. Fixes: 17926a79320a ("[AF_RXRPC]: Provide secure RxRPC sockets for use by userspace and kernel both") Signed-off-by: David Howells <dhowells@redhat.com> cc: linux-afs@lists.infradead.org Signed-off-by: David S. Miller <davem@davemloft.net>
2022-05-22rxrpc: Fix listen() setting the bar too high for the prealloc ringsDavid Howells
AF_RXRPC's listen() handler lets you set the backlog up to 32 (if you bump up the sysctl), but whilst the preallocation circular buffers have 32 slots in them, one of them has to be a dead slot because we're using CIRC_CNT(). This means that listen(rxrpc_sock, 32) will cause an oops when the socket is closed because rxrpc_service_prealloc_one() allocated one too many calls and rxrpc_discard_prealloc() won't then be able to get rid of them because it'll think the ring is empty. rxrpc_release_calls_on_socket() then tries to abort them, but oopses because call->peer isn't yet set. Fix this by setting the maximum backlog to RXRPC_BACKLOG_MAX - 1 to match the ring capacity. BUG: kernel NULL pointer dereference, address: 0000000000000086 ... RIP: 0010:rxrpc_send_abort_packet+0x73/0x240 [rxrpc] Call Trace: <TASK> ? __wake_up_common_lock+0x7a/0x90 ? rxrpc_notify_socket+0x8e/0x140 [rxrpc] ? rxrpc_abort_call+0x4c/0x60 [rxrpc] rxrpc_release_calls_on_socket+0x107/0x1a0 [rxrpc] rxrpc_release+0xc9/0x1c0 [rxrpc] __sock_release+0x37/0xa0 sock_close+0x11/0x20 __fput+0x89/0x240 task_work_run+0x59/0x90 do_exit+0x319/0xaa0 Fixes: 00e907127e6f ("rxrpc: Preallocate peers, conns and calls for incoming service requests") Reported-by: Marc Dionne <marc.dionne@auristor.com> Signed-off-by: David Howells <dhowells@redhat.com> cc: linux-afs@lists.infradead.org Link: https://lists.infradead.org/pipermail/linux-afs/2022-March/005079.html Signed-off-by: David S. Miller <davem@davemloft.net>
2022-05-22Merge branch 'rxrpc-misc'David S. Miller
David Howells says: ==================== rxrpc: Miscellaneous changes Here are some miscellaneous changes for AF_RXRPC: (1) Allow the list of local endpoints to be viewed through /proc. (2) Switch to using refcount_t for refcounting. (3) Fix a locking issue found by lockdep. (4) Autogenerate tracing symbol enums from symbol->string maps to make it easier to keep them in sync. (5) Return an error to sendmsg() if a call it tried to set up failed. Because it failed at this point, no notification will be generated for recvmsg to pick up - but userspace still needs to know about the failure. (6) Fix the selection of abort codes generated by internal events. In particular, rxrpc and kafs shouldn't be generating RX_USER_ABORT unless it's because userspace did something to cancel a call. (7) Adjust the interpretation and handling of certain ACK types to try and detect NAT changes causing a call to seem to start mid-flow from a different peer. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2022-05-22afs: Adjust ACK interpretation to try and cope with NATDavid Howells
If a client's address changes, say if it is NAT'd, this can disrupt an in progress operation. For most operations, this is not much of a problem, but StoreData can be different as some servers modify the target file as the data comes in, so if a store request is disrupted, the file can get corrupted on the server. The problem is that the server doesn't recognise packets that come after the change of address as belonging to the original client and will bounce them, either by sending an OUT_OF_SEQUENCE ACK to the apparent new call if the packet number falls within the initial sequence number window of a call or by sending an EXCEEDS_WINDOW ACK if it falls outside and then aborting it. In both cases, firstPacket will be 1 and previousPacket will be 0 in the ACK information. Fix this by the following means: (1) If a client call receives an EXCEEDS_WINDOW ACK with firstPacket as 1 and previousPacket as 0, assume this indicates that the server saw the incoming packets from a different peer and thus as a different call. Fail the call with error -ENETRESET. (2) Also fail the call if a similar OUT_OF_SEQUENCE ACK occurs if the first packet has been hard-ACK'd. If it hasn't been hard-ACK'd, the ACK packet will cause it to get retransmitted, so the call will just be repeated. (3) Make afs_select_fileserver() treat -ENETRESET as a straight fail of the operation. (4) Prioritise the error code over things like -ECONNRESET as the server did actually respond. (5) Make writeback treat -ENETRESET as a retryable error and make it redirty all the pages involved in a write so that the VM will retry. Note that there is still a circumstance that I can't easily deal with: if the operation is fully received and processed by the server, but the reply is lost due to address change. There's no way to know if the op happened. We can examine the server, but a conflicting change could have been made by a third party - and we can't tell the difference. In such a case, a message like: kAFS: vnode modified {100058:146266} b7->b8 YFS.StoreData64 (op=2646a) will be logged to dmesg on the next op to touch the file and the client will reset the inode state, including invalidating clean parts of the pagecache. Reported-by: Marc Dionne <marc.dionne@auristor.com> Signed-off-by: David Howells <dhowells@redhat.com> cc: linux-afs@lists.infradead.org Link: http://lists.infradead.org/pipermail/linux-afs/2021-December/004811.html # v1 Signed-off-by: David S. Miller <davem@davemloft.net>
2022-05-22rxrpc, afs: Fix selection of abort codesDavid Howells
The RX_USER_ABORT code should really only be used to indicate that the user of the rxrpc service (ie. userspace) implicitly caused a call to be aborted - for instance if the AF_RXRPC socket is closed whilst the call was in progress. (The user may also explicitly abort a call and specify the abort code to use). Change some of the points of generation to use other abort codes instead: (1) Abort the call with RXGEN_SS_UNMARSHAL or RXGEN_CC_UNMARSHAL if we see ENOMEM and EFAULT during received data delivery and abort with RX_CALL_DEAD in the default case. (2) Abort with RXGEN_SS_MARSHAL if we get ENOMEM whilst trying to send a reply. (3) Abort with RX_CALL_DEAD if we stop hearing from the peer if we had heard from the peer and abort with RX_CALL_TIMEOUT if we hadn't. (4) Abort with RX_CALL_DEAD if we try to disconnect a call that's not completed successfully or been aborted. Reported-by: Jeffrey Altman <jaltman@auristor.com> Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org Signed-off-by: David S. Miller <davem@davemloft.net>
2022-05-22rxrpc: Return an error to sendmsg if call failedDavid Howells
If at the end of rxrpc sendmsg() or rxrpc_kernel_send_data() the call that was being given data was aborted remotely or otherwise failed, return an error rather than returning the amount of data buffered for transmission. The call (presumably) did not complete, so there's not much point continuing with it. AF_RXRPC considers it "complete" and so will be unwilling to do anything else with it - and won't send a notification for it, deeming the return from sendmsg sufficient. Not returning an error causes afs to incorrectly handle a StoreData operation that gets interrupted by a change of address due to NAT reconfiguration. This doesn't normally affect most operations since their request parameters tend to fit into a single UDP packet and afs_make_call() returns before the server responds; StoreData is different as it involves transmission of a lot of data. This can be triggered on a client by doing something like: dd if=/dev/zero of=/afs/example.com/foo bs=1M count=512 at one prompt, and then changing the network address at another prompt, e.g.: ifconfig enp6s0 inet 192.168.6.2 && route add 192.168.6.1 dev enp6s0 Tracing packets on an Auristor fileserver looks something like: 192.168.6.1 -> 192.168.6.3 RX 107 ACK Idle Seq: 0 Call: 4 Source Port: 7000 Destination Port: 7001 192.168.6.3 -> 192.168.6.1 AFS (RX) 1482 FS Request: Unknown(64538) (64538) 192.168.6.3 -> 192.168.6.1 AFS (RX) 1482 FS Request: Unknown(64538) (64538) 192.168.6.1 -> 192.168.6.3 RX 107 ACK Idle Seq: 0 Call: 4 Source Port: 7000 Destination Port: 7001 <ARP exchange for 192.168.6.2> 192.168.6.2 -> 192.168.6.1 AFS (RX) 1482 FS Request: Unknown(0) (0) 192.168.6.2 -> 192.168.6.1 AFS (RX) 1482 FS Request: Unknown(0) (0) 192.168.6.1 -> 192.168.6.2 RX 107 ACK Exceeds Window Seq: 0 Call: 4 Source Port: 7000 Destination Port: 7001 192.168.6.1 -> 192.168.6.2 RX 74 ABORT Seq: 0 Call: 4 Source Port: 7000 Destination Port: 7001 192.168.6.1 -> 192.168.6.2 RX 74 ABORT Seq: 29321 Call: 4 Source Port: 7000 Destination Port: 7001 The Auristor fileserver logs code -453 (RXGEN_SS_UNMARSHAL), but the abort code received by kafs is -5 (RX_PROTOCOL_ERROR) as the rx layer sees the condition and generates an abort first and the unmarshal error is a consequence of that at the application layer. Reported-by: Marc Dionne <marc.dionne@auristor.com> Signed-off-by: David Howells <dhowells@redhat.com> cc: linux-afs@lists.infradead.org Link: http://lists.infradead.org/pipermail/linux-afs/2021-December/004810.html # v1 Signed-off-by: David S. Miller <davem@davemloft.net>
2022-05-22rxrpc: Automatically generate trace tag enumsDavid Howells
Automatically generate trace tag enums from the symbol -> string mapping tables rather than having the enums as well, thereby reducing duplicated data. Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org Signed-off-by: David S. Miller <davem@davemloft.net>
2022-05-22rxrpc: Fix locking issueDavid Howells
There's a locking issue with the per-netns list of calls in rxrpc. The pieces of code that add and remove a call from the list use write_lock() and the calls procfile uses read_lock() to access it. However, the timer callback function may trigger a removal by trying to queue a call for processing and finding that it's already queued - at which point it has a spare refcount that it has to do something with. Unfortunately, if it puts the call and this reduces the refcount to 0, the call will be removed from the list. Unfortunately, since the _bh variants of the locking functions aren't used, this can deadlock. ================================ WARNING: inconsistent lock state 5.18.0-rc3-build4+ #10 Not tainted -------------------------------- inconsistent {SOFTIRQ-ON-W} -> {IN-SOFTIRQ-W} usage. ksoftirqd/2/25 [HC0[0]:SC1[1]:HE1:SE0] takes: ffff888107ac4038 (&rxnet->call_lock){+.?.}-{2:2}, at: rxrpc_put_call+0x103/0x14b {SOFTIRQ-ON-W} state was registered at: ... Possible unsafe locking scenario: CPU0 ---- lock(&rxnet->call_lock); <Interrupt> lock(&rxnet->call_lock); *** DEADLOCK *** 1 lock held by ksoftirqd/2/25: #0: ffff8881008ffdb0 ((&call->timer)){+.-.}-{0:0}, at: call_timer_fn+0x5/0x23d Changes ======= ver #2) - Changed to using list_next_rcu() rather than rcu_dereference() directly. Fixes: 17926a79320a ("[AF_RXRPC]: Provide secure RxRPC sockets for use by userspace and kernel both") Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org Signed-off-by: David S. Miller <davem@davemloft.net>
2022-05-22rxrpc: Use refcount_t rather than atomic_tDavid Howells
Move to using refcount_t rather than atomic_t for refcounts in rxrpc. Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org Signed-off-by: David S. Miller <davem@davemloft.net>
2022-05-22rxrpc: Allow list of in-use local UDP endpoints to be viewed in /procDavid Howells
Allow the list of in-use local UDP endpoints in the current network namespace to be viewed in /proc. To aid with this, the endpoint list is converted to an hlist and RCU-safe manipulation is used so that the list can be read with only the RCU read lock held. Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org Signed-off-by: David S. Miller <davem@davemloft.net>
2022-05-22Linux 5.18v5.18Linus Torvalds
2022-05-22Merge branch 'ipa-next'David S. Miller
Alex Elder says: ==================== net: ipa: a few more small items This series consists of three small sets of changes. Version 2 adds a patch that avoids a warning that occurs when handling a modem crash (I unfortunately didn't notice it earlier). All other patches are the same--just rebased. The first three patches allow a few endpoint features to be specified. At this time, currently-defined endpoints retain the same configuration, but when the monitor functionality is added in the next cycle these options will be required. The fourth patch simply removes an unused function, explaining also why it would likely never be used. The fifth patch is new. It counts the number of modem TX endpoints and uses it to determine how many TREs a transaction needs when when handling a modem crash. It is needed to avoid exceeding the limited number of commands imposed by the last four patches. And the last four patches refactor code related to IPA immediate commands, eliminating an unused field and then simplifying and removing some unneeded code. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2022-05-22net: ipa: use data space for command opcodesAlex Elder
The 64-bit data field in a transaction is not used for commands. And the opcode array is *only* used for commands. They're (currently) the same size; save a little space in the transaction structure by enclosing the two fields in a union. Signed-off-by: Alex Elder <elder@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-05-22net: ipa: remove command info poolAlex Elder
The ipa_cmd_info structure now contains only one field, and it's an enumerated type whose values all fit in 8 bits. Currently we'll never use more than 8 TREs in a command transaction, and we can represent that number of command opcodes in the same space as a 64 bit pointer to an ipa_cmd_info structure. Define IPA_COMMAND_TRANS_TRE_MAX as the maximum number of TREs that can be in a command transaction. Replace the info pointer in a transaction with a fixed-size array named cmd_opcode[] of that many bytes. Store the opcode in this array when adding a command TRE to a transaction, as was done previously for the info array. This makes the ipa_cmd_info unused, so get rid of it. When committing an immediate command transaction, use the channel's Boolean command flag to determine whether to fill in the opcode, which will be taken (as before) from the array in the transaction. This makes the command info pool unnecessary, so get rid of it. Signed-off-by: Alex Elder <elder@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-05-22net: ipa: remove command direction argumentAlex Elder
We no longer use the direction argument for gsi_trans_cmd_add(), so get rid of it in its definition, and in its seven callers. Signed-off-by: Alex Elder <elder@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>