Age | Commit message (Collapse) | Author |
|
|
|
UDP sendmsg() is lockless, so ip_select_ident_segs()
can very well be run from multiple cpus [1]
Convert inet->inet_id to an atomic_t, but implement
a dedicated path for TCP, avoiding cost of a locked
instruction (atomic_add_return())
Note that this patch will cause a trivial merge conflict
because we added inet->flags in net-next tree.
v2: added missing change in
drivers/net/ethernet/chelsio/inline_crypto/chtls/chtls_cm.c
(David Ahern)
[1]
BUG: KCSAN: data-race in __ip_make_skb / __ip_make_skb
read-write to 0xffff888145af952a of 2 bytes by task 7803 on cpu 1:
ip_select_ident_segs include/net/ip.h:542 [inline]
ip_select_ident include/net/ip.h:556 [inline]
__ip_make_skb+0x844/0xc70 net/ipv4/ip_output.c:1446
ip_make_skb+0x233/0x2c0 net/ipv4/ip_output.c:1560
udp_sendmsg+0x1199/0x1250 net/ipv4/udp.c:1260
inet_sendmsg+0x63/0x80 net/ipv4/af_inet.c:830
sock_sendmsg_nosec net/socket.c:725 [inline]
sock_sendmsg net/socket.c:748 [inline]
____sys_sendmsg+0x37c/0x4d0 net/socket.c:2494
___sys_sendmsg net/socket.c:2548 [inline]
__sys_sendmmsg+0x269/0x500 net/socket.c:2634
__do_sys_sendmmsg net/socket.c:2663 [inline]
__se_sys_sendmmsg net/socket.c:2660 [inline]
__x64_sys_sendmmsg+0x57/0x60 net/socket.c:2660
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x41/0xc0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x63/0xcd
read to 0xffff888145af952a of 2 bytes by task 7804 on cpu 0:
ip_select_ident_segs include/net/ip.h:541 [inline]
ip_select_ident include/net/ip.h:556 [inline]
__ip_make_skb+0x817/0xc70 net/ipv4/ip_output.c:1446
ip_make_skb+0x233/0x2c0 net/ipv4/ip_output.c:1560
udp_sendmsg+0x1199/0x1250 net/ipv4/udp.c:1260
inet_sendmsg+0x63/0x80 net/ipv4/af_inet.c:830
sock_sendmsg_nosec net/socket.c:725 [inline]
sock_sendmsg net/socket.c:748 [inline]
____sys_sendmsg+0x37c/0x4d0 net/socket.c:2494
___sys_sendmsg net/socket.c:2548 [inline]
__sys_sendmmsg+0x269/0x500 net/socket.c:2634
__do_sys_sendmmsg net/socket.c:2663 [inline]
__se_sys_sendmmsg net/socket.c:2660 [inline]
__x64_sys_sendmmsg+0x57/0x60 net/socket.c:2660
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x41/0xc0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x63/0xcd
value changed: 0x184d -> 0x184e
Reported by Kernel Concurrency Sanitizer on:
CPU: 0 PID: 7804 Comm: syz-executor.1 Not tainted 6.5.0-rc6-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 07/26/2023
==================================================================
Fixes: 23f57406b82d ("ipv4: avoid using shared IP generator for connected sockets")
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
veth and vxcan need to make sure the ifindexes of the peer
are not negative, core does not validate this.
Using iproute2 with user-space-level checking removed:
Before:
# ./ip link add index 10 type veth peer index -1
# ip link show
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
2: enp1s0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP mode DEFAULT group default qlen 1000
link/ether 52:54:00:74:b2:03 brd ff:ff:ff:ff:ff:ff
10: veth1@veth0: <BROADCAST,MULTICAST,M-DOWN> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000
link/ether 8a:90:ff:57:6d:5d brd ff:ff:ff:ff:ff:ff
-1: veth0@veth1: <BROADCAST,MULTICAST,M-DOWN> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000
link/ether ae:ed:18:e6:fa:7f brd ff:ff:ff:ff:ff:ff
Now:
$ ./ip link add index 10 type veth peer index -1
Error: ifindex can't be negative.
This problem surfaced in net-next because an explicit WARN()
was added, the root cause is older.
Fixes: e6f8f1a739b6 ("veth: Allow to create peer link with given ifindex")
Fixes: a8f820a380a2 ("can: add Virtual CAN Tunnel driver (vxcan)")
Reported-by: syzbot+5ba06978f34abb058571@syzkaller.appspotmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The user ports use RSGMII, but we don't have that, and DT doesn't
specify a phy interface mode, so phylib defaults to GMII. These support
1G, 100M and 10M with flow control. It is unknown whether asymetric
pause is supported at all speeds.
The CPU port uses MII/GMII/RGMII/REVMII by hardware pin strapping,
and support speeds specific to each, with full duplex only supported
in some modes. Flow control may be supported again by hardware pin
strapping, and theoretically is readable through a register but no
information is given in the datasheet for that.
So, we do a best efforts - and be lenient.
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Reviewed-by: Vladimir Oltean <olteanv@gmail.com>
Reviewed-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Ruan Jinjie says:
====================
net: Fix return value check for fixed_phy_register()
The fixed_phy_register() function returns error pointers and never
returns NULL. Update the checks accordingly.
Changes in v3:
- Drop the error fix patch for fixed_phy_get_gpiod().
- Split the error code update code into another patch set as suggested.
- Update the commit title and message.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The fixed_phy_register() function returns error pointers and never
returns NULL. Update the checks accordingly.
Fixes: b0ba512e25d7 ("net: bcmgenet: enable driver to work without a device tree")
Signed-off-by: Ruan Jinjie <ruanjinjie@huawei.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Acked-by: Doug Berger <opendmb@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The fixed_phy_register() function returns error pointers and never
returns NULL. Update the checks accordingly.
Fixes: c25b23b8a387 ("bgmac: register fixed PHY for ARM BCM470X / BCM5301X chipsets")
Signed-off-by: Ruan Jinjie <ruanjinjie@huawei.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Add RoCE MACsec rules when a gid is added for the MACsec netdevice and
handle their cleanup when the gid is removed or the MACsec SA is deleted.
Also support alias IP for the MACsec device, as long as we don't have
more ips than what the gid table can hold.
In addition handle the case where a gid is added but there are still no
SAs added for the MACsec device, so the rules are added later on when
the SAs are added.
Signed-off-by: Patrisious Haddad <phaddad@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
|
|
Adds all the core steering helper functions that are needed in order
to setup RoCE steering rules which includes both the RX and TX rules
addition and deletion.
As well as exporting the function to be ready to use from the IB driver
where we expose functions to allow deletion of all rules, which is
needed when a GID is deleted, or a deletion of a specific rule when an SA
is deleted, and a similar manner for the rules addition.
These functions are used in a later patch by IB driver to trigger the
rules addition/deletion when needed.
Signed-off-by: Patrisious Haddad <phaddad@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
|
|
Add steering tables/rules to check if the decrypted traffic is RoCEv2,
if so copy reg_b_metadata to a temp reg and forward it to RDMA_RX domain.
The rules are added once the MACsec device is assigned an IP address
where we verify that the packet ip is for MACsec device and that the temp
reg has MACsec operation and a valid SCI inside.
Signed-off-by: Patrisious Haddad <phaddad@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
|
|
Add steering table in RDMA_TX domain, to forward MACsec traffic
to MACsec crypto table in NIC domain.
The tables are created in a lazy manner when the first TX SA is
being created, and destroyed upon the destruction of the last SA.
Signed-off-by: Patrisious Haddad <phaddad@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
|
|
Reorder GID delete code so that the driver del_gid operation is executed
before nullifying the gid attribute ndev parameter, this allows drivers
to access the ndev during their gid delete operation, which makes more
sense since they had access to it during the gid addition operation.
Signed-off-by: Patrisious Haddad <phaddad@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
|
|
Add MACsec flow steering priorities in RDMA namespaces. This allows
adding tables/rules to forward RoCEv2 traffic to the MACsec crypto
tables in NIC_TX domain, and accept RoCEv2 traffic from NIC_RX domain.
Signed-off-by: Patrisious Haddad <phaddad@nvidia.com>
Reviewed-by: Maor Gottlieb <maorg@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
|
|
Handle MACsec IP ambiguity issue, since mlx5 hw can't support
programming both the MACsec and the physical gid when they have the same
IP address, because it wouldn't know to whom to steer the traffic.
Hence in such case we delete the physical gid from the hw gid table,
which would then cause all traffic sent over it to fail, and we'll only
be able to send traffic over the MACsec gid.
Signed-off-by: Patrisious Haddad <phaddad@nvidia.com>
Reviewed-by: Raed Salem <raeds@nvidia.com>
Reviewed-by: Mark Zhang <markzhang@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
|
|
Remove fs_id from the MACsec SA, since it has no real usage there and
instead maintain with the MACsec steering data inside the core.
Downstream patches requires this change to facilitate IB driver accesses
to the fs_ids to avoid RoCE MACsec dependency on EN driver.
Signed-off-by: Patrisious Haddad <phaddad@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
|
|
Since MACsec steering was moved from ethernet private code to core,
remove the netdevice from the MACsec steering, and use core device
methods for error reporting instead.
Signed-off-by: Patrisious Haddad <phaddad@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
|
|
to core
Since now MACsec flow steering (macsec_fs) and MACsec statistics (stats)
are maintained by the core driver, move their data as well to be saved
inside core structures instead of staying part of ethernet MACsec database.
In addition cleanup all MACsec stats functions from the ethernet MACsec
code and move what's needed to be part of macsec_fs instead.
Signed-off-by: Patrisious Haddad <phaddad@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
|
|
naming style
Rename MACsec flow steering(macsec_fs) functions and parameters from
ethernet(core/en_accel) naming convention to core naming convention.
Signed-off-by: Patrisious Haddad <phaddad@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
|
|
Since macsec flow steering was moved to core, it should be independent
of all ethernet code and structures hence we remove all ethernet header
includes and redefine ethernet structs internally for macsec_fs usage
where needed.
Signed-off-by: Patrisious Haddad <phaddad@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
|
|
Move MACsec flow steering operations(macsec_fs) from core/en_accel to
core/lib, this mandates moving MACsec statistics structure from the
general MACsec code header(en_accel/macsec.h) to macsec_fs header to
remove macsec_fs.h dependency over en_accel/macsec.h.
This to lay the ground for RoCE MACsec by moving all the data
that will need to be accessed by both ethernet MACsec and
RoCE MACsec to be shared at core.
Signed-off-by: Patrisious Haddad <phaddad@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
|
|
Given a macsec net_device add two functions to return the real net_device
for that device, and check if that macsec device is offloaded or not.
This is needed for auxiliary drivers that implement MACsec offload, but
have flows which are triggered over the macsec net_device, this allows
the drivers in such cases to verify if the device is offloaded or not,
and to access the real device of that macsec device, which would
belong to the driver, and would be needed for the offload procedure.
Signed-off-by: Patrisious Haddad <phaddad@nvidia.com>
Reviewed-by: Raed Salem <raeds@nvidia.com>
Reviewed-by: Mark Zhang <markzhang@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty
Pull tty/serial fixes from Greg KH:
"Here are some small tty and serial core fixes for 6.5-rc7 that resolve
a lot of reported issues.
Primarily in here are the fixes for the serial bus code from Tony that
came in -rc1, as it hit wider testing with the huge number of
different types of systems and serial ports. All of the reported
issues with duplicate names and other issues with this code are now
resolved.
Other than that included in here is:
- n_gsm fix for a previous fix
- 8250 lockdep annotation fix
- fsl_lpuart serial driver fix
- TIOCSTI documentation update for previous CAP_SYS_ADMIN change
All of these have been in linux-next for a while with no reported
problems"
* tag 'tty-6.5-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty:
serial: core: Fix serial core port id, including multiport devices
serial: 8250: drop lockdep annotation from serial8250_clear_IER()
tty: n_gsm: fix the UAF caused by race condition in gsm_cleanup_mux
serial: core: Revert port_id use
TIOCSTI: Document CAP_SYS_ADMIN behaviour in Kconfig
serial: 8250: Fix oops for port->pm on uart_change_pm()
serial: 8250: Reinit port_id when adding back serial8250_isa_devs
serial: core: Fix kmemleak issue for serial core device remove
MAINTAINERS: Merge TTY layer and serial drivers
serial: core: Fix serial_base_match() after fixing controller port name
serial: core: Fix serial core controller port name to show controller id
serial: core: Fix serial core port id to not use port->line
serial: core: Controller id cannot be negative
tty: serial: fsl_lpuart: Clear the error flags by writing 1 for lpuart32 platforms
|
|
Pull rust fix from Miguel Ojeda:
- Macros: fix 'HAS_*' redefinition by the '#[vtable]' macro
under conditional compilation
* tag 'rust-fixes-6.5-rc7' of https://github.com/Rust-for-Linux/linux:
rust: macros: vtable: fix `HAS_*` redefinition (`gen_const_name`)
|
|
The DISPLAY_CLEAR command on the NewHaven NHD-0220DZW-AG5 display
does NOT change the DDRAM address to 00h (home position) like the
standard Hitachi HD44780 controller. As a consequence, the starting
position of the initial string LCD_INIT_TEXT is not guaranteed to be
at 0,0 depending on where the cursor was before the DISPLAY_CLEAR
command.
Extract of DISPLAY_CLEAR command from datasheets of:
Hitachi HD44780:
... It then sets DDRAM address 0 into the address counter...
NewHaven NHD-0220DZW-AG5 datasheet:
... This instruction does not change the DDRAM Address
Move the cursor home after sending DISPLAY_CLEAR command to support
non-standard LCDs.
Signed-off-by: Hugo Villeneuve <hvilleneuve@dimonoff.com>
Reviewed-by: Geert Uytterhoeven <geert@linux-m68k.org>
Tested-by: David Reaver <me@davidreaver.com>
Link: https://lore.kernel.org/r/20230722180925.1408885-1-hugo@hugovil.com
Signed-off-by: Miguel Ojeda <ojeda@kernel.org>
|
|
Ruan Jinjie says:
====================
net: Update and fix return value check for vcap_get_rule()
As Simon Horman suggests, update vcap_get_rule() to always
return an ERR_PTR() and update the error detection conditions to
use IS_ERR(), which would be more cleaner.
So se IS_ERR() to update the return value and fix the issue
in lan966x_ptp_add_trap().
Changes in v2:
- Update vcap_get_rule() to always return an ERR_PTR().
- Update the return value fix in lan966x_ptp_add_trap().
- Update the return value check in sparx5_tc_free_rule_resources().
====================
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
As Simon Horman suggests, update vcap_get_rule() to always
return an ERR_PTR() and update the error detection conditions to
use IS_ERR(), so use IS_ERR() to check the return value.
Signed-off-by: Ruan Jinjie <ruanjinjie@huawei.com>
Suggested-by: Simon Horman <horms@kernel.org>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
As Simon Horman suggests, update vcap_get_rule() to always
return an ERR_PTR() and update the error detection conditions to
use IS_ERR(), so use IS_ERR() to fix the return value issue.
Fixes: 72df3489fb10 ("net: lan966x: Add ptp trap rules")
Signed-off-by: Ruan Jinjie <ruanjinjie@huawei.com>
Suggested-by: Simon Horman <horms@kernel.org>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
As Simon Horman suggests, update vcap_get_rule() to always
return an ERR_PTR() and update the error detection conditions to
use IS_ERR(), which would be more cleaner in this case.
Signed-off-by: Ruan Jinjie <ruanjinjie@huawei.com>
Suggested-by: Simon Horman <horms@kernel.org>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Since commit 91a7cda1f4b8 ("net: phy: Fix race condition on link status
change") all the phy_error() method invocations have been causing the
nested-mutex-lock deadlock because it's normally done in the PHY-driver
threaded IRQ handlers which since that change have been called with the
phydev->lock mutex held. Here is the calls thread:
IRQ: phy_interrupt()
+-> mutex_lock(&phydev->lock); <--------------------+
drv->handle_interrupt() | Deadlock due
+-> ERROR: phy_error() + to the nested
+-> phy_process_error() | mutex lock
+-> mutex_lock(&phydev->lock); <-+
phydev->state = PHY_ERROR;
mutex_unlock(&phydev->lock);
mutex_unlock(&phydev->lock);
The problem can be easily reproduced just by calling phy_error() from any
PHY-device threaded interrupt handler. Fix it by dropping the phydev->lock
mutex lock from the phy_process_error() method and printing a nasty error
message to the system log if the mutex isn't held in the caller execution
context.
Note for the fix to work correctly in the PHY-subsystem itself the
phydev->lock mutex locking must be added to the phy_error_precise()
function.
Link: https://lore.kernel.org/netdev/20230816180944.19262-1-fancer.lancer@gmail.com
Fixes: 91a7cda1f4b8 ("net: phy: Fix race condition on link status change")
Suggested-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Serge Semin <fancer.lancer@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
xgene_mdio_status is declared static, and is only written once by the
driver. It appears to have been this way since the driver was first
added to the kernel tree. No other users can be found, so let's remove
it.
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Currently any error during render leads to output an empty file.
That is quite annoying when using tools/net/ynl/ynl-regen.sh
which git greps files with content of "YNL-GEN.." and therefore ignores
empty files. So once you fail to regen, you have to checkout the file.
Avoid that by rendering to a temporary file first, only at the end
copy the content to the actual destination.
Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Reviewed-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
All captured timestamps should be corrected by PHY, MAC and CDC introduced
latency/errors. The CDC correction is already used. Enable MAC propagation delay
correction as well which is available since commit 26cfb838aa00 ("net: stmmac:
correct MAC propagation delay").
Before:
|ptp4l[390.458]: rms 7 max 21 freq +177 +/- 14 delay 357 +/- 1
After:
|ptp4l[620.012]: rms 7 max 20 freq +195 +/- 14 delay 345 +/- 1
Tested on Intel Elkhart Lake.
Signed-off-by: Kurt Kanzenbach <kurt@linutronix.de>
Reviewed-by: Johannes Zink <j.zink@pengutronix.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Handle extended compliance code 0x1 (SFF8024_ECC_100G_25GAUI_C2M_AOC)
for active optical cables supporting 25G and 100G speeds.
Since the specification makes no statement about transmitter range, and
as the specific sfp module that had been tested features only 2m fiber -
short-range (SR) modes are selected.
The 100G speed is irrelevant because it would require multiple fibers /
multiple SFP28 modules combined under one netdev.
sfp-bus.c only handles a single module per netdev, so only 25Gbps modes
are selected.
sfp_parse_support already handles SFF8024_ECC_100GBASE_SR4_25GBASE_SR
with compatible properties, however that entry is a contradiction in
itself since with SFP(28) 100GBASE_SR4 is impossible - that would likely
be a mode for qsfp modules only.
Add a case for SFF8024_ECC_100G_25GAUI_C2M_AOC selecting 25gbase-r
interface mode and 25000baseSR link mode.
Also enforce SFP28 bitrate limits on the values read from sfp eeprom as
requested by Russell King.
Tested with fs.com S28-AO02 AOC SFP28 module.
Signed-off-by: Josua Mayer <josua@solid-run.com>
Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux
Pull i2c fixes from Wolfram Sang:
"Usual set of driver fixes. A bit more than usual because I was
unavailable for a while"
* tag 'i2c-for-6.5-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux:
i2c: bcm-iproc: Fix bcm_iproc_i2c_isr deadlock issue
i2c: Update documentation to use .probe() again
i2c: sun6i-p2wi: Fix an error message in probe()
i2c: hisi: Only handle the interrupt of the driver's transfer
i2c: tegra: Fix i2c-tegra DMA config option processing
i2c: tegra: Fix failure during probe deferral cleanup
i2c: designware: Handle invalid SMBus block data response length value
i2c: designware: Correct length byte validation logic
i2c: imx-lpi2c: return -EINVAL when i2c peripheral clk doesn't work
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux
Pull btrfs fixes from David Sterba:
- fix infinite loop in readdir(), could happen in a big directory when
files get renamed during enumeration
- fix extent map handling of skipped pinned ranges
- fix a corner case when handling ordered extent length
- fix a potential crash when balance cancel races with pause
- verify correct uuid when starting scrub or device replace
* tag 'for-6.5-rc6-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
btrfs: fix incorrect splitting in btrfs_drop_extent_map_range
btrfs: fix BUG_ON condition in btrfs_cancel_balance
btrfs: only subtract from len_to_oe_boundary when it is tracking an extent
btrfs: fix replace/scrub failure with metadata_uuid
btrfs: fix infinite directory reads
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/deller/linux-fbdev
Pull fbdev fixes and cleanups from Helge Deller:
- various code cleanups in amifb, atmel_lcdfb, ssd1307fb, kyro and
goldfishfb
* tag 'fbdev-for-6.5-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/linux-fbdev:
fbdev: goldfishfb: Do not check 0 for platform_get_irq()
fbdev: atmel_lcdfb: Remove redundant of_match_ptr()
fbdev: kyro: Remove unused declarations
fbdev: ssd1307fb: Print the PWM's label instead of its number
fbdev: mmp: fix value check in mmphw_probe()
fbdev: amifb: Replace zero-length arrays with DECLARE_FLEX_ARRAY() helper
|
|
Pull block fixes from Jens Axboe:
"Main thing here is the fix for the regression in flush handling which
caused IO hangs/stalls for a few reporters. Hopefully that should all
be sorted out now. Outside of that, just a few minor fixes for issues
that were introduced in this cycle"
* tag 'block-6.5-2023-08-19' of git://git.kernel.dk/linux:
blk-mq: release scheduler resource when request completes
blk-crypto: dynamically allocate fallback profile
blk-cgroup: hold queue_lock when removing blkg->q_node
drivers/rnbd: restore sysfs interface to rnbd-client
|
|
skb_queue_purge() and __skb_queue_purge() become wrappers
around the new generic functions.
New SKB_DROP_REASON_QUEUE_PURGE drop reason is added,
but users can start adding more specific reasons.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
On server-initiated disconnect, rpcrdma_xprt_disconnect() was DMA-
unmapping the Receive buffers, but rpcrdma_post_recvs() neglected
to remap them after a new connection had been established. The
result was immediate failure of the new connection with the Receives
flushing with LOCAL_PROT_ERR.
Fixes: 671c450b6fe0 ("xprtrdma: Fix oops in Receive handler after device removal")
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
|
|
Another highly rare error case when a page allocating loop (inside
__nfs4_get_acl_uncached, this time) is not properly unwound on error.
Since pages array is allocated being uninitialized, need to free only
lower array indices. NULL checks were useful before commit 62a1573fcf84
("NFSv4 fix acl retrieval over krb5i/krb5p mounts") when the array had
been initialized to zero on stack.
Found by Linux Verification Center (linuxtesting.org).
Fixes: 62a1573fcf84 ("NFSv4 fix acl retrieval over krb5i/krb5p mounts")
Signed-off-by: Fedor Pchelkin <pchelkin@ispras.ru>
Reviewed-by: Benjamin Coddington <bcodding@redhat.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
|
|
There is a slight issue with error handling code inside
nfs42_proc_getxattr(). If page allocating loop fails then we free the
failing page array element which is NULL but __free_page() can't deal with
NULL args.
Found by Linux Verification Center (linuxtesting.org).
Fixes: a1f26739ccdc ("NFSv4.2: improve page handling for GETXATTR")
Signed-off-by: Fedor Pchelkin <pchelkin@ispras.ru>
Reviewed-by: Benjamin Coddington <bcodding@redhat.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
|
|
Free the formatted server index string after it has been duplicated by
kobject_rename().
Fixes: 1c7251187dc0 ("NFS: add superblock sysfs entries")
Reported-by: Alexander Aring <aahringo@redhat.com>
Signed-off-by: Benjamin Coddington <bcodding@redhat.com>
Tested-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
|
|
Chuck reported [1] an IO hang problem on NFS exports that reside on SATA
devices and bisected to commit 615939a2ae73 ("blk-mq: defer to the normal
submission path for post-flush requests").
We analysed the IO hang problem, found there are two postflush requests
waiting for each other.
The first postflush request completed the REQ_FSEQ_DATA sequence, so go to
the REQ_FSEQ_POSTFLUSH sequence and added in the flush pending list, but
failed to blk_kick_flush() because of the second postflush request which
is inflight waiting in scheduler queue.
The second postflush waiting in scheduler queue can't be dispatched because
the first postflush hasn't released scheduler resource even though it has
completed by itself.
Fix it by releasing scheduler resource when the first postflush request
completed, so the second postflush can be dispatched and completed, then
make blk_kick_flush() succeed.
While at it, remove the check for e->ops.finish_request, as all
schedulers set that. Reaffirm this requirement by adding a WARN_ON_ONCE()
at scheduler registration time, just like we do for insert_requests and
dispatch_request.
[1] https://lore.kernel.org/all/7A57C7AE-A51A-4254-888B-FE15CA21F9E9@oracle.com/
Link: https://lore.kernel.org/linux-block/20230819031206.2744005-1-chengming.zhou@linux.dev/
Reported-by: kernel test robot <oliver.sang@intel.com>
Closes: https://lore.kernel.org/oe-lkp/202308172100.8ce4b853-oliver.sang@intel.com
Fixes: 615939a2ae73 ("blk-mq: defer to the normal submission path for post-flush requests")
Reported-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Chengming Zhou <zhouchengming@bytedance.com>
Tested-by: Chuck Lever <chuck.lever@oracle.com>
Link: https://lore.kernel.org/r/20230813152325.3017343-1-chengming.zhou@linux.dev
[axboe: folded in incremental fix and added tags]
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Guangguan Wang says:
====================
net/smc: several features's implementation for smc v2.1
This patch set implement several new features in SMC v2.1(https://
www.ibm.com/support/pages/node/7009315), including vendor unique
experimental options, max connections per lgr negotiation, max links
per lgr negotiation.
v1 - v2:
- rename field fce_v20 to fce_v2_base in struct
smc_clc_first_contact_ext_v2x
- use smc_get_clc_first_contact_ext in smc_connect
_rdma_v2_prepare
- adding comment about field vendor_oui in struct
smc_clc_msg_smcd
- remove comment about SMC_CONN_PER_LGR_MAX in smc_
clc_srv_v2x_features_validate
- rename smc_clc_clnt_v2x_features_validate
RFC v2 - v1:
- more description in commit message
- modify SMC_CONN_PER_LGR_xxx and SMC_LINKS_ADD_LNK_xxx
macro defination and usage
- rename field release_ver to release_nr
- remove redundant release version check in client
- explicitly set the rc value in smc_llc_cli/srv_add_link
RFC v1 - RFC v2:
- Remove ini pointer NULL check and fix code style in
smc_clc_send_confirm_accept.
- Optimize the max_conns check in smc_clc_xxx_v2x_features_validate.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Add SMC_NLA_LGR_R_V2_MAX_CONNS and SMC_NLA_LGR_R_V2_MAX_LINKS
to SMCR v2 linkgroup netlink attribute SMC_NLA_LGR_R_V2 for
linkgroup's detail info showing.
Signed-off-by: Guangguan Wang <guangguan.wang@linux.alibaba.com>
Reviewed-by: Jan Karcher <jaka@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Support max links per lgr negotiation in clc handshake for SMCR v2.1,
which is one of smc v2.1 features. Server makes decision for the final
value of max links based on the client preferred max links and
self-preferred max links. Here use the minimum value of the client
preferred max links and server preferred max links.
Client Server
Proposal(max links(client preferred))
-------------------------------------->
Accept(max links(accepted value))
accepted value=min(client preferred, server preferred)
<-------------------------------------
Confirm(max links(accepted value))
------------------------------------->
Signed-off-by: Guangguan Wang <guangguan.wang@linux.alibaba.com>
Reviewed-by: Tony Lu <tonylu@linux.alibaba.com>
Reviewed-by: Jan Karcher <jaka@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Support max connections per lgr negotiation for SMCR v2.1,
which is one of smc v2.1 features. Server makes decision for
the final value of max conns based on the client preferred
max conns and self-preferred max conns. Here use the minimum
value of client preferred max conns and server preferred max
conns.
Client Server
Proposal(max conns(client preferred))
------------------------------------>
Accept(max conns(accepted value))
accepted value=min(client preferred, server preferred)
<-----------------------------------
Confirm(max conns(accepted value))
----------------------------------->
Signed-off-by: Guangguan Wang <guangguan.wang@linux.alibaba.com>
Reviewed-by: Tony Lu <tonylu@linux.alibaba.com>
Reviewed-by: Jan Karcher <jaka@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Support SMC v2.x features validate for SMC v2.1. This is the frame
code for SMC v2.x features validate, and will take effects only when
the negotiated release version is v2.1 or later.
For Server, v2.x features' validation should be done in smc_clc_srv_
v2x_features_validate when receiving v2.1 or later CLC Proposal Message,
such as max conns, max links negotiation, the decision of the final
value of max conns and max links should be made in this function.
And final check for server when receiving v2.1 or later CLC Confirm
Message should be done in smc_clc_v2x_features_confirm_check.
For client, v2.x features' validation should be done in smc_clc_clnt_
v2x_features_validate when receiving v2.1 or later CLC Accept Message,
for example, the decision to accpt the accepted value or to decline
should be made in this function.
Signed-off-by: Guangguan Wang <guangguan.wang@linux.alibaba.com>
Reviewed-by: Tony Lu <tonylu@linux.alibaba.com>
Reviewed-by: Jan Karcher <jaka@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Add vendor unique experimental options area in clc handshake. In clc
accept and confirm msg, vendor unique experimental options use the
16-Bytes reserved field, which defined in struct smc_clc_fce_gid_ext
in previous version. Because of the struct smc_clc_first_contact_ext
is widely used and limit the scope of modification, this patch moves
the 16-Bytes reserved field out of struct smc_clc_fce_gid_ext, and
followed with the struct smc_clc_first_contact_ext in a new struct
names struct smc_clc_first_contact_ext_v2x.
For SMC-R first connection, in previous version, the struct smc_clc_
first_contact_ext and the 16-Bytes reserved field has already been
included in clc accept and confirm msg. Thus, this patch use struct
smc_clc_first_contact_ext_v2x instead of the struct smc_clc_first_
contact_ext and the 16-Bytes reserved field in SMC-R clc accept and
confirm msg is compatible with previous version.
For SMC-D first connection, in previous version, only the struct smc_
clc_first_contact_ext is included in clc accept and confirm msg, and
the 16-Bytes reserved field is not included. Thus, when the negotiated
smc release version is the version before v2.1, we still use struct
smc_clc_first_contact_ext for compatible consideration. If the negotiated
smc release version is v2.1 or later, use struct smc_clc_first_contact_
ext_v2x instead.
Signed-off-by: Guangguan Wang <guangguan.wang@linux.alibaba.com>
Reviewed-by: Tony Lu <tonylu@linux.alibaba.com>
Reviewed-by: Jan Karcher <jaka@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Support smc release version negotiation in clc handshake based on
SMC v2, where no negotiation process for different releases, but
for different versions. The latest smc release version was updated
to v2.1. And currently there are two release versions of SMCv2, v2.0
and v2.1. In the release version negotiation, client sends the preferred
release version by CLC Proposal Message, server makes decision for which
release version to use based on the client preferred release version and
self-supported release version (here choose the minimum release version
of the client preferred and server latest supported), then the decision
returns to client by CLC Accept Message. Client confirms the decision by
CLC Confirm Message.
Client Server
Proposal(preferred release version)
------------------------------------>
Accept(accpeted release version)
min(client preferred, server latest supported)
<------------------------------------
Confirm(accpeted release version)
------------------------------------>
Signed-off-by: Guangguan Wang <guangguan.wang@linux.alibaba.com>
Reviewed-by: Tony Lu <tonylu@linux.alibaba.com>
Reviewed-by: Jan Karcher <jaka@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|