Age | Commit message (Collapse) | Author |
|
Add a drop reason for packets that are dropped because an action
returns a non-zero error code.
Acked-by: Aaron Conole <aconole@redhat.com>
Signed-off-by: Adrian Moreno <amorenoz@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Create a new drop reason subsystem for openvswitch and add the first
drop reason to represent last-action drops.
Last-action drops happen when a flow has an empty action list or there
is no action that consumes the packet (output, userspace, recirc, etc).
It is the most common way in which OVS drops packets.
Implementation-wise, most of these skb-consuming actions already call
"consume_skb" internally and return directly from within the
do_execute_actions() loop so with minimal changes we can assume that
any skb that exits the loop normally is a packet drop.
Signed-off-by: Adrian Moreno <amorenoz@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Matthieu Baerts says:
====================
mptcp: get rid of msk->subflow
The MPTCP protocol maintains an additional struct socket per connection,
mainly to be able to easily use tcp-level struct socket operations.
This leads to several side effects, beyond the quite unfortunate /
confusing 'subflow' field name:
- active and passive sockets behaviour is inconsistent: only active ones
have a not NULL msk->subflow, leading to different error handling and
different error code returned to the user-space in several places.
- active sockets uses an unneeded, larger amount of memory
- passive sockets can't successfully go through accept(), disconnect(),
accept() sequence, see [1] for more details.
The 13 first patches of this series are from Paolo and address all the
above, finally getting rid of the blamed field:
- The first patch is a minor clean-up.
- In the next 11 patches, msk->subflow usage is systematically removed
from the MPTCP protocol, replacing it with direct msk->first usage,
eventually introducing new core helpers when needed.
- The 13th patch finally disposes the field, and it's the only patch in
the series intended to produce functional changes.
The last and 14th patch is from Kuniyuki and it is not linked to the
previous ones: it is a small clean-up to get rid of an unnecessary check
in mptcp_init_sock().
[1] https://github.com/multipath-tcp/mptcp_net-next/issues/290
====================
Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net>
|
|
__mptcp_init_sock() always returns 0 because mptcp_init_sock() used
to return the value directly.
But after commit 18b683bff89d ("mptcp: queue data for mptcp level
retransmission"), __mptcp_init_sock() need not return value anymore.
Let's remove the unnecessary test for __mptcp_init_sock() and make
it return void.
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Reviewed-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Such field is now unused just as a flag to control the first subflow
deletion at close() time. Introduce a new bit flag for that and finally
drop the mentioned field.
As an intended side effect, now the first subflow sock is not freed
before close() even for passive sockets. The msk has no open/active
subflows if the first one is closed and the subflow list is singular,
update accordingly the state check in mptcp_stream_accept().
Among other benefits, the subflow removal, reduces the amount of memory
used on the client side for each mptcp connection, allows passive sockets
to go through successful accept()/disconnect()/connect() and makes return
error code consistent for failing both passive and active sockets.
Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/290
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Reviewed-by: Mat Martineau <martineau@kernel.org>
Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
After the previous patch the __mptcp_nmpc_socket helper is used
only to ensure that the MPTCP socket is a suitable status - that
is, the mptcp capable handshake is not started yet.
Change the return value to the relevant subflow sock, to finally
remove the last references to first subflow socket in the MPTCP stack.
As a bonus, we can get rid of a few local variables in different
functions.
No functional change intended.
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Reviewed-by: Mat Martineau <martineau@kernel.org>
Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
This is one of the few remaining spots actually manipulating the
first subflow socket. We can leverage the recently introduced
inet helpers to get rid of ssock there.
No functional changes intended.
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Reviewed-by: Mat Martineau <martineau@kernel.org>
Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The mptcp sockopt infrastructure unneedly uses the first subflow
socket struct in a few spots. We are going to remove such field
soon, so use directly the first subflow sock instead.
No functional changes intended.
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Reviewed-by: Mat Martineau <martineau@kernel.org>
Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
We are going to remove the first subflow socket soon, so avoid
the additional indirection at accept() time. Instead access
directly the first subflow sock, and update mptcp_accept() to
operate on it. This allows dropping a duplicated check in
mptcp_accept().
No functional changes intended.
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Reviewed-by: Mat Martineau <martineau@kernel.org>
Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
We are going to remove the first subflow socket soon, so avoid
the additional indirection at poll() time. Instead access
directly the first subflow sock.
No functional changes intended.
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Reviewed-by: Mat Martineau <martineau@kernel.org>
Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
We are going to remove the first subflow socket soon, so avoid
the additional indirection via at listen() time. Instead call
directly the recently introduced helper on the first subflow sock.
No functional changes intended.
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Reviewed-by: Mat Martineau <martineau@kernel.org>
Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The mptcp protocol maintains an additional socket just to easily
invoke a few stream operations on the first subflow. One of them
is inet_listen().
Factor out an helper operating directly on the (locked) struct sock,
to allow get rid of the above dependency in the next patch without
duplicating the existing code.
No functional changes intended.
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Acked-by: Mat Martineau <martineau@kernel.org>
Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
We are going to remove the first subflow socket soon, so avoid
the additional indirection via at bind() time. Instead call directly
the recently introduced helpers on the first subflow sock.
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Reviewed-by: Mat Martineau <martineau@kernel.org>
Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The mptcp protocol maintains an additional socket just to easily
invoke a few stream operations on the first subflow. One of
them is bind().
Factor out the helpers operating directly on the struct sock, to
allow get rid of the above dependency in the next patch without
duplicating the existing code.
No functional changes intended.
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Acked-by: Mat Martineau <martineau@kernel.org>
Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
We are going to remove the first subflow socket soon, so avoid
accessing it in mptcp_get_port(). Instead, access directly the
first subflow sock.
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Reviewed-by: Mat Martineau <martineau@kernel.org>
Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The mptcp protocol maintains an additional socket just to easily
invoke a few stream operations on the first subflow. One of them is
__inet_stream_connect().
We are going to remove the first subflow socket soon, so avoid
the additional indirection via at connect time, calling directly
into the sock-level connect() ops.
The sk-level connect never return -EINPROGRESS, cleanup the error
path accordingly. Additionally, the ssk status on error is always
TCP_CLOSE. Avoid unneeded access to the subflow sk state.
No functional change intended.
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Reviewed-by: Mat Martineau <martineau@kernel.org>
Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The MPTCP protocol currently clears the msk token both at connect() and
listen() time. That is needed to deal with failing connect() calls that
can create a new token while leaving the sk in TCP_CLOSE,SS_UNCONNECTED
status and thus allowing later connect() and/or listen() calls.
Let's deal with such failures explicitly, cleaning the token in a timely
manner and avoid the confusing early mptcp_token_destroy().
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Reviewed-by: Mat Martineau <martineau@kernel.org>
Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild
Pull Kbuild fixes from Masahiro Yamada:
- Clear errno before calling getline()
- Fix a modpost warning for ARCH=alpha
* tag 'kbuild-fixes-v6.5-2' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild:
alpha: remove __init annotation from exported page_is_ram()
scripts/kallsyms: Fix build failure by setting errno before calling getline()
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86
Pull x86 platform drivers fixes from Hans de Goede:
- lenovo-ymc driver causes keyboard + touchpad to not work with >= 6.4
on some Thinkbook models, fix this
- A set of small fixes for mlx-platform
- Other small fixes and hw-id additions
* tag 'platform-drivers-x86-v6.5-4' of git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86:
platform/x86: lenovo-ymc: Only bind on machines with a convertible DMI chassis-type
platform: mellanox: Change register offset addresses
platform: mellanox: mlx-platform: Modify graceful shutdown callback and power down mask
platform: mellanox: mlx-platform: Fix signals polarity and latch mask
platform: mellanox: Fix order in exit flow
platform/x86: ISST: Reduce noise for missing numa information in logs
platform/x86: msi-ec: Fix the build
ACPI: scan: Create platform device for CS35L56
platform/x86/amd/pmf: Fix unsigned comparison with less than zero
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi
Pull SCSI fixes from James Bottomley:
"Eleven small fixes, ten in drivers.
Of the two fixes marked core, one is in the raid helper class (used by
some raid device drivers) and the other one is the /proc/scsi/scsi
parsing fix for potential reads beyond the end of the buffer"
* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
scsi: qedf: Fix firmware halt over suspend and resume
scsi: qedi: Fix firmware halt over suspend and resume
scsi: qedi: Fix potential deadlock on &qedi_percpu->p_work_lock
scsi: lpfc: Remove reftag check in DIF paths
scsi: ufs: renesas: Fix private allocation
scsi: snic: Fix possible memory leak if device_add() fails
scsi: core: Fix possible memory leak if device_add() fails
scsi: core: Fix legacy /proc parsing buffer overflow
scsi: 53c700: Check that command slot is not NULL
scsi: fnic: Replace return codes in fnic_clean_pending_aborts()
scsi: storvsc: Fix handling of virtual Fibre Channel timeouts
|
|
Commit db3685b4046f ("net: remove obsolete members from struct net")
removed the uses of struct list_head from this header, without removing
the corresponding included header.
Signed-off-by: Jörn-Thorben Hinz <jthinz@mailbox.tu-berlin.de>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth-next
bluetooth-next pull request for net-next:
- Add new VID/PID for Mediatek MT7922
- Add support multiple BIS/BIG
- Add support for Intel Gale Peak
- Add support for Qualcomm WCN3988
- Add support for BT_PKT_STATUS for ISO sockets
- Various fixes for experimental ISO support
- Load FW v2 for RTL8852C
- Add support for NXP AW693 chipset
- Add support for Mediatek MT2925
|
|
chassis-type
The lenovo-ymc driver is causing the keyboard + touchpad to stop working
on some regular laptop models such as the Lenovo ThinkBook 13s G2 ITL 20V9.
The problem is that there are YMC WMI GUID methods in the ACPI tables
of these laptops, despite them not being Yogas and lenovo-ymc loading
causes libinput to see a SW_TABLET_MODE switch with state 1.
This in turn causes libinput to ignore events from the builtin keyboard
and touchpad, since it filters those out for a Yoga in tablet mode.
Similar issues with false-positive SW_TABLET_MODE=1 reporting have
been seen with the intel-hid driver.
Copy the intel-hid driver approach to fix this and only bind to the WMI
device on machines where the DMI chassis-type indicates the machine
is a convertible.
Add a 'force' module parameter to allow overriding the chassis-type check
so that users can easily test if the YMC interface works on models which
report an unexpected chassis-type.
Fixes: e82882cdd241 ("platform/x86: Add driver for Yoga Tablet Mode switch")
Link: https://bugzilla.redhat.com/show_bug.cgi?id=2229373
Cc: André Apitzsch <git@apitzsch.eu>
Cc: stable@vger.kernel.org
Tested-by: Andrew Kallmeyer <kallmeyeras@gmail.com>
Tested-by: Gergő Köteles <soyer@irl.hu>
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Link: https://lore.kernel.org/r/20230812144818.383230-1-hdegoede@redhat.com
|
|
Move debug register offsets to different location due to hardware changes.
Fixes: dd635e33b5c9 ("platform: mellanox: Introduce support of new Nvidia L1 switch")
Signed-off-by: Vadim Pasternak <vadimp@nvidia.com>
Reviewed-by: Michael Shych <michaelsh@nvidia.com>
Link: https://lore.kernel.org/r/20230813083735.39090-5-vadimp@nvidia.com
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
|
|
power down mask
Use kernel_power_off() instead of kernel_halt() to pass through
machine_power_off() -> pm_power_off(), otherwise axillary power does
not go off.
Change "power down" bitmask.
Fixes: dd635e33b5c9 ("platform: mellanox: Introduce support of new Nvidia L1 switch")
Signed-off-by: Vadim Pasternak <vadimp@nvidia.com>
Reviewed-by: Michael Shych <michaelsh@nvidia.com>
Link: https://lore.kernel.org/r/20230813083735.39090-4-vadimp@nvidia.com
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
|
|
Change polarity of chassis health and power signals and fix latch reset
mask for L1 switch.
Fixes: dd635e33b5c9 ("platform: mellanox: Introduce support of new Nvidia L1 switch")
Signed-off-by: Vadim Pasternak <vadimp@nvidia.com>
Reviewed-by: Michael Shych <michaelsh@nvidia.com>
Link: https://lore.kernel.org/r/20230813083735.39090-3-vadimp@nvidia.com
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
|
|
Fix exit flow order: call mlxplat_post_exit() after
mlxplat_i2c_main_exit() in order to unregister main i2c driver before
to "mlxplat" driver.
Fixes: 0170f616f496 ("platform: mellanox: Split initialization procedure")
Signed-off-by: Vadim Pasternak <vadimp@nvidia.com>
Reviewed-by: Michael Shych <michaelsh@nvidia.com>
Link: https://lore.kernel.org/r/20230813083735.39090-2-vadimp@nvidia.com
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
|
|
fixed-link mode
lynx_pcs_link_up_sgmii() is supposed to update the PCS speed and duplex
for the non-inband operating modes, and prior to the blamed commit, it
did just that, but a mistake sneaked into the conversion and reversed
the condition.
It is easy for this to go undetected on platforms that also initialize
the PCS in the bootloader, because Linux doesn't reset it (although
maybe it should). The nature of the bug is that phylink will not touch
the IF_MODE_HALF_DUPLEX | IF_MODE_SPEED_MSK fields when it should, and
it will apparently keep working if the previous values set by the
bootloader were correct.
Fixes: c689a6528c22 ("net: pcs: lynx: update PCS driver to use neg_mode")
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Zheng Zengkai says:
====================
net: Use pci_dev_id() to simplify the code
PCI core API pci_dev_id() can be used to get the BDF number for a pci
device. Use the API to simplify the code.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
PCI core API pci_dev_id() can be used to get the BDF number for a pci
device. We don't need to compose it manually. Use pci_dev_id() to
simplify the code a little bit.
Signed-off-by: Zheng Zengkai <zhengzengkai@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
PCI core API pci_dev_id() can be used to get the BDF number for a pci
device. We don't need to compose it manually. Use pci_dev_id() to
simplify the code a little bit.
Signed-off-by: Zheng Zengkai <zhengzengkai@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
PCI core API pci_dev_id() can be used to get the BDF number for a pci
device. We don't need to compose it manually. Use pci_dev_id() to
simplify the code a little bit.
Signed-off-by: Zheng Zengkai <zhengzengkai@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
PCI core API pci_dev_id() can be used to get the BDF number for a pci
device. We don't need to compose it manually. Use pci_dev_id() to
simplify the code a little bit.
Signed-off-by: Zheng Zengkai <zhengzengkai@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
PCI core API pci_dev_id() can be used to get the BDF number for a pci
device. We don't need to compose it manually. Use pci_dev_id() to
simplify the code a little bit.
Signed-off-by: Zheng Zengkai <zhengzengkai@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Commit 675ad47375c7 ("e1000: Use netdev_<level>, pr_<level> and dev_<level>")
declared but never implemented e1000_get_hw_dev_name().
Commit 1532ecea1deb ("e1000: drop dead pcie code from e1000")
removed e1000_check_mng_mode()/e1000_blink_led_start() but not the declarations.
Commit c46b59b241ec ("e1000: Remove unused function e1000_mta_set.")
removed e1000_mta_set() but not its declaration.
Signed-off-by: Yue Haibing <yuehaibing@huawei.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Uwe reports:
"Most PHYs signal WoL using an interrupt. So disabling interrupts [at
shutdown] breaks WoL at least on PHYs covered by the marvell driver."
Discussing with Ioana, the problem which was trying to be solved was:
"The board in question is a LS1021ATSN which has two AR8031 PHYs that
share an interrupt line. In case only one of the PHYs is probed and
there are pending interrupts on the PHY#2 an IRQ storm will happen
since there is no entity to clear the interrupt from PHY#2's registers.
PHY#1's driver will get stuck in .handle_interrupt() indefinitely."
Further confirmation that "the two AR8031 PHYs are on the same MDIO
bus."
With WoL using interrupts to wake the system, in such a case, the
system will begin booting with an asserted interrupt. Thus, we need to
cope with an interrupt asserted during boot.
Solve this instead by disabling interrupts during PHY probe. This will
ensure in Ioana's situation that both PHYs of the same type sharing an
interrupt line on a common MDIO bus will have their interrupt outputs
disabled when the driver probes the device, but before we hook in any
interrupt handlers - thus avoiding the interrupt storm.
A better fix would be for platform firmware to disable the interrupting
devices at source during boot, before control is handed to the kernel.
Fixes: e2f016cf7751 ("net: phy: add a shutdown procedure")
Link: 20230804071757.383971-1-u.kleine-koenig@pengutronix.de
Reported-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Commit 39de8281791c ("RDS: Main header file") declared but never implemented
rds_trans_init() and rds_trans_exit(), remove it.
Commit d37c9359056f ("RDS: Move loop-only function to loop.c") removed the
implementation rds_message_inc_free() but not the declaration.
Since commit 55b7ed0b582f ("RDS: Common RDMA transport code")
rds_rdma_conn_connect() is never implemented and used.
rds_tcp_map_seq() is never implemented and used since
commit 70041088e3b9 ("RDS: Add TCP transport to RDS").
Signed-off-by: Yue Haibing <yuehaibing@huawei.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
sk_diag_put_flags(), netlink_setsockopt(), netlink_getsockopt()
and others use nlk->flags without correct locking.
Use set_bit(), clear_bit(), test_bit(), assign_bit() to remove
data-races.
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Menglong Dong says:
====================
net: tcp: support probing OOM
In this series, we make some small changes to make the tcp
retransmission become zero-window probes if the receiver drops the skb
because of memory pressure.
In the 1st patch, we reply a zero-window ACK if the skb is dropped
because out of memory, instead of dropping the skb silently.
In the 2nd patch, we allow a zero-window ACK to update the window.
In the 3rd patch, fix unexcepted socket die when snd_wnd is 0 in
tcp_retransmit_timer().
In the 4th patch, we refactor the debug message in
tcp_retransmit_timer() to make it more correct.
After these changes, the tcp can probe the OOM of the receiver forever.
Changes since v3:
- make the timeout "2 * TCP_RTO_MAX" in the 3rd patch
- tp->retrans_stamp is not based on jiffies and can't be compared with
icsk->icsk_timeout in the 3rd patch. Fix it.
- introduce the 4th patch
Changes since v2:
- refactor the code to avoid code duplication in the 1st patch
- use after() instead of max() in tcp_rtx_probe0_timed_out()
Changes since v1:
- send 0 rwin ACK for the receive queue empty case when necessary in the
1st patch
- send the ACK immediately by using the ICSK_ACK_NOW flag in the 1st
patch
- consider the case of the connection restart from idle, as Neal comment,
in the 3rd patch
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The debug message in tcp_retransmit_timer() is slightly wrong, because
they could be printed even if we did not receive a new ACK packet from
the remote peer.
Change it to probing zero-window, as it is a expected case now. The
description may be not correct.
Adding the duration since the last ACK we received, and the duration of
the retransmission, which are useful for debugging.
And the message now like this:
Probing zero-window on 127.0.0.1:9999/46946, seq=3737778959:3737791503, recv 209ms ago, lasting 209ms
Probing zero-window on 127.0.0.1:9999/46946, seq=3737778959:3737791503, recv 404ms ago, lasting 408ms
Probing zero-window on 127.0.0.1:9999/46946, seq=3737778959:3737791503, recv 812ms ago, lasting 1224ms
Signed-off-by: Menglong Dong <imagedong@tencent.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
In tcp_retransmit_timer(), a window shrunk connection will be regarded
as timeout if 'tcp_jiffies32 - tp->rcv_tstamp > TCP_RTO_MAX'. This is not
right all the time.
The retransmits will become zero-window probes in tcp_retransmit_timer()
if the 'snd_wnd==0'. Therefore, the icsk->icsk_rto will come up to
TCP_RTO_MAX sooner or later.
However, the timer can be delayed and be triggered after 122877ms, not
TCP_RTO_MAX, as I tested.
Therefore, 'tcp_jiffies32 - tp->rcv_tstamp > TCP_RTO_MAX' is always true
once the RTO come up to TCP_RTO_MAX, and the socket will die.
Fix this by replacing the 'tcp_jiffies32' with '(u32)icsk->icsk_timeout',
which is exact the timestamp of the timeout.
However, "tp->rcv_tstamp" can restart from idle, then tp->rcv_tstamp
could already be a long time (minutes or hours) in the past even on the
first RTO. So we double check the timeout with the duration of the
retransmission.
Meanwhile, making "2 * TCP_RTO_MAX" as the timeout to avoid the socket
dying too soon.
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Link: https://lore.kernel.org/netdev/CADxym3YyMiO+zMD4zj03YPM3FBi-1LHi6gSD2XT8pyAMM096pg@mail.gmail.com/
Signed-off-by: Menglong Dong <imagedong@tencent.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Fow now, an ACK can update the window in following case, according to
the tcp_may_update_window():
1. the ACK acknowledged new data
2. the ACK has new data
3. the ACK expand the window and the seq of it is valid
Now, we allow the ACK update the window if the window is 0, and the
seq/ack of it is valid. This is for the case that the receiver replies
an zero-window ACK when it is under memory stress and can't queue the new
data.
Signed-off-by: Menglong Dong <imagedong@tencent.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
For now, skb will be dropped when no memory, which makes client keep
retrans util timeout and it's not friendly to the users.
In this patch, we reply an ACK with zero-window in this case to update
the snd_wnd of the sender to 0. Therefore, the sender won't timeout the
connection and will probe the zero-window with the retransmits.
Signed-off-by: Menglong Dong <imagedong@tencent.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux
Pull btrfs fixes from David Sterba:
"More fixes, some of them going back to older releases and there are
fixes for hangs in stress tests regarding space caching:
- fixes and progress tracking for hangs in free space caching, found
by test generic/475
- writeback fixes, write pages in integrity mode and skip writing
pages that have been written meanwhile
- properly clear end of extent range after an error
- relocation fixes:
- fix race betwen qgroup tree creation and relocation
- detect and report invalid reloc roots"
* tag 'for-6.5-rc5-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
btrfs: set cache_block_group_error if we find an error
btrfs: reject invalid reloc tree root keys with stack dump
btrfs: exit gracefully if reloc roots don't match
btrfs: avoid race between qgroup tree creation and relocation
btrfs: properly clear end of the unreserved range in cow_file_range
btrfs: don't wait for writeback on clean pages in extent_write_cache_pages
btrfs: don't stop integrity writeback too early
btrfs: wait for actual caching progress during allocation
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux
Pull gpio fixes from Bartosz Golaszewski:
- mark virtual chips exposed by gpio-sim as ones that can sleep
(callbacks must not be called from interrupt context)
- fix an off-by-one error in gpio-ws16c48
* tag 'gpio-fixes-for-v6.5-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux:
gpio: ws16c48: Fix off-by-one error in WS16C48 resource region extent
gpio: sim: mark the GPIO chip as a one that can sleep
|
|
The only remaining consumer is new_inode, where it showed up in 2001 as
commit c37fa164f793 ("v2.4.9.9 -> v2.4.9.10") in a historical repo [1]
with a changelog which does not mention it.
Since then the line got only touched up to keep compiling.
While it may have been of benefit back in the day, it is guaranteed to
at best not get in the way in the multicore setting -- as the code
performs *a lot* of work between the prefetch and actual lock acquire,
any contention means the cacheline is already invalid by the time the
routine calls spin_lock(). It adds spurious traffic, for short.
On top of it prefetch is notoriously tricky to use for single-threaded
purposes, making it questionable from the get go.
As such, remove it.
I admit upfront I did not see value in benchmarking this change, but I
can do it if that is deemed appropriate.
Removal from new_inode and of the entire thing are in the same patch as
requested by Linus, so whatever weird looks can be directed at that guy.
Link: https://git.kernel.org/pub/scm/linux/kernel/git/tglx/history.git/commit/fs/inode.c?id=c37fa164f793735b32aa3f53154ff1a7659e6442 [1]
Signed-off-by: Mateusz Guzik <mjguzik@gmail.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc
Pull char / misc driver fixes from Greg KH:
"Here are some small char/misc driver fixes for 6.5-rc6 that resolve
some reported issues. Included in here are:
- bunch of iio driver fixes for reported problems
- interconnect driver fixes
- counter driver build fix
- cardreader driver fixes
- binder driver fixes
- other tiny driver fixes
All of these have been in linux-next for a while with no reported
problems"
* tag 'char-misc-6.5-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc: (21 commits)
misc: tps6594-esm: Disable ESM for rev 1 PMIC
misc: rtsx: judge ASPM Mode to set PETXCFG Reg
binder: fix memory leak in binder_init()
iio: cros_ec: Fix the allocation size for cros_ec_command
tools/counter: Makefile: Replace rmdir by rm to avoid make,clean failure
iio: imu: lsm6dsx: Fix mount matrix retrieval
iio: adc: meson: fix core clock enable/disable moment
iio: core: Prevent invalid memory access when there is no parent
iio: frequency: admv1013: propagate errors from regulator_get_voltage()
counter: Fix menuconfig "Counter support" submenu entries disappearance
dt-bindings: iio: adi,ad74115: remove ref from -nanoamp
iio: adc: ina2xx: avoid NULL pointer dereference on OF device match
iio: light: bu27008: Fix intensity data type
iio: light: bu27008: Fix scale format
iio: light: bu27034: Fix scale format
iio: adc: ad7192: Fix ac excitation feature
interconnect: qcom: sa8775p: add enable_mask for bcm nodes
interconnect: qcom: sm8550: add enable_mask for bcm nodes
interconnect: qcom: sm8450: add enable_mask for bcm nodes
interconnect: qcom: Add support for mask-based BCMs
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb
Pull USB / Thunderbolt driver fixes from Greg KH:
"Here are some small USB and Thunderbolt driver fixes for reported
problems. Included in here are:
- thunderbolt driver memory leak fix
- thunderbolt display flicker fix
- usb dwc3 driver fix
- usb gadget uvc disconnect crash fix
- usb typec Kconfig build dependency fix
- usb typec small fixes
- usb-con-gpio bugfix
- usb-storage old driver bugfix
All of these have been in linux-next for a while with no reported
issues"
* tag 'usb-6.5-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb:
thunderbolt: Fix memory leak in tb_handle_dp_bandwidth_request()
usb: dwc3: Properly handle processing of pending events
usb-storage: alauda: Fix uninit-value in alauda_check_media()
usb: common: usb-conn-gpio: Prevent bailing out if initial role is none
USB: Gadget: core: Help prevent panic during UVC unconfigure
usb: typec: mux: intel: Add dependency on USB_COMMON
usb: typec: nb7vpq904m: Add an error handling path in nb7vpq904m_probe()
usb: typec: altmodes/displayport: Signal hpd when configuring pin assignment
usb: typec: tcpm: Fix response to vsafe0V event
thunderbolt: Fix Thunderbolt 3 display flickering issue on 2nd hot plug onwards
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Borislav Petkov:
- Do not parse the confidential computing blob on non-AMD hardware as
it leads to an EFI config table ending up unmapped
- Use the correct segment selector in the 32-bit version of getcpu() in
the vDSO
- Make sure vDSO and VVAR regions are placed in the 47-bit VA range
even on 5-level paging systems
- Add models 0x90-0x91 to the range of AMD Zenbleed-affected CPUs
* tag 'x86_urgent_for_v6.5_rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/cpu/amd: Enable Zenbleed fix for AMD Custom APU 0405
x86/mm: Fix VDSO and VVAR placement on 5-level paging machines
x86/linkage: Fix typo of BUILD_VDSO in asm/linkage.h
x86/vdso: Choose the right GDT_ENTRY_CPUNODE for 32-bit getcpu() on 64-bit kernel
x86/sev: Do not try to parse for the CC blob on non-AMD hardware
|