Age | Commit message (Collapse) | Author |
|
Split clocks settings from init callback into clks_config callback,
which could support platform level clocks management.
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Joakim Zhang <qiangqing.zhang@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
This patch intends to add platform level clocks management. Some
platforms may have their own special clocks, they also need to be
managed dynamically. If you want to manage such clocks, please implement
clks_config callback.
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Joakim Zhang <qiangqing.zhang@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
This patch intends to add clocks management for stmmac driver:
If CONFIG_PM enabled:
1. Keep clocks disabled after driver probed.
2. Enable clocks when up the net device, and disable clocks when down
the net device.
If CONFIG_PM disabled:
Keep clocks always enabled after driver probed.
Note:
1. It is fine for ethtool, since the way of implementing ethtool_ops::begin
in stmmac is only can be accessed when interface is enabled, so the clocks
are ticked.
2. The MDIO bus has a different life cycle to the MAC, need ensure
clocks are enabled when _mdio_read/write() need clocks, because these
functions can be called while the interface it not opened.
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Joakim Zhang <qiangqing.zhang@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Currently tcp_check_req can be called with obsolete req socket for which big
socket have been already created (because of CPU race or early demux
assigning req socket to multiple packets in gro batch).
Commit e0f9759f530bf789e984 ("tcp: try to keep packet if SYN_RCV race
is lost") added retry in case when tcp_check_req is called for PSH|ACK packet.
But if client sends RST+ACK immediatly after connection being
established (it is performing healthcheck, for example) retry does not
occur. In that case tcp_check_req tries to close req socket,
leaving big socket active.
Fixes: e0f9759f530 ("tcp: try to keep packet if SYN_RCV race is lost")
Signed-off-by: Alexander Ovechkin <ovov@yandex-team.ru>
Reported-by: Oleg Senin <olegsenin@yandex-team.ru>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Before calling tipc_aead_key_size(ptr), we need to ensure
we have enough data to dereference ptr->keylen.
We probably also want to make sure tipc_aead_key_size()
wont overflow with malicious ptr->keylen values.
Syzbot reported:
BUG: KMSAN: uninit-value in __tipc_nl_node_set_key net/tipc/node.c:2971 [inline]
BUG: KMSAN: uninit-value in tipc_nl_node_set_key+0x9bf/0x13b0 net/tipc/node.c:3023
CPU: 0 PID: 21060 Comm: syz-executor.5 Not tainted 5.11.0-rc7-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Call Trace:
__dump_stack lib/dump_stack.c:79 [inline]
dump_stack+0x21c/0x280 lib/dump_stack.c:120
kmsan_report+0xfb/0x1e0 mm/kmsan/kmsan_report.c:118
__msan_warning+0x5f/0xa0 mm/kmsan/kmsan_instr.c:197
__tipc_nl_node_set_key net/tipc/node.c:2971 [inline]
tipc_nl_node_set_key+0x9bf/0x13b0 net/tipc/node.c:3023
genl_family_rcv_msg_doit net/netlink/genetlink.c:739 [inline]
genl_family_rcv_msg net/netlink/genetlink.c:783 [inline]
genl_rcv_msg+0x1319/0x1610 net/netlink/genetlink.c:800
netlink_rcv_skb+0x6fa/0x810 net/netlink/af_netlink.c:2494
genl_rcv+0x63/0x80 net/netlink/genetlink.c:811
netlink_unicast_kernel net/netlink/af_netlink.c:1304 [inline]
netlink_unicast+0x11d6/0x14a0 net/netlink/af_netlink.c:1330
netlink_sendmsg+0x1740/0x1840 net/netlink/af_netlink.c:1919
sock_sendmsg_nosec net/socket.c:652 [inline]
sock_sendmsg net/socket.c:672 [inline]
____sys_sendmsg+0xcfc/0x12f0 net/socket.c:2345
___sys_sendmsg net/socket.c:2399 [inline]
__sys_sendmsg+0x714/0x830 net/socket.c:2432
__compat_sys_sendmsg net/compat.c:347 [inline]
__do_compat_sys_sendmsg net/compat.c:354 [inline]
__se_compat_sys_sendmsg+0xa7/0xc0 net/compat.c:351
__ia32_compat_sys_sendmsg+0x4a/0x70 net/compat.c:351
do_syscall_32_irqs_on arch/x86/entry/common.c:79 [inline]
__do_fast_syscall_32+0x102/0x160 arch/x86/entry/common.c:141
do_fast_syscall_32+0x6a/0xc0 arch/x86/entry/common.c:166
do_SYSENTER_32+0x73/0x90 arch/x86/entry/common.c:209
entry_SYSENTER_compat_after_hwframe+0x4d/0x5c
RIP: 0023:0xf7f60549
Code: 03 74 c0 01 10 05 03 74 b8 01 10 06 03 74 b4 01 10 07 03 74 b0 01 10 08 03 74 d8 01 00 00 00 00 00 51 52 55 89 e5 0f 34 cd 80 <5d> 5a 59 c3 90 90 90 90 8d b4 26 00 00 00 00 8d b4 26 00 00 00 00
RSP: 002b:00000000f555a5fc EFLAGS: 00000296 ORIG_RAX: 0000000000000172
RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 0000000020000200
RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
Uninit was created at:
kmsan_save_stack_with_flags mm/kmsan/kmsan.c:121 [inline]
kmsan_internal_poison_shadow+0x5c/0xf0 mm/kmsan/kmsan.c:104
kmsan_slab_alloc+0x8d/0xe0 mm/kmsan/kmsan_hooks.c:76
slab_alloc_node mm/slub.c:2907 [inline]
__kmalloc_node_track_caller+0xa37/0x1430 mm/slub.c:4527
__kmalloc_reserve net/core/skbuff.c:142 [inline]
__alloc_skb+0x2f8/0xb30 net/core/skbuff.c:210
alloc_skb include/linux/skbuff.h:1099 [inline]
netlink_alloc_large_skb net/netlink/af_netlink.c:1176 [inline]
netlink_sendmsg+0xdbc/0x1840 net/netlink/af_netlink.c:1894
sock_sendmsg_nosec net/socket.c:652 [inline]
sock_sendmsg net/socket.c:672 [inline]
____sys_sendmsg+0xcfc/0x12f0 net/socket.c:2345
___sys_sendmsg net/socket.c:2399 [inline]
__sys_sendmsg+0x714/0x830 net/socket.c:2432
__compat_sys_sendmsg net/compat.c:347 [inline]
__do_compat_sys_sendmsg net/compat.c:354 [inline]
__se_compat_sys_sendmsg+0xa7/0xc0 net/compat.c:351
__ia32_compat_sys_sendmsg+0x4a/0x70 net/compat.c:351
do_syscall_32_irqs_on arch/x86/entry/common.c:79 [inline]
__do_fast_syscall_32+0x102/0x160 arch/x86/entry/common.c:141
do_fast_syscall_32+0x6a/0xc0 arch/x86/entry/common.c:166
do_SYSENTER_32+0x73/0x90 arch/x86/entry/common.c:209
entry_SYSENTER_compat_after_hwframe+0x4d/0x5c
Fixes: e1f32190cf7d ("tipc: add support for AEAD key setting via netlink")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Tuong Lien <tuong.t.lien@dektech.com.au>
Cc: Jon Maloy <jmaloy@redhat.com>
Cc: Ying Xue <ying.xue@windriver.com>
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
if pl->mac_ops->mac_finish() failed, phylink_err should use
"mac_finish" instead of "mac_prepare".
Fixes: b7ad14c2fe2d4 ("net: phylink: re-implement interface configuration with PCS")
Signed-off-by: Ong Boon Leong <boon.leong.ong@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Ong Boon Leong says:
====================
net: pcs, stmmac: add C37 AN SGMII support
This patch series adds MAC-side SGMII support to stmmac driver and it is
changed as follow:-
1/6: Refactor the current C73 implementation in pcs-xpcs to prepare for
adding C37 AN later.
2/6: Add MAC-side SGMII C37 AN support to pcs-xpcs
3,4/6: make phylink_parse_mode() to work for non-DT platform so that
we can use stmmac platform_data to set it.
5/6: Make stmmac_open() to only skip PHY init if C73 is used, otherwise
C37 AN will need phydev to be connected to phylink.
6/6: Finally, add pcs-xpcs SGMII interface support to Intel mGbE
controller.
The patch series have been tested on EHL CRB PCH TSN (eth2) controller
that has Marvell 88E1512 PHY attached over SGMII interface and the
iterative tests of speed change (AN) + ping test have been successful.
[63446.009295] intel-eth-pci 0000:00:1e.4 eth2: Link is Down
[63449.986365] intel-eth-pci 0000:00:1e.4 eth2: Link is Up - 1Gbps/Full - flow control off
[63449.987625] IPv6: ADDRCONF(NETDEV_CHANGE): eth2: link becomes ready
[63451.248064] intel-eth-pci 0000:00:1e.4 eth2: Link is Down
[63454.082366] intel-eth-pci 0000:00:1e.4 eth2: Link is Up - 100Mbps/Full - flow control off
[63454.083650] IPv6: ADDRCONF(NETDEV_CHANGE): eth2: link becomes ready
[63456.465179] intel-eth-pci 0000:00:1e.4 eth2: Link is Down
[63459.202367] intel-eth-pci 0000:00:1e.4 eth2: Link is Up - 10Mbps/Full - flow control off
[63459.203639] IPv6: ADDRCONF(NETDEV_CHANGE): eth2: link becomes ready
[63460.882832] intel-eth-pci 0000:00:1e.4 eth2: Link is Down
[63464.322366] intel-eth-pci 0000:00:1e.4 eth2: Link is Up - 1Gbps/Full - flow control off
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Intel mGbE controller such as those in EHL & TGL uses pcs-xpcs driver for
SGMII interface. To ensure mdio bus scanning does not assign phy_device
to MDIO-addressable entities like intel serdes and pcs-xpcs, we set up
to phy_mask to skip them.
Signed-off-by: Ong Boon Leong <boon.leong.ong@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
As the support for MAC-side SGMII C37 AN is added to pcs-xpcs, phydev
should be attached to phylink during driver's open(). So, we change the
condition to "Not C73 AN" instead.
Signed-off-by: Ong Boon Leong <boon.leong.ong@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Not all platform uses DT, so phylink_parse_mode() will skip in-band setup
of pl->supported and pl->link_config.advertising entirely. So, we add the
setting of ovr_an_inband flag to make it works for non-DT platform.
Signed-off-by: Ong Boon Leong <boon.leong.ong@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Certain platform does not support DT, so we make phylink_parse_mode() to
allow non-DT platform to use it to setup in-band AN advertising.
Signed-off-by: Ong Boon Leong <boon.leong.ong@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
XPCS IP supports C37 SGMII AN process and it is used in intel multi-GbE
controller as MAC-side SGMII.
Signed-off-by: Ong Boon Leong <boon.leong.ong@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The current implementation for XPCS is validated for C73, so we rename them
to have _c73 suffix and introduce a set of functions to use an_mode flag
to switch between C73 and C37 AN later.
Signed-off-by: Ong Boon Leong <boon.leong.ong@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
s/structue/structure/
Signed-off-by: Bhaskar Chowdhury <unixbhaskar@gmail.com>
Acked-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
s/structue/structure/
Signed-off-by: Bhaskar Chowdhury <unixbhaskar@gmail.com>
Acked-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Mundane typo fix.
Signed-off-by: Bhaskar Chowdhury <unixbhaskar@gmail.com>
Acked-by: Randy Dunlap <rdunlap@infradead.org>
Acked-by: Shay Agroskin <shayagr@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
This file has been effectively empty since 2.3.99-pre3 !
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
When the incorrect interface name is stored in the slaves/active_slave
option of the bonding sysfs, the kernel does not record the log that
interface does not exist.
This patch adds a log for -ENODEV error, which will facilitate users to
figure out such issue.
Signed-off-by: Jianlin Lv <Jianlin.Lv@arm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
For wireless devices (e.g. mt76 driver) multiple net_devices belongs to
the same wireless phy and the napi object is registered in a dummy
netdevice related to the wireless phy.
Export dev_set_threaded in order to be reused in device drivers enabling
threaded NAPI.
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Built-in microphone and combojack on Xiaomi Notebook Pro (1d72:1701) needs
to be fixed, the existing quirk for Dell works well on that machine.
Signed-off-by: Xiaoliang Yu <yxl_22@outlook.com>
Cc: <stable@vger.kernel.org>
Link: https://lore.kernel.org/r/OS0P286MB02749B9E13920E6899902CD8EE6C9@OS0P286MB0274.JPNP286.PROD.OUTLOOK.COM
Signed-off-by: Takashi Iwai <tiwai@suse.de>
|
|
The switch implements unicast and multicast filtering per port.
Add support for it. By default filtering is disabled.
Signed-off-by: Kurt Kanzenbach <kurt@kmk-computers.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
"x25_close" is called by "hdlc_close" in "hdlc.c", which is called by
hardware drivers' "ndo_stop" function.
"x25_xmit" is called by "hdlc_start_xmit" in "hdlc.c", which is hardware
drivers' "ndo_start_xmit" function.
"x25_rx" is called by "hdlc_rcv" in "hdlc.c", which receives HDLC frames
from "net/core/dev.c".
"x25_close" races with "x25_xmit" and "x25_rx" because their callers race.
However, we need to ensure that the LAPB APIs called in "x25_xmit" and
"x25_rx" are called before "lapb_unregister" is called in "x25_close".
This patch adds locking to ensure when "x25_xmit" and "x25_rx" are doing
their work, "lapb_unregister" is not yet called in "x25_close".
Reasons for not solving the racing between "x25_close" and "x25_xmit" by
calling "netif_tx_disable" in "x25_close":
1. We still need to solve the racing between "x25_close" and "x25_rx";
2. The design of the HDLC subsystem assumes the HDLC hardware drivers
have full control over the TX queue, and the HDLC protocol drivers (like
this driver) have no control. Controlling the queue here in the protocol
driver may interfere with hardware drivers' control of the queue.
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Signed-off-by: Xie He <xie.he.0141@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Optimize ice_run_xdp_zc() for the XDP program verdict being
XDP_REDIRECT in the xsk zero-copy path. This path is only used when
having AF_XDP zero-copy on and in that case most packets will be
directed to user space. This provides a little over 100k extra packets
in throughput on my server when running l2fwd in xdpsock.
Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com>
Tested-by: George Kuruvinakunnel <george.kuruvinakunnel@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
|
|
Optimize ixgbe_run_xdp_zc() for the XDP program verdict being
XDP_REDIRECT in the xsk zero-copy path. This path is only used when
having AF_XDP zero-copy on and in that case most packets will be
directed to user space. This provides a little under 100k extra
packets in throughput on my server when running l2fwd in xdpsock.
Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com>
Tested-by: Vishakha Jambekar <vishakha.jambekar@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
|
|
Optimize i40e_run_xdp_zc() for the XDP program verdict being
XDP_REDIRECT in the xsk zero-copy path. This path is only used when
having AF_XDP zero-copy on and in that case most packets will be
directed to user space. This provides a little over 100k extra packets
in throughput on my server when running l2fwd in xdpsock.
Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com>
Tested-by: George Kuruvinakunnel <george.kuruvinakunnel@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
|
|
In commit 05bc1be6db4b2 ("s390/pci: create zPCI bus") we removed the
pci_dev_put() call matching the earlier pci_get_slot() done as part of
__zpci_event_availability(). This was based on the wrong understanding
that the device_put() done as part of pci_destroy_device() would counter
the pci_get_slot() when it only counters the initial reference. This
same understanding and existing bad example also lead to not doing
a pci_dev_put() in zpci_remove_device().
Since releasing the PCI devices, unlike releasing the PCI slot, does not
print any debug message for testing I added one in pci_release_dev().
This revealed that we are indeed leaking the PCI device on PCI
hotunplug. Further testing also revealed another missing pci_dev_put() in
disable_slot().
Fix this by adding the missing pci_dev_put() in disable_slot() and fix
zpci_remove_device() with the correct pci_dev_put() calls. Also instead
of calling pci_get_slot() in __zpci_event_availability() to determine if
a PCI device is registered and then doing the same again in
zpci_remove_device() do this once in zpci_remove_device() which makes
sure that the pdev in __zpci_event_availability() is only used for the
result of pci_scan_single_device() which does not need a reference count
decremnt as its ownership goes to the PCI bus.
Also move the check if zdev->zbus->bus is set into zpci_remove_device()
since it may be that we're removing a device with devfn != 0 which never
had a PCI bus. So we can still set the pdev->error_state to indicate
that the device is not usable anymore, add a flag to set the error state.
Fixes: 05bc1be6db4b2 ("s390/pci: create zPCI bus")
Cc: <stable@vger.kernel.org> # 5.8+: e1bff843cde6 s390/pci: remove superfluous zdev->zbus check
Cc: <stable@vger.kernel.org> # 5.8+: ba764dd703fe s390/pci: refactor zpci_create_device()
Cc: <stable@vger.kernel.org> # 5.8+
Reviewed-by: Matthew Rosato <mjrosato@linux.ibm.com>
Signed-off-by: Niklas Schnelle <schnelle@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
|
|
Commit 152e9b8676c6e ("s390/vtime: steal time exponential moving average")
inadvertently changed the input value for account_steal_time() from
"cputime_to_nsecs(steal)" to just "steal", resulting in broken increased
steal time accounting.
Fix this by changing it back to "cputime_to_nsecs(steal)".
Fixes: 152e9b8676c6e ("s390/vtime: steal time exponential moving average")
Cc: <stable@vger.kernel.org> # 5.1
Reported-by: Sabine Forkel <sabine.forkel@de.ibm.com>
Reviewed-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Gerald Schaefer <gerald.schaefer@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
|
|
The following BUG message was triggered repeatedly when complete counter
sets are extracted from the CPUMF:
BUG: using smp_processor_id() in preemptible [00000000]
code: psvc-readsets/7759
caller is cf_diag_needspace+0x2c/0x100
CPU: 7 PID: 7759 Comm: psvc-readsets Not tainted 5.12.0
Hardware name: IBM 3906 M03 703 (LPAR)
Call Trace:
[<00000000c7043f78>] show_stack+0x90/0xf8
[<00000000c705776a>] dump_stack+0xba/0x108
[<00000000c705d91c>] check_preemption_disabled+0xec/0xf0
[<00000000c63eb1c4>] cf_diag_needspace+0x2c/0x100
[<00000000c63ecbcc>] cf_diag_ioctl_start+0x10c/0x240
[<00000000c63ece9a>] cf_diag_ioctl+0x19a/0x238
[<00000000c675f3f4>] __s390x_sys_ioctl+0xc4/0x100
[<00000000c63ca762>] do_syscall+0x82/0xd0
[<00000000c705bdd8>] __do_syscall+0xc0/0xd8
[<00000000c706d532>] system_call+0x72/0x98
2 locks held by psvc-readsets/7759:
#0: 00000000c75a57c0 (cpu_hotplug_lock){++++}-{0:0},
at: cf_diag_ioctl+0x44/0x238
#1: 00000000c75a3078 (cf_diag_ctrset_mutex){+.+.}-{3:3},
at: cf_diag_ioctl+0x54/0x238
This issue is a missing get_cpu_ptr/put_cpu_ptr pair in function
cf_diag_needspace. Add it.
Fixes: cf6acb8bdb1d ("s390/cpumf: Add support for complete counter set extraction")
Reviewed-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Thomas Richter <tmricht@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
|
|
[Why]
With certain fclock overclocks, state 1 may be chosen
as the closest clock level. This may result in this state
being empty if not populated beforehand, resulting in
black screens and screen corruption.
[How]
Copy over all soc states to clock_limits before bounding
box creation to avoid any cases with empty states.
Fixes: f2459c52c84449 ("drm/amd/display: Add Bounding Box State for Low DF PState but High Voltage State")
Bug: https://gitlab.freedesktop.org/drm/amd/-/issues/1514
Signed-off-by: Sung Lee <sung.lee@amd.com>
Reviewed-by: Tony Cheng <Tony.Cheng@amd.com>
Reviewed-by: Yongqiang Sun <yongqiang.sun@amd.com>
Acked-by: Qingqing Zhuo <Qingqing.Zhuo@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
|
|
Before this change, the mask is never included in the netlink message, so
"conntrack -E expect" always prints 0.0.0.0.
In older kernels the l3num callback struct was passed as argument, based
on tuple->src.l3num. After the l3num indirection got removed, the call
chain is based on m.src.l3num, but this value is 0xffff.
Init l3num to the correct value.
Fixes: f957be9d349a3 ("netfilter: conntrack: remove ctnetlink callbacks from l3 protocol trackers")
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
When a new table value was assigned, it was followed by a write memory
barrier. This ensured that all writes before this point would complete
before any writes after this point. However, to determine whether the
rules are unused, the sequence counter is read. To ensure that all
writes have been done before these reads, a full memory barrier is
needed, not just a write memory barrier. The same argument applies when
incrementing the counter, before the rules are read.
Changing to using smp_mb() instead of smp_wmb() fixes the kernel panic
reported in cc00bcaa5899 (which is still present), while still
maintaining the same speed of replacing tables.
The smb_mb() barriers potentially slow the packet path, however testing
has shown no measurable change in performance on a 4-core MIPS64
platform.
Fixes: 7f5c6d4f665b ("netfilter: get rid of atomic ops in fast path")
Signed-off-by: Mark Tomlinson <mark.tomlinson@alliedtelesis.co.nz>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
This reverts commit cc00bcaa589914096edef7fb87ca5cee4a166b5c.
This (and the preceding) patch basically re-implemented the RCU
mechanisms of patch 784544739a25. That patch was replaced because of the
performance problems that it created when replacing tables. Now, we have
the same issue: the call to synchronize_rcu() makes replacing tables
slower by as much as an order of magnitude.
Prior to using RCU a script calling "iptables" approx. 200 times was
taking 1.16s. With RCU this increased to 11.59s.
Revert these patches and fix the issue in a different way.
Signed-off-by: Mark Tomlinson <mark.tomlinson@alliedtelesis.co.nz>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
This reverts commit 443d6e86f821a165fae3fc3fc13086d27ac140b1.
This (and the following) patch basically re-implemented the RCU
mechanisms of patch 784544739a25. That patch was replaced because of the
performance problems that it created when replacing tables. Now, we have
the same issue: the call to synchronize_rcu() makes replacing tables
slower by as much as an order of magnitude.
Revert these patches and fix the issue in a different way.
Signed-off-by: Mark Tomlinson <mark.tomlinson@alliedtelesis.co.nz>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
afs_listxattr() lists all the available special afs xattrs (i.e. those in
the "afs.*" space), no matter what type of server we're dealing with. But
OpenAFS servers, for example, cannot deal with some of the extra-capable
attributes that AuriStor (YFS) servers provide. Unfortunately, the
presence of the afs.yfs.* attributes causes errors[1] for anything that
tries to read them if the server is of the wrong type.
Fix the problem by removing afs_listxattr() so that none of the special
xattrs are listed (AFS doesn't support xattrs). It does mean, however,
that getfattr won't list them, though they can still be accessed with
getxattr() and setxattr().
This can be tested with something like:
getfattr -d -m ".*" /afs/example.com/path/to/file
With this change, none of the afs.* attributes should be visible.
Changes:
ver #2:
- Hide all of the afs.* xattrs, not just the ACL ones.
Fixes: ae46578b963f ("afs: Get YFS ACLs and information through xattrs")
Reported-by: Gaja Sophie Peters <gaja.peters@math.uni-hamburg.de>
Signed-off-by: David Howells <dhowells@redhat.com>
Tested-by: Gaja Sophie Peters <gaja.peters@math.uni-hamburg.de>
Reviewed-by: Jeffrey Altman <jaltman@auristor.com>
Reviewed-by: Marc Dionne <marc.dionne@auristor.com>
cc: linux-afs@lists.infradead.org
Link: http://lists.infradead.org/pipermail/linux-afs/2021-March/003502.html [1]
Link: http://lists.infradead.org/pipermail/linux-afs/2021-March/003567.html # v1
Link: http://lists.infradead.org/pipermail/linux-afs/2021-March/003573.html # v2
|
|
If someone attempts to access YFS-related xattrs (e.g. afs.yfs.acl) on a
file on a non-YFS AFS server (such as OpenAFS), then the kernel will jump
to a NULL function pointer because the afs_fetch_acl_operation descriptor
doesn't point to a function for issuing an operation on a non-YFS
server[1].
Fix this by making afs_wait_for_operation() check that the issue_afs_rpc
method is set before jumping to it and setting -ENOTSUPP if not. This fix
also covers other potential operations that also only exist on YFS servers.
afs_xattr_get/set_yfs() then need to translate -ENOTSUPP to -ENODATA as the
former error is internal to the kernel.
The bug shows up as an oops like the following:
BUG: kernel NULL pointer dereference, address: 0000000000000000
[...]
Code: Unable to access opcode bytes at RIP 0xffffffffffffffd6.
[...]
Call Trace:
afs_wait_for_operation+0x83/0x1b0 [kafs]
afs_xattr_get_yfs+0xe6/0x270 [kafs]
__vfs_getxattr+0x59/0x80
vfs_getxattr+0x11c/0x140
getxattr+0x181/0x250
? __check_object_size+0x13f/0x150
? __fput+0x16d/0x250
__x64_sys_fgetxattr+0x64/0xb0
do_syscall_64+0x49/0xc0
entry_SYSCALL_64_after_hwframe+0x44/0xa9
RIP: 0033:0x7fb120a9defe
This was triggered with "cp -a" which attempts to copy xattrs, including
afs ones, but is easier to reproduce with getfattr, e.g.:
getfattr -d -m ".*" /afs/openafs.org/
Fixes: e49c7b2f6de7 ("afs: Build an abstraction around an "operation" concept")
Reported-by: Gaja Sophie Peters <gaja.peters@math.uni-hamburg.de>
Signed-off-by: David Howells <dhowells@redhat.com>
Tested-by: Gaja Sophie Peters <gaja.peters@math.uni-hamburg.de>
Reviewed-by: Marc Dionne <marc.dionne@auristor.com>
Reviewed-by: Jeffrey Altman <jaltman@auristor.com>
cc: linux-afs@lists.infradead.org
Link: http://lists.infradead.org/pipermail/linux-afs/2021-March/003498.html [1]
Link: http://lists.infradead.org/pipermail/linux-afs/2021-March/003566.html # v1
Link: http://lists.infradead.org/pipermail/linux-afs/2021-March/003572.html # v2
|
|
When fixing the bpf test_tunnel.sh geneve failure. I only fixed the IPv4
part but forgot the IPv6 issue. Similar with the IPv4 fixes 557c223b643a
("selftests/bpf: No need to drop the packet when there is no geneve opt"),
when there is no tunnel option and bpf_skb_get_tunnel_opt() returns error,
there is no need to drop the packets and break all geneve rx traffic.
Just set opt_class to 0 and keep returning TC_ACT_OK at the end.
Fixes: 557c223b643a ("selftests/bpf: No need to drop the packet when there is no geneve opt")
Fixes: 933a741e3b82 ("selftests/bpf: bpf tunnel test.")
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: William Tu <u9012063@gmail.com>
Link: https://lore.kernel.org/bpf/20210309032214.2112438-1-liuhangbin@gmail.com
|
|
Augment the current set of options that are accessible via
bpf_{g,s}etsockopt to also support SO_REUSEPORT.
Signed-off-by: Manu Bretelle <chantra@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Link: https://lore.kernel.org/bpf/20210310182305.1910312-1-chantra@fb.com
|
|
When using a zoned filesystem, while syncing the log, if we fail to
allocate the root node for the log root tree, we are not removing the
log context we allocated on stack from the list of log contexts of the
log root tree. This means after the return from btrfs_sync_log() we get
a corrupted linked list.
Fix this by allocating the node before adding our stack allocated context
to the list of log contexts of the log root tree.
Fixes: 3ddebf27fcd3a9 ("btrfs: zoned: reorder log node allocation on zoned filesystem")
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Reviewed-by: Naohiro Aota <naohiro.aota@wdc.com>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
|
|
[BUG]
When running fsstress with only falloc workload, and a very low qgroup
limit set, we can get qgroup data rsv leak at unmount time.
BTRFS warning (device dm-0): qgroup 0/5 has unreleased space, type 0 rsv 20480
BTRFS error (device dm-0): qgroup reserved space leaked
The minimal reproducer looks like:
#!/bin/bash
dev=/dev/test/test
mnt="/mnt/btrfs"
fsstress=~/xfstests-dev/ltp/fsstress
runtime=8
workload()
{
umount $dev &> /dev/null
umount $mnt &> /dev/null
mkfs.btrfs -f $dev > /dev/null
mount $dev $mnt
btrfs quota en $mnt
btrfs quota rescan -w $mnt
btrfs qgroup limit 16m 0/5 $mnt
$fsstress -w -z -f creat=10 -f fallocate=10 -p 2 -n 100 \
-d $mnt -v > /tmp/fsstress
umount $mnt
if dmesg | grep leak ; then
echo "!!! FAILED !!!"
exit 1
fi
}
for (( i=0; i < $runtime; i++)); do
echo "=== $i/$runtime==="
workload
done
Normally it would fail before round 4.
[CAUSE]
In function insert_prealloc_file_extent(), we first call
btrfs_qgroup_release_data() to know how many bytes are reserved for
qgroup data rsv.
Then use that @qgroup_released number to continue our work.
But after we call btrfs_qgroup_release_data(), we should either queue
@qgroup_released to delayed ref or free them manually in error path.
Unfortunately, we lack the error handling to free the released bytes,
leaking qgroup data rsv.
All the error handling function outside won't help at all, as we have
released the range, meaning in inode io tree, the EXTENT_QGROUP_RESERVED
bit is already cleared, thus all btrfs_qgroup_free_data() call won't
free any data rsv.
[FIX]
Add free_qgroup tag to manually free the released qgroup data rsv.
Reported-by: Nikolay Borisov <nborisov@suse.com>
Reported-by: David Sterba <dsterba@suse.cz>
Fixes: 9729f10a608f ("btrfs: inode: move qgroup reserved space release to the callers of insert_reserved_file_extent()")
CC: stable@vger.kernel.org # 5.10+
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
|
|
There is a piece of weird code in insert_prealloc_file_extent(), which
looks like:
ret = btrfs_qgroup_release_data(inode, file_offset, len);
if (ret < 0)
return ERR_PTR(ret);
if (trans) {
ret = insert_reserved_file_extent(trans, inode,
file_offset, &stack_fi,
true, ret);
...
}
extent_info.is_new_extent = true;
extent_info.qgroup_reserved = ret;
...
Note how the variable @ret is abused here, and if anyone is adding code
just after btrfs_qgroup_release_data() call, it's super easy to
overwrite the @ret and cause tons of qgroup related bugs.
Fix such abuse by introducing new variable @qgroup_released, so that we
won't reuse the existing variable @ret.
Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
|
|
[BUG]
The test generic/091 fails , with the following output:
fsx -N 10000 -o 128000 -l 500000 -r PSIZE -t BSIZE -w BSIZE -Z -W
mapped writes DISABLED
Seed set to 1
main: filesystem does not support fallocate mode FALLOC_FL_COLLAPSE_RANGE, disabling!
main: filesystem does not support fallocate mode FALLOC_FL_INSERT_RANGE, disabling!
skipping zero size read
truncating to largest ever: 0xe400
copying to largest ever: 0x1f400
cloning to largest ever: 0x70000
cloning to largest ever: 0x77000
fallocating to largest ever: 0x7a120
Mapped Read: non-zero data past EOF (0x3a7ff) page offset 0x800 is 0xf2e1 <<<
...
[CAUSE]
In commit c28ea613fafa ("btrfs: subpage: fix the false data csum mismatch error")
end_bio_extent_readpage() changes to only zero the range inside the bvec
for incoming subpage support.
But that commit is using incorrect offset to calculate the start.
For subpage, we can have a case that the whole bvec is beyond isize,
thus we need to calculate the correct offset.
But the offending commit is using @end (bvec end), other than @start
(bvec start) to calculate the start offset.
This means, we only zero the last byte of the bvec, not from the isize.
This stupid bug makes the range beyond isize is not properly zeroed, and
failed above test.
[FIX]
Use correct @start to calculate the range start.
Reported-by: kernel test robot <oliver.sang@intel.com>
Fixes: c28ea613fafa ("btrfs: subpage: fix the false data csum mismatch error")
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
|
|
BULKSTAT_SINGLE exposed the ondisk uids/gids just like bulkstat, and can
be called on any inode, including ones not visible in the current mount.
Fixes: f736d93d76d3 ("xfs: support idmapped mounts")
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
|
|
s/oustanding/outstanding/
Signed-off-by: Bhaskar Chowdhury <unixbhaskar@gmail.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
|
|
If we allocate quota inodes in the process of mounting a filesystem but
then decide to abort the mount, it's possible that the quota inodes are
sitting around pinned by the log. Now that inode reclaim relies on the
AIL to flush inodes, we have to force the log and push the AIL in
between releasing the quota inodes and kicking off reclaim to tear down
all the incore inodes. Do this by extracting the bits we need from the
unmount path and reusing them. As an added bonus, failed writes during
a failed mount will not retry forever now.
This was originally found during a fuzz test of metadata directories
(xfs/1546), but the actual symptom was that reclaim hung up on the quota
inodes.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
|
|
https://git.kernel.org/pub/scm/linux/kernel/git/jic23/iio into staging-linus
Jonathan writes:
First set of IIO and counter fixes for the 5.12 cycle
adi,ad7949
* Fix a wrong bitmask that could lead to an undefined bit being included.
adi,adi-axi-adc
* Add missing Kconfig dependencies
adi,adis16400
* Wrong error code handling in adis16400 that could lead to failed probe.
hid-sensor-humidity, temperature
* Fix alignment and space for timestamp channel.
hid-sensor-prox
* Fix an issue with handling of exponent on the channel scaling.
invensense,mpu3050
* Fix a hole in error handling.
qcom,spi-vadc
* Correct scaling
st,ab8500-adc
* Fix wrong scaling (by factor of 1000)
st,stm32-adc
* Add missing HAS_IOMEM dependency
st,stm32-timer-cnt
* Report count when running off internal clock
* Fix issue with not checking ceiling before trying to write to hardware
* Ensure driver doesn't have stashed state which doesn't match hardware by
rereading from hardware in a slow path.
* tag 'iio-fixes-for-5.12a' of https://git.kernel.org/pub/scm/linux/kernel/git/jic23/iio:
iio: gyro: mpu3050: Fix error handling in mpu3050_trigger_handler
iio: hid-sensor-temperature: Fix issues of timestamp channel
iio: hid-sensor-humidity: Fix alignment issue of timestamp channel
counter: stm32-timer-cnt: fix ceiling miss-alignment with reload register
counter: stm32-timer-cnt: fix ceiling write max value
counter: stm32-timer-cnt: Report count function when SLAVE_MODE_DISABLED
iio: adc: ab8500-gpadc: Fix off by 10 to 3
iio:adc:stm32-adc: Add HAS_IOMEM dependency
iio: adis16400: Fix an error code in adis16400_initial_setup()
iio: adc: adi-axi-adc: add proper Kconfig dependencies
iio: adc: ad7949: fix wrong ADC result due to incorrect bit mask
iio: hid-sensor-prox: Fix scale not correct issue
iio:adc:qcom-spmi-vadc: add default scale to LR_MUX2_BAT_ID channel
|
|
Running sqpoll cancellations via task_work_run() is a bad idea because
it depends on other task works to be run, but those may be locked in
currently running task_work_run() because of how it's (splicing the list
in batches).
Enqueue and run them through a separate callback head, namely
struct io_sq_data::park_task_work. As a nice bonus we now precisely
control where it's run, that's much safer than guessing where it can
happen as it was before.
Reported-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
We already have helpers to run/add callback_head but taking ctx and
working with ctx->exit_task_work. Extract generic versions of them
implemented in terms of struct callback_head, it will be used later.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
If io_sq_thread_park() of one task got rescheduled right after
set_bit(), before it gets back to mutex_lock() there can happen
park()/unpark() by another task with SQPOLL locking again and
continuing running never seeing that first set_bit(SHOULD_PARK),
so won't even try to put the mutex down for parking.
It will get parked eventually when SQPOLL drops the lock for reschedule,
but may be problematic and will get in the way of further fixes.
Account number of tasks waiting for parking with a new atomic variable
park_pending and adjust SHOULD_PARK accordingly. It doesn't entirely
replaces SHOULD_PARK bit with this atomic var because it's convenient
to have it as a bit in the state and will help to do optimisations
later.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
io_sq_thread_finish() is called in io_ring_ctx_free(), so SQPOLL task is
potentially running submitting new requests. It's not a disaster because
of using a "try" variant of percpu_ref_get, but is far from nice.
Remove ctx from the sqd ctx list earlier, before cancellation loop, so
SQPOLL can't find it and so won't submit new requests.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
The only user of read-locking of sqd->rw_lock is sq_thread itself, which
is by definition alone, so we don't really need rw_semaphore, but mutex
will do. Replace it with a mutex, and kill read-to-write upgrading and
extra task_work handling in io_sq_thread().
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|