Age | Commit message (Collapse) | Author |
|
rxkad_verify_packet_2() has a small stack-allocated sglist of 4 elements,
but if that isn't sufficient for the number of fragments in the socket
buffer, we try to allocate an sglist large enough to hold all the
fragments.
However, for large packets with a lot of fragments, this isn't sufficient
and we need at least one additional fragment.
The problem manifests as skb_to_sgvec() returning -EMSGSIZE and this then
getting returned by userspace. Most of the time, this isn't a problem as
rxrpc sets a limit of 5692, big enough for 4 jumbo subpackets to be glued
together; occasionally, however, the server will ignore the reported limit
and give a packet that's a lot bigger - say 19852 bytes with ->nr_frags
being 7. skb_to_sgvec() then tries to return a "zeroth" fragment that
seems to occur before the fragments counted by ->nr_frags and we hit the
end of the sglist too early.
Note that __skb_to_sgvec() also has an skb_walk_frags() loop that is
recursive up to 24 deep. I'm not sure if I need to take account of that
too - or if there's an easy way of counting those frags too.
Fix this by counting an extra frag and allocating a larger sglist based on
that.
Fixes: d0d5c0cd1e71 ("rxrpc: Use skb_unshare() rather than skb_cow_data()")
Reported-by: Marc Dionne <marc.dionne@auristor.com>
Signed-off-by: David Howells <dhowells@redhat.com>
cc: linux-afs@lists.infradead.org
|
|
Because rxrpc pretends to be a tunnel on top of a UDP/UDP6 socket, allowing
it to siphon off UDP packets early in the handling of received UDP packets
thereby avoiding the packet going through the UDP receive queue, it doesn't
get ICMP packets through the UDP ->sk_error_report() callback. In fact, it
doesn't appear that there's any usable option for getting hold of ICMP
packets.
Fix this by adding a new UDP encap hook to distribute error messages for
UDP tunnels. If the hook is set, then the tunnel driver will be able to
see ICMP packets. The hook provides the offset into the packet of the UDP
header of the original packet that caused the notification.
An alternative would be to call the ->error_handler() hook - but that
requires that the skbuff be cloned (as ip_icmp_error() or ipv6_cmp_error()
do, though isn't really necessary or desirable in rxrpc's case is we want
to parse them there and then, not queue them).
Changes
=======
ver #3)
- Fixed an uninitialised variable.
ver #2)
- Fixed some missing CONFIG_AF_RXRPC_IPV6 conditionals.
Fixes: 5271953cad31 ("rxrpc: Use the UDP encap_rcv hook")
Signed-off-by: David Howells <dhowells@redhat.com>
|
|
slab_mutex/cpu_hotplug_lock
A circular locking problem is reported by lockdep due to the following
circular locking dependency.
+--> cpu_hotplug_lock --> slab_mutex --> kn->active --+
| |
+-----------------------------------------------------+
The forward cpu_hotplug_lock ==> slab_mutex ==> kn->active dependency
happens in
kmem_cache_destroy(): cpus_read_lock(); mutex_lock(&slab_mutex);
==> sysfs_slab_unlink()
==> kobject_del()
==> kernfs_remove()
==> __kernfs_remove()
==> kernfs_drain(): rwsem_acquire(&kn->dep_map, ...);
The backward kn->active ==> cpu_hotplug_lock dependency happens in
kernfs_fop_write_iter(): kernfs_get_active();
==> slab_attr_store()
==> cpu_partial_store()
==> flush_all(): cpus_read_lock()
One way to break this circular locking chain is to avoid holding
cpu_hotplug_lock and slab_mutex while deleting the kobject in
sysfs_slab_unlink() which should be equivalent to doing a write_lock
and write_unlock pair of the kn->active virtual lock.
Since the kobject structures are not protected by slab_mutex or the
cpu_hotplug_lock, we can certainly release those locks before doing
the delete operation.
Move sysfs_slab_unlink() and sysfs_slab_release() to the newly
created kmem_cache_release() and call it outside the slab_mutex &
cpu_hotplug_lock critical sections. There will be a slight delay
in the deletion of sysfs files if kmem_cache_release() is called
indirectly from a work function.
Fixes: 5a836bf6b09f ("mm: slub: move flush_cpu_slab() invocations __free_slab() invocations out of IRQ context")
Signed-off-by: Waiman Long <longman@redhat.com>
Reviewed-by: Hyeonggon Yoo <42.hyeyoo@gmail.com>
Reviewed-by: Roman Gushchin <roman.gushchin@linux.dev>
Acked-by: David Rientjes <rientjes@google.com>
Link: https://lore.kernel.org/all/YwOImVd+nRUsSAga@hyeyoo/
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
|
|
Sebastian Reichel says:
====================
RK3588 Ethernet Support
This adds ethernet support for the RK3588(s) SoCs.
Changes since PATCHv2:
* Rebased to v6.0-rc1
* Wrap DT bindings additions at 80 characters
* Add Acked-by from Krzysztof
Changes since PATCHv1:
* Drop first patch (not required) and rebase second patch accordingly
* Rename DT property rockchip,php_grf -> rockchip,php-grf
====================
Link: https://lore.kernel.org/r/20220830154559.61506-1-sebastian.reichel@collabora.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
Add compatible string for RK3588 gmac, which is similar to the RK3568
one, but needs another syscon device for clock selection.
Acked-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Signed-off-by: Sebastian Reichel <sebastian.reichel@collabora.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
Add constants and callback functions for the dwmac on RK3588 soc.
As can be seen, the base structure is the same, only registers
and the bits in them moved slightly.
Signed-off-by: David Wu <david.wu@rock-chips.com>
[rebase, squash fixes]
Signed-off-by: Sebastian Reichel <sebastian.reichel@collabora.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
https://git.kernel.org/pub/scm/linux/kernel/git/johan/usb-serial into usb-linus
Johan writes:
"USB-serial fixes for 6.0-rc4
Here are a couple of fixes for two long-standing issues with some older
ch341 devices and a number of new device ids.
All have been in linux-next with no reported issues."
* tag 'usb-serial-6.0-rc4' of https://git.kernel.org/pub/scm/linux/kernel/git/johan/usb-serial:
USB: serial: ch341: fix disabled rx timer on older devices
USB: serial: ch341: fix lost character on LCR updates
USB: serial: cp210x: add Decagon UCA device id
USB: serial: option: add support for Cinterion MV32-WA/WB RmNet mode
USB: serial: ftdi_sio: add Omron CS1W-CIF31 device id
USB: serial: option: add Quectel EM060K modem
USB: serial: option: add support for OPPO R11 diag port
|
|
This patch updates device status array range from 11 to 12 as we will
be reading status from device number 0 to device number 11 inclusive.
Without this patch we can potentially access status array out of range
during auto-enumeration.
Fixes: aa1262ca6695 ("soundwire: qcom: Check device status before reading devid")
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Srinivas Kandagatla <srinivas.kandagatla@linaro.org>
Link: https://lore.kernel.org/r/20220708104747.8722-1-srinivas.kandagatla@linaro.org
Signed-off-by: Vinod Koul <vkoul@kernel.org>
|
|
For passive connections, the refcount increment has been done in
smc_clcsock_accept()-->smc_sock_alloc().
Fixes: 3b2dec2603d5 ("net/smc: restructure client and server code in af_smc")
Signed-off-by: Yacan Liu <liuyacan@corp.netease.com>
Reviewed-by: Tony Lu <tonylu@linux.alibaba.com>
Link: https://lore.kernel.org/r/20220830152314.838736-1-liuyacan@corp.netease.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
As of now all transmit queues transmit packets out of same scheduler
queue hierarchy. Due to this PFC frames sent by peer are not handled
properly, either all transmit queues are backpressured or none.
To fix this when user enables PFC for a given priority map relavant
transmit queue to a different scheduler queue hierarcy, so that
backpressure is applied only to the traffic egressing out of that TXQ.
Signed-off-by: Suman Ghosh <sumang@marvell.com>
Link: https://lore.kernel.org/r/20220830120304.158060-1-sumang@marvell.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
git://anongit.freedesktop.org/drm/drm-intel into drm-fixes
- GVT fixes including fix for a CommetLake regression in mmio table
and misc doc and typo fixes
- Fix CCS handling (Matt)
- Fix for guc requests after reset (Daniele)
- Display DSI related fixes (Jani)
- Display backlight related fixes (Arun, Jouni)
Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Rodrigo Vivi <rodrigo.vivi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/YwjCTDFm7clXPgEu@intel.com
|
|
Currently, the change function can be called by two ways. The one way is
that qdisc_change() will call it. Before calling change function,
qdisc_change() ensures tca[TCA_OPTIONS] is not empty. The other way is
that .init() will call it. The opt parameter is also checked before
calling change function in .init(). Therefore, it's no need to check the
input parameter opt in change function.
Signed-off-by: Zhengchao Shao <shaozhengchao@huawei.com>
Acked-by: Toke Høiland-Jørgensen <toke@toke.dk>
Link: https://lore.kernel.org/r/20220829071219.208646-1-shaozhengchao@huawei.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
This reverts commit 90fabae8a2c225c4e4936723c38857887edde5cc.
Patch was applied hastily, revert and let the v2 be reviewed.
Fixes: 90fabae8a2c2 ("sch_cake: Return __NET_XMIT_STOLEN when consuming enqueued skb")
Link: https://lore.kernel.org/all/87wnao2ha3.fsf@toke.dk/
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Eric Dumazet says:
====================
tcp: tcp challenge ack fixes
syzbot found a typical data-race addressed in the first patch.
While we are at it, second patch makes the global rate limit
per net-ns and disabled by default.
====================
Link: https://lore.kernel.org/r/20220830185656.268523-1-eric.dumazet@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Because per host rate limiting has been proven problematic (side channel
attacks can be based on it), per host rate limiting of challenge acks ideally
should be per netns and turned off by default.
This is a long due followup of following commits:
083ae308280d ("tcp: enable per-socket rate limiting of all 'challenge acks'")
f2b2c582e824 ("tcp: mitigate ACK loops for connections as tcp_sock")
75ff39ccc1bd ("tcp: make challenge acks less predictable")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Jason Baron <jbaron@akamai.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
challenge_timestamp can be read an written by concurrent threads.
This was expected, but we need to annotate the race to avoid potential issues.
Following patch moves challenge_timestamp and challenge_count
to per-netns storage to provide better isolation.
Fixes: 354e4aa391ed ("tcp: RFC 5961 5.2 Blind Data Injection Attack Mitigation")
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
In case the source port cannot be decoded, print the warning only once. This
still brings attention to the user and does not spam the logs at the same time.
Signed-off-by: Kurt Kanzenbach <kurt@linutronix.de>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Vladimir Oltean <olteanv@gmail.com>
Link: https://lore.kernel.org/r/20220830163448.8921-1-kurt@linutronix.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
The IP_UNICAST_IF socket option is used to set the outgoing interface
for outbound packets.
The IP_UNICAST_IF socket option was added as it was needed by the
Wine project, since no other existing option (SO_BINDTODEVICE socket
option, IP_PKTINFO socket option or the bind function) provided the
needed characteristics needed by the IP_UNICAST_IF socket option. [1]
The IP_UNICAST_IF socket option works well for unconnected sockets,
that is, the interface specified by the IP_UNICAST_IF socket option
is taken into consideration in the route lookup process when a packet
is being sent. However, for connected sockets, the outbound interface
is chosen when connecting the socket, and in the route lookup process
which is done when a packet is being sent, the interface specified by
the IP_UNICAST_IF socket option is being ignored.
This inconsistent behavior was reported and discussed in an issue
opened on systemd's GitHub project [2]. Also, a bug report was
submitted in the kernel's bugzilla [3].
To understand the problem in more detail, we can look at what happens
for UDP packets over IPv4 (The same analysis was done separately in
the referenced systemd issue).
When a UDP packet is sent the udp_sendmsg function gets called and
the following happens:
1. The oif member of the struct ipcm_cookie ipc (which stores the
output interface of the packet) is initialized by the ipcm_init_sk
function to inet->sk.sk_bound_dev_if (the device set by the
SO_BINDTODEVICE socket option).
2. If the IP_PKTINFO socket option was set, the oif member gets
overridden by the call to the ip_cmsg_send function.
3. If no output interface was selected yet, the interface specified
by the IP_UNICAST_IF socket option is used.
4. If the socket is connected and no destination address is
specified in the send function, the struct ipcm_cookie ipc is not
taken into consideration and the cached route, that was calculated in
the connect function is being used.
Thus, for a connected socket, the IP_UNICAST_IF sockopt isn't taken
into consideration.
This patch corrects the behavior of the IP_UNICAST_IF socket option
for connect()ed sockets by taking into consideration the
IP_UNICAST_IF sockopt when connecting the socket.
In order to avoid reconnecting the socket, this option is still
ignored when applied on an already connected socket until connect()
is called again by the Richard Gobert.
Change the __ip4_datagram_connect function, which is called during
socket connection, to take into consideration the interface set by
the IP_UNICAST_IF socket option, in a similar way to what is done in
the udp_sendmsg function.
[1] https://lore.kernel.org/netdev/1328685717.4736.4.camel@edumazet-laptop/T/
[2] https://github.com/systemd/systemd/issues/11935#issuecomment-618691018
[3] https://bugzilla.kernel.org/show_bug.cgi?id=210255
Signed-off-by: Richard Gobert <richardbgobert@gmail.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Link: https://lore.kernel.org/r/20220829111554.GA1771@debian
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
__mkroute_input() uses fib_validate_source() to trigger an icmp redirect.
My understanding is that fib_validate_source() is used to know if the src
address and the gateway address are on the same link. For that,
fib_validate_source() returns 1 (same link) or 0 (not the same network).
__mkroute_input() is the only user of these positive values, all other
callers only look if the returned value is negative.
Since the below patch, fib_validate_source() didn't return anymore 1 when
both addresses are on the same network, because the route lookup returns
RT_SCOPE_LINK instead of RT_SCOPE_HOST. But this is, in fact, right.
Let's adapat the test to return 1 again when both addresses are on the same
link.
CC: stable@vger.kernel.org
Fixes: 747c14307214 ("ip: fix dflt addr selection for connected nexthop")
Reported-by: kernel test robot <yujie.liu@intel.com>
Reported-by: Heng Qi <hengqi@linux.alibaba.com>
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Link: https://lore.kernel.org/r/20220829100121.3821-1-nicolas.dichtel@6wind.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Zhengchao Shao says:
====================
net: sched: remove unused variables
The variable "other" is unused, remove it.
====================
Link: https://lore.kernel.org/r/20220830092255.281330-1-shaozhengchao@huawei.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
The variable "other" in the struct red_stats is not used. Remove it.
Signed-off-by: Zhengchao Shao <shaozhengchao@huawei.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
The variable "other" in the struct choke_sched_data is not used. Remove it.
Signed-off-by: Zhengchao Shao <shaozhengchao@huawei.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
The RX and TX byte/packet statistics in this driver could be overflowed
relatively quickly on a 32-bit platform. Switch these stats to use the
u64_stats infrastructure to avoid this.
Signed-off-by: Robert Hancock <robert.hancock@calian.com>
Link: https://lore.kernel.org/r/20220829233901.3429419-1-robert.hancock@calian.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Functions that work on a pointer to virtual memory such as
virt_to_pfn() and users of that function such as
virt_to_page() are supposed to pass a pointer to virtual
memory, ideally a (void *) or other pointer. However since
many architectures implement virt_to_pfn() as a macro,
this function becomes polymorphic and accepts both a
(unsigned long) and a (void *).
If we instead implement a proper virt_to_pfn(void *addr)
function the following happens (occurred on arch/arm):
net/rds/message.c:357:56: warning: passing argument 1
of 'virt_to_pfn' makes pointer from integer without a
cast [-Wint-conversion]
Fix this with an explicit cast.
Cc: Santosh Shilimkar <santosh.shilimkar@oracle.com>
Cc: rds-devel@oss.oracle.com
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Link: https://lore.kernel.org/r/20220829132001.114858-1-linus.walleij@linaro.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Ensure the match happens in the right direction, previously the
destination used was the server, not the NAT host, as the comment
shows the code intended.
Additionally nf_nat_irc uses port 0 as a signal and there's no valid way
it can appear in a DCC message, so consider port 0 also forged.
Fixes: 869f37d8e48f ("[NETFILTER]: nf_conntrack/nf_nat: add IRC helper port")
Signed-off-by: David Leadbeater <dgl@dgl.cx>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
anon_vma->degree tracks the combined number of child anon_vmas and VMAs
that use the anon_vma as their ->anon_vma.
anon_vma_clone() then assumes that for any anon_vma attached to
src->anon_vma_chain other than src->anon_vma, it is impossible for it to
be a leaf node of the VMA tree, meaning that for such VMAs ->degree is
elevated by 1 because of a child anon_vma, meaning that if ->degree
equals 1 there are no VMAs that use the anon_vma as their ->anon_vma.
This assumption is wrong because the ->degree optimization leads to leaf
nodes being abandoned on anon_vma_clone() - an existing anon_vma is
reused and no new parent-child relationship is created. So it is
possible to reuse an anon_vma for one VMA while it is still tied to
another VMA.
This is an issue because is_mergeable_anon_vma() and its callers assume
that if two VMAs have the same ->anon_vma, the list of anon_vmas
attached to the VMAs is guaranteed to be the same. When this assumption
is violated, vma_merge() can merge pages into a VMA that is not attached
to the corresponding anon_vma, leading to dangling page->mapping
pointers that will be dereferenced during rmap walks.
Fix it by separately tracking the number of child anon_vmas and the
number of VMAs using the anon_vma as their ->anon_vma.
Fixes: 7a3ef208e662 ("mm: prevent endless growth of anon_vma hierarchy")
Cc: stable@kernel.org
Acked-by: Michal Hocko <mhocko@suse.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Jann Horn <jannh@google.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
The rtl8152 driver does not disable multicasting when userspace asks
it to. For example:
$ ifconfig eth0 -multicast -allmulti
$ tcpdump -p -i eth0 # will still capture multicast frames
Fix by clearing the device multicast filter table when multicast and
allmulti are both unset.
Tested as follows:
- Set multicast on eth0 network interface
- verify that multicast packets are coming in:
$ tcpdump -p -i eth0
- Clear multicast and allmulti on eth0 network interface
- verify that no more multicast packets are coming in:
$ tcpdump -p -i eth0
Signed-off-by: Sven van Ashbrook <svenva@chromium.org>
Acked-by: Hayes Wang <hayeswang@realtek.com>
Link: https://lore.kernel.org/r/20220830045923.net-next.v1.1.I4fee0ac057083d4f848caf0fa3a9fd466fc374a0@changeid
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
When the GSO splitting feature of sch_cake is enabled, GSO superpackets
will be broken up and the resulting segments enqueued in place of the
original skb. In this case, CAKE calls consume_skb() on the original skb,
but still returns NET_XMIT_SUCCESS. This can confuse parent qdiscs into
assuming the original skb still exists, when it really has been freed. Fix
this by adding the __NET_XMIT_STOLEN flag to the return value in this case.
Fixes: 0c850344d388 ("sch_cake: Conditionally split GSO segments")
Signed-off-by: Toke Høiland-Jørgensen <toke@toke.dk>
Reported-by: zdi-disclosures@trendmicro.com # ZDI-CAN-18231
Link: https://lore.kernel.org/r/20220831092103.442868-1-toke@toke.dk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Follow the advice of the below link and prefer 'strscpy' in this
subsystem. Conversion is 1:1 because the return value is not used.
Generated by a coccinelle script.
Link: https://lore.kernel.org/r/CAHk-=wgfRnXz0W3D37d01q3JFkr_i_uTL=V6A6G1oUZcprmknw@mail.gmail.com/
Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com>
Reviewed-by: Petr Machata <petrm@nvidia.com> # For drivers/net/ethernet/mellanox/mlxsw
Acked-by: Geoff Levand <geoff@infradead.org> # For ps3_gelic_net and spider_net_ethtool
Acked-by: Tom Lendacky <thomas.lendacky@amd.com> # For drivers/net/ethernet/amd/xgbe/xgbe-ethtool.c
Acked-by: Marcin Wojtas <mw@semihalf.com> # For drivers/net/ethernet/marvell/mvpp2
Reviewed-by: Leon Romanovsky <leonro@nvidia.com> # For drivers/net/ethernet/mellanox/mlx{4|5}
Reviewed-by: Shay Agroskin <shayagr@amazon.com> # For drivers/net/ethernet/amazon/ena
Acked-by: Krzysztof Hałasa <khalasa@piap.pl> # For IXP4xx Ethernet
Link: https://lore.kernel.org/r/20220830201457.7984-3-wsa+renesas@sang-engineering.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Follow the advice of the below link and prefer 'strscpy' in this
subsystem. Conversion is 1:1 because the return value is not used.
Generated by a coccinelle script.
Link: https://lore.kernel.org/r/CAHk-=wgfRnXz0W3D37d01q3JFkr_i_uTL=V6A6G1oUZcprmknw@mail.gmail.com/
Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com>
Acked-by: Marc Kleine-Budde <mkl@pengutronix.de> # for CAN
Link: https://lore.kernel.org/r/20220830201457.7984-1-wsa+renesas@sang-engineering.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Hou Tao says:
====================
From: Hou Tao <houtao1@huawei.com>
Hi,
The patchset aims to fix the issues found during investigating the
syzkaller problem reported in [0]. It seems that the concurrent updates
to the same hash-table bucket may fail as shown in patch 1.
Patch 1 uses preempt_disable() to fix the problem for
htab_use_raw_lock() case. For !htab_use_raw_lock() case, the problem is
left to "BPF specific memory allocator" patchset [1] in which
!htab_use_raw_lock() will be removed.
Patch 2 fixes the out-of-bound memory read problem reported in [0]. The
problem has the root cause as patch 1 and it is fixed by handling -EBUSY
from htab_lock_bucket() correctly.
Patch 3 add two cases for hash-table update: one for the reentrancy of
bpf_map_update_elem(), and another one for concurrent updates of the
same hash-table bucket.
Comments are always welcome.
Regards,
Tao
[0]: https://lore.kernel.org/bpf/CACkBjsbuxaR6cv0kXJoVnBfL9ZJXjjoUcMpw_Ogc313jSrg14A@mail.gmail.com/
[1]: https://lore.kernel.org/bpf/20220819214232.18784-1-alexei.starovoitov@gmail.com/
Change Log:
v4:
* rebased on bpf-next
* add htab_update to DENYLIST.s390x
v3: https://lore.kernel.org/bpf/20220829023709.1958204-1-houtao@huaweicloud.com/
* patch 1: update commit message and add Fixes tag
* patch 2: add Fixes tag
* patch 3: elaborate the description of test cases
v2: https://lore.kernel.org/bpf/bd60ef93-1c6a-2db2-557d-b09b92ad22bd@huaweicloud.com/
* Note the fix is for CONFIG_PREEMPT case in commit message and add
Reviewed-by tag for patch 1
* Drop patch "bpf: Allow normally concurrent map updates for !htab_use_raw_lock() case"
v1: https://lore.kernel.org/bpf/20220821033223.2598791-1-houtao@huaweicloud.com/
====================
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
|
|
One test demonstrates the reentrancy of hash map update on the same
bucket should fail, and another one shows concureently updates of
the same hash map bucket should succeed and not fail due to
the reentrancy checking for bucket lock.
There is no trampoline support on s390x, so move htab_update to
denylist.
Signed-off-by: Hou Tao <houtao1@huawei.com>
Link: https://lore.kernel.org/r/20220831042629.130006-4-houtao@huaweicloud.com
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
|
|
In __htab_map_lookup_and_delete_batch() if htab_lock_bucket() returns
-EBUSY, it will go to next bucket. Going to next bucket may not only
skip the elements in current bucket silently, but also incur
out-of-bound memory access or expose kernel memory to userspace if
current bucket_cnt is greater than bucket_size or zero.
Fixing it by stopping batch operation and returning -EBUSY when
htab_lock_bucket() fails, and the application can retry or skip the busy
batch as needed.
Fixes: 20b6cc34ea74 ("bpf: Avoid hashtab deadlock with map_locked")
Reported-by: Hao Sun <sunhao.th@gmail.com>
Signed-off-by: Hou Tao <houtao1@huawei.com>
Link: https://lore.kernel.org/r/20220831042629.130006-3-houtao@huaweicloud.com
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
|
|
Per-cpu htab->map_locked is used to prohibit the concurrent accesses
from both NMI and non-NMI contexts. But since commit 74d862b682f5
("sched: Make migrate_disable/enable() independent of RT"),
migrate_disable() is also preemptible under CONFIG_PREEMPT case, so now
map_locked also disallows concurrent updates from normal contexts
(e.g. userspace processes) unexpectedly as shown below:
process A process B
htab_map_update_elem()
htab_lock_bucket()
migrate_disable()
/* return 1 */
__this_cpu_inc_return()
/* preempted by B */
htab_map_update_elem()
/* the same bucket as A */
htab_lock_bucket()
migrate_disable()
/* return 2, so lock fails */
__this_cpu_inc_return()
return -EBUSY
A fix that seems feasible is using in_nmi() in htab_lock_bucket() and
only checking the value of map_locked for nmi context. But it will
re-introduce dead-lock on bucket lock if htab_lock_bucket() is re-entered
through non-tracing program (e.g. fentry program).
One cannot use preempt_disable() to fix this issue as htab_use_raw_lock
being false causes the bucket lock to be a spin lock which can sleep and
does not work with preempt_disable().
Therefore, use migrate_disable() when using the spinlock instead of
preempt_disable() and defer fixing concurrent updates to when the kernel
has its own BPF memory allocator.
Fixes: 74d862b682f5 ("sched: Make migrate_disable/enable() independent of RT")
Reviewed-by: Hao Luo <haoluo@google.com>
Signed-off-by: Hou Tao <houtao1@huawei.com>
Link: https://lore.kernel.org/r/20220831042629.130006-2-houtao@huaweicloud.com
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
|
|
Restrict the ucode loading check to avoid frontdoor loading error.
Signed-off-by: Chengming Gui <Jack.Gui@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
This patch adds a test to ensure bpf_setsockopt(TCP_CONGESTION, "not_exist")
will not trigger the kernel module autoload.
Before the fix:
[ 40.535829] BUG: sleeping function called from invalid context at include/linux/sched/mm.h:274
[...]
[ 40.552134] tcp_ca_find_autoload.constprop.0+0xcb/0x200
[ 40.552689] tcp_set_congestion_control+0x99/0x7b0
[ 40.553203] do_tcp_setsockopt+0x3ed/0x2240
[...]
[ 40.556041] __bpf_setsockopt+0x124/0x640
Signed-off-by: Martin KaFai Lau <martin.lau@linux.dev>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20220830231953.792412-1-martin.lau@linux.dev
|
|
When bpf prog changes tcp-cc by calling bpf_setsockopt(TCP_CONGESTION),
it should not try to load module which may be a blocking operation.
This details was correct in the v1 [0] but missed by mistake in the
later revision in commit cb388e7ee3a8 ("bpf: net: Change do_tcp_setsockopt()
to use the sockopt's lock_sock() and capable()"). This patch fixes it by
checking the has_current_bpf_ctx().
[0] https://lore.kernel.org/bpf/20220727060921.2373314-1-kafai@fb.com/
Fixes: cb388e7ee3a8 ("bpf: net: Change do_tcp_setsockopt() to use the sockopt's lock_sock() and capable()")
Signed-off-by: Martin KaFai Lau <martin.lau@linux.dev>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20220830231946.791504-1-martin.lau@linux.dev
|
|
This is the result of `sort tools/testing/selftests/net/.gitignore`, but
preserving the comment at the top.
Suggested-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Axel Rasmussen <axelrasmussen@google.com>
Link: https://lore.kernel.org/r/20220829184748.1535580-1-axelrasmussen@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Change occurrences of "it's" that are possessive to "its"
so that they don't read as "it is".
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Paolo Abeni <pabeni@redhat.com>
Cc: Jiri Pirko <jiri@nvidia.com>
Link: https://lore.kernel.org/r/20220829235414.17110-1-rdunlap@infradead.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Simplify the code by using the device-managed clock API.
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://lore.kernel.org/r/b222be68-ba7e-999d-0a07-eca0ecedf74e@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
strp_init() is called just a few lines above this csk->sk_user_data
check, it also initializes strp->work etc., therefore, it is
unnecessary to call strp_done() to cancel the freshly initialized
work.
And if sk_user_data is already used by KCM, psock->strp should not be
touched, particularly strp->work state, so we need to move strp_init()
after the csk->sk_user_data check.
This also makes a lockdep warning reported by syzbot go away.
Reported-and-tested-by: syzbot+9fc084a4348493ef65d2@syzkaller.appspotmail.com
Reported-by: syzbot+e696806ef96cdd2d87cd@syzkaller.appspotmail.com
Fixes: e5571240236c ("kcm: Check if sk_user_data already set in kcm_attach")
Fixes: dff8baa26117 ("kcm: Call strp_stop before strp_done in kcm_attach")
Cc: Tom Herbert <tom@herbertland.com>
Signed-off-by: Cong Wang <cong.wang@bytedance.com>
Link: https://lore.kernel.org/r/20220827181314.193710-1-xiyou.wangcong@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
This patch adds logic to compute the MDIO period based on
the i1clk, and thereafter write the MDIO period into the YU
MDIO config register. The i1clk resource from the ACPI table
is used to provide addressing to YU bootrecord PLL registers.
The values in these registers are used to compute MDIO period.
If the i1clk resource is not present in the ACPI table, then
the current default hardcorded value of 430Mhz is used.
The i1clk clock value of 430MHz is only accurate for boards
with BF2 mid bin and main bin SoCs. The BF2 high bin SoCs
have i1clk = 500MHz, but can support a slower MDIO period.
Fixes: f92e1869d74e ("Add Mellanox BlueField Gigabit Ethernet driver")
Reviewed-by: Asmaa Mnebhi <asmaa@nvidia.com>
Signed-off-by: David Thompson <davthompson@nvidia.com>
Link: https://lore.kernel.org/r/20220826155916.12491-1-davthompson@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
This reverts commit 35b0fac808b95eea1212f8860baf6ad25b88b087. Alexander
reports that it causes boot failures on i.MX8M Plus based boards
(specifically imx8mp-tqma8mpql-mba8mpxl.dts).
Reported-by: Alexander Stein <alexander.stein@ew.tq-group.com>
Cc: Chen-Yu Tsai <wenst@chromium.org>
Fixes: 35b0fac808b9 ("clk: core: Honor CLK_OPS_PARENT_ENABLE for clk gate ops")
Link: https://lore.kernel.org/r/12115951.O9o76ZdvQC@steina-w
Signed-off-by: Stephen Boyd <sboyd@kernel.org>
Link: https://lore.kernel.org/r/20220831175326.2523912-1-sboyd@kernel.org
|
|
The bpf_tail_call_static function is currently not defined unless
using clang >= 8.
To support bpf_tail_call_static on GCC we can check if __clang__ is
not defined to enable bpf_tail_call_static.
We need to use GCC assembly syntax when the compiler does not define
__clang__ as LLVM inline assembly is not fully compatible with GCC.
Signed-off-by: James Hilliard <james.hilliard1@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20220829210546.755377-1-james.hilliard1@gmail.com
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs
Pull fscache/cachefiles fixes from David Howells:
- Fix kdoc on fscache_use/unuse_cookie().
- Fix the error returned by cachefiles_ondemand_copen() from an upcall
result.
- Fix the distribution of requests in on-demand mode in cachefiles to
be fairer by cycling through them rather than picking the one with
the lowest ID each time (IDs being reused).
* tag 'fscache-fixes-20220831' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs:
cachefiles: make on-demand request distribution fairer
cachefiles: fix error return code in cachefiles_ondemand_copen()
fscache: fix misdocumented parameter
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/hid/hid
Pull HID fixes from Jiri Kosina:
- NULL pointer dereference fix for Steam driver (Lee Jones)
- memory leak fix for hidraw (Karthik Alapati)
- regression fix for functionality of some UCLogic tables (Benjamin
Tissoires)
- a few new device IDs and device-specific quirks
* tag 'for-linus-2022083101' of git://git.kernel.org/pub/scm/linux/kernel/git/hid/hid:
HID: nintendo: fix rumble worker null pointer deref
HID: intel-ish-hid: ipc: Add Meteor Lake PCI device ID
HID: input: fix uclogic tablets
HID: Add Apple Touchbar on T2 Macs in hid_have_special_driver list
HID: add Lenovo Yoga C630 battery quirk
HID: AMD_SFH: Add a DMI quirk entry for Chromebooks
HID: thrustmaster: Add sparco wheel and fix array length
hid: intel-ish-hid: ishtp: Fix ishtp client sending disordered message
HID: ishtp-hid-clientHID: ishtp-hid-client: Fix comment typo
HID: asus: ROG NKey: Ignore portion of 0x5a report
HID: hidraw: fix memory leak in hidraw_release()
HID: steam: Prevent NULL pointer dereference in steam_{recv,send}_report
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6
Pull crypto fix from Herbert Xu:
"Fix a boot performance regression due to an unnecessary dependency on
XOR_BLOCKS"
* tag 'v6.0-p2' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6:
crypto: lib - remove unneeded selection of XOR_BLOCKS
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/lsm
Pull LSM support for IORING_OP_URING_CMD from Paul Moore:
"Add SELinux and Smack controls to the io_uring IORING_OP_URING_CMD.
These are necessary as without them the IORING_OP_URING_CMD remains
outside the purview of the LSMs (Luis' LSM patch, Casey's Smack patch,
and my SELinux patch). They have been discussed at length with the
io_uring folks, and Jens has given his thumbs-up on the relevant
patches (see the commit descriptions).
There is one patch that is not strictly necessary, but it makes
testing much easier and is very trivial: the /dev/null
IORING_OP_URING_CMD patch."
* tag 'lsm-pr-20220829' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/lsm:
Smack: Provide read control for io_uring_cmd
/dev/null: add IORING_OP_URING_CMD support
selinux: implement the security_uring_cmd() LSM hook
lsm,io_uring: add LSM hooks for the new uring_cmd file op
|
|
By using 16-bit I/O on the GPIO peripheral, which is apparently not safe
on MIPS, the IMR can end up containing garbage. This then results in
interrupt triggers for lines that don't have an interrupt handler
associated. The irq_desc lookup fails, and the ISR will not be cleared,
keeping the CPU busy until reboot, or until another IMR operation
restores the correct value. This situation appears to happen very
rarely, for < 0.5% of IMR writes.
Instead of using 8-bit or 16-bit I/O operations on the 32-bit memory
mapped peripheral registers, switch to using 32-bit I/O only, operating
on the entire bank for all single bit line settings. For 2-bit line
settings, with 16-bit port values, stick to manual (un)packing.
This issue has been seen on RTL8382M (HPE 1920-16G), RTL8391M (Netgear
GS728TP v2), and RTL8393M (D-Link DGS-1210-52 F3, Zyxel GS1900-48).
Reported-by: Luiz Angelo Daros de Luca <luizluca@gmail.com> # DGS-1210-52
Reported-by: Birger Koblitz <mail@birger-koblitz.de> # GS728TP
Reported-by: Jan Hoffmann <jan@3e8.eu> # 1920-16G
Fixes: 0d82fb1127fb ("gpio: Add Realtek Otto GPIO support")
Signed-off-by: Sander Vanheule <sander@svanheule.net>
Cc: Paul Cercueil <paul@crapouillou.net>
Reviewed-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: Bartosz Golaszewski <brgl@bgdev.pl>
|
|
For now, enqueuing and dequeuing on-demand requests all start from
idx 0, this makes request distribution unfair. In the weighty
concurrent I/O scenario, the request stored in higher idx will starve.
Searching requests cyclically in cachefiles_ondemand_daemon_read,
makes distribution fairer.
Fixes: c8383054506c ("cachefiles: notify the user daemon when looking up cookie")
Reported-by: Yongqing Li <liyongqing@bytedance.com>
Signed-off-by: Xin Yin <yinxin.x@bytedance.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: Jeffle Xu <jefflexu@linux.alibaba.com>
Reviewed-by: Gao Xiang <hsiangkao@linux.alibaba.com>
Link: https://lore.kernel.org/r/20220817065200.11543-1-yinxin.x@bytedance.com/ # v1
Link: https://lore.kernel.org/r/20220825020945.2293-1-yinxin.x@bytedance.com/ # v2
|