Age | Commit message (Collapse) | Author |
|
IP tunnels can apparently update dev->needed_headroom
in their xmit path.
This patch takes care of three tunnels xmit, and also the
core LL_RESERVED_SPACE() and LL_RESERVED_SPACE_EXTRA()
helpers.
More changes might be needed for completeness.
BUG: KCSAN: data-race in ip_tunnel_xmit / ip_tunnel_xmit
read to 0xffff88815b9da0ec of 2 bytes by task 888 on cpu 1:
ip_tunnel_xmit+0x1270/0x1730 net/ipv4/ip_tunnel.c:803
__gre_xmit net/ipv4/ip_gre.c:469 [inline]
ipgre_xmit+0x516/0x570 net/ipv4/ip_gre.c:661
__netdev_start_xmit include/linux/netdevice.h:4881 [inline]
netdev_start_xmit include/linux/netdevice.h:4895 [inline]
xmit_one net/core/dev.c:3580 [inline]
dev_hard_start_xmit+0x127/0x400 net/core/dev.c:3596
__dev_queue_xmit+0x1007/0x1eb0 net/core/dev.c:4246
dev_queue_xmit include/linux/netdevice.h:3051 [inline]
neigh_direct_output+0x17/0x20 net/core/neighbour.c:1623
neigh_output include/net/neighbour.h:546 [inline]
ip_finish_output2+0x740/0x840 net/ipv4/ip_output.c:228
ip_finish_output+0xf4/0x240 net/ipv4/ip_output.c:316
NF_HOOK_COND include/linux/netfilter.h:291 [inline]
ip_output+0xe5/0x1b0 net/ipv4/ip_output.c:430
dst_output include/net/dst.h:444 [inline]
ip_local_out+0x64/0x80 net/ipv4/ip_output.c:126
iptunnel_xmit+0x34a/0x4b0 net/ipv4/ip_tunnel_core.c:82
ip_tunnel_xmit+0x1451/0x1730 net/ipv4/ip_tunnel.c:813
__gre_xmit net/ipv4/ip_gre.c:469 [inline]
ipgre_xmit+0x516/0x570 net/ipv4/ip_gre.c:661
__netdev_start_xmit include/linux/netdevice.h:4881 [inline]
netdev_start_xmit include/linux/netdevice.h:4895 [inline]
xmit_one net/core/dev.c:3580 [inline]
dev_hard_start_xmit+0x127/0x400 net/core/dev.c:3596
__dev_queue_xmit+0x1007/0x1eb0 net/core/dev.c:4246
dev_queue_xmit include/linux/netdevice.h:3051 [inline]
neigh_direct_output+0x17/0x20 net/core/neighbour.c:1623
neigh_output include/net/neighbour.h:546 [inline]
ip_finish_output2+0x740/0x840 net/ipv4/ip_output.c:228
ip_finish_output+0xf4/0x240 net/ipv4/ip_output.c:316
NF_HOOK_COND include/linux/netfilter.h:291 [inline]
ip_output+0xe5/0x1b0 net/ipv4/ip_output.c:430
dst_output include/net/dst.h:444 [inline]
ip_local_out+0x64/0x80 net/ipv4/ip_output.c:126
iptunnel_xmit+0x34a/0x4b0 net/ipv4/ip_tunnel_core.c:82
ip_tunnel_xmit+0x1451/0x1730 net/ipv4/ip_tunnel.c:813
__gre_xmit net/ipv4/ip_gre.c:469 [inline]
ipgre_xmit+0x516/0x570 net/ipv4/ip_gre.c:661
__netdev_start_xmit include/linux/netdevice.h:4881 [inline]
netdev_start_xmit include/linux/netdevice.h:4895 [inline]
xmit_one net/core/dev.c:3580 [inline]
dev_hard_start_xmit+0x127/0x400 net/core/dev.c:3596
__dev_queue_xmit+0x1007/0x1eb0 net/core/dev.c:4246
dev_queue_xmit include/linux/netdevice.h:3051 [inline]
neigh_direct_output+0x17/0x20 net/core/neighbour.c:1623
neigh_output include/net/neighbour.h:546 [inline]
ip_finish_output2+0x740/0x840 net/ipv4/ip_output.c:228
ip_finish_output+0xf4/0x240 net/ipv4/ip_output.c:316
NF_HOOK_COND include/linux/netfilter.h:291 [inline]
ip_output+0xe5/0x1b0 net/ipv4/ip_output.c:430
dst_output include/net/dst.h:444 [inline]
ip_local_out+0x64/0x80 net/ipv4/ip_output.c:126
iptunnel_xmit+0x34a/0x4b0 net/ipv4/ip_tunnel_core.c:82
ip_tunnel_xmit+0x1451/0x1730 net/ipv4/ip_tunnel.c:813
__gre_xmit net/ipv4/ip_gre.c:469 [inline]
ipgre_xmit+0x516/0x570 net/ipv4/ip_gre.c:661
__netdev_start_xmit include/linux/netdevice.h:4881 [inline]
netdev_start_xmit include/linux/netdevice.h:4895 [inline]
xmit_one net/core/dev.c:3580 [inline]
dev_hard_start_xmit+0x127/0x400 net/core/dev.c:3596
__dev_queue_xmit+0x1007/0x1eb0 net/core/dev.c:4246
dev_queue_xmit include/linux/netdevice.h:3051 [inline]
neigh_direct_output+0x17/0x20 net/core/neighbour.c:1623
neigh_output include/net/neighbour.h:546 [inline]
ip_finish_output2+0x740/0x840 net/ipv4/ip_output.c:228
ip_finish_output+0xf4/0x240 net/ipv4/ip_output.c:316
NF_HOOK_COND include/linux/netfilter.h:291 [inline]
ip_output+0xe5/0x1b0 net/ipv4/ip_output.c:430
dst_output include/net/dst.h:444 [inline]
ip_local_out+0x64/0x80 net/ipv4/ip_output.c:126
iptunnel_xmit+0x34a/0x4b0 net/ipv4/ip_tunnel_core.c:82
ip_tunnel_xmit+0x1451/0x1730 net/ipv4/ip_tunnel.c:813
__gre_xmit net/ipv4/ip_gre.c:469 [inline]
ipgre_xmit+0x516/0x570 net/ipv4/ip_gre.c:661
__netdev_start_xmit include/linux/netdevice.h:4881 [inline]
netdev_start_xmit include/linux/netdevice.h:4895 [inline]
xmit_one net/core/dev.c:3580 [inline]
dev_hard_start_xmit+0x127/0x400 net/core/dev.c:3596
__dev_queue_xmit+0x1007/0x1eb0 net/core/dev.c:4246
dev_queue_xmit include/linux/netdevice.h:3051 [inline]
neigh_direct_output+0x17/0x20 net/core/neighbour.c:1623
neigh_output include/net/neighbour.h:546 [inline]
ip_finish_output2+0x740/0x840 net/ipv4/ip_output.c:228
ip_finish_output+0xf4/0x240 net/ipv4/ip_output.c:316
NF_HOOK_COND include/linux/netfilter.h:291 [inline]
ip_output+0xe5/0x1b0 net/ipv4/ip_output.c:430
dst_output include/net/dst.h:444 [inline]
ip_local_out+0x64/0x80 net/ipv4/ip_output.c:126
iptunnel_xmit+0x34a/0x4b0 net/ipv4/ip_tunnel_core.c:82
ip_tunnel_xmit+0x1451/0x1730 net/ipv4/ip_tunnel.c:813
__gre_xmit net/ipv4/ip_gre.c:469 [inline]
ipgre_xmit+0x516/0x570 net/ipv4/ip_gre.c:661
__netdev_start_xmit include/linux/netdevice.h:4881 [inline]
netdev_start_xmit include/linux/netdevice.h:4895 [inline]
xmit_one net/core/dev.c:3580 [inline]
dev_hard_start_xmit+0x127/0x400 net/core/dev.c:3596
__dev_queue_xmit+0x1007/0x1eb0 net/core/dev.c:4246
dev_queue_xmit include/linux/netdevice.h:3051 [inline]
neigh_direct_output+0x17/0x20 net/core/neighbour.c:1623
neigh_output include/net/neighbour.h:546 [inline]
ip_finish_output2+0x740/0x840 net/ipv4/ip_output.c:228
ip_finish_output+0xf4/0x240 net/ipv4/ip_output.c:316
NF_HOOK_COND include/linux/netfilter.h:291 [inline]
ip_output+0xe5/0x1b0 net/ipv4/ip_output.c:430
dst_output include/net/dst.h:444 [inline]
ip_local_out+0x64/0x80 net/ipv4/ip_output.c:126
iptunnel_xmit+0x34a/0x4b0 net/ipv4/ip_tunnel_core.c:82
ip_tunnel_xmit+0x1451/0x1730 net/ipv4/ip_tunnel.c:813
__gre_xmit net/ipv4/ip_gre.c:469 [inline]
ipgre_xmit+0x516/0x570 net/ipv4/ip_gre.c:661
__netdev_start_xmit include/linux/netdevice.h:4881 [inline]
netdev_start_xmit include/linux/netdevice.h:4895 [inline]
xmit_one net/core/dev.c:3580 [inline]
dev_hard_start_xmit+0x127/0x400 net/core/dev.c:3596
__dev_queue_xmit+0x1007/0x1eb0 net/core/dev.c:4246
write to 0xffff88815b9da0ec of 2 bytes by task 2379 on cpu 0:
ip_tunnel_xmit+0x1294/0x1730 net/ipv4/ip_tunnel.c:804
__gre_xmit net/ipv4/ip_gre.c:469 [inline]
ipgre_xmit+0x516/0x570 net/ipv4/ip_gre.c:661
__netdev_start_xmit include/linux/netdevice.h:4881 [inline]
netdev_start_xmit include/linux/netdevice.h:4895 [inline]
xmit_one net/core/dev.c:3580 [inline]
dev_hard_start_xmit+0x127/0x400 net/core/dev.c:3596
__dev_queue_xmit+0x1007/0x1eb0 net/core/dev.c:4246
dev_queue_xmit include/linux/netdevice.h:3051 [inline]
neigh_direct_output+0x17/0x20 net/core/neighbour.c:1623
neigh_output include/net/neighbour.h:546 [inline]
ip6_finish_output2+0x9bc/0xc50 net/ipv6/ip6_output.c:134
__ip6_finish_output net/ipv6/ip6_output.c:195 [inline]
ip6_finish_output+0x39a/0x4e0 net/ipv6/ip6_output.c:206
NF_HOOK_COND include/linux/netfilter.h:291 [inline]
ip6_output+0xeb/0x220 net/ipv6/ip6_output.c:227
dst_output include/net/dst.h:444 [inline]
NF_HOOK include/linux/netfilter.h:302 [inline]
mld_sendpack+0x438/0x6a0 net/ipv6/mcast.c:1820
mld_send_cr net/ipv6/mcast.c:2121 [inline]
mld_ifc_work+0x519/0x7b0 net/ipv6/mcast.c:2653
process_one_work+0x3e6/0x750 kernel/workqueue.c:2390
worker_thread+0x5f2/0xa10 kernel/workqueue.c:2537
kthread+0x1ac/0x1e0 kernel/kthread.c:376
ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:308
value changed: 0x0dd4 -> 0x0e14
Reported by Kernel Concurrency Sanitizer on:
CPU: 0 PID: 2379 Comm: kworker/0:0 Not tainted 6.3.0-rc1-syzkaller-00002-g8ca09d5fa354-dirty #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 03/02/2023
Workqueue: mld mld_ifc_work
Fixes: 8eb30be0352d ("ipv6: Create ip6_tnl_xmit")
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Link: https://lore.kernel.org/r/20230310191109.2384387-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace
Pull tracing fixes from Steven Rostedt:
- Do not allow histogram values to have modifies. They can cause a NULL
pointer dereference if they do.
- Warn if hist_field_name() is passed a NULL. Prevent the NULL pointer
dereference mentioned above.
- Fix invalid address look up race in lookup_rec()
- Define ftrace_stub_graph conditionally to prevent linker errors
- Always check if RCU is watching at all tracepoint locations
* tag 'trace-v6.3-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
tracing: Make tracepoint lockdep check actually test something
ftrace,kcfi: Define ftrace_stub_graph conditionally
ftrace: Fix invalid address access in lookup_rec() when index is 0
tracing: Check field value in hist_field_name()
tracing: Do not let histogram values have some modifiers
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux
Pull clk fixes from Stephen Boyd:
"A collection of clk driver fixes, and a couple OF clk patches to fix
regressions seen in the last few weeks. The fwnode patch broke the
build for one driver that isn't always compiled, so I waited over the
weekend to be certain no more build issues came up.
- Mark the firmware node (fwnode) that matches the compatible in
CLK_OF_DECLARE() as initialized to fix a regression on u8500 SoCs
after fw_devlink stopped checking parent nodes in
of_link_to_phandle()
- Remove a couple MODULE_LICENSE macros in non-modules
- Update the maintainers file for Microchip clk drivers
- Use 'select' instead of 'depend on' for the REGMAP config to fix
Kconfig issues
- Use div_u64() for portable 64-bit division in K210 clk driver"
* tag 'clk-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux:
clk: Avoid invalid function names in CLK_OF_DECLARE()
clk: k210: remove an implicit 64-bit division
MAINTAINERS: add missing clock driver coverage for Microchip FPGAs
clk: HI655X: select REGMAP instead of depending on it
kbuild, clk: remove MODULE_LICENSE in non-modules
kbuild, clk: bcm2835: remove MODULE_LICENSE in non-modules
clk: Mark a fwnode as initialized when using CLK_OF_DECLARE() macro
|
|
A while ago where the trace events had the following:
rcu_read_lock_sched_notrace();
rcu_dereference_sched(...);
rcu_read_unlock_sched_notrace();
If the tracepoint is enabled, it could trigger RCU issues if called in
the wrong place. And this warning was only triggered if lockdep was
enabled. If the tracepoint was never enabled with lockdep, the bug would
not be caught. To handle this, the above sequence was done when lockdep
was enabled regardless if the tracepoint was enabled or not (although the
always enabled code really didn't do anything, it would still trigger a
warning).
But a lot has changed since that lockdep code was added. One is, that
sequence no longer triggers any warning. Another is, the tracepoint when
enabled doesn't even do that sequence anymore.
The main check we care about today is whether RCU is "watching" or not.
So if lockdep is enabled, always check if rcu_is_watching() which will
trigger a warning if it is not (tracepoints require RCU to be watching).
Note, that old sequence did add a bit of overhead when lockdep was enabled,
and with the latest kernel updates, would cause the system to slow down
enough to trigger kernel "stalled" warnings.
Link: http://lore.kernel.org/lkml/20140806181801.GA4605@redhat.com
Link: http://lore.kernel.org/lkml/20140807175204.C257CAC5@viggo.jf.intel.com
Link: https://lore.kernel.org/lkml/20230307184645.521db5c9@gandalf.local.home/
Link: https://lore.kernel.org/linux-trace-kernel/20230310172856.77406446@gandalf.local.home
Cc: stable@vger.kernel.org
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: "Paul E. McKenney" <paulmck@kernel.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Joel Fernandes <joel@joelfernandes.org>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Paul E. McKenney <paulmck@kernel.org>
Fixes: e6753f23d961 ("tracepoint: Make rcuidle tracepoint callers use SRCU")
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
|
|
Commit 26fed4ac4eab ("block: flush plug based on hardware and software
queue order") changed flushing of plug list to submit requests one
device at a time. However while doing that it also started using
list_add_tail() instead of list_add() used previously thus effectively
submitting requests in reverse order. Also when forming a rq_list with
remaining requests (in case two or more devices are used), we
effectively reverse the ordering of the plug list for each device we
process. Submitting requests in reverse order has negative impact on
performance for rotational disks (when BFQ is not in use). We observe
10-25% regression in random 4k write throughput, as well as ~20%
regression in MariaDB OLTP benchmark on rotational storage on btrfs
filesystem.
Fix the problem by preserving ordering of the plug list when inserting
requests into the queuelist as well as by appending to requeue_list
instead of prepending to it.
Fixes: 26fed4ac4eab ("block: flush plug based on hardware and software queue order")
Signed-off-by: Jan Kara <jack@suse.cz>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20230313093002.11756-1-jack@suse.cz
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
A new platform-op was added to Xen to allow obtaining the same VGA
console information PV Dom0 is handed. Invoke the new function and have
the output data processed by xen_init_vga().
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Juergen Gross <jgross@suse.com>
Link: https://lore.kernel.org/r/8f315e92-7bda-c124-71cc-478ab9c5e610@suse.com
Signed-off-by: Juergen Gross <jgross@suse.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD
KVM/arm64 fixes for 6.3, part #1
A single patch to address a rather annoying bug w.r.t. guest timer
offsetting. Effectively the synchronization of timer offsets between
vCPUs was broken, leading to inconsistent timer reads within the VM.
|
|
Virtio spec introduced a feature VIRTIO_NET_F_GUEST_HDRLEN which when
set implicates that device benefits from knowing the exact size
of the header. For compatibility, to signal to the device that
the header is reliable driver also needs to set this feature.
Without this feature set by driver, device has to figure
out the header size itself.
Quoting the original virtio spec:
"hdr_len is a hint to the device as to how much of the header needs to
be kept to copy into each packet"
"a hint" might not be clear for the reader what does it mean, if it is
"maybe like that" of "exactly like that". This feature just makes it
crystal clear and let the device count on the hdr_len being filled up
by the exact length of header.
Also note the spec already has following note about hdr_len:
"Due to various bugs in implementations, this field is not useful
as a guarantee of the transport header size."
Without this feature the device needs to parse the header in core
data path handling. Accurate information helps the device to eliminate
such header parsing and directly use the hardware accelerators
for GSO operation.
virtio_net_hdr_from_skb() fills up hdr_len to skb_headlen(skb).
The driver already complies to fill the correct value. Introduce the
feature and advertise it.
Note that virtio spec also includes following note for device
implementation:
"Caution should be taken by the implementation so as to prevent
a malicious driver from attacking the device by setting
an incorrect hdr_len."
There is a plan to support this feature in our emulated device.
A device of SolidRun offers this feature bit. They claim this feature
will save the device a few cycles for every GSO packet.
Link: https://docs.oasis-open.org/virtio/virtio/v1.2/cs01/virtio-v1.2-cs01.html#x1-230006x3
Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Reviewed-by: Parav Pandit <parav@nvidia.com>
Reviewed-by: Alvaro Karsz <alvaro.karsz@solid-run.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Willem de Bruijn <willemb@google.com>
Link: https://lore.kernel.org/r/20230309094559.917857-1-jiri@resnulli.us
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Backmerging to get latest upstream.
Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de>
|
|
On s390 PCI functions may be hotplugged individually even when they
belong to a multi-function device. In particular on an SR-IOV device VFs
may be removed and later re-added.
In commit a50297cf8235 ("s390/pci: separate zbus creation from
scanning") it was missed however that struct pci_bus and struct
zpci_bus's resource list retained a reference to the PCI functions MMIO
resources even though those resources are released and freed on
hot-unplug. These stale resources may subsequently be claimed when the
PCI function re-appears resulting in use-after-free.
One idea of fixing this use-after-free in s390 specific code that was
investigated was to simply keep resources around from the moment a PCI
function first appeared until the whole virtual PCI bus created for
a multi-function device disappears. The problem with this however is
that due to the requirement of artificial MMIO addreesses (address
cookies) extra logic is then needed to keep the address cookies
compatible on re-plug. At the same time the MMIO resources semantically
belong to the PCI function so tying their lifecycle to the function
seems more logical.
Instead a simpler approach is to remove the resources of an individually
hot-unplugged PCI function from the PCI bus's resource list while
keeping the resources of other PCI functions on the PCI bus untouched.
This is done by introducing pci_bus_remove_resource() to remove an
individual resource. Similarly the resource also needs to be removed
from the struct zpci_bus's resource list. It turns out however, that
there is really no need to add the MMIO resources to the struct
zpci_bus's resource list at all and instead we can simply use the
zpci_bar_struct's resource pointer directly.
Fixes: a50297cf8235 ("s390/pci: separate zbus creation from scanning")
Signed-off-by: Niklas Schnelle <schnelle@linux.ibm.com>
Reviewed-by: Matthew Rosato <mjrosato@linux.ibm.com>
Acked-by: Bjorn Helgaas <bhelgaas@google.com>
Link: https://lore.kernel.org/r/20230306151014.60913-2-schnelle@linux.ibm.com
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4
Pull ext4 fixes from Ted Ts'o:
"Bug fixes and regressions for ext4, the most serious of which is a
potential deadlock during directory renames that was introduced during
the merge window discovered by a combination of syzbot and lockdep"
* tag 'ext4_for_linus_stable' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4:
ext4: zero i_disksize when initializing the bootloader inode
ext4: make sure fs error flag setted before clear journal error
ext4: commit super block if fs record error when journal record without error
ext4, jbd2: add an optimized bmap for the journal inode
ext4: fix WARNING in ext4_update_inline_data
ext4: move where set the MAY_INLINE_DATA flag is set
ext4: Fix deadlock during directory rename
ext4: Fix comment about the 64BIT feature
docs: ext4: modify the group desc size to 64
ext4: fix another off-by-one fsmap error on 1k block filesystems
ext4: fix RENAME_WHITEOUT handling for inline directories
ext4: make kobj_type structures constant
ext4: fix cgroup writeback accounting with fs-layer encryption
|
|
The cpumask_check() was unnecessarily tight, and causes problems for the
users of cpumask_next().
We have a number of users that take the previous return value of one of
the bit scanning functions and subtract one to keep it in "range". But
since the scanning functions end up returning up to 'small_cpumask_bits'
instead of the tighter 'nr_cpumask_bits', the range really needs to be
using that widened form.
[ This "previous-1" behavior is also the reason we have all those
comments about /* -1 is a legal arg here. */ and separate checks for
that being ok. So we could have just made "small_cpumask_bits-1"
be a similar special "don't check this" value.
Tetsuo Handa even suggested a patch that only does that for
cpumask_next(), since that seems to be the only actual case that
triggers, but that all makes it even _more_ magical and special. So
just relax the check ]
One example of this kind of pattern being the 'c_start()' function in
arch/x86/kernel/cpu/proc.c, but also duplicated in various forms on
other architectures.
Reported-by: syzbot+96cae094d90877641f32@syzkaller.appspotmail.com
Link: https://syzkaller.appspot.com/bug?extid=96cae094d90877641f32
Reported-by: Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp>
Link: https://lore.kernel.org/lkml/c1f4cc16-feea-b83c-82cf-1a1f007b7eb9@I-love.SAKURA.ne.jp/
Fixes: 596ff4a09b89 ("cpumask: re-introduce constant-sized cpumask optimizations")
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux
Pull i2c updates from Wolfram Sang:
"This marks the end of a transition to let I2C have the same probe
semantics as other subsystems. Uwe took care that no drivers in the
current tree nor in -next use the deprecated .probe call. So, it is a
good time to switch to the new, standard semantics now.
There is also a regression fix:
- regression fix for the notifier handling of the I2C core
- final coversions of drivers away from deprecated .probe
- make .probe_new the standard probe and convert I2C core to use it
* tag 'i2c-for-6.3-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux:
i2c: dev: Fix bus callback return values
i2c: Convert drivers to new .probe() callback
i2c: mux: Convert all drivers to new .probe() callback
i2c: Switch .probe() to not take an id parameter
media: i2c: ov2685: convert to i2c's .probe_new()
media: i2c: ov5695: convert to i2c's .probe_new()
w1: ds2482: Convert to i2c's .probe_new()
serial: sc16is7xx: Convert to i2c's .probe_new()
mtd: maps: pismo: Convert to i2c's .probe_new()
misc: ad525x_dpot-i2c: Convert to i2c's .probe_new()
|
|
Having a per-vcpu virtual offset is a pain. It needs to be synchronized
on each update, and expands badly to a setup where different timers can
have different offsets, or have composite offsets (as with NV).
So let's start by replacing the use of the CNTVOFF_EL2 shadow register
(which we want to reclaim for NV anyway), and make the virtual timer
carry a pointer to a VM-wide offset.
This simplifies the code significantly. It also addresses two terrible bugs:
- The use of CNTVOFF_EL2 leads to some nice offset corruption
when the sysreg gets reset, as reported by Joey.
- The kvm mutex is taken from a vcpu ioctl, which goes against
the locking rules...
Reported-by: Joey Gouly <joey.gouly@arm.com>
Reviewed-by: Reiji Watanabe <reijiw@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20230224173915.GA17407@e124191.cambridge.arm.com
Tested-by: Joey Gouly <joey.gouly@arm.com>
Link: https://lore.kernel.org/r/20230224191640.3396734-1-maz@kernel.org
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
|
|
The generic bmap() function exported by the VFS takes locks and does
checks that are not necessary for the journal inode. So allow the
file system to set a journal-optimized bmap function in
journal->j_bmap.
Reported-by: syzbot+9543479984ae9e576000@syzkaller.appspotmail.com
Link: https://syzkaller.appspot.com/bug?id=e4aaa78795e490421c79f76ec3679006c8ff4cf0
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
|
Introduce xdp_set_features_flag utility routine in order to update
dynamically xdp_features according to the dynamic hw configuration via
ethtool (e.g. changing number of hw rx/tx queues).
Add xdp_clear_features_flag() in order to clear all xdp_feature flag.
Reviewed-by: Shay Agroskin <shayagr@amazon.com>
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull put_and_unmap_page() helper from Al Viro:
"kmap_local_page() conversions in local filesystems keep running into
kunmap_local_page()+put_page() combinations. We can keep inventing
names for identical inline helpers, but it's getting rather
inconvenient. I've added a trivial helper to linux/highmem.h instead.
I would've held that back until the merge window, if not for the mess
it causes in tree topology - I've several branches merging from that
one, and it's only going to get worse if e.g. ext2 stuff gets picked
by Jan"
* tag 'pull-highmem' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
new helper: put_and_unmap_page()
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless-next
Johannes Berg says:
====================
wireless-next patches for 6.4
Major changes:
cfg80211
* 6 GHz improvements
* HW timestamping support
* support for randomized auth/deauth TA for PASN privacy
(also for mac80211)
mac80211
* radiotap TLV and EHT support for the iwlwifi sniffer
* HW timestamping support
* per-link debugfs for multi-link
brcmfmac
* support for Apple (M1 Pro/Max) devices
iwlwifi
* support for a few new devices
* EHT sniffer support
rtw88
* better support for some SDIO devices
(e.g. MAC address from efuse)
rtw89
* HW scan support for 8852b
* better support for 6 GHz scanning
* tag 'wireless-next-2023-03-10' of git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless-next: (84 commits)
wifi: iwlwifi: mvm: fix EOF bit reporting
wifi: iwlwifi: Do not include radiotap EHT user info if not needed
wifi: iwlwifi: mvm: add EHT RU allocation to radiotap
wifi: iwlwifi: Update logs for yoyo reset sw changes
wifi: iwlwifi: mvm: clean up duplicated defines
wifi: iwlwifi: rs-fw: break out for unsupported bandwidth
wifi: iwlwifi: Add support for B step of BnJ-Fm4
wifi: iwlwifi: mvm: make flush code a bit clearer
wifi: iwlwifi: mvm: avoid UB shift of snif_queue
wifi: iwlwifi: mvm: add primary 80 known for EHT radiotap
wifi: iwlwifi: mvm: parse FW frame metadata for EHT sniffer mode
wifi: iwlwifi: mvm: decode USIG_B1_B7 RU to nl80211 RU width
wifi: iwlwifi: mvm: rename define to generic name
wifi: iwlwifi: mvm: allow Microsoft to use TAS
wifi: iwlwifi: mvm: add all EHT based on data0 info from HW
wifi: iwlwifi: mvm: add EHT radiotap info based on rate_n_flags
wifi: iwlwifi: mvm: add an helper function radiotap TLVs
wifi: radiotap: separate vendor TLV into header/content
wifi: iwlwifi: reduce verbosity of some logging events
wifi: iwlwifi: Adding the code to get RF name for MsP device
...
====================
Link: https://lore.kernel.org/r/20230310120159.36518-1-johannes@sipsolutions.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux
Pull btrfs fixes from David Sterba:
"First batch of fixes. Among them there are two updates to sysfs and
ioctl which are not strictly fixes but are used for testing so there's
no reason to delay them.
- fix block group item corruption after inserting new block group
- fix extent map logging bit not cleared for split maps after
dropping range
- fix calculation of unusable block group space reporting bogus
values due to 32/64b division
- fix unnecessary increment of read error stat on write error
- improve error handling in inode update
- export per-device fsid in DEV_INFO ioctl to distinguish seeding
devices, needed for testing
- allocator size classes:
- fix potential dead lock in size class loading logic
- print sysfs stats for the allocation classes"
* tag 'for-6.3-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
btrfs: fix block group item corruption after inserting new block group
btrfs: fix extent map logging bit not cleared for split maps after dropping range
btrfs: fix percent calculation for bg reclaim message
btrfs: fix unnecessary increment of read error stat on write error
btrfs: handle btrfs_del_item errors in __btrfs_update_delayed_inode
btrfs: ioctl: return device fsid from DEV_INFO ioctl
btrfs: fix potential dead lock in size class loading logic
btrfs: sysfs: add size class stats
|
|
The recent fix for the deferred I/O by the commit
3efc61d95259 ("fbdev: Fix invalid page access after closing deferred I/O devices")
caused a regression when the same fb device is opened/closed while
it's being used. It resulted in a frozen screen even if something
is redrawn there after the close. The breakage is because the patch
was made under a wrong assumption of a single open; in the current
code, fb_deferred_io_release() cleans up the page mapping of the
pageref list and it calls cancel_delayed_work_sync() unconditionally,
where both are no correct behavior for multiple opens.
This patch adds a refcount for the opens of the device, and applies
the cleanup only when all files get closed.
As both fb_deferred_io_open() and _close() are called always in the
fb_info lock (mutex), it's safe to use the normal int for the
refcounting.
Also, a useless BUG_ON() is dropped.
Fixes: 3efc61d95259 ("fbdev: Fix invalid page access after closing deferred I/O devices")
Cc: <stable@vger.kernel.org>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Reviewed-by: Patrik Jakobsson <patrik.r.jakobsson@gmail.com>
Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de>
Link: https://patchwork.freedesktop.org/patch/msgid/20230308105012.1845-1-tiwai@suse.de
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf-next
Florian Westphal says:
====================
Netfilter updates for net-next
1. nf_tables 'brouting' support, from Sriram Yagnaraman.
2. Update bridge netfilter and ovs conntrack helpers to handle
IPv6 Jumbo packets properly, i.e. fetch the packet length
from hop-by-hop extension header, from Xin Long.
This comes with a test BIG TCP test case, added to
tools/testing/selftests/net/.
3. Fix spelling and indentation in conntrack, from Jeremy Sowden.
* 'main' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf-next:
netfilter: nat: fix indentation of function arguments
netfilter: conntrack: fix typo
selftests: add a selftest for big tcp
netfilter: use nf_ip6_check_hbh_len in nf_ct_skb_network_trim
netfilter: move br_nf_check_hbh_len to utils
netfilter: bridge: move pskb_trim_rcsum out of br_nf_check_hbh_len
netfilter: bridge: check len before accessing more nh data
netfilter: bridge: call pskb_may_pull in br_nf_check_hbh_len
netfilter: bridge: introduce broute meta statement
====================
Link: https://lore.kernel.org/r/20230308193033.13965-1-fw@strlen.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
neigh_lookup_nodev isn't used in the kernel after removal
of DECnet. So let's remove it.
Fixes: 1202cdd66531 ("Remove DECnet support from kernel")
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org>
Link: https://lore.kernel.org/r/eb5656200d7964b2d177a36b77efa3c597d6d72d.1678267343.git.leonro@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
This field mirrors hrtimer softexpires, we can instead
use the existing helpers.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Link: https://lore.kernel.org/r/20230308182648.1150762-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
No users in the tree. Tested with allmodconfig build.
Signed-off-by: Florian Westphal <fw@strlen.de>
Link: https://lore.kernel.org/r/20230308142006.20879-1-fw@strlen.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Documentation/bpf/bpf_devel_QA.rst
b7abcd9c656b ("bpf, doc: Link to submitting-patches.rst for general patch submission info")
d56b0c461d19 ("bpf, docs: Fix link to netdev-FAQ target")
https://lore.kernel.org/all/20230307095812.236eb1be@canb.auug.org.au/
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Some storage, such as AIX VDASD (virtual storage) and IBM 2076 (front
end), fail as a result of commit c92a6b5d6335 ("scsi: core: Query VPD
size before getting full page").
That commit changed getting SCSI VPD pages so that we now read just
enough of the page to get the actual page size, then read the whole
page in a second read. The problem is that the above mentioned
hardware returns zero for the page size, because of a firmware
error. In such cases, until the firmware is fixed, this new blacklist
flag says to revert to the original method of reading the VPD pages,
i.e. try to read a whole buffer's worth on the first try.
[mkp: reworked somewhat]
Fixes: c92a6b5d6335 ("scsi: core: Query VPD size before getting full page")
Reported-by: Martin Wilck <mwilck@suse.com>
Suggested-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Lee Duncan <lduncan@suse.com>
Link: https://lore.kernel.org/r/20220928181350.9948-1-leeman.duncan@gmail.com
Tested-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
After commit c28cd1f3433c ("clk: Mark a fwnode as initialized when using
CLK_OF_DECLARE() macro"), drivers/clk/mvebu/kirkwood.c fails to build:
drivers/clk/mvebu/kirkwood.c:358:1: error: expected identifier or '('
CLK_OF_DECLARE(98dx1135_clk, "marvell,mv98dx1135-core-clock",
^
include/linux/clk-provider.h:1367:21: note: expanded from macro 'CLK_OF_DECLARE'
static void __init name##_of_clk_init_declare(struct device_node *np) \
^
<scratch space>:124:1: note: expanded from here
98dx1135_clk_of_clk_init_declare
^
drivers/clk/mvebu/kirkwood.c:358:1: error: invalid digit 'd' in decimal constant
include/linux/clk-provider.h:1372:34: note: expanded from macro 'CLK_OF_DECLARE'
OF_DECLARE_1(clk, name, compat, name##_of_clk_init_declare)
^
<scratch space>:125:3: note: expanded from here
98dx1135_clk_of_clk_init_declare
^
drivers/clk/mvebu/kirkwood.c:358:1: error: invalid digit 'd' in decimal constant
include/linux/clk-provider.h:1372:34: note: expanded from macro 'CLK_OF_DECLARE'
OF_DECLARE_1(clk, name, compat, name##_of_clk_init_declare)
^
<scratch space>:125:3: note: expanded from here
98dx1135_clk_of_clk_init_declare
^
drivers/clk/mvebu/kirkwood.c:358:1: error: invalid digit 'd' in decimal constant
include/linux/clk-provider.h:1372:34: note: expanded from macro 'CLK_OF_DECLARE'
OF_DECLARE_1(clk, name, compat, name##_of_clk_init_declare)
^
<scratch space>:125:3: note: expanded from here
98dx1135_clk_of_clk_init_declare
^
C function names must start with either an alphabetic letter or an
underscore. To avoid generating invalid function names from clock names,
add two underscores to the beginning of the identifier.
Fixes: c28cd1f3433c ("clk: Mark a fwnode as initialized when using CLK_OF_DECLARE() macro")
Suggested-by: Saravana Kannan <saravanak@google.com>
Signed-off-by: Nathan Chancellor <nathan@kernel.org>
Link: https://lore.kernel.org/r/20230308-clk_of_declare-fix-v1-1-317b741e2532@kernel.org
Reviewed-by: Saravana Kannan <saravanak@google.com>
Reported-by: Naresh Kamboju <naresh.kamboju@linaro.org>
Signed-off-by: Stephen Boyd <sboyd@kernel.org>
|
|
Commit b8a1a4cd5a98 ("i2c: Provide a temporary .probe_new() call-back
type") introduced a new probe callback to convert i2c init routines to
not take an i2c_device_id parameter. Now that all in-tree drivers are
converted to the temporary .probe_new() callback, .probe() can be
modified to match the desired prototype.
Now that .probe() and .probe_new() have the same semantic, they can be
defined as members of an anonymous union to save some memory and
simplify the core code a bit.
Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Signed-off-by: Wolfram Sang <wsa@kernel.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Pull networking fixes from Paolo Abeni:
"Including fixes from netfilter and bpf.
Current release - regressions:
- core: avoid skb end_offset change in __skb_unclone_keeptruesize()
- sched:
- act_connmark: handle errno on tcf_idr_check_alloc
- flower: fix fl_change() error recovery path
- ieee802154: prevent user from crashing the host
Current release - new code bugs:
- eth: bnxt_en: fix the double free during device removal
- tools: ynl:
- fix enum-as-flags in the generic CLI
- fully inherit attrs in subsets
- re-license uniformly under GPL-2.0 or BSD-3-clause
Previous releases - regressions:
- core: use indirect calls helpers for sk_exit_memory_pressure()
- tls:
- fix return value for async crypto
- avoid hanging tasks on the tx_lock
- eth: ice: copy last block omitted in ice_get_module_eeprom()
Previous releases - always broken:
- core: avoid double iput when sock_alloc_file fails
- af_unix: fix struct pid leaks in OOB support
- tls:
- fix possible race condition
- fix device-offloaded sendpage straddling records
- bpf:
- sockmap: fix an infinite loop error
- test_run: fix &xdp_frame misplacement for LIVE_FRAMES
- fix resolving BTF_KIND_VAR after ARRAY, STRUCT, UNION, PTR
- netfilter: tproxy: fix deadlock due to missing BH disable
- phylib: get rid of unnecessary locking
- eth: bgmac: fix *initial* chip reset to support BCM5358
- eth: nfp: fix csum for ipsec offload
- eth: mtk_eth_soc: fix RX data corruption issue
Misc:
- usb: qmi_wwan: add telit 0x1080 composition"
* tag 'net-6.3-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (64 commits)
tools: ynl: fix enum-as-flags in the generic CLI
tools: ynl: move the enum classes to shared code
net: avoid double iput when sock_alloc_file fails
af_unix: fix struct pid leaks in OOB support
eth: fealnx: bring back this old driver
net: dsa: mt7530: permit port 5 to work without port 6 on MT7621 SoC
net: microchip: sparx5: fix deletion of existing DSCP mappings
octeontx2-af: Unlock contexts in the queue context cache in case of fault detection
net/smc: fix fallback failed while sendmsg with fastopen
ynl: re-license uniformly under GPL-2.0 OR BSD-3-Clause
mailmap: update entries for Stephen Hemminger
mailmap: add entry for Maxim Mikityanskiy
nfc: change order inside nfc_se_io error path
ethernet: ice: avoid gcc-9 integer overflow warning
ice: don't ignore return codes in VSI related code
ice: Fix DSCP PFC TLV creation
net: usb: qmi_wwan: add Telit 0x1080 composition
net: usb: cdc_mbim: avoid altsetting toggling for Telit FE990
netfilter: conntrack: adopt safer max chain length
net: tls: fix device-offloaded sendpage straddling records
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/hid/hid
Pull HID fixes from Benjamin Tissoires:
- fix potential out of bound write of zeroes in HID core with a
specially crafted uhid device (Lee Jones)
- fix potential use-after-free in work function in intel-ish-hid (Reka
Norman)
- selftests config fixes (Benjamin Tissoires)
- few device small fixes and support
* tag 'for-linus-2023030901' of git://git.kernel.org/pub/scm/linux/kernel/git/hid/hid:
HID: intel-ish-hid: ipc: Fix potential use-after-free in work function
HID: logitech-hidpp: Add support for Logitech MX Master 3S mouse
HID: cp2112: Fix driver not registering GPIO IRQ chip as threaded
selftest: hid: fix hid_bpf not set in config
HID: uhid: Over-ride the default maximum data buffer value with our own
HID: core: Provide new max_buffer_size attribute to over-ride the default
|
|
As it says in rfc8260#section-3.6 about the weighted fair queueing
scheduler:
A Weighted Fair Queueing scheduler between the streams is used. The
weight is configurable per outgoing SCTP stream. This scheduler
considers the lengths of the messages of each stream and schedules
them in a specific way to use the capacity according to the given
weights. If the weight of stream S1 is n times the weight of stream
S2, the scheduler should assign to stream S1 n times the capacity it
assigns to stream S2. The details are implementation dependent.
Interleaving user messages allows for a better realization of the
capacity usage according to the given weights.
This patch adds Weighted Fair Queueing Scheduler actually based on
the code of Fair Capacity Scheduler by adding fc_weight into struct
sctp_stream_out_ext and taking it into account when sorting stream->
fc_list in sctp_sched_fc_sched() and sctp_sched_fc_dequeue_done().
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
As it says in rfc8260#section-3.5 about the fair capacity scheduler:
A fair capacity distribution between the streams is used. This
scheduler considers the lengths of the messages of each stream and
schedules them in a specific way to maintain an equal capacity for
all streams. The details are implementation dependent. interleaving
user messages allows for a better realization of the fair capacity
usage.
This patch adds Fair Capacity Scheduler based on the foundations added
by commit 5bbbbe32a431 ("sctp: introduce stream scheduler foundations"):
A fc_list and a fc_length are added into struct sctp_stream_out_ext and
a fc_list is added into struct sctp_stream. In .enqueue, when there are
chunks enqueued into a stream, this stream will be linked into stream->
fc_list by its fc_list ordered by its fc_length. In .dequeue, it always
picks up the 1st skb from stream->fc_list. In .dequeue_done, fc_length
is increased by chunk's len and update its location in stream->fc_list
according to the its new fc_length.
Note that when the new fc_length overflows in .dequeue_done, instead of
resetting all fc_lengths to 0, we only reduced them by U32_MAX / 4 to
avoid a moment of imbalance in the scheduling, as Marcelo suggested.
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
Andrii Nakryiko says:
====================
pull-request: bpf-next 2023-03-08
We've added 23 non-merge commits during the last 2 day(s) which contain
a total of 28 files changed, 414 insertions(+), 104 deletions(-).
The main changes are:
1) Add more precise memory usage reporting for all BPF map types,
from Yafang Shao.
2) Add ARM32 USDT support to libbpf, from Puranjay Mohan.
3) Fix BTF_ID_LIST size causing problems in !CONFIG_DEBUG_INFO_BTF,
from Nathan Chancellor.
4) IMA selftests fix, from Roberto Sassu.
5) libbpf fix in APK support code, from Daniel Müller.
* https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (23 commits)
selftests/bpf: Fix IMA test
libbpf: USDT arm arg parsing support
libbpf: Refactor parse_usdt_arg() to re-use code
libbpf: Fix theoretical u32 underflow in find_cd() function
bpf: enforce all maps having memory usage callback
bpf: offload map memory usage
bpf, net: xskmap memory usage
bpf, net: sock_map memory usage
bpf, net: bpf_local_storage memory usage
bpf: local_storage memory usage
bpf: bpf_struct_ops memory usage
bpf: queue_stack_maps memory usage
bpf: devmap memory usage
bpf: cpumap memory usage
bpf: bloom_filter memory usage
bpf: ringbuf memory usage
bpf: reuseport_array memory usage
bpf: stackmap memory usage
bpf: arraymap memory usage
bpf: hashtab memory usage
...
====================
Link: https://lore.kernel.org/r/20230308193533.1671597-1-andrii@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Rename br_nf_check_hbh_len() to nf_ip6_check_hbh_len() and move it
to netfilter utils, so that it can be used by other modules, like
ovs and tc.
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org>
Reviewed-by: Aaron Conole <aconole@redhat.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
|
|
Commit 0091bfc81741 ("io_uring/af_unix: defer registered
files gc to io_uring release") added one bit to struct sk_buff.
This structure is critical for networking, and we try very hard
to not add bloat on it, unless absolutely required.
For instance, we can use a specific destructor as a wrapper
around unix_destruct_scm(), to identify skbs that unix_gc()
has to special case.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Pavel Begunkov <asml.silence@gmail.com>
Cc: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
Cc: Jens Axboe <axboe@kernel.dk>
Reviewed-by: Jens Axboe <axboe@kernel.dk>
Reviewed-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
nftables equivalent for ebtables -t broute.
Implement broute meta statement to set br_netfilter_broute flag
in skb to force a packet to be routed instead of being bridged.
Signed-off-by: Sriram Yagnaraman <sriram.yagnaraman@est.tech>
Signed-off-by: Florian Westphal <fw@strlen.de>
|
|
enum skb_drop_reason is more generic, we can adopt it instead.
Provide dev_kfree_skb_irq_reason() and dev_kfree_skb_any_reason().
This means drivers can use more precise drop reasons if they want to.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Reviewed-by: Yunsheng Lin <linyunsheng@huawei.com>
Link: https://lore.kernel.org/r/20230306204313.10492-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
cond sometimes is (val & MASK) what may result in a false positive
if val is a negative errno. We shouldn't evaluate cond if val < 0.
This has no functional impact here, but it's not nice.
Therefore switch order of the checks.
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://lore.kernel.org/r/6d8274ac-4344-23b4-d9a3-cad4c39517d4@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
I was intending to make all the Netlink Spec code BSD-3-Clause
to ease the adoption but it appears that:
- I fumbled the uAPI and used "GPL WITH uAPI note" there
- it gives people pause as they expect GPL in the kernel
As suggested by Chuck re-license under dual. This gives us benefit
of full BSD freedom while fulfilling the broad "kernel is under GPL"
expectations.
Link: https://lore.kernel.org/all/20230304120108.05dd44c5@kernel.org/
Link: https://lore.kernel.org/r/20230306200457.3903854-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Commit 596ff4a09b89 ("cpumask: re-introduce constant-sized cpumask
optimizations") changed cpumask_setall() to use "bitmap_set()" instead
of "bitmap_fill()", because bitmap_fill() would explicitly set all the
bits of a constant sized small bitmap, and that's exactly what we don't
want: we want to only set bits up to 'nr_cpu_ids', which is what
"bitmap_set()" does.
However, Yury correctly points out that while "bitmap_set()" does indeed
only set bits up to the required bitmap size, it doesn't _clear_ bits
above that size, so the upper bits would still not have well-defined
values.
Now, none of this should really matter, since any bits set past
'nr_cpu_ids' should always be ignored in the first place. Yes, the bit
scanning functions might return them as a result, but since users should
always consider the ">= nr_cpu_ids" condition to mean "no more bits",
that shouldn't have any actual effect (see previous commit 8ca09d5fa354
"cpumask: fix incorrect cpumask scanning result checks").
But let's just do it right, the way the code was _intended_ to work. We
have had enough lazy code that works but bites us in the *rse later
(again, see previous commit) that there's no reason to not just do this
properly.
It turns out that "bitmap_fill()" gets this all right for the complex
case, and really only fails for the inlined optimized case that just
fills the whole word. And while we could just fix bitmap_fill() to use
the proper last word mask, there's two issues with that:
- the cpumask case wants to do the _optimization_ based on "NR_CPUS is
a small constant", but then wants to do the actual bit _fill_ based
on "nr_cpu_ids" that isn't necessarily that same constant
- we have lots of non-cpumask users of bitmap_fill(), and while they
hopefully don't care, and probably would want the proper semantics
anyway ("only set bits up to the limit"), I do not want the cpumask
changes to impact other parts
So this ends up just doing the single-word optimization by hand in the
cpumask code. If our cpumask is fundamentally limited to a single word,
just do the proper "fill in that word" exactly. And if it's the more
complex multi-word case, then the generic bitmap_fill() will DTRT.
This is all an example of how our bitmap function optimizations really
are somewhat broken. They conflate the "this is size of the bitmap"
optimizations with the actual bit(s) we want to set.
In many cases we really want to have the two be separate things:
sometimes we base our optimizations on the size of the whole bitmap ("I
know this whole bitmap fits in a single word, so I'll just use
single-word accesses"), and sometimes we base them on the bit we are
looking at ("this is just acting on bits that are in the first word, so
I'll use single-word accesses").
Notice how the end result of the two optimizations are the same, but the
way we get to them are quite different.
And all our cpumask optimization games are really about that fundamental
distinction, and we'd often really want to pass in both the "this is the
bit I'm working on" (which _can_ be a small constant but might be
variable), and "I know it's in this range even if it's variable" (based
on CONFIG_NR_CPUS).
So this cpumask_setall() implementation just makes that explicit. It
checks the "I statically know the size is small" using the known static
size of the cpumask (which is what that 'small_cpumask_bits' is all
about), but then sets the actual bits using the exact number of cpus we
have (ie 'nr_cpumask_bits')
Of course, in a perfect world, the compiler would have done all the
range analysis (possibly with help from us just telling it that
"this value is always in this range"), and would do all of this for us.
But that is not the world we live in.
While we dream of that perfect world, this does that manual logic to
make it all work out. And this was a very long explanation for a small
code change that shouldn't even matter.
Reported-by: Yury Norov <yury.norov@gmail.com>
Link: https://lore.kernel.org/lkml/ZAV9nGG9e1%2FrV+L%2F@yury-laptop/
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
To be able to use a general function later for any kind of
TLV, separate the vendor TLV header/content in the structs.
Signed-off-by: Mordechay Goodstein <mordechay.goodstein@intel.com>
Signed-off-by: Gregory Greenman <gregory.greenman@intel.com>
Link: https://lore.kernel.org/r/20230305124407.8ac5195bb3e6.I19ad99c1ad3108453aede64bddf6ef1a7c4a0b74@changeid
[separate from the original combined patch]
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
|
|
A new helper is introduced to calculate offload map memory usage. But
currently the memory dynamically allocated in netdev dev_ops, like
nsim_map_update_elem, is not counted. Let's just put it aside now.
Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
Link: https://lore.kernel.org/r/20230305124615.12358-18-laoar.shao@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
A new helper is introduced to calculate xskmap memory usage.
The xfsmap memory usage can be dynamically changed when we add or remove
a xsk_map_node. Hence we need to track the count of xsk_map_node to get
its memory usage.
The result as follows,
- before
10: xskmap name count_map flags 0x0
key 4B value 4B max_entries 65536 memlock 524288B
- after
10: xskmap name count_map flags 0x0 <<< no elements case
key 4B value 4B max_entries 65536 memlock 524608B
Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
Link: https://lore.kernel.org/r/20230305124615.12358-17-laoar.shao@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
A new helper is introduced into bpf_local_storage map to calculate the
memory usage. This helper is also used by other maps like
bpf_cgrp_storage, bpf_inode_storage, bpf_task_storage and etc.
Note that currently the dynamically allocated storage elements are not
counted in the usage, since it will take extra runtime overhead in the
elements update or delete path. So let's put it aside now, and implement
it in the future when someone really need it.
Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
Link: https://lore.kernel.org/r/20230305124615.12358-15-laoar.shao@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
Add a new map ops ->map_mem_usage to print the memory usage of a
bpf map.
This is a preparation for the followup change.
Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
Link: https://lore.kernel.org/r/20230305124615.12358-2-laoar.shao@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
After commit 66e3a13e7c2c ("bpf: Add bpf_dynptr_slice and bpf_dynptr_slice_rdwr"),
clang builds without CONFIG_DEBUG_INFO_BTF warn:
kernel/bpf/verifier.c:10298:24: warning: array index 16 is past the end of the array (that has type 'u32[16]' (aka 'unsigned int[16]')) [-Warray-bounds]
meta.func_id == special_kfunc_list[KF_bpf_dynptr_slice_rdwr]) {
^ ~~~~~~~~~~~~~~~~~~~~~~~~
kernel/bpf/verifier.c:9150:1: note: array 'special_kfunc_list' declared here
BTF_ID_LIST(special_kfunc_list)
^
include/linux/btf_ids.h:207:27: note: expanded from macro 'BTF_ID_LIST'
#define BTF_ID_LIST(name) static u32 __maybe_unused name[16];
^
1 warning generated.
A warning of this nature was previously addressed by
commit beb3d47d1d3d ("bpf: Fix a BTF_ID_LIST bug with CONFIG_DEBUG_INFO_BTF not set")
but there have been new kfuncs added since then.
Quadruple the size of the CONFIG_DEBUG_INFO_BTF=n definition so that
this problem is unlikely to show up for some time.
Link: https://github.com/ClangBuiltLinux/linux/issues/1810
Signed-off-by: Nathan Chancellor <nathan@kernel.org>
Reviewed-by: Tom Rix <trix@redhat.com>
Link: https://lore.kernel.org/r/20230307-bpf-kfuncs-warray-bounds-v1-1-00ad3191f3a6@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
x86 ACPI boards which ship with only Android as their factory image usually
have pretty broken ACPI tables, relying on everything being hardcoded in
the factory kernel image and often disabling parts of the ACPI enumeration
kernel code to avoid the broken tables causing issues.
Part of this broken ACPI code is that sometimes these boards have _AEI
ACPI GPIO event handlers which are broken.
So far this has been dealt with in the platform/x86/x86-android-tablets.c
module, which contains various workarounds for these devices, by it calling
acpi_gpiochip_free_interrupts() on gpiochip-s with troublesome handlers to
disable the handlers.
But in some cases this is too late, if the handlers are of the edge type
then gpiolib-acpi.c's code will already have run them at boot.
This can cause issues such as GPIOs ending up as owned by "ACPI:OpRegion",
making them unavailable for drivers which actually need them.
Boards with these broken ACPI tables are already listed in
drivers/acpi/x86/utils.c for e.g. acpi_quirk_skip_i2c_client_enumeration().
Extend the quirks mechanism for a new acpi_quirk_skip_gpio_event_handlers()
helper, this re-uses the DMI-ids rather then having to duplicate the same
DMI table in gpiolib-acpi.c .
Also add the new ACPI_QUIRK_SKIP_GPIO_EVENT_HANDLERS quirk to existing
boards with troublesome ACPI gpio event handlers, so that the current
acpi_gpiochip_free_interrupts() hack can be removed from
x86-android-tablets.c .
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Acked-by: Andy Shevchenko <andy.shevchenko@gmail.com>
Signed-off-by: Rafael J. Wysocki <rjw@rjwysocki.net>
|
|
Add support to use a random local address in authentication and
deauthentication frames sent to unassociated peer when the driver
supports.
The driver needs to configure receive behavior to accept frames with
random transmit address specified in TX path authentication frames
during the time of the frame exchange is pending and such frames need to
be acknowledged similarly to frames sent to the local permanent address
when this random address functionality is used.
This capability allows use of randomized transmit address for PASN
authentication frames to improve privacy of WLAN clients.
Signed-off-by: Veerendranath Jakkam <quic_vjakkam@quicinc.com>
Link: https://lore.kernel.org/r/20230112012415.167556-2-quic_vjakkam@quicinc.com
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
|
|
This is utilized to pass LDPC configurations from user space
(i.e. hostapd) to driver.
Signed-off-by: Ryder Lee <ryder.lee@mediatek.com>
Link: https://lore.kernel.org/r/1de696aaa34efd77a926eb657b8c0fda05aaa177.1676628065.git.ryder.lee@mediatek.com
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
|
|
Similar to VHT/HE. This is utilized to pass MU-MIMO configurations
from user space (i.e. hostapd) to driver.
Signed-off-by: Ryder Lee <ryder.lee@mediatek.com>
Link: https://lore.kernel.org/r/8d9966c4c1e77cb1ade77d42bdc49905609192e9.1676628065.git.ryder.lee@mediatek.com
[move into combined if statement, reset on !eht]
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
|