Age | Commit message (Collapse) | Author |
|
git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD
KVM/arm64 fixes for 6.14, take #3
- Fix TCR_EL2 configuration to not use the ASID in TTBR1_EL2
and not mess-up T1SZ/PS by using the HCR_EL2.E2H==0 layout.
- Bring back the VMID allocation to the vcpu_load phase, ensuring
that we only setup VTTBR_EL2 once on VHE. This cures an ugly
race that would lead to running with an unallocated VMID.
|
|
The perf_iterate_ctx() function performs RCU list traversal but
currently lacks RCU read lock protection. This causes lockdep warnings
when running perf probe with unshare(1) under CONFIG_PROVE_RCU_LIST=y:
WARNING: suspicious RCU usage
kernel/events/core.c:8168 RCU-list traversed in non-reader section!!
Call Trace:
lockdep_rcu_suspicious
? perf_event_addr_filters_apply
perf_iterate_ctx
perf_event_exec
begin_new_exec
? load_elf_phdrs
load_elf_binary
? lock_acquire
? find_held_lock
? bprm_execve
bprm_execve
do_execveat_common.isra.0
__x64_sys_execve
do_syscall_64
entry_SYSCALL_64_after_hwframe
This protection was previously present but was removed in commit
bd2756811766 ("perf: Rewrite core context handling"). Add back the
necessary rcu_read_lock()/rcu_read_unlock() pair around
perf_iterate_ctx() call in perf_event_exec().
[ mingo: Use scoped_guard() as suggested by Peter ]
Fixes: bd2756811766 ("perf: Rewrite core context handling")
Signed-off-by: Breno Leitao <leitao@debian.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20250117-fix_perf_rcu-v1-1-13cb9210fc6a@debian.org
|
|
The ES8328 codec driver, which is also used for the ES8388 chip that
appears to have an identical register map, claims that the output can
either take the route from DAC->Mixer->Output or through DAC->Output
directly. To the best of what I could find, this is not true, and
creates problems.
Without DACCONTROL17 bit index 7 set for the left channel, as well as
DACCONTROL20 bit index 7 set for the right channel, I cannot get any
analog audio out on Left Out 2 and Right Out 2 respectively, despite the
DAPM routes claiming that this should be possible. Furthermore, the same
is the case for Left Out 1 and Right Out 1, showing that those two don't
have a direct route from DAC to output bypassing the mixer either.
Those control bits toggle whether the DACs are fed (stale bread?) into
their respective mixers. If one "unmutes" the mixer controls in
alsamixer, then sure, the audio output works, but if it doesn't work
without the mixer being fed the DAC input then evidently it's not a
direct output from the DAC.
ES8328/ES8388 are seemingly not alone in this. ES8323, which uses a
separate driver for what appears to be a very similar register map,
simply flips those two bits on in its probe function, and then pretends
there is no power management whatsoever for the individual controls.
Fair enough.
My theory as to why nobody has noticed this up to this point is that
everyone just assumes it's their fault when they had to unmute an
additional control in ALSA.
Fix this in the es8328 driver by removing the erroneous direct route,
then get rid of the playback switch controls and have those bits tied to
the mixer's widget instead, which until now had no register to play
with.
Fixes: 567e4f98922c ("ASoC: add es8328 codec driver")
Signed-off-by: Nicolas Frattaroli <nicolas.frattaroli@collabora.com>
Link: https://patch.msgid.link/20250222-es8328-route-bludgeoning-v1-1-99bfb7fb22d9@collabora.com
Signed-off-by: Mark Brown <broonie@kernel.org>
|
|
Signed-off-by: Ken Raeburn <raeburn@redhat.com>
Signed-off-by: Matthew Sakai <msakai@redhat.com>
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Cc: stable@vger.kernel.org
|
|
Nsfs only deals with unhashed dentries and there's currently no way for
them to become hashed. So remove d_op->d_delete.
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
Pidfs only deals with unhashed dentries and there's currently no way for
them to become hashed. So remove d_op->d_delete.
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
When the kernel is compiled without LED framework support the
rtl8366rb fails to build like this:
rtl8366rb.o: in function `rtl8366rb_setup_led':
rtl8366rb.c:953:(.text.unlikely.rtl8366rb_setup_led+0xe8):
undefined reference to `led_init_default_state_get'
rtl8366rb.c:980:(.text.unlikely.rtl8366rb_setup_led+0x240):
undefined reference to `devm_led_classdev_register_ext'
As this is constantly coming up in different randconfig builds,
bite the bullet and create a separate file for the offending
code, split out a header with all stuff needed both in the
core driver and the leds code.
Add a new bool Kconfig option for the LED compile target, such
that it depends on LEDS_CLASS=y || LEDS_CLASS=RTL8366RB
which make LED support always available when LEDS_CLASS is
compiled into the kernel and enforce that if the LEDS_CLASS
is a module, then the RTL8366RB driver needs to be a module
as well so that modprobe can resolve the dependencies.
Fixes: 32d617005475 ("net: dsa: realtek: add LED drivers for rtl8366rb")
Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202502070525.xMUImayb-lkp@intel.com/
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Reviewed-by: Vladimir Oltean <olteanv@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
There are several issues existed in start_xmit():
- Transmitted packets need to be freed before sending a packet, this
introduces delay and increases the average packets transmit
time. This also increase the time that spent in holding the TX lock.
- Notification is enabled after free_old_xmit_skbs() which will
introduce unnecessary interrupts if TX notification happens on the
same CPU that is doing the transmission now (actually, virtio-net
driver are optimized for this case).
So this patch tries to avoid those issues by not cleaning transmitted
packets in start_xmit() when TX NAPI is enabled and disable
notifications even more aggressively. Notification will be since the
beginning of the start_xmit(). But we can't enable delayed
notification after TX is stopped as we will lose the
notifications. Instead, the delayed notification needs is enabled
after the virtqueue is kicked for best performance.
Performance numbers:
1) single queue 2 vcpus guest with pktgen_sample03_burst_single_flow.sh
(burst 256) + testpmd (rxonly) on the host:
- When pinning TX IRQ to pktgen VCPU: split virtqueue PPS were
increased 55% from 6.89 Mpps to 10.7 Mpps and 32% TX interrupts were
eliminated. Packed virtqueue PPS were increased 50% from 7.09 Mpps to
10.7 Mpps, 99% TX interrupts were eliminated.
- When pinning TX IRQ to VCPU other than pktgen: split virtqueue PPS
were increased 96% from 5.29 Mpps to 10.4 Mpps and 45% TX interrupts
were eliminated; Packed virtqueue PPS were increased 78% from 6.12
Mpps to 10.9 Mpps and 99% TX interrupts were eliminated.
2) single queue 1 vcpu guest + vhost-net/TAP on the host: single
session netperf from guest to host shows 82% improvement from
31Gb/s to 58Gb/s, %stddev were reduced from 34.5% to 1.9% and 88%
of TX interrupts were eliminated.
Signed-off-by: Jason Wang <jasowang@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Reviewed-by: Thorsten Blum <thorsten.blum@linux.dev>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux
Pull i2c fix from Wolfram Sang:
"Revert one cleanup which turned out to eat too much stack space"
* tag 'i2c-for-6.14-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux:
i2c: core: Allocate temporary client dynamically
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras
Pull EDAC fix from Borislav Petkov:
- Have qcom_edac use the correct interrupt enable register to configure
the RAS interrupt lines
* tag 'edac_urgent_for_v6.14_rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras:
EDAC/qcom: Correct interrupt enable register configuration
|
|
syzbot reports an issue that turns out to be caused by the fact that the
efivarfs PM notifier may be invoked before the efivarfs_fs_info::sb
field is populated, resulting in a NULL deference.
So defer the registration until efivarfs_fill_super() is invoked.
Reported-by: syzbot+00d13e505ef530a45100@syzkaller.appspotmail.com
Tested-by: syzbot+00d13e505ef530a45100@syzkaller.appspotmail.com
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
|
|
According to the UEFI Common Platform Error Record appendix, the
processor context information structure is a variable length structure,
but "is padded with zeros if the size is not a multiple of 16 bytes".
Currently this isn't honoured, causing all but the first structure to
be garbage when printed. Thus align the size to be a multiple of 16.
Signed-off-by: Patrick Rudolph <patrick.rudolph@9elements.com>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
|
|
According to the UEFI Common Platform Error Record appendix, the
IA32/X64 Processor Context Information Structure is a variable length
structure, but "is padded with zeros if the size is not a multiple
of 16 bytes".
Currently this isn't honoured, causing all but the first structure to
be garbage when printed. Thus align the size to be a multiple of 16.
Signed-off-by: Patrick Rudolph <patrick.rudolph@9elements.com>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
|
|
While using nvme target with use_srq on, below kernel panic is noticed.
[ 549.698111] bnxt_en 0000:41:00.0 enp65s0np0: FEC autoneg off encoding: Clause 91 RS(544,514)
[ 566.393619] Oops: divide error: 0000 [#1] PREEMPT SMP NOPTI
..
[ 566.393799] <TASK>
[ 566.393807] ? __die_body+0x1a/0x60
[ 566.393823] ? die+0x38/0x60
[ 566.393835] ? do_trap+0xe4/0x110
[ 566.393847] ? bnxt_qplib_alloc_init_hwq+0x1d4/0x580 [bnxt_re]
[ 566.393867] ? bnxt_qplib_alloc_init_hwq+0x1d4/0x580 [bnxt_re]
[ 566.393881] ? do_error_trap+0x7c/0x120
[ 566.393890] ? bnxt_qplib_alloc_init_hwq+0x1d4/0x580 [bnxt_re]
[ 566.393911] ? exc_divide_error+0x34/0x50
[ 566.393923] ? bnxt_qplib_alloc_init_hwq+0x1d4/0x580 [bnxt_re]
[ 566.393939] ? asm_exc_divide_error+0x16/0x20
[ 566.393966] ? bnxt_qplib_alloc_init_hwq+0x1d4/0x580 [bnxt_re]
[ 566.393997] bnxt_qplib_create_srq+0xc9/0x340 [bnxt_re]
[ 566.394040] bnxt_re_create_srq+0x335/0x3b0 [bnxt_re]
[ 566.394057] ? srso_return_thunk+0x5/0x5f
[ 566.394068] ? __init_swait_queue_head+0x4a/0x60
[ 566.394090] ib_create_srq_user+0xa7/0x150 [ib_core]
[ 566.394147] nvmet_rdma_queue_connect+0x7d0/0xbe0 [nvmet_rdma]
[ 566.394174] ? lock_release+0x22c/0x3f0
[ 566.394187] ? srso_return_thunk+0x5/0x5f
Page size and shift info is set only for the user space SRQs.
Set page size and page shift for kernel space SRQs also.
Fixes: 0c4dcd602817 ("RDMA/bnxt_re: Refactor hardware queue memory allocation")
Signed-off-by: Kashyap Desai <kashyap.desai@broadcom.com>
Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com>
Link: https://patch.msgid.link/1740237621-29291-1-git-send-email-selvin.xavier@broadcom.com
Signed-off-by: Leon Romanovsky <leon@kernel.org>
|
|
Change POOL_NEXT_SIZE define value from 0 to BIT(30), since this define
is used to request the available maximum sized flow table, and zero doesn't
make sense for it, whereas some places in the driver use zero explicitly
expecting the smallest table size possible but instead due to this
define they end up allocating the biggest table size unawarely.
In addition move the definition to "include/linux/mlx5/fs.h" to expose the
define to IB driver as well, while appropriately renaming it.
Signed-off-by: Patrisious Haddad <phaddad@nvidia.com>
Reviewed-by: Maor Gottlieb <maorg@nvidia.com>
Reviewed-by: Mark Bloch <mbloch@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://patch.msgid.link/20250219085808.349923-3-tariqt@nvidia.com
Signed-off-by: Leon Romanovsky <leon@kernel.org>
|
|
Add new error value for trust lockdown in health syndrome enum.
Also, include the offset for crr bit in the health buffer layout.
These changes prepare for downstream patches that update health
event handling.
Signed-off-by: Shahar Shitrit <shshitrit@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://patch.msgid.link/20250219085808.349923-2-tariqt@nvidia.com
Signed-off-by: Leon Romanovsky <leon@kernel.org>
|
|
When there is a failure during bind QP, the cleanup flow destroys the
counter regardless if it is the one that created it or not, which is
problematic since if it isn't the one that created it, that counter could
still be in use.
Fix that by destroying the counter only if it was created during this call.
Fixes: 45842fc627c7 ("IB/mlx5: Support statistic q counter configuration")
Signed-off-by: Patrisious Haddad <phaddad@nvidia.com>
Reviewed-by: Mark Zhang <markzhang@nvidia.com>
Link: https://patch.msgid.link/25dfefddb0ebefa668c32e06a94d84e3216257cf.1740033937.git.leon@kernel.org
Reviewed-by: Zhu Yanjun <yanjun.zhu@linux.dev>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
|
|
git://git.samba.org/sfrench/cifs-2.6
Pull smb client fix from Steve French:
- Fix potential null pointer dereference
* tag 'v6.14-rc3-smb3-client-fix-part2' of git://git.samba.org/sfrench/cifs-2.6:
smb: client: Add check for next_buffer in receive_encrypted_standard()
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Ingo Molnar:
- Fix AVX-VNNI CPU feature dependency bug triggered via the 'noxsave'
boot option
- Fix typos in the SVA documentation
- Add Tony Luck as RDT co-maintainer and remove Fenghua Yu
* tag 'x86-urgent-2025-02-22' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
docs: arch/x86/sva: Fix two grammar errors under Background and FAQ
x86/cpufeatures: Make AVX-VNNI depend on AVX
MAINTAINERS: Change maintainer for RDT
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull rseq fixes from Ingo Molnar:
- Fix overly spread-out RSEQ concurrency ID allocation pattern that
regressed certain workloads
- Fix RSEQ registration syscall behavior on -EFAULT errors when
CONFIG_DEBUG_RSEQ=y (This debug option is disabled on most
distributions)
* tag 'sched-urgent-2025-02-22' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
rseq: Fix rseq registration with CONFIG_DEBUG_RSEQ
sched: Compact RSEQ concurrency IDs with reduced threads and affinity
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf event fixes from Ingo Molnar:
"Fix x86 Intel Lion Cove CPU event constraints, and fix uprobes
debug/error printk output pointer-value verbosity"
* tag 'perf-urgent-2025-02-22' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
perf/x86/intel: Fix event constraints for LNC
uprobes: Don't use %pK through printk
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull irq fixes from Ingo Molnar:
"Fix miscellaneous irqchip bugs"
* tag 'irq-urgent-2025-02-22' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
irqchip/qcom-pdc: Workaround hardware register bug on X1E80100
irqchip/jcore-aic, clocksource/drivers/jcore: Fix jcore-pit interrupt request
irqchip/gic-v3: Fix rk3399 workaround when secure interrupts are enabled
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux
Pull s390 fixes from Vasily Gorbik:
- Fix inline asm constraint in cmma_test_essa() to avoid potential ESSA
detection miscompilation
- Fix build failure with CONFIG_GENDWARFKSYMS by disabling purgatory
symbol exports with -D__DISABLE_EXPORTS
- Update defconfigs
* tag 's390-6.14-5' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux:
s390/boot: Fix ESSA detection
s390/purgatory: Use -D__DISABLE_EXPORTS
s390: Update defconfigs
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace
Pull tracing fixes from Steven Rostedt:
"Function graph accounting fixes:
- Fix the manage ops hashes
The function graph registers a "manager ops" and "sub-ops" to
ftrace. The manager ops does not have any callback but calls the
sub-ops callbacks. The manage ops hashes (what is used to tell
ftrace what functions to attach to) is built on the sub-ops it
manages.
There was an error in the way it built the hash. An empty hash
means to attach to all functions. When the manager ops had one
sub-ops it properly copied its hash. But when the manager ops had
more than one sub-ops, it went into a loop to make a set of all
functions it needed to add to the hash. If any of the subops hashes
was empty, that would mean to attach to all functions. The error
was that the first iteration of the loop passed in an empty hash to
start with in order to add the other hashes. That starting hash was
mistaken as to attach to all functions. This made the manage ops
attach to all functions whenever it had two or more sub-ops, even
if each sub-op was attached to only a single function.
- Do not add duplicate entries to the manager ops hash
If two or more subops hashes trace the same function, an entry for
that function will be added to the manager ops for each subops.
This causes waste and extra overhead.
Fprobe accounting fixes:
- Remove last function from fprobe hash
Fprobes has a ftrace hash to manage which functions an fprobe is
attached to. It also has a counter of how many fprobes are
attached. When the last fprobe is removed, it unregisters the
fprobe from ftrace but does not remove the functions the last
fprobe was attached to from the hash. This leaves the old functions
attached. When a new fprobe is added, the fprobe infrastructure
attaches to not only the functions of the new fprobe, but also to
the functions of the last fprobe.
- Fix accounting of the fprobe counter
When a fprobe is added, it updates a counter. If the counter goes
from zero to one, it attaches its ops to ftrace. When an fprobe is
removed, the counter is decremented. If the counter goes from 1 to
zero, it removes the fprobes ops from ftrace.
There was an issue where if two fprobes trace the same function,
the addition of each fprobe would increment the counter. But when
removing the first of the fprobes, it would notice that another
fprobe is still attached to one of its functions no it does not
remove the functions from the ftrace ops.
But it also did not decrement the counter, so when the last fprobe
is removed, the counter is still one. This leaves the fprobes
callback still registered with ftrace and it being called by the
functions defined by the fprobes ops hash. Worse yet, because all
the functions from the fprobe ops hash have been removed, that
tells ftrace that it wants to trace all functions.
Thus, this puts the state of the system where every function is
calling the fprobe callback handler (which does nothing as there
are no registered fprobes), but this causes a good 13% slow down of
the entire system.
Other updates:
- Add a selftest to test the above issues to prevent regressions.
- Fix preempt count accounting in function tracing
Better recursion protection was added to function tracing which
added another layer of preempt disable. As the preempt_count gets
traced in the event, it needs to subtract the amount of preempt
disabling the tracer does to record what the preempt_count was when
the trace was triggered.
- Fix memory leak in output of set_event
A variable is passed by the seq_file functions in the location that
is set by the return of the next() function. The start() function
allocates it and the stop() function frees it. But when the last
item is found, the next() returns NULL which leaks the data that
was allocated in start(). The m->private is used for something
else, so have next() free the data when it returns NULL, as stop()
will then just receive NULL in that case"
* tag 'ftrace-v6.14-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
tracing: Fix memory leak when reading set_event file
ftrace: Correct preemption accounting for function tracing.
selftests/ftrace: Update fprobe test to check enabled_functions file
fprobe: Fix accounting of when to unregister from function graph
fprobe: Always unregister fgraph function from ops
ftrace: Do not add duplicate entries in subops manager ops
ftrace: Fix accounting of adding subops to a manager ops
|
|
Load patches for which the driver carries a SHA256 checksum of the patch
blob.
This can be disabled by adding "microcode.amd_sha_check=off" on the
kernel cmdline. But it is highly NOT recommended.
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
|
|
drivers/i2c/i2c-core-base.c: In function ‘i2c_detect.isra’:
drivers/i2c/i2c-core-base.c:2544:1: warning: the frame size of 1312 bytes is larger than 1024 bytes [-Wframe-larger-than=]
2544 | }
| ^
Fix this by allocating the temporary client structure dynamically, as it
is a rather large structure (1216 bytes, depending on kernel config).
This is basically a revert of the to-be-fixed commit with some
checkpatch improvements.
Fixes: 735668f8e5c9 ("i2c: core: Allocate temp client on the stack in i2c_detect")
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Reviewed-by: Su Hui <suhui@nfschina.com>
Reviewed-by: Guenter Roeck <linux@roeck-us.net>
[wsa: updated commit message, merged tags from similar patch]
Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com>
|
|
Jeremy Kerr says:
====================
mctp: Add MCTP-over-USB hardware transport binding
Add an implementation of the DMTF standard DSP0283, providing an MCTP
channel over high-speed USB.
This is a fairly trivial first implementation, in that we only submit
one tx and one rx URB at a time. We do accept multi-packet transfers,
but do not yet generate them on transmit.
Of course, questions and comments are most welcome, particularly on the
USB interfaces.
v2: https://lore.kernel.org/20250212-dev-mctp-usb-v2-0-76e67025d764@codeconstruct.com.au
v1: https://lore.kernel.org/20250206-dev-mctp-usb-v1-0-81453fe26a61@codeconstruct.com.au
====================
Link: https://patch.msgid.link/20250221-dev-mctp-usb-v3-0-3353030fe9cc@codeconstruct.com.au
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Add an implementation for DMTF DSP0283, which defines a MCTP-over-USB
transport. As per that spec, we're restricted to full speed mode,
requiring 512-byte transfers.
Each MCTP-over-USB interface is a peer-to-peer link to a single MCTP
endpoint, so no physical addressing is required (of course, that MCTP
endpoint may then bridge to further MCTP endpoints). Consequently,
interfaces will report with no lladdr data:
# mctp link
dev lo index 1 address 00:00:00:00:00:00 net 1 mtu 65536 up
dev mctpusb0 index 6 address none net 1 mtu 68 up
This is a simple initial implementation, with single rx & tx urbs, and
no multi-packet tx transfers - although we do accept multi-packet rx
from the device.
Includes suggested fixes from Santosh Puranik <spuranik@nvidia.com>.
Signed-off-by: Jeremy Kerr <jk@codeconstruct.com.au>
Cc: Santosh Puranik <spuranik@nvidia.com>
Link: https://patch.msgid.link/20250221-dev-mctp-usb-v3-2-3353030fe9cc@codeconstruct.com.au
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Upcoming changes will add a USB host (and later gadget) driver for the
MCTP-over-USB protocol. Add a header that provides common definitions
for protocol support: the packet header format and a few framing
definitions. Add a define for the MCTP class code, as per
https://usb.org/defined-class-codes.
Signed-off-by: Jeremy Kerr <jk@codeconstruct.com.au>
Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Link: https://patch.msgid.link/20250221-dev-mctp-usb-v3-1-3353030fe9cc@codeconstruct.com.au
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Using L: with more than a bare email address causes getmaintainer.pl
to be unable to parse the entry. Fix this by doing as other entries
that use this email address and convert it to an R: entry.
Link: https://patch.msgid.link/20250221005012.1051897-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Implement byte queue limits to allow queuing disciplines to account for
packets enqueued in the ring buffer but not yet transmitted. There are a
separate set of transmit functions for AT91 that I haven't touched since
I don't have hardware to test on.
Signed-off-by: Sean Anderson <sean.anderson@linux.dev>
Link: https://patch.msgid.link/20250220164257.96859-1-sean.anderson@linux.dev
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Stats calculations involve a RMW to add the stat update to the existing
value. This is currently not protected by any synchronization mechanism,
so data races are possible. Add a spinlock to protect the update. The
reader side could be protected using u64_stats, but we would still need
a spinlock for the update side anyway. And we always do an update
immediately before reading the stats anyway.
Fixes: 89e5785fc8a6 ("[PATCH] Atmel MACB ethernet driver")
Signed-off-by: Sean Anderson <sean.anderson@linux.dev>
Link: https://patch.msgid.link/20250220162950.95941-1-sean.anderson@linux.dev
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
syzbot found that ipvlan_process_v6_outbound() was assuming
the IPv6 network header isis present in skb->head [1]
Add the needed pskb_network_may_pull() calls for both
IPv4 and IPv6 handlers.
[1]
BUG: KMSAN: uninit-value in __ipv6_addr_type+0xa2/0x490 net/ipv6/addrconf_core.c:47
__ipv6_addr_type+0xa2/0x490 net/ipv6/addrconf_core.c:47
ipv6_addr_type include/net/ipv6.h:555 [inline]
ip6_route_output_flags_noref net/ipv6/route.c:2616 [inline]
ip6_route_output_flags+0x51/0x720 net/ipv6/route.c:2651
ip6_route_output include/net/ip6_route.h:93 [inline]
ipvlan_route_v6_outbound+0x24e/0x520 drivers/net/ipvlan/ipvlan_core.c:476
ipvlan_process_v6_outbound drivers/net/ipvlan/ipvlan_core.c:491 [inline]
ipvlan_process_outbound drivers/net/ipvlan/ipvlan_core.c:541 [inline]
ipvlan_xmit_mode_l3 drivers/net/ipvlan/ipvlan_core.c:605 [inline]
ipvlan_queue_xmit+0xd72/0x1780 drivers/net/ipvlan/ipvlan_core.c:671
ipvlan_start_xmit+0x5b/0x210 drivers/net/ipvlan/ipvlan_main.c:223
__netdev_start_xmit include/linux/netdevice.h:5150 [inline]
netdev_start_xmit include/linux/netdevice.h:5159 [inline]
xmit_one net/core/dev.c:3735 [inline]
dev_hard_start_xmit+0x247/0xa20 net/core/dev.c:3751
sch_direct_xmit+0x399/0xd40 net/sched/sch_generic.c:343
qdisc_restart net/sched/sch_generic.c:408 [inline]
__qdisc_run+0x14da/0x35d0 net/sched/sch_generic.c:416
qdisc_run+0x141/0x4d0 include/net/pkt_sched.h:127
net_tx_action+0x78b/0x940 net/core/dev.c:5484
handle_softirqs+0x1a0/0x7c0 kernel/softirq.c:561
__do_softirq+0x14/0x1a kernel/softirq.c:595
do_softirq+0x9a/0x100 kernel/softirq.c:462
__local_bh_enable_ip+0x9f/0xb0 kernel/softirq.c:389
local_bh_enable include/linux/bottom_half.h:33 [inline]
rcu_read_unlock_bh include/linux/rcupdate.h:919 [inline]
__dev_queue_xmit+0x2758/0x57d0 net/core/dev.c:4611
dev_queue_xmit include/linux/netdevice.h:3311 [inline]
packet_xmit+0x9c/0x6c0 net/packet/af_packet.c:276
packet_snd net/packet/af_packet.c:3132 [inline]
packet_sendmsg+0x93e0/0xa7e0 net/packet/af_packet.c:3164
sock_sendmsg_nosec net/socket.c:718 [inline]
Fixes: 2ad7bf363841 ("ipvlan: Initial check-in of the IPVLAN driver.")
Reported-by: syzbot+93ab4a777bafb9d9f960@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/netdev/67b74f01.050a0220.14d86d.02d8.GAE@google.com/T/#u
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Mahesh Bandewar <maheshb@google.com>
Link: https://patch.msgid.link/20250220155336.61884-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
stmmac_init_dma_engine() uses dev_err() which leads to errors being
reported as e.g:
dwc-eth-dwmac 2490000.ethernet: Failed to reset the dma
dwc-eth-dwmac 2490000.ethernet eth0: stmmac_hw_setup: DMA engine initialization failed
stmmac_init_dma_engine() is only called from stmmac_hw_setup() which
itself uses netdev_err(), and we will have a net_device setup. So,
change the dev_err() to netdev_err() to give consistent error
messages.
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Link: https://patch.msgid.link/E1tl5y1-004UgG-8X@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Commit 7acf8a1e8a28 ("Replace 2 jiffies with sysctl netdev_budget_usecs
to enable softirq tuning") added a possibility to set
net_hotdata.netdev_budget_usecs, but added no lower bound checking.
Commit a4837980fd9f ("net: revert default NAPI poll timeout to 2 jiffies")
made the *initial* value HZ-dependent, so the initial value is at least
2 jiffies even for lower HZ values (2 ms for 1000 Hz, 8ms for 250 Hz, 20
ms for 100 Hz).
But a user still can set improper values by a sysctl. Set .extra1
(the lower bound) for net_hotdata.netdev_budget_usecs to the same value
as in the latter commit. That is to 2 jiffies.
Fixes: a4837980fd9f ("net: revert default NAPI poll timeout to 2 jiffies")
Fixes: 7acf8a1e8a28 ("Replace 2 jiffies with sysctl netdev_budget_usecs to enable softirq tuning")
Signed-off-by: Jiri Slaby (SUSE) <jirislaby@kernel.org>
Cc: Dmitry Yakunin <zeil@yandex-team.ru>
Cc: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
Link: https://patch.msgid.link/20250220110752.137639-1-jirislaby@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
The current test marks all unexpected return values as failed and sets ret
to 1. If a test is skipped, the entire test also returns 1, incorrectly
indicating failure.
To fix this, add a skipped variable and set ret to 4 if it was previously
0. Otherwise, keep ret set to 1.
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Link: https://patch.msgid.link/20250220085326.1512814-1-liuhangbin@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Ido Schimmel says:
====================
net: fib_rules: Add DSCP mask support
In some deployments users would like to encode path information into
certain bits of the IPv6 flow label, the UDP source port and the DSCP
field and use this information to route packets accordingly.
Redirecting traffic to a routing table based on specific bits in the
DSCP field is not currently possible. Only exact match is currently
supported by FIB rules.
This patchset extends FIB rules to match on the DSCP field with an
optional mask.
Patches #1-#5 gradually extend FIB rules to match on the DSCP field with
an optional mask.
Patch #6 adds test cases for the new functionality.
iproute2 support can be found here [1].
[1] https://github.com/idosch/iproute2/tree/submit/fib_rule_mask_v1
====================
Link: https://patch.msgid.link/20250220080525.831924-1-idosch@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Add tests for FIB rules that match on DSCP with a mask. Test both good
and bad flows and both the input and output paths.
# ./fib_rule_tests.sh
IPv6 FIB rule tests
[...]
TEST: rule6 check: dscp redirect to table [ OK ]
TEST: rule6 check: dscp no redirect to table [ OK ]
TEST: rule6 del by pref: dscp redirect to table [ OK ]
TEST: rule6 check: iif dscp redirect to table [ OK ]
TEST: rule6 check: iif dscp no redirect to table [ OK ]
TEST: rule6 del by pref: iif dscp redirect to table [ OK ]
TEST: rule6 check: dscp masked redirect to table [ OK ]
TEST: rule6 check: dscp masked no redirect to table [ OK ]
TEST: rule6 del by pref: dscp masked redirect to table [ OK ]
TEST: rule6 check: iif dscp masked redirect to table [ OK ]
TEST: rule6 check: iif dscp masked no redirect to table [ OK ]
TEST: rule6 del by pref: iif dscp masked redirect to table [ OK ]
[...]
Tests passed: 316
Tests failed: 0
Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Guillaume Nault <gnault@redhat.com>
Link: https://patch.msgid.link/20250220080525.831924-7-idosch@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Add new DSCP mask attribute to the spec. Example:
# ./tools/net/ynl/cli.py --spec Documentation/netlink/specs/rt_rule.yaml \
--do newrule \
--json '{"family": 2, "dscp": 10, "dscp-mask": 63, "action": 1, "table": 1}'
None
$ ./tools/net/ynl/cli.py --spec Documentation/netlink/specs/rt_rule.yaml \
--dump getrule --json '{"family": 2}' --output-json | jq '.[]'
[...]
{
"table": 1,
"suppress-prefixlen": "0xffffffff",
"protocol": 0,
"priority": 32765,
"dscp": 10,
"dscp-mask": "0x3f",
"family": 2,
"dst-len": 0,
"src-len": 0,
"tos": 0,
"action": "to-tbl",
"flags": 0
}
[...]
Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Guillaume Nault <gnault@redhat.com>
Link: https://patch.msgid.link/20250220080525.831924-6-idosch@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Allow user space to configure FIB rules that match on DSCP with a mask,
now that support has been added to the IPv4 and IPv6 address families.
Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Guillaume Nault <gnault@redhat.com>
Link: https://patch.msgid.link/20250220080525.831924-5-idosch@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Extend IPv6 FIB rules to match on DSCP using a mask. Unlike IPv4, also
initialize the DSCP mask when a non-zero 'tos' is specified as there is
no difference in matching between 'tos' and 'dscp'. As a side effect,
this makes it possible to match on 'dscp 0', like in IPv4.
Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Guillaume Nault <gnault@redhat.com>
Link: https://patch.msgid.link/20250220080525.831924-4-idosch@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Extend IPv4 FIB rules to match on DSCP using a mask. The mask is only
set in rules that match on DSCP (not TOS) and initialized to cover the
entire DSCP field if the mask attribute is not specified.
Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Guillaume Nault <gnault@redhat.com>
Link: https://patch.msgid.link/20250220080525.831924-3-idosch@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Add an attribute that allows matching on DSCP with a mask. Matching on
DSCP with a mask is needed in deployments where users encode path
information into certain bits of the DSCP field.
Temporarily set the type of the attribute to 'NLA_REJECT' while support
is being added.
Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Guillaume Nault <gnault@redhat.com>
Link: https://patch.msgid.link/20250220080525.831924-2-idosch@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
After commit 22600596b675 ("ipv4: give an IPv4 dev to blackhole_netdev")
IPv4 neighbors can be constructed on the blackhole net device, but they
are constructed with an output function (neigh_direct_output()) that
simply calls dev_queue_xmit(). The latter will transmit packets via
'skb->dev' which might not be the blackhole net device if dst_dev_put()
switched 'dst->dev' to the blackhole net device while another CPU was
using the dst entry in ip_output(), but after it already initialized
'skb->dev' from 'dst->dev'.
Specifically, the following can happen:
CPU1 CPU2
udp_sendmsg(sk1) udp_sendmsg(sk2)
udp_send_skb() [...]
ip_output()
skb->dev = skb_dst(skb)->dev
dst_dev_put()
dst->dev = blackhole_netdev
ip_finish_output2()
resolves neigh on dst->dev
neigh_output()
neigh_direct_output()
dev_queue_xmit()
This will result in IPv4 packets being sent without an Ethernet header
via a valid net device:
tcpdump: verbose output suppressed, use -v[v]... for full protocol decode
listening on enp9s0, link-type EN10MB (Ethernet), snapshot length 262144 bytes
22:07:02.329668 20:00:40:11:18:fb > 45:00:00:44:f4:94, ethertype Unknown
(0x58c6), length 68:
0x0000: 8dda 74ca f1ae ca6c ca6c 0098 969c 0400 ..t....l.l......
0x0010: 0000 4730 3f18 6800 0000 0000 0000 9971 ..G0?.h........q
0x0020: c4c9 9055 a157 0a70 9ead bf83 38ca ab38 ...U.W.p....8..8
0x0030: 8add ab96 e052 .....R
Fix by making sure that neighbors are constructed on top of the
blackhole net device with an output function that simply consumes the
packets, in a similar fashion to dst_discard_out() and
blackhole_netdev_xmit().
Fixes: 8d7017fd621d ("blackhole_netdev: use blackhole_netdev to invalidate dst entries")
Fixes: 22600596b675 ("ipv4: give an IPv4 dev to blackhole_netdev")
Reported-by: Florian Meister <fmei@sfs.com>
Closes: https://lore.kernel.org/netdev/20250210084931.23a5c2e4@hermes.local/
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20250220072559.782296-1-idosch@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
While kernel sockets are dismantled during pernet_operations->exit(),
their freeing can be delayed by any tx packets still held in qdisc
or device queues, due to skb_set_owner_w() prior calls.
This then trigger the following warning from ref_tracker_dir_exit() [1]
To fix this, make sure that kernel sockets own a reference on net->passive.
Add sk_net_refcnt_upgrade() helper, used whenever a kernel socket
is converted to a refcounted one.
[1]
[ 136.263918][ T35] ref_tracker: net notrefcnt@ffff8880638f01e0 has 1/2 users at
[ 136.263918][ T35] sk_alloc+0x2b3/0x370
[ 136.263918][ T35] inet6_create+0x6ce/0x10f0
[ 136.263918][ T35] __sock_create+0x4c0/0xa30
[ 136.263918][ T35] inet_ctl_sock_create+0xc2/0x250
[ 136.263918][ T35] igmp6_net_init+0x39/0x390
[ 136.263918][ T35] ops_init+0x31e/0x590
[ 136.263918][ T35] setup_net+0x287/0x9e0
[ 136.263918][ T35] copy_net_ns+0x33f/0x570
[ 136.263918][ T35] create_new_namespaces+0x425/0x7b0
[ 136.263918][ T35] unshare_nsproxy_namespaces+0x124/0x180
[ 136.263918][ T35] ksys_unshare+0x57d/0xa70
[ 136.263918][ T35] __x64_sys_unshare+0x38/0x40
[ 136.263918][ T35] do_syscall_64+0xf3/0x230
[ 136.263918][ T35] entry_SYSCALL_64_after_hwframe+0x77/0x7f
[ 136.263918][ T35]
[ 136.343488][ T35] ref_tracker: net notrefcnt@ffff8880638f01e0 has 1/2 users at
[ 136.343488][ T35] sk_alloc+0x2b3/0x370
[ 136.343488][ T35] inet6_create+0x6ce/0x10f0
[ 136.343488][ T35] __sock_create+0x4c0/0xa30
[ 136.343488][ T35] inet_ctl_sock_create+0xc2/0x250
[ 136.343488][ T35] ndisc_net_init+0xa7/0x2b0
[ 136.343488][ T35] ops_init+0x31e/0x590
[ 136.343488][ T35] setup_net+0x287/0x9e0
[ 136.343488][ T35] copy_net_ns+0x33f/0x570
[ 136.343488][ T35] create_new_namespaces+0x425/0x7b0
[ 136.343488][ T35] unshare_nsproxy_namespaces+0x124/0x180
[ 136.343488][ T35] ksys_unshare+0x57d/0xa70
[ 136.343488][ T35] __x64_sys_unshare+0x38/0x40
[ 136.343488][ T35] do_syscall_64+0xf3/0x230
[ 136.343488][ T35] entry_SYSCALL_64_after_hwframe+0x77/0x7f
Fixes: 0cafd77dcd03 ("net: add a refcount tracker for kernel sockets")
Reported-by: syzbot+30a19e01a97420719891@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/netdev/67b72aeb.050a0220.14d86d.0283.GAE@google.com/T/#u
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Link: https://patch.msgid.link/20250220131854.4048077-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next
Martin KaFai Lau says:
====================
pull-request: bpf-next 2025-02-20
We've added 19 non-merge commits during the last 8 day(s) which contain
a total of 35 files changed, 1126 insertions(+), 53 deletions(-).
The main changes are:
1) Add TCP_RTO_MAX_MS support to bpf_set/getsockopt, from Jason Xing
2) Add network TX timestamping support to BPF sock_ops, from Jason Xing
3) Add TX metadata Launch Time support, from Song Yoong Siang
* tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next:
igc: Add launch time support to XDP ZC
igc: Refactor empty frame insertion for launch time support
net: stmmac: Add launch time support to XDP ZC
selftests/bpf: Add launch time request to xdp_hw_metadata
xsk: Add launch time hardware offload support to XDP Tx metadata
selftests/bpf: Add simple bpf tests in the tx path for timestamping feature
bpf: Support selective sampling for bpf timestamping
bpf: Add BPF_SOCK_OPS_TSTAMP_SENDMSG_CB callback
bpf: Add BPF_SOCK_OPS_TSTAMP_ACK_CB callback
bpf: Add BPF_SOCK_OPS_TSTAMP_SND_HW_CB callback
bpf: Add BPF_SOCK_OPS_TSTAMP_SND_SW_CB callback
bpf: Add BPF_SOCK_OPS_TSTAMP_SCHED_CB callback
net-timestamp: Prepare for isolating two modes of SO_TIMESTAMPING
bpf: Disable unsafe helpers in TX timestamping callbacks
bpf: Prevent unsafe access to the sock fields in the BPF timestamping callback
bpf: Prepare the sock_ops ctx and call bpf prog for TX timestamping
bpf: Add networking timestamping support to bpf_get/setsockopt()
selftests/bpf: Add rto max for bpf_setsockopt test
bpf: Support TCP_RTO_MAX_MS for bpf_setsockopt
====================
Link: https://patch.msgid.link/20250221022104.386462-1-martin.lau@linux.dev
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth
Luiz Augusto von Dentz says:
====================
bluetooth pull request for net:
- btusb: Always allow SCO packets for user channel
- L2CAP: Fix L2CAP_ECRED_CONN_RSP response
* tag 'for-net-2025-02-21' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth:
Bluetooth: L2CAP: Fix L2CAP_ECRED_CONN_RSP response
Bluetooth: Always allow SCO packets for user channel
====================
Link: https://patch.msgid.link/20250221154941.2139043-1-luiz.dentz@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Not all the devices have the capability for the driver to query for the
registered RSS configuration. The driver can discover this by checking
the relevant device option during setup. If it cannot, the driver needs
to store the RSS config cache and directly return such cache when
queried by the ethtool. RSS config is inited when driver probes. Also the
default RSS config will be adjusted when there is RX queue count change.
At this point, only keys of GVE_RSS_KEY_SIZE and indirection tables of
GVE_RSS_INDIR_SIZE are supported.
Signed-off-by: Ziwei Xiao <ziweixiao@google.com>
Reviewed-by: Harshitha Ramamurthy <hramamurthy@google.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: Praveen Kaligineedi <pkaligineedi@google.com>
Signed-off-by: Jeroen de Borst <jeroendb@google.com>
Link: https://patch.msgid.link/20250219200451.3348166-1-jeroendb@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|