Age | Commit message (Collapse) | Author |
|
Amery Hung says:
====================
Fix bpf qdisc bugs and clean up
This patchset fixes the following bugs in bpf qdisc and clean up the
selftest.
- A null-pointer dereference can happen in qdisc_watchdog_cancel() if
the timer is not initialized when 1) .init is not defined by user so
init prologue is not generated. 2) .init fails and qdisc_create()
calls .destroy
- bpf qdisc fails to attach to mq/mqprio when being set as the default
qdisc due to failed qdisc_lookup() in init prologue
v2
- Rebase to bpf-next/net
- Fix erroneous commit messages
- Fix and simplify selftests cleanup
v1: https://lore.kernel.org/bpf/20250501223025.569020-1-ameryhung@gmail.com/
====================
Link: https://patch.msgid.link/20250502201624.3663079-1-ameryhung@gmail.com
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
|
|
Some cleanups:
- Remove unnecessary kfuncs declaration
- Use _ns in the test name to run tests in a separate net namespace
- Call skeleton __attach() instead of bpf_map__attach_struct_ops() to
simplify tests.
Signed-off-by: Amery Hung <ameryhung@gmail.com>
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
|
|
Implement .destroy in bpf_fq and bpf_fifo as it is now mandatory.
Test attaching a bpf qdisc with a missing operator .init. This is not
allowed as bpf qdisc qdisc_watchdog_cancel() could have been called with
an uninitialized timer.
Signed-off-by: Amery Hung <ameryhung@gmail.com>
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
|
|
The patch makes all currently supported Qdisc_ops (i.e., .enqueue,
.dequeue, .init, .reset, and .destroy) mandatory.
Make .init, .reset and .destroy mandatory as bpf qdisc relies on prologue
and epilogue to check attach points and correctly initialize/cleanup
resources. The prologue/epilogue will only be generated for an struct_ops
operator only if users implement the operator.
Make .enqueue and .dequeue mandatory as bpf qdisc infra does not provide
a default data path.
Fixes: c8240344956e ("bpf: net_sched: Support implementation of Qdisc_ops in bpf")
Signed-off-by: Amery Hung <ameryhung@gmail.com>
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
|
|
First, test that bpf qdisc can be set as default qdisc. Then, attach
an mq qdisc to see if bpf qdisc can be successfully created and grafted.
The test is a sequential test as net.core.default_qdisc is global.
Signed-off-by: Amery Hung <ameryhung@gmail.com>
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
|
|
Allow .init to proceed if qdisc_lookup() returns NULL as it only happens
when called by qdisc_create_dflt() in mq/mqprio_init and the parent qdisc
has not been added to qdisc_hash yet. In qdisc_create(), the caller,
__tc_modify_qdisc(), would have made sure the parent qdisc already exist.
In addition, call qdisc_watchdog_init() whether .init succeeds or not to
prevent null-pointer dereference. In qdisc_create() and
qdisc_create_dflt(), if .init fails, .destroy will be called. As a
result, the destroy epilogue could call qdisc_watchdog_cancel() with an
uninitialized timer, causing null-pointer deference in hrtimer_cancel().
Fixes: c8240344956e ("bpf: net_sched: Support implementation of Qdisc_ops in bpf")
Signed-off-by: Amery Hung <ameryhung@gmail.com>
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull power management fixes from Rafael Wysocki:
"These fix three recent regressions, two in cpufreq and one in the
Intel Soundwire driver, and an unchecked MSR access in the
intel_pstate driver:
- Fix a recent regression causing systems where frequency tables are
used by cpufreq to have issues with setting frequency limits
(Rafael Wysocki)
- Fix a recent regressions causing frequency boost settings to become
out-of-sync if platform firmware updates the registers associated
with frequency boost during system resume (Viresh Kumar)
- Fix a recent regression causing resume failures to occur in the
Intel Soundwire driver if the device handled by it is in runtime
suspend before a system-wide suspend (Rafael Wysocki)
- Fix an unchecked MSR aceess in the intel_pstate driver occurring
when CPUID indicates no turbo, but the driver attempts to enable
turbo frequencies due to a misleading value read from an MSR
(Srinivas Pandruvada)"
* tag 'pm-6.15-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
cpufreq: intel_pstate: Unchecked MSR aceess in legacy mode
soundwire: intel_auxdevice: Fix system suspend/resume handling
cpufreq: Fix setting policy limits when frequency tables are used
cpufreq: ACPI: Re-sync CPU boost state on system resume
|
|
Pull smb client fixes from Steve French:
- fix posix mkdir error to ksmbd (also avoids crash in
cifs_destroy_request_bufs)
- two smb1 fixes: fixing querypath info and setpathinfo to old servers
- fix rsize/wsize when not multiple of page size to address DIO
reads/writes
* tag '6.15-rc4-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6:
smb: client: ensure aligned IO sizes
cifs: Fix changing times and read-only attr over SMB1 smb_set_file_info() function
cifs: Fix and improve cifs_query_path_info() and cifs_query_file_info()
smb: client: fix zero length for mkdir POSIX create context
|
|
Pull drm fixes from Dave Airlie:
"Weekly drm fixes, amdgpu and xe as usual, the new adp driver has a
bunch of vblank fixes, then a bunch of small fixes across the board.
Seems about the right level for this time in the release cycle.
ttm:
- docs warning fix
kunit
- fix leak in shmem tests
fdinfo:
- driver unbind race fix
amdgpu:
- Fix possible UAF in HDCP
- XGMI dma-buf fix
- NBIO 7.11 fix
- VCN 5.0.1 fix
xe:
- EU stall locking fix and disabling on VF
- Documentation fix kernel version supporting hwmon entries
- SVM fixes on error handling
i915:
- Fix build for CONFIG_DRM_I915_PXP=n
nouveau:
- fix race condition in fence handling
ivpu:
- interrupt handling fix
- D0i2 test mode fix
adp:
- vblank fixes
mipi-dbi:
- timing fix"
* tag 'drm-fixes-2025-05-03' of https://gitlab.freedesktop.org/drm/kernel: (23 commits)
drm/gpusvm: set has_dma_mapping inside mapping loop
drm/xe/hwmon: Fix kernel version documentation for temperature
drm/xe/eustall: Do not support EU stall on SRIOV VF
drm/xe/eustall: Resolve a possible circular locking dependency
drm/amdgpu: Add DPG pause for VCN v5.0.1
drm/amdgpu: Fix offset for HDP remap in nbio v7.11
drm/amdgpu: Fail DMABUF map of XGMI-accessible memory
drm/amd/display: Fix slab-use-after-free in hdcp
drm/mipi-dbi: Fix blanking for non-16 bit formats
drm/tests: shmem: Fix memleak
drm/xe/guc: Fix capture of steering registers
drm/xe/svm: fix dereferencing error pointer in drm_gpusvm_range_alloc()
drm: Select DRM_KMS_HELPER from DRM_DEBUG_DP_MST_TOPOLOGY_REFS
drm: adp: Remove pointless irq_lock spin lock
drm: adp: Enable vblank interrupts in crtc's .atomic_enable
drm: adp: Handle drm_crtc_vblank_get() errors
drm: adp: Use spin_lock_irqsave for drm device event_lock
drm/fdinfo: Protect against driver unbind
drm/ttm: fix the warning for hit_low and evict_low
accel/ivpu: Fix the D0i2 disable test mode
...
|
|
When changing memory attributes on a subset of a potential hugepage, add
the hugepage to the invalidation range tracking to prevent installing a
hugepage until the attributes are fully updated. Like the actual hugepage
tracking updates in kvm_arch_post_set_memory_attributes(), process only
the head and tail pages, as any potential hugepages that are entirely
covered by the range will already be tracked.
Note, only hugepage chunks whose current attributes are NOT mixed need to
be added to the invalidation set, as mixed attributes already prevent
installing a hugepage, and it's perfectly safe to install a smaller
mapping for a gfn whose attributes aren't changing.
Fixes: 8dd2eee9d526 ("KVM: x86/mmu: Handle page fault for private memory")
Cc: stable@vger.kernel.org
Reported-by: Michael Roth <michael.roth@amd.com>
Tested-by: Michael Roth <michael.roth@amd.com>
Link: https://lore.kernel.org/r/20250430220954.522672-1-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
|
|
Commit 4e15a0ddc3ff ("KVM: SEV: snapshot the GHCB before accessing it")
updated the SEV code to take a snapshot of the GHCB before using it. But
the dump_ghcb() function wasn't updated to use the snapshot locations.
This results in incorrect output from dump_ghcb() for the "is_valid" and
"valid_bitmap" fields.
Update dump_ghcb() to use the proper locations.
Fixes: 4e15a0ddc3ff ("KVM: SEV: snapshot the GHCB before accessing it")
Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Reviewed-by: Liam Merwick <liam.merwick@oracle.com>
Link: https://lore.kernel.org/r/8f03878443681496008b1b37b7c4bf77a342b459.1745866531.git.thomas.lendacky@amd.com
[sean: add comment and snapshot qualifier]
Signed-off-by: Sean Christopherson <seanjc@google.com>
|
|
Merge cpufreq fixes for 6.15-rc5:
- Fix a recent regression causing systems where frequency tables are
used by cpufreq to have issues with setting frequency limits (Rafael
Wysocki).
- Fix a recent regressions causing frequency boost settings to become
out-of-sync if platform firmware updates the registers associated
with them during system resume (Viresh Kumar).
- Fix an unchecked MSR aceess in the intel_pstate driver occurring when
CPUID indicates no turbo, but the driver attempts to enable turbo
frequencies due to a misleading value read from an MSR (Srinivas
Pandruvada).
* pm-cpufreq:
cpufreq: intel_pstate: Unchecked MSR aceess in legacy mode
cpufreq: Fix setting policy limits when frequency tables are used
cpufreq: ACPI: Re-sync CPU boost state on system resume
|
|
When a CL/CSD job times out, we check if the GPU has made any progress
since the last timeout. If so, instead of resetting the hardware, we skip
the reset and let the timer get rearmed. This gives long-running jobs a
chance to complete.
However, when `timedout_job()` is called, the job in question is removed
from the pending list, which means it won't be automatically freed through
`free_job()`. Consequently, when we skip the reset and keep the job
running, the job won't be freed when it finally completes.
This situation leads to a memory leak, as exposed in [1] and [2].
Similarly to commit 704d3d60fec4 ("drm/etnaviv: don't block scheduler when
GPU is still active"), this patch ensures the job is put back on the
pending list when extending the timeout.
Cc: stable@vger.kernel.org # 6.0
Reported-by: Daivik Bhatia <dtgs1208@gmail.com>
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/12227 [1]
Closes: https://github.com/raspberrypi/linux/issues/6817 [2]
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Acked-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com>
Link: https://lore.kernel.org/r/20250430210643.57924-1-mcanal@igalia.com
Signed-off-by: Maíra Canal <mcanal@igalia.com>
|
|
Jordan Rife says:
====================
bpf: udp: Exactly-once socket iteration
Both UDP and TCP socket iterators use iter->offset to track progress
through a bucket, which is a measure of the number of matching sockets
from the current bucket that have been seen or processed by the
iterator. On subsequent iterations, if the current bucket has
unprocessed items, we skip at least iter->offset matching items in the
bucket before adding any remaining items to the next batch. However,
iter->offset isn't always an accurate measure of "things already seen"
when the underlying bucket changes between reads which can lead to
repeated or skipped sockets. Instead, this series remembers the cookies
of the sockets we haven't seen yet in the current bucket and resumes
from the first cookie in that list that we can find on the next
iteration. This series focuses on UDP socket iterators, but a later
series will apply a similar approach to TCP socket iterators.
To be more specific, this series replaces struct sock **batch inside
struct bpf_udp_iter_state with union bpf_udp_iter_batch_item *batch,
where union bpf_udp_iter_batch_item can contain either a pointer to a
socket or a socket cookie. During reads, batch contains pointers to all
sockets in the current batch while between reads batch contains all the
cookies of the sockets in the current bucket that have yet to be
processed. On subsequent reads, when iteration resumes,
bpf_iter_udp_batch finds the first saved cookie that matches a socket in
the bucket's socket list and picks up from there to construct the next
batch. On average, assuming it's rare that the next socket disappears
before the next read occurs, we should only need to scan as much as we
did with the offset-based approach to find the starting point. In the
case that the next socket is no longer there, we keep scanning through
the saved cookies list until we find a match. The worst case is when
none of the sockets from last time exist anymore, but again, this should
be rare.
CHANGES
=======
v6 -> v7:
* Move initialization of iter->state.bucket to -1 from patch five ("bpf:
udp: Avoid socket skips and repeats during iteration") to patch three
("bpf: udp: Get rid of st_bucket_done") to avoid skipping the first
bucket in the patch three and four (Martin).
* Rename sock to sk in bpf_iter_batch_item (Martin).
* Use ASSERT_OK_PTR in do_resume_test to check if counts is NULL
(Martin).
* goto done in do_resume_test when calloc or sock_iter_batch__open fails
to make sure things are cleaned up properly, and initialize pointers
to NULL explicitly to silence warnings from llvm 20 in CI.
v5 -> v6:
* Rework the logic in patch two ("bpf: udp: Make sure iter->batch
always contains a full bucket snapshot") again to simplify it:
* Only try realloc with GFP_USER one time instead of two (Alexei).
* v5 introduced a second call to bpf_iter_udp_realloc_batch inside the
loop to handle the GFP_ATOMIC case. In v6, move the GFP_USER case
inside the loop as well, so it's all in once place. This, I feel,
makes it a bit easier to understand the control flow. Consequently,
it also simplifies the logic outside the loop.
* Use GFP_NOWAIT instead of GFP_ATOMIC to avoid depleting memory
reserves, since iterators are not critical operation (Alexei). Alexei
suggested using __GFP_NOWARN as well with GFP_NOWAIT, but this is
already set inside bpf_iter_udp_realloc_batch, so no change was needed
there.
* Introduce patch three ("bpf: udp: Get rid of st_bucket_done") to
simplify things further, since with patch two, st_bucket_done == true
is equivalent to iter->cur_sk == iter->end_sk.
* In patch five ("bpf: udp: Avoid socket skips and repeats during
iteration"), initialize iter->state.bucket to -1 so that on the first
call to bpf_iter_udp_batch, the resume_bucket condition is not hit.
This avoids adding a special case to the condition around
bpf_iter_udp_resume for bucket zero.
v4 -> v5:
* Rework the logic from patch two ("bpf: udp: Make sure iter->batch
always contains a full bucket snapshot") to move the handling of the
GFP_ATOMIC case inside the main loop and get rid of the extra lock
variable. This makes the logic clearer and makes it clearer that the
bucket lock is always released (Martin).
* Introduce udp_portaddr_for_each_entry_from in patch two instead of
patch four ("bpf: udp: Avoid socket skips and repeats during
iteration"), since patch two now needs to be able to resume list
iteration from an arbitrary point in the GFP_ATOMIC case.
* Similarly, introduce the memcpy inside bpf_iter_udp_realloc_batch in
patch two instead of patch four, since in the GFP_ATOMIC case the new
batch needs to remember the sockets from the old batch.
* Use sock_gen_cookie instead of __sock_gen_cookie inside
bpf_iter_udp_put_batch, since it can be called from a preemptible
context (Martin).
v3 -> v4:
* Explicitly assign sk = NULL on !iter->end_sk exit condition
(Kuniyuki).
* Reword the commit message of patch two ("bpf: udp: Make sure
iter->batch always contains a full bucket snapshot") to make the
reasoning for GFP_ATOMIC more clear.
v2 -> v3:
* Guarantee that iter->batch is always a full snapshot of a bucket to
prevent socket repeat scenarios [3]. This supercedes the patch from v2
that simply propagated ENOMEM up from bpf_iter_udp_batch and covers
the scenario where the batch size is still too small after a realloc.
* Fix up self tests (Martin)
* ASSERT_EQ(nread, sizeof(out), "nread") instead of
ASSERT_GE(nread, 1, "nread) in read_n.
* Use ASSERT_OK and ASSERT_OK_FD in several places.
* Add missing free(counts) to do_resume_test.
* Move int local_port declaration to the top of do_resume_test.
* Remove unnecessary guards before close and free.
v1 -> v2:
* Drop WARN_ON_ONCE from bpf_iter_udp_realloc_batch (Kuniyuki).
* Fixed memcpy size parameter in bpf_iter_udp_realloc_batch; before it
was missing sizeof(elem) * (Kuniyuki).
* Move "bpf: udp: Propagate ENOMEM up from bpf_iter_udp_batch" to patch
two in the series (Kuniyuki).
rfc [1] -> v1:
* Use hlist_entry_safe directly to retrieve the first socket in the
current bucket's linked list instead of immediately breaking from
udp_portaddr_for_each_entry (Martin).
* Cancel iteration if bpf_iter_udp_realloc_batch() can't grab enough
memory to contain a full snapshot of the current bucket to prevent
unwanted skips or repeats [2].
[1]: https://lore.kernel.org/bpf/20250404220221.1665428-1-jordan@jrife.io/
[2]: https://lore.kernel.org/bpf/CABi4-ogUtMrH8-NVB6W8Xg_F_KDLq=yy-yu-tKr2udXE2Mu1Lg@mail.gmail.com/
[3]: https://lore.kernel.org/bpf/d323d417-3e8b-48af-ae94-bc28469ac0c1@linux.dev/
====================
Link: https://patch.msgid.link/20250502161528.264630-1-jordan@jrife.io
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
|
|
Introduce a set of tests that exercise various bucket resume scenarios:
* remove_seen resumes iteration after removing a socket from the bucket
that we've already processed. Before, with the offset-based approach,
this test would have skipped an unseen socket after resuming
iteration. With the cookie-based approach, we now see all sockets
exactly once.
* remove_unseen exercises the condition where the next socket that we
would have seen is removed from the bucket before we resume iteration.
This tests the scenario where we need to scan past the first cookie in
our remembered cookies list to find the socket from which to resume
iteration.
* remove_all exercises the condition where all sockets we remembered
were removed from the bucket to make sure iteration terminates and
returns no more results.
* add_some exercises the condition where a few, but not enough to
trigger a realloc, sockets are added to the head of the current bucket
between reads. Before, with the offset-based approach, this test would
have repeated sockets we've already seen. With the cookie-based
approach, we now see all sockets exactly once.
* force_realloc exercises the condition that we need to realloc the
batch on a subsequent read, since more sockets than can be held in the
current batch array were added to the current bucket. This exercies
the logic inside bpf_iter_udp_realloc_batch that copies cookies into
the new batch to make sure nothing is skipped or repeated.
Signed-off-by: Jordan Rife <jordan@jrife.io>
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
|
|
Extend the iter_udp_soreuse and iter_tcp_soreuse programs to write the
cookie of the current socket, so that we can track the identity of the
sockets that the iterator has seen so far. Update the existing do_test
function to account for this change to the iterator program output. At
the same time, teach both programs to work with AF_INET as well.
Signed-off-by: Jordan Rife <jordan@jrife.io>
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
|
|
Replace the offset-based approach for tracking progress through a bucket
in the UDP table with one based on socket cookies. Remember the cookies
of unprocessed sockets from the last batch and use this list to
pick up where we left off or, in the case that the next socket
disappears between reads, find the first socket after that point that
still exists in the bucket and resume from there.
This approach guarantees that all sockets that existed when iteration
began and continue to exist throughout will be visited exactly once.
Sockets that are added to the table during iteration may or may not be
seen, but if they are they will be seen exactly once.
Signed-off-by: Jordan Rife <jordan@jrife.io>
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
|
|
On Qualcomm chipsets not all GPIOs are wakeup capable. Those GPIOs do not
have a corresponding MPM pin and should not be handled inside the MPM
driver. The IRQ domain hierarchy is always applied, so it's required to
explicitly disconnect the hierarchy for those. The pinctrl-msm driver marks
these with GPIO_NO_WAKE_IRQ. qcom-pdc has a check for this, but
irq-qcom-mpm is currently missing the check. This is causing crashes when
setting up interrupts for non-wake GPIOs:
root@rb1:~# gpiomon -c gpiochip1 10
irq: IRQ159: trimming hierarchy from :soc@0:interrupt-controller@f200000-1
Unable to handle kernel paging request at virtual address ffff8000a1dc3820
Hardware name: Qualcomm Technologies, Inc. Robotics RB1 (DT)
pc : mpm_set_type+0x80/0xcc
lr : mpm_set_type+0x5c/0xcc
Call trace:
mpm_set_type+0x80/0xcc (P)
qcom_mpm_set_type+0x64/0x158
irq_chip_set_type_parent+0x20/0x38
msm_gpio_irq_set_type+0x50/0x530
__irq_set_trigger+0x60/0x184
__setup_irq+0x304/0x6bc
request_threaded_irq+0xc8/0x19c
edge_detector_setup+0x260/0x364
linereq_create+0x420/0x5a8
gpio_ioctl+0x2d4/0x6c0
Fix this by copying the check for GPIO_NO_WAKE_IRQ from qcom-pdc.c, so that
MPM is removed entirely from the hierarchy for non-wake GPIOs.
Fixes: a6199bb514d8 ("irqchip: Add Qualcomm MPM controller driver")
Reported-by: Alexey Klimov <alexey.klimov@linaro.org>
Signed-off-by: Stephan Gerhold <stephan.gerhold@linaro.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Alexey Klimov <alexey.klimov@linaro.org>
Reviewed-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/all/20250502-irq-qcom-mpm-fix-no-wake-v1-1-8a1eafcd28d4@linaro.org
|
|
All vDSO code needs to be completely position independent. Symbol
references are marked as hidden so the compiler emits PC-relative
relocations.
However GCC emits absolute relocations for symbol-relative references with
an offset >= 64KiB. After recent refactorings in the vDSO code this is the
case in __arch_get_vdso_u_timens_data() with a page size of 64KiB.
Work around the issue by preventing the optimizer from seeing the offsets.
Fixes: 83a2a6b8cfc5 ("vdso/gettimeofday: Prepare do_hres_timens() for introduction of struct vdso_clock")
Reported-by: Jan Stancek <jstancek@redhat.com>
Signed-off-by: Thomas Weißschuh <thomas.weissschuh@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Link: https://lore.kernel.org/all/20250430-vdso-absolute-reloc-v2-1-5efcc3bc4b26@linutronix.de
Link: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120002
Closes: https://lore.kernel.org/lkml/aApGPAoctq_eoE2g@t14ultra/
|
|
Prepare for the next patch that tracks cookies between iterations by
converting struct sock **batch to union bpf_udp_iter_batch_item *batch
inside struct bpf_udp_iter_state.
Signed-off-by: Jordan Rife <jordan@jrife.io>
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi
Pull SCSI fixes from James Bottomley:
"Two minor updates, both in drivers"
* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
scsi: ufs: core: Remove redundant query_complete trace
scsi: myrb: Fix spelling mistake "statux" -> "status"
|
|
Get rid of the st_bucket_done field to simplify UDP iterator state and
logic. Before, st_bucket_done could be false if bpf_iter_udp_batch
returned a partial batch; however, with the last patch ("bpf: udp: Make
sure iter->batch always contains a full bucket snapshot"),
st_bucket_done == true is equivalent to iter->cur_sk == iter->end_sk.
Signed-off-by: Jordan Rife <jordan@jrife.io>
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
|
|
Require that iter->batch always contains a full bucket snapshot. This
invariant is important to avoid skipping or repeating sockets during
iteration when combined with the next few patches. Before, there were
two cases where a call to bpf_iter_udp_batch may only capture part of a
bucket:
1. When bpf_iter_udp_realloc_batch() returns -ENOMEM [1].
2. When more sockets are added to the bucket while calling
bpf_iter_udp_realloc_batch(), making the updated batch size
insufficient [2].
In cases where the batch size only covers part of a bucket, it is
possible to forget which sockets were already visited, especially if we
have to process a bucket in more than two batches. This forces us to
choose between repeating or skipping sockets, so don't allow this:
1. Stop iteration and propagate -ENOMEM up to userspace if reallocation
fails instead of continuing with a partial batch.
2. Try bpf_iter_udp_realloc_batch() with GFP_USER just as before, but if
we still aren't able to capture the full bucket, call
bpf_iter_udp_realloc_batch() again while holding the bucket lock to
guarantee the bucket does not change. On the second attempt use
GFP_NOWAIT since we hold onto the spin lock.
Introduce the udp_portaddr_for_each_entry_from macro and use it instead
of udp_portaddr_for_each_entry to make it possible to continue iteration
from an arbitrary socket. This is required for this patch in the
GFP_NOWAIT case to allow us to fill the rest of a batch starting from
the middle of a bucket and the later patch which skips sockets that were
already seen.
Testing all scenarios directly is a bit difficult, but I did some manual
testing to exercise the code paths where GFP_NOWAIT is used and where
ERR_PTR(err) is returned. I used the realloc test case included later
in this series to trigger a scenario where a realloc happens inside
bpf_iter_udp_batch and made a small code tweak to force the first
realloc attempt to allocate a too-small batch, thus requiring
another attempt with GFP_NOWAIT. Some printks showed both reallocs with
the tests passing:
Apr 25 23:16:24 crow kernel: go again GFP_USER
Apr 25 23:16:24 crow kernel: go again GFP_NOWAIT
With this setup, I also forced each of the bpf_iter_udp_realloc_batch
calls to return -ENOMEM to ensure that iteration ends and that the
read() in userspace fails.
[1]: https://lore.kernel.org/bpf/CABi4-ogUtMrH8-NVB6W8Xg_F_KDLq=yy-yu-tKr2udXE2Mu1Lg@mail.gmail.com/
[2]: https://lore.kernel.org/bpf/7ed28273-a716-4638-912d-f86f965e54bb@linux.dev/
Signed-off-by: Jordan Rife <jordan@jrife.io>
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
|
|
Prepare for the next patch which needs to be able to choose either
GFP_USER or GFP_NOWAIT for calls to bpf_iter_udp_realloc_batch.
Signed-off-by: Jordan Rife <jordan@jrife.io>
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com>
|
|
Pull block fixes from Jens Axboe:
- NVMe pull request via Christoph:
- fix queue unquiesce check on PCI slot_reset (Keith Busch)
- fix premature queue removal and I/O failover in nvme-tcp (Michael
Liang)
- don't restore null sk_state_change (Alistair Francis)
- select CONFIG_TLS where needed (Alistair Francis)
- always free derived key data (Hannes Reinecke)
- more quirks (Wentao Guan)
- ublk zero copy fix
- ublk selftest fix for UBLK_F_NEED_GET_DATA
* tag 'block-6.15-20250502' of git://git.kernel.dk/linux:
nvmet-auth: always free derived key data
nvmet-tcp: don't restore null sk_state_change
nvmet-tcp: select CONFIG_TLS from CONFIG_NVME_TARGET_TCP_TLS
nvme-tcp: select CONFIG_TLS from CONFIG_NVME_TCP_TLS
nvme-tcp: fix premature queue removal and I/O failover
nvme-pci: add quirks for WDC Blue SN550 15b7:5009
nvme-pci: add quirks for device 126f:1001
nvme-pci: fix queue unquiesce check on slot_reset
ublk: remove the check of ublk_need_req_ref() from __ublk_check_and_get_req
ublk: enhance check for register/unregister io buffer command
ublk: decouple zero copy from user copy
selftests: ublk: fix UBLK_F_NEED_GET_DATA
|
|
Pull io_uring fix from Jens Axboe:
"Just a single fix, annotating the fdinfo side SQ/CQ head/tail reads
with data_race() as they are known racy.
Only serves to silence syzbot testing, by definition these debug
outputs are going to be racy as they may change as soon as we've read
them"
* tag 'io_uring-6.15-20250502' of git://git.kernel.dk/linux:
io_uring/fdinfo: annotate racy sq/cq head/tail reads
|
|
Pull bcachefs fixes from Kent Overstreet:
"Lots of assorted small fixes...
- Some repair path fixes, a fix for -ENOMEM when reconstructing lots
of alloc info on large filesystems, upgrade for ancient 0.14
filesystems, etc.
- Various assert tweaks; assert -> ERO, ERO -> log the error in the
superblock and continue
- casefolding now uses d_ops like on other casefolding filesystems
- fix device label create on device add, fix bucket array resize on
filesystem resize
- fix xattrs with FORTIFY_SOURCE builds with gcc-15/clang"
* tag 'bcachefs-2025-05-01' of git://evilpiepirate.org/bcachefs: (22 commits)
bcachefs: Remove incorrect __counted_by annotation
bcachefs: add missing sched_annotate_sleep()
bcachefs: Fix __bch2_dev_group_set()
bcachefs: Kill ERO for i_blocks check in truncate
bcachefs: check for inode.bi_sectors underflow
bcachefs: Kill ERO in __bch2_i_sectors_acct()
bcachefs: readdir fixes
bcachefs: improve missing journal write device error message
bcachefs: Topology error after insert is now an ERO
bcachefs: Use bch2_kvmalloc() for journal keys array
bcachefs: More informative error message when shutting down due to error
bcachefs: btree_root_unreadable_and_scan_found_nothing autofix for non data btrees
bcachefs: btree_node_data_missing is now autofix
bcachefs: Don't generate alloc updates to invalid buckets
bcachefs: Improve bch2_dev_bucket_missing()
bcachefs: fix bch2_dev_buckets_resize()
bcachefs: Add upgrade table entry from 0.14
bcachefs: Run BCH_RECOVERY_PASS_reconstruct_snapshots on missing subvol -> snapshot
bcachefs: Add missing utf8_unload()
bcachefs: Emit unicode version message on startup
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl
Pull pin control fixes from Linus Walleij:
- Fix potential NULL dereference in the i.MX driver
- Fix the pull up/down resistor values in the Meson driver
- Fix the mapping of the PHY LED pins in the Airhoa driver
- Fix EINT interrupts on older controllers and a debounce value issue
in the Mediatek driver
- Fix an erronoeus PINGROUP define in the Qualcomm driver
* tag 'pinctrl-v6.15-2' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl:
pinctrl: qcom: Fix PINGROUP definition for sm8750
pinctrl: mediatek: common-v1: Fix error checking in mtk_eint_init()
pinctrl: mediatek: Fix new design debounce issue
pinctrl: mediatek: common-v1: Fix EINT breakage on older controllers
pinctrl: airoha: fix wrong PHY LED mapping and PHY2 LED defines
pinctrl: meson: define the pull up/down resistor value as 60 kOhm
pinctrl: imx: Return NULL if no group is matched and found
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/iommu/linux
Pull iommu fixes from Joerg Roedel:
"ARM-SMMU fixes:
- Fix broken detection of the S2FWB feature
- Ensure page-size bitmap is initialised for SVA domains
- Fix handling of SMMU client devices with duplicate Stream IDs
- Don't fail SMMU probe if Stream IDs are aliased across clients
Intel VT-d fixes:
- Add quirk for IGFX device
- Revert an ATS change to fix a boot failure
AMD IOMMU:
- Fix potential buffer overflow
Core:
- Fix for iommu_copy_struct_from_user()"
* tag 'iommu-fixes-v6.15-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/iommu/linux:
iommu/vt-d: Apply quirk_iommu_igfx for 8086:0044 (QM57/QS57)
iommu/vt-d: Revert ATS timing change to fix boot failure
iommu: Fix two issues in iommu_copy_struct_from_user()
iommu/amd: Fix potential buffer overflow in parse_ivrs_acpihid
iommu/arm-smmu-v3: Fail aliasing StreamIDs more gracefully
iommu/arm-smmu-v3: Fix iommu_device_probe bug due to duplicated stream ids
iommu/arm-smmu-v3: Fix pgsize_bit for sva domains
iommu/arm-smmu-v3: Add missing S2FWB feature detection
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/vbabka/slab
Pull slab fix from Vlastimil Babka:
- Stable fix to avoid bugs due to leftover obj_ext after allocation
profiling is disabled at runtime (Zhenhua Huang)
* tag 'slab-for-6.15-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/vbabka/slab:
mm, slab: clean up slab->obj_exts always
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/andi.shyti/linux into i2c/for-current
i2c-host-fixes for v6.15-rc5
- imx-lpi2c: fix error handling sequence in probe
|
|
When calling scnprintf() to append recovery method to event_string,
the second argument should be `sizeof(event_string) - len`, otherwise
there is a potential overflow problem.
Fixes: b7cf9f4ac1b8 ("drm: Introduce device wedged event")
Signed-off-by: Feng Jiang <jiangfeng@kylinos.cn>
Reviewed-by: André Almeida <andrealmeid@igalia.com>
Reviewed-by: Raag Jadav <raag.jadav@intel.com>
Link: https://lore.kernel.org/r/20250409014633.31303-1-jiangfeng@kylinos.cn
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
|
|
After commit 0a65bc27bd64 ("eventpoll: Set epoll timeout if it's in
the future"), the following program would immediately enter a busy
loop in the kernel:
```
int main() {
int e = epoll_create1(0);
struct epoll_event event = {.events = EPOLLIN};
epoll_ctl(e, EPOLL_CTL_ADD, 0, &event);
const struct timespec timeout = {.tv_nsec = 1};
epoll_pwait2(e, &event, 1, &timeout, 0);
}
```
This happens because the given (non-zero) timeout of 1 nanosecond
usually expires before ep_poll() is entered and then
ep_schedule_timeout() returns false, but `timed_out` is never set
because the code line that sets it is skipped. This quickly turns
into a soft lockup, RCU stalls and deadlocks, inflicting severe
headaches to the whole system.
When the timeout has expired, we don't need to schedule a hrtimer, but
we should set the `timed_out` variable. Therefore, I suggest moving
the ep_schedule_timeout() check into the `timed_out` expression
instead of skipping it.
brauner: Note that there was an earlier fix by Joe Damato in response to
my bug report in [1].
Fixes: 0a65bc27bd64 ("eventpoll: Set epoll timeout if it's in the future")
Cc: Joe Damato <jdamato@fastly.com>
Cc: stable@vger.kernel.org
Signed-off-by: Max Kellermann <max.kellermann@ionos.com>
Link: https://lore.kernel.org/20250429153419.94723-1-jdamato@fastly.com [1]
Link: https://lore.kernel.org/20250429185827.3564438-1-max.kellermann@ionos.com
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
The ring buffer size varies across VMBus channels. The size of sysfs
node for the ring buffer is currently hardcoded to 4 MB. Userspace
clients either use fstat() or hardcode this size for doing mmap().
To address this, make the sysfs node size dynamic to reflect the
actual ring buffer size for each channel. This will ensure that
fstat() on ring sysfs node always returns the correct size of
ring buffer.
Reviewed-by: Michael Kelley <mhklinux@outlook.com>
Tested-by: Michael Kelley <mhklinux@outlook.com>
Reviewed-by: Dexuan Cui <decui@microsoft.com>
Signed-off-by: Naman Jain <namjain@linux.microsoft.com>
Link: https://lore.kernel.org/r/20250502074811.2022-3-namjain@linux.microsoft.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
On regular bootup, devices get registered to VMBus first, so when
uio_hv_generic driver for a particular device type is probed,
the device is already initialized and added, so sysfs creation in
hv_uio_probe() works fine. However, when the device is removed
and brought back, the channel gets rescinded and the device again gets
registered to VMBus. However this time, the uio_hv_generic driver is
already registered to probe for that device and in this case sysfs
creation is tried before the device's kobject gets initialized
completely.
Fix this by moving the core logic of sysfs creation of ring buffer,
from uio_hv_generic to HyperV's VMBus driver, where the rest of the
sysfs attributes for the channels are defined. While doing that, make
use of attribute groups and macros, instead of creating sysfs
directly, to ensure better error handling and code flow.
Problematic path:
vmbus_process_offer (A new offer comes for the VMBus device)
vmbus_add_channel_work
vmbus_device_register
|-> device_register
| |...
| |-> hv_uio_probe
| |...
| |-> sysfs_create_bin_file (leads to a warning as
| the primary channel's kobject, which is used to
| create the sysfs file, is not yet initialized)
|-> kset_create_and_add
|-> vmbus_add_channel_kobj (initialization of the primary
channel's kobject happens later)
Above code flow is sequential and the warning is always reproducible in
this path.
Fixes: 9ab877a6ccf8 ("uio_hv_generic: make ring buffer attribute for primary channel")
Cc: stable@kernel.org
Suggested-by: Saurabh Sengar <ssengar@linux.microsoft.com>
Suggested-by: Michael Kelley <mhklinux@outlook.com>
Reviewed-by: Michael Kelley <mhklinux@outlook.com>
Tested-by: Michael Kelley <mhklinux@outlook.com>
Reviewed-by: Dexuan Cui <decui@microsoft.com>
Signed-off-by: Naman Jain <namjain@linux.microsoft.com>
Link: https://lore.kernel.org/r/20250502074811.2022-2-namjain@linux.microsoft.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
The folio_index() helper is only needed for mixed usage of page cache
and swap cache, for pure page cache usage, the caller can just use
folio->index instead.
It can't be a swap cache folio here. Swap mapping may only call into fs
through 'swap_rw' but btrfs does not use that method for swap.
Reviewed-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Kairui Song <kasong@tencent.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
|
|
This reverts commit 7e06de7c83a746e58d4701e013182af133395188.
Commit 7e06de7c83a7 ("btrfs: canonicalize the device path before adding
it") tries to make btrfs to use "/dev/mapper/*" name first, then any
filename inside "/dev/" as the device path.
This is mostly fine when there is only the root namespace involved, but
when multiple namespace are involved, things can easily go wrong for the
d_path() usage.
As d_path() returns a file path that is namespace dependent, the
resulted string may not make any sense in another namespace.
Furthermore, the "/dev/" prefix checks itself is not reliable, one can
still make a valid initramfs without devtmpfs, and fill all needed
device nodes manually.
Overall the userspace has all its might to pass whatever device path for
mount, and we are not going to win the war trying to cover every corner
case.
So just revert that commit, and do no extra d_path() based file path
sanity check.
CC: stable@vger.kernel.org # 6.12+
Link: https://lore.kernel.org/linux-fsdevel/20250115185608.GA2223535@zen.localdomain/
Reviewed-by: Boris Burkov <boris@bur.io>
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
|
|
[BUG]
When trying read-only scrub on a btrfs with rescue=idatacsums mount
option, it will crash with the following call trace:
BUG: kernel NULL pointer dereference, address: 0000000000000208
#PF: supervisor read access in kernel mode
#PF: error_code(0x0000) - not-present page
CPU: 1 UID: 0 PID: 835 Comm: btrfs Tainted: G O 6.15.0-rc3-custom+ #236 PREEMPT(full)
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS unknown 02/02/2022
RIP: 0010:btrfs_lookup_csums_bitmap+0x49/0x480 [btrfs]
Call Trace:
<TASK>
scrub_find_fill_first_stripe+0x35b/0x3d0 [btrfs]
scrub_simple_mirror+0x175/0x290 [btrfs]
scrub_stripe+0x5f7/0x6f0 [btrfs]
scrub_chunk+0x9a/0x150 [btrfs]
scrub_enumerate_chunks+0x333/0x660 [btrfs]
btrfs_scrub_dev+0x23e/0x600 [btrfs]
btrfs_ioctl+0x1dcf/0x2f80 [btrfs]
__x64_sys_ioctl+0x97/0xc0
do_syscall_64+0x4f/0x120
entry_SYSCALL_64_after_hwframe+0x76/0x7e
[CAUSE]
Mount option "rescue=idatacsums" will completely skip loading the csum
tree, so that any data read will not find any data csum thus we will
ignore data checksum verification.
Normally call sites utilizing csum tree will check the fs state flag
NO_DATA_CSUMS bit, but unfortunately scrub does not check that bit at all.
This results in scrub to call btrfs_search_slot() on a NULL pointer
and triggered above crash.
[FIX]
Check both extent and csum tree root before doing any tree search.
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
|
|
num_extent_folios() unconditionally calls folio_order() on
eb->folios[0]. If that is NULL this will be a segfault. It is reasonable
for it to return 0 as the number of folios in the eb when the first
entry is NULL, so do that instead.
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Boris Burkov <boris@bur.io>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
|
|
btrfs_prelim_ref() calls the old and new reference variables in the
incorrect order. This causes a NULL pointer dereference because oldref
is passed as NULL to trace_btrfs_prelim_ref_insert().
Note, trace_btrfs_prelim_ref_insert() is being called with newref as
oldref (and oldref as NULL) on purpose in order to print out
the values of newref.
To reproduce:
echo 1 > /sys/kernel/debug/tracing/events/btrfs/btrfs_prelim_ref_insert/enable
Perform some writeback operations.
Backtrace:
BUG: kernel NULL pointer dereference, address: 0000000000000018
#PF: supervisor read access in kernel mode
#PF: error_code(0x0000) - not-present page
PGD 115949067 P4D 115949067 PUD 11594a067 PMD 0
Oops: Oops: 0000 [#1] SMP NOPTI
CPU: 1 UID: 0 PID: 1188 Comm: fsstress Not tainted 6.15.0-rc2-tester+ #47 PREEMPT(voluntary) 7ca2cef72d5e9c600f0c7718adb6462de8149622
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.16.3-2-gc13ff2cd-prebuilt.qemu.org 04/01/2014
RIP: 0010:trace_event_raw_event_btrfs__prelim_ref+0x72/0x130
Code: e8 43 81 9f ff 48 85 c0 74 78 4d 85 e4 0f 84 8f 00 00 00 49 8b 94 24 c0 06 00 00 48 8b 0a 48 89 48 08 48 8b 52 08 48 89 50 10 <49> 8b 55 18 48 89 50 18 49 8b 55 20 48 89 50 20 41 0f b6 55 28 88
RSP: 0018:ffffce44820077a0 EFLAGS: 00010286
RAX: ffff8c6b403f9014 RBX: ffff8c6b55825730 RCX: 304994edf9cf506b
RDX: d8b11eb7f0fdb699 RSI: ffff8c6b403f9010 RDI: ffff8c6b403f9010
RBP: 0000000000000001 R08: 0000000000000001 R09: 0000000000000010
R10: 00000000ffffffff R11: 0000000000000000 R12: ffff8c6b4e8fb000
R13: 0000000000000000 R14: ffffce44820077a8 R15: ffff8c6b4abd1540
FS: 00007f4dc6813740(0000) GS:ffff8c6c1d378000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000018 CR3: 000000010eb42000 CR4: 0000000000750ef0
PKRU: 55555554
Call Trace:
<TASK>
prelim_ref_insert+0x1c1/0x270
find_parent_nodes+0x12a6/0x1ee0
? __entry_text_end+0x101f06/0x101f09
? srso_alias_return_thunk+0x5/0xfbef5
? srso_alias_return_thunk+0x5/0xfbef5
? srso_alias_return_thunk+0x5/0xfbef5
? srso_alias_return_thunk+0x5/0xfbef5
btrfs_is_data_extent_shared+0x167/0x640
? fiemap_process_hole+0xd0/0x2c0
extent_fiemap+0xa5c/0xbc0
? __entry_text_end+0x101f05/0x101f09
btrfs_fiemap+0x7e/0xd0
do_vfs_ioctl+0x425/0x9d0
__x64_sys_ioctl+0x75/0xc0
Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
|
|
In preparation for making the kmalloc() family of allocators type aware,
we need to make sure that the returned type from the allocation matches
the type of the variable being assigned. (Before, the allocator would
always return "void *", which can be implicitly cast to any pointer type.)
The assigned type is "struct folio **" but the returned type will be
"struct page **". These are the same allocation size (pointer size), but
the types don't match. Adjust the allocation type to match the assignment.
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Kees Cook <kees@kernel.org>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
|
|
Jakub Kicinski says:
====================
tools: ynl-gen: additional C types and classic netlink handling
This series is a bit of a random grab bag adding things we need
to generate code for rt-link.
First two patches are pretty random code cleanups.
Patch 3 adds default values if the spec is missing them.
Patch 4 adds support for setting Netlink request flags
(NLM_F_CREATE, NLM_F_REPLACE etc.). Classic netlink uses those
quite a bit.
Patches 5 and 6 extend the notification handling for variations
used in classic netlink. Patch 6 adds support for when notification
ID is the same as the ID of the response message to GET.
Next 4 patches add support for handling a couple of complex types.
These are supported by the schema and Python but C code gen wasn't
there.
Patch 11 is a bit of a hack, it skips code related to kernel
policy generation, since we don't need it for classic netlink.
Patch 12 adds support for having different fixed headers per op.
Something we could avoid in previous rtnetlink specs but some
specs do mix.
v2: https://lore.kernel.org/20250425024311.1589323-1-kuba@kernel.org
v1: https://lore.kernel.org/20250424021207.1167791-1-kuba@kernel.org
====================
Link: https://patch.msgid.link/20250429154704.2613851-1-kuba@kernel.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
rtnetlink has variety of ops with different fixed headers.
Detect that op fixed header is not the same as family one,
and use sizeof() directly. For reverse parsing we need to
pass the fixed header len along the policy (in the socket
state).
Reviewed-by: Donald Hunter <donald.hunter@gmail.com>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Link: https://patch.msgid.link/20250429154704.2613851-13-kuba@kernel.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
rt-link has a vlan-protocols enum with:
name: 8021q value: 33024
name: 8021ad value: 34984
It's nice to have, since it converts the values to strings in Python.
For C, however, the codegen is trying to use enums to generate strict
policy checks. Parsing such sparse enums is not possible via policies.
Since for classic netlink we don't support kernel codegen and policy
generation - skip the auto-generation of checks from enums.
Reviewed-by: Donald Hunter <donald.hunter@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Link: https://patch.msgid.link/20250429154704.2613851-12-kuba@kernel.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
IPv6 addresses are expressed as binary arrays since we don't have u128.
Since they are not variable length, however, they are relatively
easy to represent as an array of known size.
Reviewed-by: Donald Hunter <donald.hunter@gmail.com>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Link: https://patch.msgid.link/20250429154704.2613851-11-kuba@kernel.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
C codegen supports ArrayNest AKA indexed-array carrying scalars,
but only for the netlink -> struct parsing. Support rendering
from struct to netlink.
Reviewed-by: Donald Hunter <donald.hunter@gmail.com>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Link: https://patch.msgid.link/20250429154704.2613851-10-kuba@kernel.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
Binary types with struct are fixed size, relatively easy to
handle for multi attr. Declare the member as a pointer.
Count the members, allocate an array, copy in the data.
Allow the netlink attr to be smaller or larger than our view
of the struct in case the build headers are newer or older
than the running kernel.
Reviewed-by: Donald Hunter <donald.hunter@gmail.com>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Link: https://patch.msgid.link/20250429154704.2613851-9-kuba@kernel.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
Add support for multi attr strings (needed for link alt_names).
We record the length individual strings in a len member, to do
the same for multi-attr create a struct ynl_string in ynl.h
and use it as a layer holding both the string and its length.
Since strings may be arbitrary length dynamically allocate each
individual one.
Adjust arg_member and struct member to avoid spacing the double
pointers to get "type **name;" rather than "type * *name;"
Reviewed-by: Donald Hunter <donald.hunter@gmail.com>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Link: https://patch.msgid.link/20250429154704.2613851-8-kuba@kernel.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
Allow CRUD-style notification where the notification is more
like the response to the request, which can optionally be
looped back onto the requesting socket. Since the notification
and request are different ops in the spec, for example:
-
name: delrule
doc: Remove an existing FIB rule
attribute-set: fib-rule-attrs
do:
request:
value: 33
attributes: *fib-rule-all
-
name: delrule-ntf
doc: Notify a rule deletion
value: 33
notify: getrule
We need to find the request by ID. Ideally we'd detect this model
from the spec properties, rather than assume that its what all
classic netlink families do. But maybe that'd cause this model
to spread and its easy to get wrong. For now assume CRUD == classic.
Reviewed-by: Donald Hunter <donald.hunter@gmail.com>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Link: https://patch.msgid.link/20250429154704.2613851-7-kuba@kernel.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
Classic Netlink has GET callbacks with no doit support, just dumps.
Support using their responses in notifications. If notification points
at a type which only has a dump - use the dump's type.
Reviewed-by: Donald Hunter <donald.hunter@gmail.com>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Link: https://patch.msgid.link/20250429154704.2613851-6-kuba@kernel.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|