Age | Commit message (Collapse) | Author |
|
git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace
Pull probes fixes from Masami Hiramatsu:
- objpool: Fix objpool overrun case on memory/cache access delay
especially on the big.LITTLE SoC. The objpool uses a copy of object
slot index internal loop, but the slot index can be changed on
another processor in parallel. In that case, the difference of 'head'
local copy and the 'slot->last' index will be bigger than local slot
size. In that case, we need to re-read the slot::head to update it.
- kretprobe: Fix to use appropriate rcu API for kretprobe holder. Since
kretprobe_holder::rp is RCU managed, it should use
rcu_assign_pointer() and rcu_dereference_check() correctly. Also
adding __rcu tag for finding wrong usage by sparse.
- rethook: Fix to use appropriate rcu API for rethook::handler. The
same as kretprobe, rethook::handler is RCU managed and it should use
rcu_assign_pointer() and rcu_dereference_check(). This also adds
__rcu tag for finding wrong usage by sparse.
* tag 'probes-fixes-v6.7-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
rethook: Use __rcu pointer for rethook::handler
kprobes: consistent rcu api usage for kretprobe holder
lib: objpool: fix head overrun on RK3588 SBC
|
|
Set up build time warnings to safeguard against future header changes of
organized structs.
Warning includes:
1) whether all variables are still in the same cache group
2) whether all the cache groups have the sum of the members size (in the
maximum condition, including all members defined in configs)
The __cache_group* variables are ignored in kernel-doc check in the
various header files they appear in to enforce the cache groups.
Suggested-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Coco Li <lixiaoyan@google.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Shakeel Butt <shakeelb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Instead of relying on potentially imprecise tnum representation of
expected return value range for callbacks and subprogs, validate that
smin/smax range satisfy exact expected range of return values.
E.g., if callback would need to return [0, 2] range, tnum can't
represent this precisely and instead will allow [0, 3] range. By
checking smin/smax range, we can make sure that subprog/callback indeed
returns only valid [0, 2] range.
Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Acked-by: Shung-Hsi Yu <shung-hsi.yu@suse.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/r/20231202175705.885270-5-andrii@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
It's a trivial rearrangement saving 8 bytes. We have 4 bytes of padding
at the end which can be filled with another field without increasing
struct bpf_func_state.
copy_func_state() logic remains correct without any further changes.
BEFORE
======
struct bpf_func_state {
struct bpf_reg_state regs[11]; /* 0 1320 */
/* --- cacheline 20 boundary (1280 bytes) was 40 bytes ago --- */
int callsite; /* 1320 4 */
u32 frameno; /* 1324 4 */
u32 subprogno; /* 1328 4 */
u32 async_entry_cnt; /* 1332 4 */
bool in_callback_fn; /* 1336 1 */
/* XXX 7 bytes hole, try to pack */
/* --- cacheline 21 boundary (1344 bytes) --- */
struct tnum callback_ret_range; /* 1344 16 */
bool in_async_callback_fn; /* 1360 1 */
bool in_exception_callback_fn; /* 1361 1 */
/* XXX 2 bytes hole, try to pack */
int acquired_refs; /* 1364 4 */
struct bpf_reference_state * refs; /* 1368 8 */
int allocated_stack; /* 1376 4 */
/* XXX 4 bytes hole, try to pack */
struct bpf_stack_state * stack; /* 1384 8 */
/* size: 1392, cachelines: 22, members: 13 */
/* sum members: 1379, holes: 3, sum holes: 13 */
/* last cacheline: 48 bytes */
};
AFTER
=====
struct bpf_func_state {
struct bpf_reg_state regs[11]; /* 0 1320 */
/* --- cacheline 20 boundary (1280 bytes) was 40 bytes ago --- */
int callsite; /* 1320 4 */
u32 frameno; /* 1324 4 */
u32 subprogno; /* 1328 4 */
u32 async_entry_cnt; /* 1332 4 */
struct tnum callback_ret_range; /* 1336 16 */
/* --- cacheline 21 boundary (1344 bytes) was 8 bytes ago --- */
bool in_callback_fn; /* 1352 1 */
bool in_async_callback_fn; /* 1353 1 */
bool in_exception_callback_fn; /* 1354 1 */
/* XXX 1 byte hole, try to pack */
int acquired_refs; /* 1356 4 */
struct bpf_reference_state * refs; /* 1360 8 */
struct bpf_stack_state * stack; /* 1368 8 */
int allocated_stack; /* 1376 4 */
/* size: 1384, cachelines: 22, members: 13 */
/* sum members: 1379, holes: 1, sum holes: 1 */
/* padding: 4 */
/* last cacheline: 40 bytes */
};
Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/r/20231202175705.885270-2-andrii@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
No more users of this field.
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Keith Busch <kbusch@kernel.org>
Link: https://lore.kernel.org/r/20231130215309.2923568-5-kbusch@meta.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
No more users of this flag.
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Keith Busch <kbusch@kernel.org>
Link: https://lore.kernel.org/r/20231130215309.2923568-4-kbusch@meta.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Passthrough commands that utilize metadata currently need to bounce the
user space buffer through the kernel. Add support for mapping user space
directly so that we can avoid this costly overhead. This is similar to
how the normal bio data payload utilizes user addresses with
bio_map_user_iov().
If the user address can't directly be used for reason, like too many
segments or address unalignement, fallback to a copy of the user vec
while keeping the user address pinned for the IO duration so that it
can safely be copied on completion in any process context.
Signed-off-by: Keith Busch <kbusch@kernel.org>
Link: https://lore.kernel.org/r/20231130215309.2923568-2-kbusch@meta.com
[axboe: fold in fix from Kanchan Joshi]
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull power management fixes from Rafael Wysocki:
"These fix issues in two cpufreq drivers, in the AMD P-state driver and
in the power-capping DTPM framework.
Specifics:
- Fix the AMD P-state driver's EPP sysfs interface in the cases when
the performance governor is in use (Ayush Jain)
- Make the ->fast_switch() callback in the AMD P-state driver return
the target frequency as expected (Gautham R. Shenoy)
- Allow user space to control the range of frequencies to use via
scaling_min_freq and scaling_max_freq when AMD P-state driver is in
use (Wyes Karny)
- Prevent power domains needed for wakeup signaling from being turned
off during system suspend on Qualcomm systems and prevent
performance states votes from runtime-suspended devices from being
lost across a system suspend-resume cycle in qcom-cpufreq-nvmem
(Stephan Gerhold)
- Fix disabling the 792 Mhz OPP in the imx6q cpufreq driver for the
i.MX6ULL types that can run at that frequency (Christoph
Niedermaier)
- Eliminate unnecessary and harmful conversions to uW from the DTPM
(dynamic thermal and power management) framework (Lukasz Luba)"
* tag 'pm-6.7-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
cpufreq/amd-pstate: Only print supported EPP values for performance governor
cpufreq/amd-pstate: Fix scaling_min_freq and scaling_max_freq update
powercap: DTPM: Fix unneeded conversions to micro-Watts
cpufreq/amd-pstate: Fix the return value of amd_pstate_fast_switch()
pmdomain: qcom: rpmpd: Set GENPD_FLAG_ACTIVE_WAKEUP
cpufreq: qcom-nvmem: Preserve PM domain votes in system suspend
cpufreq: qcom-nvmem: Enable virtual power domain devices
cpufreq: imx6q: Don't disable 792 Mhz OPP unnecessarily
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull ACPI fixes from Rafael Wysocki:
"This fixes a recently introduced build issue on ARM32 and a NULL
pointer dereference in the ACPI backlight driver due to a design issue
exposed by a recent change in the ACPI bus type code.
Specifics:
- Fix a recently introduced build issue on ARM32 platforms caused by
an inadvertent header file breakage (Dave Jiang)
- Eliminate questionable usage of acpi_driver_data() in the ACPI
backlight cooling device code that leads to NULL pointer
dereferences after recent ACPI core changes (Hans de Goede)"
* tag 'acpi-6.7-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
ACPI: video: Use acpi_video_device for cooling-dev driver data
ACPI: Fix ARM32 platforms compile issue introduced by fw_table changes
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu
Pull iommu fixes from Joerg Roedel:
- Fix race conditions in device probe path
- Handle ERR_PTR() returns in __iommu_domain_alloc() path
- Update MAINTAINERS entry for Qualcom IOMMUs
- Printk argument fix in device tree specific code
- Several Intel VT-d fixes from Lu Baolu:
- Do not support enforcing cache coherency for non-empty domains
- Avoid devTLB invalidation if iommu is off
- Disable PCI ATS in legacy passthrough mode
- Support non-PCI devices when clearing context
- Fix incorrect cache invalidation for mm notification
- Add MTL to quirk list to skip TE disabling
- Set variable intel_dirty_ops to static
* tag 'iommu-fixes-v6.7-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu:
iommu: Fix printk arg in of_iommu_get_resv_regions()
iommu/vt-d: Set variable intel_dirty_ops to static
iommu/vt-d: Fix incorrect cache invalidation for mm notification
iommu/vt-d: Add MTL to quirk list to skip TE disabling
iommu/vt-d: Make context clearing consistent with context mapping
iommu/vt-d: Disable PCI ATS in legacy passthrough mode
iommu/vt-d: Omit devTLB invalidation requests when TES=0
iommu/vt-d: Support enforce_cache_coherency only for empty domains
iommu: Avoid more races around device probe
MAINTAINERS: list all Qualcomm IOMMU drivers in the QUALCOMM IOMMU entry
iommu: Flow ERR_PTR out from __iommu_domain_alloc()
|
|
Pull drm fixes from Dave Airlie:
"Weekly fixes, mostly amdgpu fixes with a scattering of nouveau, i915,
and a couple of reverts. Hopefully it will quieten down in coming
weeks.
drm:
- Revert unexport of prime helpers for fd/handle conversion
dma_resv:
- Do not double add fences in dma_resv_add_fence.
gpuvm:
- Fix GPUVM license identifier.
i915:
- Mark internal GSC engine with reserved uabi class
- Take VGA converters into account in eDP probe
- Fix intel_pre_plane_updates() call to ensure workarounds get applied
panel:
- Revert panel fixes as they require exporting device_is_dependent.
nouveau:
- fix oversized allocations in new vm path
- fix zero-length array
- remove a stray lock
nt36523:
- Fix error check for nt36523.
amdgpu:
- DMUB fix
- DCN 3.5 fixes
- XGMI fix
- DCN 3.2 fixes
- Vangogh suspend fix
- NBIO 7.9 fix
- GFX11 golden register fix
- Backlight fix
- NBIO 7.11 fix
- IB test overflow fix
- DCN 3.1.4 fixes
- fix a runtime pm ref count
- Retimer fix
- ABM fix
- DCN 3.1.5 fix
- Fix AGP addressing
- Fix possible memory leak in SMU error path
- Make sure PME is enabled in D3
- Fix possible NULL pointer dereference in debugfs
- EEPROM fix
- GC 9.4.3 fix
amdkfd:
- IP version check fix
- Fix memory leak in pqm_uninit()"
* tag 'drm-fixes-2023-12-01' of git://anongit.freedesktop.org/drm/drm: (53 commits)
Revert "drm/prime: Unexport helpers for fd/handle conversion"
drm/amdgpu: Use another offset for GC 9.4.3 remap
drm/amd/display: Fix some HostVM parameters in DML
drm/amdkfd: Free gang_ctx_bo and wptr_bo in pqm_uninit
drm/amdgpu: Update EEPROM I2C address for smu v13_0_0
drm/amd/display: Allow DTBCLK disable for DCN35
drm/amdgpu: Fix cat debugfs amdgpu_regs_didt causes kernel null pointer
drm/amd: Enable PCIe PME from D3
drm/amd/pm: fix a memleak in aldebaran_tables_init
drm/amdgpu: fix AGP addressing when GART is not at 0
drm/amd/display: update dcn315 lpddr pstate latency
drm/amd/display: fix ABM disablement
drm/amd/display: Fix black screen on video playback with embedded panel
drm/amd/display: Fix conversions between bytes and KB
drm/amdkfd: Use common function for IP version check
drm/amd/display: Remove config update
drm/amd/display: Update DCN35 clock table policy
drm/amd/display: force toggle rate wa for first link training for a retimer
drm/amdgpu: correct the amdgpu runtime dereference usage count
drm/amd/display: Update min Z8 residency time to 2100 for DCN314
...
|
|
Pull io_uring fixes from Jens Axboe:
- Fix an issue with discontig page checking for IORING_SETUP_NO_MMAP
- Fix an issue with not allowing IORING_SETUP_NO_MMAP also disallowing
mmap'ed buffer rings
- Fix an issue with deferred release of memory mapped pages
- Fix a lockdep issue with IORING_SETUP_NO_MMAP
- Use fget/fput consistently, even from our sync system calls. No real
issue here, but if we were ever to allow closing io_uring descriptors
it would be required. Let's play it safe and just use the full ref
counted versions upfront. Most uses of io_uring are threaded anyway,
and hence already doing the full version underneath.
* tag 'io_uring-6.7-2023-11-30' of git://git.kernel.dk/linux:
io_uring: use fget/fput consistently
io_uring: free io_buffer_list entries via RCU
io_uring/kbuf: prune deferred locked cache when tearing down
io_uring/kbuf: recycle freed mapped buffer ring entries
io_uring/kbuf: defer release of mapped buffer rings
io_uring: enable io_mem_alloc/free to be used in other parts
io_uring: don't guard IORING_OFF_PBUF_RING with SETUP_NO_MMAP
io_uring: don't allow discontig pages for IORING_SETUP_NO_MMAP
|
|
Pull block fixes from Jens Axboe:
- NVMe pull request via Keith:
- Invalid namespace identification error handling (Marizio Ewan,
Keith)
- Fabrics keep-alive tuning (Mark)
- Fix for a bad error check regression in bcache (Markus)
- Fix for a performance regression with O_DIRECT (Ming)
- Fix for a flush related deadlock (Ming)
- Make the read-only warn on per-partition (Yu)
* tag 'block-6.7-2023-12-01' of git://git.kernel.dk/linux:
nvme-core: check for too small lba shift
blk-mq: don't count completed flush data request as inflight in case of quiesce
block: Document the role of the two attribute groups
block: warn once for each partition in bio_check_ro()
block: move .bd_inode into 1st cacheline of block_device
nvme: check for valid nvme_identify_ns() before using it
nvme-core: fix a memory leak in nvme_ns_info_from_identify()
nvme: fine-tune sending of first keep-alive
bcache: revert replacing IS_ERR_OR_NULL with IS_ERR
|
|
Pull more bcachefs bugfixes from Kent Overstreet:
- bcache & bcachefs were broken with CFI enabled; patch for closures to
fix type punning
- mark erasure coding as extra-experimental; there are incompatible
disk space accounting changes coming for erasure coding, and I'm
still seeing checksum errors in some tests
- several fixes for durability-related issues (durability is a device
specific setting where we can tell bcachefs that data on a given
device should be counted as replicated x times)
- a fix for a rare livelock when a btree node merge then updates a
parent node that is almost full
- fix a race in the device removal path, where dropping a pointer in a
btree node to a device would be clobbered by an in flight btree write
updating the btree node key on completion
- fix one SRCU lock hold time warning in the btree gc code - ther's
still a bunch more of these to fix
- fix a rare race where we'd start copygc before initializing the "are
we rw" percpu refcount; copygc would think we were already ro and die
immediately
* tag 'bcachefs-2023-11-29' of https://evilpiepirate.org/git/bcachefs: (23 commits)
bcachefs: Extra kthread_should_stop() calls for copygc
bcachefs: Convert gc_alloc_start() to for_each_btree_key2()
bcachefs: Fix race between btree writes and metadata drop
bcachefs: move journal seq assertion
bcachefs: -EROFS doesn't count as move_extent_start_fail
bcachefs: trace_move_extent_start_fail() now includes errcode
bcachefs: Fix split_race livelock
bcachefs: Fix bucket data type for stripe buckets
bcachefs: Add missing validation for jset_entry_data_usage
bcachefs: Fix zstd compress workspace size
bcachefs: bpos is misaligned on big endian
bcachefs: Fix ec + durability calculation
bcachefs: Data update path won't accidentaly grow replicas
bcachefs: deallocate_extra_replicas()
bcachefs: Proper refcounting for journal_keys
bcachefs: preserve device path as device name
bcachefs: Fix an endianness conversion
bcachefs: Start gc, copygc, rebalance threads after initing writes ref
bcachefs: Don't stop copygc thread on device resize
bcachefs: Make sure bch2_move_ratelimit() also waits for move_ops
...
|
|
Merge a fix for a recently introduced build issue on ARM32 platforms
caused by an inadvertent header file breakage (Dave Jiang).
* acpi-tables:
ACPI: Fix ARM32 platforms compile issue introduced by fw_table changes
|
|
Introduce a new type for the callback to parse an unknown argument.
This unifies function prototypes which takes that as a parameter.
Reviewed-by: Luis Chamberlain <mcgrof@kernel.org>
Reviewed-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Link: https://lore.kernel.org/r/20231120151419.1661807-2-andriy.shevchenko@linux.intel.com
Signed-off-by: Kees Cook <keescook@chromium.org>
|
|
Add the ability to allocate memory from kfence and trigger a read after
free on that memory to validate that kfence is working properly. This is
used by ChromeOS integration tests to validate that kfence errors can be
collected on user devices and parsed properly.
Cc: Alexander Potapenko <glider@google.com>
Acked-by: Marco Elver <elver@google.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: kasan-dev@googlegroups.com
Signed-off-by: Stephen Boyd <swboyd@chromium.org>
Link: https://lore.kernel.org/r/20231129214413.3156334-1-swboyd@chromium.org
Signed-off-by: Kees Cook <keescook@chromium.org>
|
|
The rstat_cpu and also rstat_css_list of the cgroup structure are read
mostly variables. However, they may share the same cacheline as the
subsequent rstat_flush_next and *bstat variables which can be updated
frequently. That will slow down the cgroup_rstat_cpu() call which is
called pretty frequently in the rstat code. Add a CACHELINE_PADDING()
line in between them to avoid false cacheline sharing.
A parallel kernel build on a 2-socket x86-64 server is used as the
benchmarking tool for measuring the lock hold time. Below were the lock
hold time frequency distribution before and after the patch:
Run time Before patch After patch
-------- ------------ -----------
0-01 us 9,928,562 9,820,428
01-05 us 110,151 50,935
05-10 us 270 93
10-15 us 273 146
15-20 us 135 76
20-25 us 0 2
25-30 us 1 0
It can be seen that the patch further pushes the lock hold time towards
the lower end.
Signed-off-by: Waiman Long <longman@redhat.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
|
|
In preparation of calling do_splice_direct() without file_start_write()
held, create a new helper splice_file_range(), to be called from context
of ->copy_file_range() methods instead of do_splice_direct().
Currently, the only difference is that splice_file_range() does not take
flags argument and that it asserts that file_start_write() is held, but
we factor out a common helper do_splice_direct_actor() that will be used
later.
Use the new helper from __ceph_copy_file_range(), that was incorrectly
passing to do_splice_direct() the copy flags argument as splice flags.
The value of copy flags in ceph is always 0, so it is a smenatic bug fix.
Move the declaration of both helpers to linux/splice.h.
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Link: https://lore.kernel.org/r/20231130141624.3338942-2-amir73il@gmail.com
Acked-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
The 'QM_INIT' and 'QM_CLOSE' status of qm and 'QP_INIT'
and 'QP_CLOSE' status of queue are not actually used. Currently,
driver only needs to switch status when the device or queue
is enabled or stopped, Therefore, remove unneeded status to
simplify driver. In addition, rename'QM_START to'QM_WORK' for
ease to understand.
Signed-off-by: Weili Qian <qianweili@huawei.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
|
|
So far, fanotify returns -ENODEV or -EXDEV when trying to set a mark
on a filesystem with a "weak" fsid, namely, zero fsid (e.g. fuse), or
non-uniform fsid (e.g. btrfs non-root subvol).
When group is watching inodes all from the same filesystem (or subvol),
allow adding inode marks with "weak" fsid, because there is no ambiguity
regarding which filesystem reports the event.
The first mark added to a group determines if this group is single or
multi filesystem, depending on the fsid at the path of the added mark.
If the first mark added has a "strong" fsid, marks with "weak" fsid
cannot be added and vice versa.
If the first mark added has a "weak" fsid, following marks must have
the same "weak" fsid and the same sb as the first mark.
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Message-Id: <20231130165619.3386452-3-amir73il@gmail.com>
|
|
Some filesystems like fuse and nfs have zero or non-unique fsid.
We would like to avoid reporting ambiguous fsid in events, so we need
to avoid marking objects with same fsid and different sb.
To make this easier to enforce, store the fsid in the marks of the group
instead of in the shared conenctor.
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Message-Id: <20231130165619.3386452-2-amir73il@gmail.com>
|
|
Currently the phy reset sequence is as shown below for a
devicetree described mdio phy on boot:
1. Assert the phy_device's reset as part of registering
2. Deassert the phy_device's reset as part of registering
3. Deassert the phy_device's reset as part of phy_probe
4. Deassert the phy_device's reset as part of phy_hw_init
The extra two deasserts include waiting the deassert delay afterwards,
which is adding unnecessary delay.
This applies to both possible types of resets (reset controller
reference and a reset gpio) that can be used.
Here's some snipped tracing output using the following command line
params "trace_event=gpio:* trace_options=stacktrace" illustrating
the reset handling and where its coming from:
/* Assert */
systemd-udevd-283 [002] ..... 6.780434: gpio_value: 544 set 0
systemd-udevd-283 [002] ..... 6.783849: <stack trace>
=> gpiod_set_raw_value_commit
=> gpiod_set_value_nocheck
=> gpiod_set_value_cansleep
=> mdio_device_reset
=> mdiobus_register_device
=> phy_device_register
=> fwnode_mdiobus_phy_device_register
=> fwnode_mdiobus_register_phy
=> __of_mdiobus_register
=> stmmac_mdio_register
=> stmmac_dvr_probe
=> stmmac_pltfr_probe
=> devm_stmmac_pltfr_probe
=> qcom_ethqos_probe
=> platform_probe
/* Deassert */
systemd-udevd-283 [002] ..... 6.802480: gpio_value: 544 set 1
systemd-udevd-283 [002] ..... 6.805886: <stack trace>
=> gpiod_set_raw_value_commit
=> gpiod_set_value_nocheck
=> gpiod_set_value_cansleep
=> mdio_device_reset
=> phy_device_register
=> fwnode_mdiobus_phy_device_register
=> fwnode_mdiobus_register_phy
=> __of_mdiobus_register
=> stmmac_mdio_register
=> stmmac_dvr_probe
=> stmmac_pltfr_probe
=> devm_stmmac_pltfr_probe
=> qcom_ethqos_probe
=> platform_probe
/* Deassert */
systemd-udevd-283 [002] ..... 6.882601: gpio_value: 544 set 1
systemd-udevd-283 [002] ..... 6.886014: <stack trace>
=> gpiod_set_raw_value_commit
=> gpiod_set_value_nocheck
=> gpiod_set_value_cansleep
=> mdio_device_reset
=> phy_probe
=> really_probe
=> __driver_probe_device
=> driver_probe_device
=> __device_attach_driver
=> bus_for_each_drv
=> __device_attach
=> device_initial_probe
=> bus_probe_device
=> device_add
=> phy_device_register
=> fwnode_mdiobus_phy_device_register
=> fwnode_mdiobus_register_phy
=> __of_mdiobus_register
=> stmmac_mdio_register
=> stmmac_dvr_probe
=> stmmac_pltfr_probe
=> devm_stmmac_pltfr_probe
=> qcom_ethqos_probe
=> platform_probe
/* Deassert */
NetworkManager-477 [000] ..... 7.023144: gpio_value: 544 set 1
NetworkManager-477 [000] ..... 7.026596: <stack trace>
=> gpiod_set_raw_value_commit
=> gpiod_set_value_nocheck
=> gpiod_set_value_cansleep
=> mdio_device_reset
=> phy_init_hw
=> phy_attach_direct
=> phylink_fwnode_phy_connect
=> __stmmac_open
=> stmmac_open
There's a lot of paths where the device is getting its reset
asserted and deasserted. Let's track the state and only actually
do the assert/deassert when it changes.
Reported-by: Sagar Cheluvegowda <quic_scheluve@quicinc.com>
Signed-off-by: Andrew Halaney <ahalaney@redhat.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://lore.kernel.org/r/20231127-net-phy-reset-once-v2-1-448e8658779e@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Since the rethook::handler is an RCU-maganged pointer so that it will
notice readers the rethook is stopped (unregistered) or not, it should
be an __rcu pointer and use appropriate functions to be accessed. This
will use appropriate memory barrier when accessing it. OTOH,
rethook::data is never changed, so we don't need to check it in
get_kretprobe().
NOTE: To avoid sparse warning, rethook::handler is defined by a raw
function pointer type with __rcu instead of rethook_handler_t.
Link: https://lore.kernel.org/all/170126066201.398836.837498688669005979.stgit@devnote2/
Fixes: 54ecbe6f1ed5 ("rethook: Add a generic return hook")
Cc: stable@vger.kernel.org
Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202311241808.rv9ceuAh-lkp@intel.com/
Tested-by: JP Kobryn <inwardvessel@gmail.com>
Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
|
|
It seems that the pointer-to-kretprobe "rp" within the kretprobe_holder is
RCU-managed, based on the (non-rethook) implementation of get_kretprobe().
The thought behind this patch is to make use of the RCU API where possible
when accessing this pointer so that the needed barriers are always in place
and to self-document the code.
The __rcu annotation to "rp" allows for sparse RCU checking. Plain writes
done to the "rp" pointer are changed to make use of the RCU macro for
assignment. For the single read, the implementation of get_kretprobe()
is simplified by making use of an RCU macro which accomplishes the same,
but note that the log warning text will be more generic.
I did find that there is a difference in assembly generated between the
usage of the RCU macros vs without. For example, on arm64, when using
rcu_assign_pointer(), the corresponding store instruction is a
store-release (STLR) which has an implicit barrier. When normal assignment
is done, a regular store (STR) is found. In the macro case, this seems to
be a result of rcu_assign_pointer() using smp_store_release() when the
value to write is not NULL.
Link: https://lore.kernel.org/all/20231122132058.3359-1-inwardvessel@gmail.com/
Fixes: d741bf41d7c7 ("kprobes: Remove kretprobe hash")
Cc: stable@vger.kernel.org
Signed-off-by: JP Kobryn <inwardvessel@gmail.com>
Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
|
|
Current jbd2 only add REQ_SYNC for descriptor block, metadata log
buffer, commit buffer and superblock buffer, the submitted IO could be
throttled by writeback throttle in block layer, that could lead to
priority inversion in some cases. The log IO looks like a kind of high
priority metadata IO, so it should not be throttled by WBT like QOS
policies in block layer, let's add REQ_SYNC | REQ_IDLE to exempt from
writeback throttle, and also add REQ_META together indicates it's a
metadata IO.
Signed-off-by: Zhang Yi <yi.zhang@huawei.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20231129114740.2686201-2-yi.zhang@huaweicloud.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
|
https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next
Daniel Borkmann says:
====================
pull-request: bpf-next 2023-11-30
We've added 30 non-merge commits during the last 7 day(s) which contain
a total of 58 files changed, 1598 insertions(+), 154 deletions(-).
The main changes are:
1) Add initial TX metadata implementation for AF_XDP with support in mlx5
and stmmac drivers. Two types of offloads are supported right now, that
is, TX timestamp and TX checksum offload, from Stanislav Fomichev with
stmmac implementation from Song Yoong Siang.
2) Change BPF verifier logic to validate global subprograms lazily instead
of unconditionally before the main program, so they can be guarded using
BPF CO-RE techniques, from Andrii Nakryiko.
3) Add BPF link_info support for uprobe multi link along with bpftool
integration for the latter, from Jiri Olsa.
4) Use pkg-config in BPF selftests to determine ld flags which is
in particular needed for linking statically, from Akihiko Odaki.
5) Fix a few BPF selftest failures to adapt to the upcoming LLVM18,
from Yonghong Song.
* tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (30 commits)
bpf/tests: Remove duplicate JSGT tests
selftests/bpf: Add TX side to xdp_hw_metadata
selftests/bpf: Convert xdp_hw_metadata to XDP_USE_NEED_WAKEUP
selftests/bpf: Add TX side to xdp_metadata
selftests/bpf: Add csum helpers
selftests/xsk: Support tx_metadata_len
xsk: Add option to calculate TX checksum in SW
xsk: Validate xsk_tx_metadata flags
xsk: Document tx_metadata_len layout
net: stmmac: Add Tx HWTS support to XDP ZC
net/mlx5e: Implement AF_XDP TX timestamp and checksum offload
tools: ynl: Print xsk-features from the sample
xsk: Add TX timestamp and TX checksum offload support
xsk: Support tx_metadata_len
selftests/bpf: Use pkg-config for libelf
selftests/bpf: Override PKG_CONFIG for static builds
selftests/bpf: Choose pkg-config for the target
bpftool: Add support to display uprobe_multi links
selftests/bpf: Add link_info test for uprobe_multi link
selftests/bpf: Use bpf_link__destroy in fill_link_info tests
...
====================
Conflicts:
Documentation/netlink/specs/netdev.yaml:
839ff60df3ab ("net: page_pool: add nlspec for basic access to page pools")
48eb03dd2630 ("xsk: Add TX timestamp and TX checksum offload support")
https://lore.kernel.org/all/20231201094705.1ee3cab8@canb.auug.org.au/
While at it also regen, tree is dirty after:
48eb03dd2630 ("xsk: Add TX timestamp and TX checksum offload support")
looks like code wasn't re-rendered after "render-max" was removed.
Link: https://lore.kernel.org/r/20231130145708.32573-1-daniel@iogearbox.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Cross-merge networking fixes after downstream PR.
No conflicts.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Pull networking fixes from Paolo Abeni:
"Including fixes from bpf and wifi.
Current release - regressions:
- neighbour: fix __randomize_layout crash in struct neighbour
- r8169: fix deadlock on RTL8125 in jumbo mtu mode
Previous releases - regressions:
- wifi:
- mac80211: fix warning at station removal time
- cfg80211: fix CQM for non-range use
- tools: ynl-gen: fix unexpected response handling
- octeontx2-af: fix possible buffer overflow
- dpaa2: recycle the RX buffer only after all processing done
- rswitch: fix missing dev_kfree_skb_any() in error path
Previous releases - always broken:
- ipv4: fix uaf issue when receiving igmp query packet
- wifi: mac80211: fix debugfs deadlock at device removal time
- bpf:
- sockmap: af_unix stream sockets need to hold ref for pair sock
- netdevsim: don't accept device bound programs
- selftests: fix a char signedness issue
- dsa: mv88e6xxx: fix marvell 6350 probe crash
- octeontx2-pf: restore TC ingress police rules when interface is up
- wangxun: fix memory leak on msix entry
- ravb: keep reverse order of operations in ravb_remove()"
* tag 'net-6.7-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (51 commits)
net: ravb: Keep reverse order of operations in ravb_remove()
net: ravb: Stop DMA in case of failures on ravb_open()
net: ravb: Start TX queues after HW initialization succeeded
net: ravb: Make write access to CXR35 first before accessing other EMAC registers
net: ravb: Use pm_runtime_resume_and_get()
net: ravb: Check return value of reset_control_deassert()
net: libwx: fix memory leak on msix entry
ice: Fix VF Reset paths when interface in a failed over aggregate
bpf, sockmap: Add af_unix test with both sockets in map
bpf, sockmap: af_unix stream sockets need to hold ref for pair sock
tools: ynl-gen: always construct struct ynl_req_state
ethtool: don't propagate EOPNOTSUPP from dumps
ravb: Fix races between ravb_tx_timeout_work() and net related ops
r8169: prevent potential deadlock in rtl8169_close
r8169: fix deadlock on RTL8125 in jumbo mtu mode
neighbour: Fix __randomize_layout crash in struct neighbour
octeontx2-pf: Restore TC ingress police rules when interface is up
octeontx2-pf: Fix adding mbox work queue entry when num_vfs > 64
net: stmmac: xgmac: Disable FPE MMC interrupts
octeontx2-af: Fix possible buffer overflow
...
|
|
git://anongit.freedesktop.org/drm/drm-misc into drm-fixes
Fixes for v6.7-rc4:
- Revert panel fixes as they require exporting device_is_dependent.
- Do not double add fences in dma_resv_add_fence.
- Fix GPUVM license identifier.
- Assorted nouveau fixes.
- Fix error check for nt36523.
Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/561f807e-f9d3-43c1-80d3-8b41ba83c9ec@linux.intel.com
|
|
Prepare for the coming implementation by GCC and Clang of the __counted_by
attribute. Flexible array members annotated with __counted_by can have
their accesses bounds-checked at run-time via CONFIG_UBSAN_BOUNDS (for
array indexing) and CONFIG_FORTIFY_SOURCE (for strcpy/memcpy-family
functions).
As found with Coccinelle[1], add __counted_by for struct i40e_qvlist_info.
Cc: Tony Nguyen <anthony.l.nguyen@intel.com>
Cc: Shiraz Saleem <shiraz.saleem@intel.com>
Cc: Jakub Kicinski <kuba@kernel.org>
Cc: Jesse Brandeburg <jesse.brandeburg@intel.com>
Cc: Gurucharan G <gurucharanx.g@intel.com>
Cc: "Gustavo A. R. Silva" <gustavoars@kernel.org>
Link: https://github.com/kees/kernel-tools/blob/trunk/coccinelle/examples/counted_by.cocci [1]
Reviewed-by: "Gustavo A. R. Silva" <gustavoars@kernel.org>
Link: https://lore.kernel.org/r/20231003231838.work.510-kees@kernel.org
Signed-off-by: Kees Cook <keescook@chromium.org>
|
|
Some SoCs have a separate dedicated wake-up interrupt controller that can
be used to wake up the system from deeper idle states. We already support
configuring a separate interrupt for a gpio-keys button to be used with a
gpio line. However, we are lacking support system suspend for cases where
a separate interrupt needs to be used in deeper sleep modes.
Because of it's nature, gpio-keys does not know about the runtime PM state
of the button gpios, and may have several gpio buttons configured for each
gpio-keys device instance. Implementing runtime PM support for gpio-keys
does not help, and we cannot use drivers/base/power/wakeirq.c support. We
need to implement custom wakeirq support for gpio-keys.
For handling a dedicated wakeirq for system suspend, we enable and disable
it with gpio_keys_enable_wakeup() and gpio_keys_disable_wakeup() that we
already use based on device_may_wakeup().
Some systems may have a dedicated wakeirq that can also be used as the
main interrupt, this is already working for gpio-keys. Let's add some
wakeirq related comments while at it as the usage with a gpio line and
separate interrupt line may not be obvious.
Tested-by: Dhruva Gole <d-gole@ti.com>
Signed-off-by: Tony Lindgren <tony@atomide.com>
Link: https://lore.kernel.org/r/20231129110618.27551-2-tony@atomide.com
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
|
|
Drop the vfio_file_iommu_group() stub and instead unconditionally declare
the function to fudge around a KVM wart where KVM tries to do symbol_get()
on vfio_file_iommu_group() (and other VFIO symbols) even if CONFIG_VFIO=n.
Ensuring the symbol is always declared fixes a PPC build error when
modules are also disabled, in which case symbol_get() simply points at the
address of the symbol (with some attributes shenanigans). Because KVM
does symbol_get() instead of directly depending on VFIO, the lack of a
fully defined symbol is not problematic (ugly, but "fine").
arch/powerpc/kvm/../../../virt/kvm/vfio.c:89:7:
error: attribute declaration must precede definition [-Werror,-Wignored-attributes]
fn = symbol_get(vfio_file_iommu_group);
^
include/linux/module.h:805:60: note: expanded from macro 'symbol_get'
#define symbol_get(x) ({ extern typeof(x) x __attribute__((weak,visibility("hidden"))); &(x); })
^
include/linux/vfio.h:294:35: note: previous definition is here
static inline struct iommu_group *vfio_file_iommu_group(struct file *file)
^
arch/powerpc/kvm/../../../virt/kvm/vfio.c:89:7:
error: attribute declaration must precede definition [-Werror,-Wignored-attributes]
fn = symbol_get(vfio_file_iommu_group);
^
include/linux/module.h:805:65: note: expanded from macro 'symbol_get'
#define symbol_get(x) ({ extern typeof(x) x __attribute__((weak,visibility("hidden"))); &(x); })
^
include/linux/vfio.h:294:35: note: previous definition is here
static inline struct iommu_group *vfio_file_iommu_group(struct file *file)
^
2 errors generated.
Although KVM is firmly in the wrong (there is zero reason for KVM to build
virt/kvm/vfio.c when VFIO is disabled), fudge around the error in VFIO as
the stub is unnecessary and doesn't serve its intended purpose (KVM is the
only external user of vfio_file_iommu_group()), and there is an in-flight
series to clean up the entire KVM<->VFIO interaction, i.e. fixing this in
KVM would result in more churn in the long run, and the stub needs to go
away regardless.
Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202308251949.5IiaV0sz-lkp@intel.com
Closes: https://lore.kernel.org/oe-kbuild-all/202309030741.82aLACDG-lkp@intel.com
Closes: https://lore.kernel.org/oe-kbuild-all/202309110914.QLH0LU6L-lkp@intel.com
Link: https://lore.kernel.org/all/0-v1-08396538817d+13c5-vfio_kvm_kconfig_jgg@nvidia.com
Link: https://lore.kernel.org/all/20230916003118.2540661-1-seanjc@google.com
Cc: Nick Desaulniers <ndesaulniers@google.com>
Cc: Jason Gunthorpe <jgg@nvidia.com>
Tested-by: Michael Ellerman <mpe@ellerman.id.au>
Fixes: c1cce6d079b8 ("vfio: Compile vfio_group infrastructure optionally")
Signed-off-by: Sean Christopherson <seanjc@google.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Link: https://lore.kernel.org/r/20231130001000.543240-1-seanjc@google.com
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
|
|
Fix the documentation of struct dma_buf members name and exp_name
as to how these members are to be used and accessed.
Signed-off-by: Ramesh Errabolu <Ramesh.Errabolu@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20231122160556.24948-1-Ramesh.Errabolu@amd.com
Signed-off-by: Christian König <christian.koenig@amd.com>
|
|
This commit updates the SPI subsystem, particularly affecting "SPI MEM"
drivers and core parts, by replacing the -ENOTSUPP error code with
-EOPNOTSUPP.
The key motivations for this change are as follows:
1. The spi-nor driver currently uses EOPNOTSUPP, whereas calls to spi-mem
might return ENOTSUPP. This update aims to unify the error reporting
within the SPI subsystem for clarity and consistency.
2. The use of ENOTSUPP has been flagged by checkpatch as inappropriate,
mainly being reserved for NFS-related errors. To align with kernel coding
standards and recommendations, this change is being made.
3. By using EOPNOTSUPP, we provide more specific context to the error,
indicating that a particular operation is not supported. This helps
differentiate from the more generic ENOTSUPP error, allowing drivers to
better handle and respond to different error scenarios.
Risks and Considerations:
While this change is primarily intended as a code cleanup and error code
unification, there is a minor risk of breaking user-space applications
that rely on specific return codes for unsupported operations. However,
this risk is considered low, as such use-cases are unlikely to be common
or critical. Nevertheless, developers and users should be aware of this
change, especially if they have scripts or tools that specifically handle
SPI error codes.
This commit does not introduce any functional changes to the SPI subsystem
or the affected drivers.
Signed-off-by: "Chia-Lin Kao (AceLan)" <acelan.kao@canonical.com>
Acked-by: Tudor Ambarus <tudor.ambarus@linaro.org>
Reviewed-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Acked-by: Miquel Raynal <miquel.raynal@bootlin.com>
Acked-by: Michael Walle <michael@walle.cc>
Link: https://lore.kernel.org/r/20231129064311.272422-1-acelan.kao@canonical.com
Signed-off-by: Mark Brown <broonie@kernel.org>
|
|
The default message transfer implementation - spi_transfer_one_message -
invokes the specific device driver's transfer_one(), then waits for
completion. However, there is no mechanism for the device driver to
report failure in the middle of the transfer.
Introduce SPI_TRANS_FAIL_IO for drivers to report transfer failure.
Signed-off-by: Nam Cao <namcao@linutronix.de>
Acked-by: Linus Walleij <linus.walleij@linaro.org>
Link: https://lore.kernel.org/r/4b420dac528e60f122adde16851da88e4798c1ea.1701274975.git.namcao@linutronix.de
Signed-off-by: Mark Brown <broonie@kernel.org>
|
|
tcp_hdr(skb) and SYN Cookie are passed to __cookie_v[46]_check(), but
none of the callers passes cookie other than ntohl(th->ack_seq) - 1.
Let's fetch it in __cookie_v[46]_check() instead of passing the cookie
over and over.
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://lore.kernel.org/r/20231129022924.96156-5-kuniyu@amazon.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless
Johannes Berg says:
====================
wireless fixes:
- debugfs had a deadlock (removal vs. use of files),
fixes going through wireless ACKed by Greg
- support for HT STAs on 320 MHz channels, even if it's
not clear that should ever happen (that's 6 GHz), best
not to WARN()
- fix for the previous CQM fix that broke most cases
- various wiphy locking fixes
- various small driver fixes
* tag 'wireless-2023-11-29' of git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless:
wifi: mac80211: use wiphy locked debugfs for sdata/link
wifi: mac80211: use wiphy locked debugfs helpers for agg_status
wifi: cfg80211: add locked debugfs wrappers
debugfs: add API to allow debugfs operations cancellation
debugfs: annotate debugfs handlers vs. removal with lockdep
debugfs: fix automount d_fsdata usage
wifi: mac80211: handle 320 MHz in ieee80211_ht_cap_ie_to_sta_ht_cap
wifi: avoid offset calculation on NULL pointer
wifi: cfg80211: hold wiphy mutex for send_interface
wifi: cfg80211: lock wiphy mutex for rfkill poll
wifi: cfg80211: fix CQM for non-range use
wifi: mac80211: do not pass AP_VLAN vif pointer to drivers during flush
wifi: iwlwifi: mvm: fix an error code in iwl_mvm_mld_add_sta()
wifi: mt76: mt7925: fix typo in mt7925_init_he_caps
wifi: mt76: mt7921: fix 6GHz disabled by the missing default CLC config
====================
Link: https://lore.kernel.org/r/20231129150809.31083-3-johannes@sipsolutions.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
AF_UNIX stream sockets are a paired socket. So sending on one of the pairs
will lookup the paired socket as part of the send operation. It is possible
however to put just one of the pairs in a BPF map. This currently increments
the refcnt on the sock in the sockmap to ensure it is not free'd by the
stack before sockmap cleans up its state and stops any skbs being sent/recv'd
to that socket.
But we missed a case. If the peer socket is closed it will be free'd by the
stack. However, the paired socket can still be referenced from BPF sockmap
side because we hold a reference there. Then if we are sending traffic through
BPF sockmap to that socket it will try to dereference the free'd pair in its
send logic creating a use after free. And following splat:
[59.900375] BUG: KASAN: slab-use-after-free in sk_wake_async+0x31/0x1b0
[59.901211] Read of size 8 at addr ffff88811acbf060 by task kworker/1:2/954
[...]
[59.905468] Call Trace:
[59.905787] <TASK>
[59.906066] dump_stack_lvl+0x130/0x1d0
[59.908877] print_report+0x16f/0x740
[59.910629] kasan_report+0x118/0x160
[59.912576] sk_wake_async+0x31/0x1b0
[59.913554] sock_def_readable+0x156/0x2a0
[59.914060] unix_stream_sendmsg+0x3f9/0x12a0
[59.916398] sock_sendmsg+0x20e/0x250
[59.916854] skb_send_sock+0x236/0xac0
[59.920527] sk_psock_backlog+0x287/0xaa0
To fix let BPF sockmap hold a refcnt on both the socket in the sockmap and its
paired socket. It wasn't obvious how to contain the fix to bpf_unix logic. The
primarily problem with keeping this logic in bpf_unix was: In the sock close()
we could handle the deref by having a close handler. But, when we are destroying
the psock through a map delete operation we wouldn't have gotten any signal
thorugh the proto struct other than it being replaced. If we do the deref from
the proto replace its too early because we need to deref the sk_pair after the
backlog worker has been stopped.
Given all this it seems best to just cache it at the end of the psock and eat 8B
for the af_unix and vsock users. Notice dgram sockets are OK because they handle
locking already.
Fixes: 94531cfcbe79 ("af_unix: Add unix_stream_proto for sockmap")
Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Reviewed-by: Jakub Sitnicki <jakub@cloudflare.com>
Link: https://lore.kernel.org/bpf/20231129012557.95371-2-john.fastabend@gmail.com
|
|
This change actually defines the (initial) metadata layout
that should be used by AF_XDP userspace (xsk_tx_metadata).
The first field is flags which requests appropriate offloads,
followed by the offload-specific fields. The supported per-device
offloads are exported via netlink (new xsk-flags).
The offloads themselves are still implemented in a bit of a
framework-y fashion that's left from my initial kfunc attempt.
I'm introducing new xsk_tx_metadata_ops which drivers are
supposed to implement. The drivers are also supposed
to call xsk_tx_metadata_request/xsk_tx_metadata_complete in
the right places. Since xsk_tx_metadata_{request,_complete}
are static inline, we don't incur any extra overhead doing
indirect calls.
The benefit of this scheme is as follows:
- keeps all metadata layout parsing away from driver code
- makes it easy to grep and see which drivers implement what
- don't need any extra flags to maintain to keep track of what
offloads are implemented; if the callback is implemented - the offload
is supported (used by netlink reporting code)
Two offloads are defined right now:
1. XDP_TXMD_FLAGS_CHECKSUM: skb-style csum_start+csum_offset
2. XDP_TXMD_FLAGS_TIMESTAMP: writes TX timestamp back into metadata
area upon completion (tx_timestamp field)
XDP_TXMD_FLAGS_TIMESTAMP is also implemented for XDP_COPY mode: it writes
SW timestamp from the skb destructor (note I'm reusing hwtstamps to pass
metadata pointer).
The struct is forward-compatible and can be extended in the future
by appending more fields.
Reviewed-by: Song Yoong Siang <yoong.siang.song@intel.com>
Signed-off-by: Stanislav Fomichev <sdf@google.com>
Acked-by: Jakub Kicinski <kuba@kernel.org>
Link: https://lore.kernel.org/r/20231127190319.1190813-3-sdf@google.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
When amd_pstate is running, writing to scaling_min_freq and
scaling_max_freq has no effect. These values are only passed to the
policy level, but not to the platform level. This means that the
platform does not know about the frequency limits set by the user.
To fix this, update the min_perf and max_perf values at the platform
level whenever the user changes the scaling_min_freq and scaling_max_freq
values.
Fixes: ffa5096a7c33 ("cpufreq: amd-pstate: implement Pstate EPP support for the AMD processors")
Acked-by: Huang Rui <ray.huang@amd.com>
Signed-off-by: Wyes Karny <wyes.karny@amd.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
|
|
Created as testing for the conditional guard infrastructure.
Specifically this makes use of the following form:
scoped_cond_guard (mutex_intr, return -ERESTARTNOINTR,
&task->signal->cred_guard_mutex) {
...
}
...
return 0;
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Oleg Nesterov <oleg@redhat.com>
Link: https://lkml.kernel.org/r/20231102110706.568467727%40infradead.org
|
|
Drop the default implementations for file read, write and mmap
operations. Each fbdev driver must now provide an implementation
and select any necessary helpers. If no implementation has been
set, fbdev returns an errno code to user space. The code is the
same as if the operation had not been set in the file_operations
struct.
This change makes the fbdev helpers for I/O memory optional. Most
systems only use system-memory framebuffers via DRM's fbdev emulation.
v2:
* warn once if I/O callbacks are missing (Javier)
Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de>
Reviewed-by: Javier Martinez Canillas <javierm@redhat.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20231127131655.4020-33-tzimmermann@suse.de
|
|
Test in framebuffer read, write and drawing helpers if FBINFO_VIRTFB
has been set correctly. Framebuffers in I/O memory should only be
accessed with the architecture's respective helpers. Framebuffers
in system memory should be accessed with the regular load and
store operations. Presumably not all drivers get this right, so we
now warn about it.
Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de>
Reviewed-by: Javier Martinez Canillas <javierm@redhat.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20231127131655.4020-32-tzimmermann@suse.de
|
|
Move the default fb_mmap code for I/O address spaces into the helper
function fb_io_mmap(). The helper can either be called via struct
fb_ops.fb_mmap or as the default if no fb_mmap has been set. Also
set the new helper in __FB_DEFAULT_IOMEM_OPS_MMAP.
In the mid-term, fb_io_mmap() is supposed to become optional. Fbdev
drivers will initialize their struct fb_ops.fb_mmap to the helper
and select a corresponding Kconfig token. The helper can then be made
optional at compile time.
Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de>
Reviewed-by: Javier Martinez Canillas <javierm@redhat.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20231127131655.4020-31-tzimmermann@suse.de
|
|
The function device_is_dependent() is only called by the driver core
internally and should not, at this time, be called by anyone else
outside of it, so mark it as static so as not to give driver authors the
wrong idea.
Cc: Saravana Kannan <saravanak@google.com>
Acked-by: "Rafael J. Wysocki" <rafael@kernel.org>
Link: https://lore.kernel.org/r/2023112815-faculty-thud-add8@gregkh
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
After commit 87888fb9ac0c ("tty: Remove baudrate dead code & make
ktermios params const"), the 'tty' parameter is only read in
tty_get_baud_rate(). Therefore, we can make 'tty' accepted in the
function 'const' for clarity.
The "the terminal bit flags may be updated." part of the
tty_get_baud_rate()'s kernel-doc is dropped as it is no longer true.
Because of the same commit above. And it was misplaced anyway.
Signed-off-by: "Jiri Slaby (SUSE)" <jirislaby@kernel.org>
Cc: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Link: https://lore.kernel.org/r/20231127123713.14504-1-jirislaby@kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
Commit 1b0a151c10a6 ("blk-core: use pr_warn_ratelimited() in
bio_check_ro()") fix message storm by limit the rate, however, there
will still be lots of message in the long term. Fix it better by warn
once for each partition.
Signed-off-by: Yu Kuai <yukuai3@huawei.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20231128123027.971610-3-yukuai1@huaweicloud.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
The .bd_inode field of block_device is used in IO fast path of
blkdev_write_iter() and blkdev_llseek(), so it is more efficient to keep
it into the 1st cacheline.
.bd_openers is only touched in open()/close(), and .bd_size_lock is only
for updating bdev capacity, which is in slow path too.
So swap .bd_inode layout with .bd_openers & .bd_size_lock to move
.bd_inode into the 1st cache line.
Cc: Yu Kuai <yukuai3@huawei.com>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Yu Kuai <yukuai3@huawei.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20231128123027.971610-2-yukuai1@huaweicloud.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
AMD MI300A models use HBM3 (High Bandwidth Memory Gen 3) memory. HBM is
a high-speed computer memory interface for 3D-stacked synchronous
dynamic random-access memory (SDRAM).
Signed-off-by: Muralidhara M K <muralidhara.mk@amd.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/20231102114225.2006878-4-muralimk@amd.com
|