Age | Commit message (Collapse) | Author |
|
Ensure that during a multi-threaded exec and premature thread-group
leader exit no exit notification is generated.
Link: https://lore.kernel.org/r/20250320-work-pidfs-thread_group-v4-4-da678ce805bf@kernel.org
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
Ensure that during a multi-threaded exec and premature thread-group
leader exit no exit notification is generated.
Link: https://lore.kernel.org/r/20250320-work-pidfs-thread_group-v4-3-da678ce805bf@kernel.org
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
Add first test for premature thread-group leader exit.
Link: https://lore.kernel.org/r/20250320-work-pidfs-thread_group-v4-2-da678ce805bf@kernel.org
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
polling
This is another attempt trying to make pidfd polling for multi-threaded
exec and premature thread-group leader exit consistent.
A quick recap of these two cases:
(1) During a multi-threaded exec by a subthread, i.e., non-thread-group
leader thread, all other threads in the thread-group including the
thread-group leader are killed and the struct pid of the
thread-group leader will be taken over by the subthread that called
exec. IOW, two tasks change their TIDs.
(2) A premature thread-group leader exit means that the thread-group
leader exited before all of the other subthreads in the thread-group
have exited.
Both cases lead to inconsistencies for pidfd polling with PIDFD_THREAD.
Any caller that holds a PIDFD_THREAD pidfd to the current thread-group
leader may or may not see an exit notification on the file descriptor
depending on when poll is performed. If the poll is performed before the
exec of the subthread has concluded an exit notification is generated
for the old thread-group leader. If the poll is performed after the exec
of the subthread has concluded no exit notification is generated for the
old thread-group leader.
The correct behavior would be to simply not generate an exit
notification on the struct pid of a subhthread exec because the struct
pid is taken over by the subthread and thus remains alive.
But this is difficult to handle because a thread-group may exit
prematurely as mentioned in (2). In that case an exit notification is
reliably generated but the subthreads may continue to run for an
indeterminate amount of time and thus also may exec at some point.
So far there was no way to distinguish between (1) and (2) internally.
This tiny series tries to address this problem by discarding
PIDFD_THREAD notification on premature thread-group leader exit.
If that works correctly then no exit notifications are generated for a
PIDFD_THREAD pidfd for a thread-group leader until all subthreads have
been reaped. If a subthread should exec aftewards no exit notification
will be generated until that task exits or it creates subthreads and
repeates the cycle.
Co-Developed-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Link: https://lore.kernel.org/r/20250320-work-pidfs-thread_group-v4-1-da678ce805bf@kernel.org
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
git://git.open-mesh.org/linux-merge
Simon Wunderlich says:
====================
Here is batman-adv bugfix:
- Ignore own maximum aggregation size during RX, Sven Eckelmann
* tag 'batadv-net-pullrequest-20250318' of git://git.open-mesh.org/linux-merge:
batman-adv: Ignore own maximum aggregation size during RX
====================
Link: https://patch.msgid.link/20250318150035.35356-1-sw@simonwunderlich.de
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
Previous commit 8b5c171bb3dc ("neigh: new unresolved queue limits")
introduces new netlink attribute NDTPA_QUEUE_LENBYTES to represent
approximative value for deprecated QUEUE_LEN. However, it forgot to add
the associated nla_policy in nl_ntbl_parm_policy array. Fix it with one
simple NLA_U32 type policy.
Fixes: 8b5c171bb3dc ("neigh: new unresolved queue limits")
Signed-off-by: Lin Ma <linma@zju.edu.cn>
Link: https://patch.msgid.link/20250315165113.37600-1-linma@zju.edu.cn
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
fd_install() has a questionable comment above it.
While it correctly points out a possible race against dup2(), it states:
> We need to detect this and fput() the struct file we are about to
> overwrite in this case.
>
> It should never happen - if we allow dup2() do it, _really_ bad things
> will follow.
I have difficulty parsing the above. The first sentence would suggest
fd_install() tries to detect and recover from the race (it does not),
the next one claims the race needs to be dealt with (it is, by dup2()).
Given that fd_install() does not suffer the burden, this patch removes
the above and instead expands on the race in dup2() commentary.
While here tidy up the docs around fd_install().
Signed-off-by: Mateusz Guzik <mjguzik@gmail.com>
Link: https://lore.kernel.org/r/20250320102637.1924183-1-mjguzik@gmail.com
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
John Garry <john.g.garry@oracle.com> says:
These iomap changes are spun-off the XFS large atomic writes series at
https://lore.kernel.org/linux-xfs/86a64256-497a-453b-bbba-a5ac6b4cb056@oracle.com/T/#ma99c763221de9d49ea2ccfca9ff9b8d71c8b2677
The XFS parts there are not ready yet, but it is worth having the iomap
changes queued in advance.
Some much earlier changes from that same series were already queued in the
vfs tree, and these patches rework those changes - specifically the
first patch in this series does.
The most other significant change is the patch to rework how the bio flags
are set in the DIO patch.
* patches from https://lore.kernel.org/r/20250320120250.4087011-1-john.g.garry@oracle.com:
iomap: rework IOMAP atomic flags
iomap: comment on atomic write checks in iomap_dio_bio_iter()
iomap: inline iomap_dio_bio_opflags()
Link: https://lore.kernel.org/r/20250320120250.4087011-1-john.g.garry@oracle.com
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
Flag IOMAP_ATOMIC_SW is not really required. The idea of having this flag
is that the FS ->iomap_begin callback could check if this flag is set to
decide whether to do a SW (FS-based) atomic write. But the FS can set
which ->iomap_begin callback it wants when deciding to do a FS-based
atomic write.
Furthermore, it was thought that IOMAP_ATOMIC_HW is not a proper name, as
the block driver can use SW-methods to emulate an atomic write. So change
back to IOMAP_ATOMIC.
The ->iomap_begin callback needs though to indicate to iomap core that
REQ_ATOMIC needs to be set, so add IOMAP_F_ATOMIC_BIO for that.
These changes were suggested by Christoph Hellwig and Dave Chinner.
Signed-off-by: John Garry <john.g.garry@oracle.com>
Link: https://lore.kernel.org/r/20250320120250.4087011-4-john.g.garry@oracle.com
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
Help explain the code.
Also clarify the comment for bio size check.
Signed-off-by: John Garry <john.g.garry@oracle.com>
Link: https://lore.kernel.org/r/20250320120250.4087011-3-john.g.garry@oracle.com
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
It is neater to build blk_opf_t fully in one place, so inline
iomap_dio_bio_opflags() in iomap_dio_bio_iter().
Also tidy up the logic in dealing with IOMAP_DIO_CALLER_COMP, in generally
separate the logic in dealing with flags associated with reads and writes.
Originally-from: Christoph Hellwig <hch@lst.de>
Reviewed-by: "Ritesh Harjani (IBM)" <ritesh.list@gmail.com>
Signed-off-by: John Garry <john.g.garry@oracle.com>
Link: https://lore.kernel.org/r/20250320120250.4087011-2-john.g.garry@oracle.com
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
This also fixes a wrong definitions for SCM_TS_OPT_ID & SO_RCVPRIORITY.
Accidentally found while working on another patchset.
Cc: linux-kernel@vger.kernel.org
Cc: netdev@vger.kernel.org
Cc: Eric Dumazet <edumazet@google.com>
Cc: Jakub Kicinski <kuba@kernel.org>
Cc: Vadim Fedorenko <vadim.fedorenko@linux.dev>
Cc: Willem de Bruijn <willemb@google.com>
Cc: Jason Xing <kerneljasonxing@gmail.com>
Cc: Anna Emese Nyiri <annaemesenyiri@gmail.com>
Cc: Kuniyuki Iwashima <kuniyu@amazon.com>
Cc: Paolo Abeni <pabeni@redhat.com>
Fixes: a89568e9be75 ("selftests: txtimestamp: add SCM_TS_OPT_ID test")
Fixes: e45469e594b2 ("sock: Introduce SO_RCVPRIORITY socket option")
Link: https://lore.kernel.org/netdev/20250314195257.34854-1-kuniyu@amazon.com/
Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Reviewed-by: Jason Xing <kerneljasonxing@gmail.com>
Link: https://patch.msgid.link/20250314214155.16046-1-aleksandr.mikhalitsyn@canonical.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
Because of the size restriction in the TCP options space, the MPTCP
ADD_ADDR option is exclusive and cannot be sent with other MPTCP ones.
For this reason, in the linked mptcp_out_options structure, group of
fields linked to different options are part of the same union.
There is a case where the mptcp_pm_add_addr_signal() function can modify
opts->addr, but not ended up sending an ADD_ADDR. Later on, back in
mptcp_established_options, other options will be sent, but with
unexpected data written in other fields due to the union, e.g. in
opts->ext_copy. This could lead to a data stream corruption in the next
packet.
Using an intermediate variable, prevents from corrupting previously
established DSS option. The assignment of the ADD_ADDR option
parameters is now done once we are sure this ADD_ADDR option can be set
in the packet, e.g. after having dropped other suboptions.
Fixes: 1bff1e43a30e ("mptcp: optimize out option generation")
Cc: stable@vger.kernel.org
Suggested-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Arthur Mongodin <amongodin@randorisec.fr>
Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
[ Matt: the commit message has been updated: long lines splits and some
clarifications. ]
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20250314-net-mptcp-fix-data-stream-corr-sockopt-v1-1-122dbb249db3@kernel.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
The SD spec version 6.0 section 6.4.1.5 requires that Vdd must be
lowered to less than 0.5V for a minimum of 1 ms when powering off a
card. Increase wait to 15 ms so that voltage has time to drain down
to 0.5V and cards can power off correctly. Issues with voltage drain
time were only observed on Apollo Lake and Bay Trail host controllers
so this fix is limited to those devices.
Signed-off-by: Erick Shepherd <erick.shepherd@ni.com>
Acked-by: Adrian Hunter <adrian.hunter@intel.com>
Link: https://lore.kernel.org/r/20250314195021.1588090-1-erick.shepherd@ni.com
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
|
|
irq allocated with devm_request_irq() will be freed in devm_irq_release(),
using free_irq() in ->remove() will causes a dangling pointer, and a
subsequent double free. So remove the free_irq() in the error path and
remove path.
Fixes: 969864efae78 ("i2c: amd-mp2: use msix/msi if the hardware supports")
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>
Acked-by: Shyam Sundar S K <Shyam-sundar.S-k@amd.com>
Link: https://lore.kernel.org/r/20221103121146.99836-1-yangyingliang@huawei.com
Signed-off-by: Andi Shyti <andi.shyti@kernel.org>
|
|
There is an issue in the kernel:
In tmpfs, when using the "ls" command to list the contents
of a directory with a large number of files, glibc performs
the getdents call in multiple rounds. If a concurrent unlink
occurs between these getdents calls, it may lead to duplicate
directory entries in the ls output. One possible reproduction
scenario is as follows:
Create 1026 files and execute ls and rm concurrently:
for i in {1..1026}; do
echo "This is file $i" > /tmp/dir/file$i
done
ls /tmp/dir rm /tmp/dir/file4
->getdents(file1026-file5)
->unlink(file4)
->getdents(file5,file3,file2,file1)
It is expected that the second getdents call to return file3
through file1, but instead it returns an extra file5.
The root cause of this problem is in the offset_dir_lookup
function. It uses mas_find to determine the starting position
for the current getdents call. Since mas_find locates the first
position that is greater than or equal to mas->index, when file4
is deleted, it ends up returning file5.
It can be fixed by replacing mas_find with mas_find_rev, which
finds the first position that is less than or equal to mas->index.
Fixes: b9b588f22a0c ("libfs: Use d_children list to iterate simple_offset directories")
Signed-off-by: Yongjian Sun <sunyongjian1@huawei.com>
Link: https://lore.kernel.org/r/20250320034417.555810-1-sunyongjian@huaweicloud.com
Reviewed-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
The WM8904 codec supports both ADC and DMIC inputs.
Get input pin functionality from the platform data and add the necessary
controls depending on the possible additional routing.
The ADC and DMIC share the IN1L/DMICDAT1 and IN1R/DMICDAT2 pins.
This leads to a few scenarios requiring different DAPM routing:
- When both are connected to an analog input, only the ADC is used.
- When one line is a DMIC and the other an analog input, the DMIC source
is set from the platform data and a mux is added to select whether to
use the ADC or DMIC.
- When both are connected to a DMIC, another mux is added to this to
select the DMIC source. Note that we still need to be able to select
the ADC system for use with the IN2L, IN2R, IN3L and IN3R pins.
Signed-off-by: Ernest Van Hoecke <ernest.vanhoecke@toradex.com>
Signed-off-by: Francesco Dolcini <francesco.dolcini@toradex.com>
Reviewed-by: Charles Keepax <ckeepax@opensource.cirrus.com>
Link: https://patch.msgid.link/20250319142059.46692-6-francesco@dolcini.it
Signed-off-by: Mark Brown <broonie@kernel.org>
|
|
Read in optional codec-specific properties from the device tree.
The platform_data structure is not populated when using device trees.
This change parses optional dts properties to populate it.
- wlf,in1l-as-dmicdat1
- wlf,in1r-as-dmicdat2
- wlf,gpio-cfg
- wlf,micbias-cfg
- wlf,drc-cfg-regs
- wlf,drc-cfg-names
- wlf,retune-mobile-cfg-regs
- wlf,retune-mobile-cfg-names
- wlf,retune-mobile-cfg-hz
Datasheet: https://statics.cirrus.com/pubs/proDatasheet/WM8904_Rev4.1.pdf
Signed-off-by: Ernest Van Hoecke <ernest.vanhoecke@toradex.com>
Signed-off-by: Francesco Dolcini <francesco.dolcini@toradex.com>
Reviewed-by: Charles Keepax <ckeepax@opensource.cirrus.com>
Link: https://patch.msgid.link/20250319142059.46692-5-francesco@dolcini.it
Signed-off-by: Mark Brown <broonie@kernel.org>
|
|
Add two properties to select the IN1L/DMICDAT1 and IN2R/DMICDAT2
functionality:
- wlf,in1l-as-dmicdat1
- wlf,in1r-as-dmicdat2
Add a property to describe the GPIO configuration registers, that can be
used to set the four multifunction pins:
- wlf,gpio-cfg
Add a property to describe the mic bias control registers:
- wlf,micbias-cfg
Add two properties to describe the Dynamic Range Controller (DRC),
allowing multiple named configurations where each config sets the 4 DRC
registers (R40-R43):
- wlf,drc-cfg-regs
- wlf,drc-cfg-names
Add three properties to describe the equalizer (ReTune Mobile), allowing
multiple named configurations (associated with a samplerate) that set
the 24 (R134-R157) EQ registers:
- wlf,retune-mobile-cfg-regs
- wlf,retune-mobile-cfg-hz
- wlf,retune-mobile-cfg-rates
The set of names and configurations for DRC and ReTune Mobile are
specified by system integrators. The names are exposed directly to
userspace as options that can be selected at runtime.
Adding the DRC and ReTune Mobile data to the DT eases the transition
from pdata, which has handled them this way for over a decade. The
parameters filled in here are almost certainly specific tuning for the
hardware so it makes sense to ship them with the hardware description.
Datasheet: https://statics.cirrus.com/pubs/proDatasheet/WM8904_Rev4.1.pdf
Signed-off-by: Ernest Van Hoecke <ernest.vanhoecke@toradex.com>
Signed-off-by: Francesco Dolcini <francesco.dolcini@toradex.com>
Reviewed-by: Charles Keepax <ckeepax@opensource.cirrus.com>
Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Link: https://patch.msgid.link/20250319142059.46692-4-francesco@dolcini.it
Signed-off-by: Mark Brown <broonie@kernel.org>
|
|
When updating the GPIO registers, do nothing for all fields of gpio_cfg
that are "0xFFFF".
This "do nothing" flag used to be 0 to easily check whether the gpio_cfg
field was actually set inside pdata or left empty (default).
However, 0 is a valid configuration for these registers, while 0xFFFF is
not.
With this change, users can explicitly set them to 0.
Not setting gpio_cfg in the platform data will now lead to setting all
GPIO registers to 0 instead of leaving them unset.
No one is using this platform data with this codec.
The change gets the driver ready to properly set gpio_cfg from the DT.
Datasheet: https://statics.cirrus.com/pubs/proDatasheet/WM8904_Rev4.1.pdf
Signed-off-by: Ernest Van Hoecke <ernest.vanhoecke@toradex.com>
Signed-off-by: Francesco Dolcini <francesco.dolcini@toradex.com>
Reviewed-by: Charles Keepax <ckeepax@opensource.cirrus.com>
Link: https://patch.msgid.link/20250319142059.46692-3-francesco@dolcini.it
Signed-off-by: Mark Brown <broonie@kernel.org>
|
|
There is an of_property_read_u32_index and of_property_read_u64_index.
This patch adds a similar helper for u16.
Signed-off-by: Ernest Van Hoecke <ernest.vanhoecke@toradex.com>
Signed-off-by: Francesco Dolcini <francesco.dolcini@toradex.com>
Reviewed-by: Rob Herring (Arm) <robh@kernel.org>
Reviewed-by: Charles Keepax <ckeepax@opensource.cirrus.com>
Link: https://patch.msgid.link/20250319142059.46692-2-francesco@dolcini.it
Signed-off-by: Mark Brown <broonie@kernel.org>
|
|
Many spi-mem controller drivers have a very similar debug log at the
beginning of their ->exec_op() callback implementation. This debug log is
effectively useful, so let's create one that is complete and concise
enough, so developers no longer need to write their own. The verbosity
being high, VERBOSE_DEBUG will be required in this case.
Remove the debug log from individual drivers and propose a common one.
Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com>
Reviewed-by: Tudor Ambarus <tudor.ambarus@linaro.org>
Link: https://patch.msgid.link/20250320115644.2231240-1-miquel.raynal@bootlin.com
Signed-off-by: Mark Brown <broonie@kernel.org>
|
|
There are 5 mandatory peripheral properties. They are described in a
separate binding but not explicitly required. Make sure they are
correctly marked required and update the example to reflect this.
Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com>
Reviewed-by: Rob Herring (Arm) <robh@kernel.org>
Link: https://patch.msgid.link/20250319094651.1290509-4-miquel.raynal@bootlin.com
Signed-off-by: Mark Brown <broonie@kernel.org>
|
|
The initial SPI controller IP from Cadence has always been implemented
into controllers from various hardware manufacturers and because of
that, it has always been (rightfully) doubled with a more specific
compatible. There are likely no reasons to keep this compatible
legitimate, alone. Make sure people do not get mislead by officially
deprecating this compatible.
While at deprecating, let's update the examples to avoid documenting
deprecated properties.
Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com>
Reviewed-by: Rob Herring (Arm) <robh@kernel.org>
Link: https://patch.msgid.link/20250319094651.1290509-3-miquel.raynal@bootlin.com
Signed-off-by: Mark Brown <broonie@kernel.org>
|
|
controller is
Despite being very common in commit logs, SPI NOR controllers simply do
not exist. At least, they are not as specific as the name implies. There
are SPI memory controllers which are indeed "specialized" and optimized
for handling "memories", but most of them are just generic and accept
almost any kind of opcode, address, dummy and data cycles, making them
as suitable for NANDs than NORs.
Furthermore, this controller supports any kind of bus, from single to
octal NAND, so make it clear.
Also add a comment to mention that the initial compatible naming is too
specific (but obviously kept for backward compatibility reasons).
Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com>
Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Link: https://patch.msgid.link/20250319094651.1290509-2-miquel.raynal@bootlin.com
Signed-off-by: Mark Brown <broonie@kernel.org>
|
|
As both locks are highly contended during significant inode churn,
holding the inode hash lock while waiting for the sb list lock
exacerbates the problem.
Why moving it out is safe: the inode at hand still has I_NEW set and
anyone who finds it through legitimate means waits for the bit to clear,
by which time inode_sb_list_add() is guaranteed to have finished.
This significantly drops hash lock contention for me when stating 20
separate trees in parallel, each with 1000 directories * 1000 files.
However, no speed up was observed as contention increased on the other
locks, notably dentry LRU.
Even so, removal of the lock ordering will help making this faster
later.
Signed-off-by: Mateusz Guzik <mjguzik@gmail.com>
Link: https://lore.kernel.org/r/20250320004643.1903287-1-mjguzik@gmail.com
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
The /sys/block/<disk>/integrity fields are historically set
if T10 protection Information is enabled.
It is not set if some upper layer uses integrity metadata.
Document it.
Signed-off-by: Milan Broz <gmazyland@gmail.com>
Co-developed-by: Martin K. Petersen <martin.petersen@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20250318154447.370786-1-gmazyland@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
The conversion to EXPORT_SIMPLE_DEV_PM_OPS() also replaced the pm ops
assignment with pm_ptr() macro, but this made difference from the
original code; it had conditional with ifdef CONFIG_PM_SLEEP, while
we have now with CONFIG_PM, instead. This seems causing build errors
with randomconfig.
For fixing the inconsistency, replace pm_ptr() with pm_sleep_ptr().
Fixes: 5ea0a2206b58 ("ALSA: oxygen: Convert to EXPORT_SIMPLE_DEV_PM_OPS()")
Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202503201853.7kB0BPRw-lkp@intel.com/
Link: https://patch.msgid.link/20250320105721.10789-1-tiwai@suse.de
Signed-off-by: Takashi Iwai <tiwai@suse.de>
|
|
Justin Iurman says:
====================
net: fix lwtunnel reentry loops
When the destination is the same after the transformation, we enter a
lwtunnel loop. This is true for most of lwt users: ioam6, rpl, seg6,
seg6_local, ila_lwt, and lwt_bpf. It can happen in their input() and
output() handlers respectively, where either dst_input() or dst_output()
is called at the end. It can also happen in xmit() handlers.
Here is an example for rpl_input():
dump_stack_lvl+0x60/0x80
rpl_input+0x9d/0x320
lwtunnel_input+0x64/0xa0
lwtunnel_input+0x64/0xa0
lwtunnel_input+0x64/0xa0
lwtunnel_input+0x64/0xa0
lwtunnel_input+0x64/0xa0
[...]
lwtunnel_input+0x64/0xa0
lwtunnel_input+0x64/0xa0
lwtunnel_input+0x64/0xa0
lwtunnel_input+0x64/0xa0
lwtunnel_input+0x64/0xa0
ip6_sublist_rcv_finish+0x85/0x90
ip6_sublist_rcv+0x236/0x2f0
... until rpl_do_srh() fails, which means skb_cow_head() failed.
This series provides a fix at the core level of lwtunnel to catch such
loops when they're not caught by the respective lwtunnel users, and
handle the loop case in ioam6 which is one of the users. This series
also comes with a new selftest to detect some dst cache reference loops
in lwtunnel users.
====================
Link: https://patch.msgid.link/20250314120048.12569-1-justin.iurman@uliege.be
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
As recently specified by commit 0ea09cbf8350 ("docs: netdev: add a note
on selftest posting") in net-next, the selftest is therefore shipped in
this series. However, this selftest does not really test this series. It
needs this series to avoid crashing the kernel. What it really tests,
thanks to kmemleak, is what was fixed by the following commits:
- commit c71a192976de ("net: ipv6: fix dst refleaks in rpl, seg6 and
ioam6 lwtunnels")
- commit 92191dd10730 ("net: ipv6: fix dst ref loops in rpl, seg6 and
ioam6 lwtunnels")
- commit c64a0727f9b1 ("net: ipv6: fix dst ref loop on input in seg6
lwt")
- commit 13e55fbaec17 ("net: ipv6: fix dst ref loop on input in rpl
lwt")
- commit 0e7633d7b95b ("net: ipv6: fix dst ref loop in ila lwtunnel")
- commit 5da15a9c11c1 ("net: ipv6: fix missing dst ref drop in ila
lwtunnel")
Signed-off-by: Justin Iurman <justin.iurman@uliege.be>
Link: https://patch.msgid.link/20250314120048.12569-4-justin.iurman@uliege.be
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
Fix the lwtunnel_output() reentry loop in ioam6_iptunnel when the
destination is the same after transformation. Note that a check on the
destination address was already performed, but it was not enough. This
is the example of a lwtunnel user taking care of loops without relying
only on the last resort detection offered by lwtunnel.
Fixes: 8cb3bf8bff3c ("ipv6: ioam: Add support for the ip6ip6 encapsulation")
Signed-off-by: Justin Iurman <justin.iurman@uliege.be>
Link: https://patch.msgid.link/20250314120048.12569-3-justin.iurman@uliege.be
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
This patch acts as a parachute, catch all solution, by detecting
recursion loops in lwtunnel users and taking care of them (e.g., a loop
between routes, a loop within the same route, etc). In general, such
loops are the consequence of pathological configurations. Each lwtunnel
user is still free to catch such loops early and do whatever they want
with them. It will be the case in a separate patch for, e.g., seg6 and
seg6_local, in order to provide drop reasons and update statistics.
Another example of a lwtunnel user taking care of loops is ioam6, which
has valid use cases that include loops (e.g., inline mode), and which is
addressed by the next patch in this series. Overall, this patch acts as
a last resort to catch loops and drop packets, since we don't want to
leak something unintentionally because of a pathological configuration
in lwtunnels.
The solution in this patch reuses dev_xmit_recursion(),
dev_xmit_recursion_inc(), and dev_xmit_recursion_dec(), which seems fine
considering the context.
Closes: https://lore.kernel.org/netdev/2bc9e2079e864a9290561894d2a602d6@akamai.com/
Closes: https://lore.kernel.org/netdev/Z7NKYMY7fJT5cYWu@shredder/
Fixes: ffce41962ef6 ("lwtunnel: support dst output redirect function")
Fixes: 2536862311d2 ("lwt: Add support to redirect dst.input")
Fixes: 14972cbd34ff ("net: lwtunnel: Handle fragmentation")
Signed-off-by: Justin Iurman <justin.iurman@uliege.be>
Link: https://patch.msgid.link/20250314120048.12569-2-justin.iurman@uliege.be
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
Currently the API emac_update_hardware_stats() reads different ICSSG
stats without any lock protection.
This API gets called by .ndo_get_stats64() which is only under RCU
protection and nothing else. Add lock to this API so that the reading of
statistics happens during lock.
Fixes: c1e10d5dc7a1 ("net: ti: icssg-prueth: Add ICSSG Stats")
Signed-off-by: MD Danish Anwar <danishanwar@ti.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20250314102721.1394366-1-danishanwar@ti.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
The ->send() operation frees skb so save the length before calling
->send() to avoid a use after free.
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/c751531d-4af4-42fe-affe-6104b34b791d@stanley.mountain
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
Otherwise gcc 13 generates conditional forward jumps (aka branch
mispredict by default) for build_open_flags() being succesfull.
Signed-off-by: Mateusz Guzik <mjguzik@gmail.com>
Link: https://lore.kernel.org/r/20250320092331.1921700-1-mjguzik@gmail.com
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
Merge the slab feature branch kfree_rcu_tiny for 6.15:
- Move the TINY_RCU kvfree_rcu() implementation from RCU to SLAB
subsystem and cleanup its integration.
|
|
Matthieu Baerts says:
====================
mptcp: pm: prep work for new ops and sysctl knobs
Here are a few cleanups, preparation work for the new PM ops, and sysctl
knobs.
- Patch 1: reorg: move generic NL code used by all PMs to pm_netlink.c.
- Patch 2: use kmemdup() instead of kmalloc + copy.
- Patch 3: small cleanup to use pm var instead of msk->pm.
- Patch 4: reorg: id_avail_bitmap is only used by the in-kernel PM.
- Patch 5: use struct_group to easily reset a subset of PM data vars.
- Patch 6: introduce the minimal skeleton for the new PM ops.
- Patch 7: register in-kernel and userspace PM ops.
- Patch 8: new net.mptcp.path_manager sysctl knob, deprecating pm_type.
- Patch 9: map the new path_manager sysctl knob with pm_type.
- Patch 10: map the old pm_type sysctl knob with path_manager.
- Patch 11: new net.mptcp.available_path_managers sysctl knob.
- Patch 12: new test to validate path_manager and pm_type mapping.
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
====================
Link: https://patch.msgid.link/20250313-net-next-mptcp-pm-ops-intro-v1-0-f4e4a88efc50@kernel.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
This patch checks if the newly added net.mptcp.path_manager is mapped
successfully from or to the old net.mptcp.pm_type in userspace_pm.sh.
Signed-off-by: Geliang Tang <tanggeliang@kylinos.cn>
Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20250313-net-next-mptcp-pm-ops-intro-v1-12-f4e4a88efc50@kernel.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
Similarly to net.mptcp.available_schedulers, this patch adds a new one
net.mptcp.available_path_managers to list the available path managers.
Signed-off-by: Geliang Tang <tanggeliang@kylinos.cn>
Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20250313-net-next-mptcp-pm-ops-intro-v1-11-f4e4a88efc50@kernel.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
This patch adds a new proc_handler "proc_pm_type" for "pm_type" to
map old path manager sysctl "pm_type" to the newly added "path_manager".
path_manager pm_type
MPTCP_PM_TYPE_KERNEL -> "kernel"
MPTCP_PM_TYPE_USERSPACE -> "userspace"
It is important to add this to keep a compatibility with the now
deprecated pm_type sysctl knob.
Signed-off-by: Geliang Tang <tanggeliang@kylinos.cn>
Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20250313-net-next-mptcp-pm-ops-intro-v1-10-f4e4a88efc50@kernel.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
This patch maps the newly added path manager sysctl "path_manager"
to the old one "pm_type".
path_manager pm_type
"kernel" -> MPTCP_PM_TYPE_KERNEL
"userspace" -> MPTCP_PM_TYPE_USERSPACE
others -> __MPTCP_PM_TYPE_NR
It is important to add this to keep a compatibility with the now
deprecated pm_type sysctl knob.
Signed-off-by: Geliang Tang <tanggeliang@kylinos.cn>
Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20250313-net-next-mptcp-pm-ops-intro-v1-9-f4e4a88efc50@kernel.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
Similar to net.mptcp.scheduler, a new net.mptcp.path_manager sysctl knob
is added to determine which path manager will be used by each newly
created MPTCP socket by setting the name of it.
Dealing with an explicit name is easier than with a number, especially
when more PMs will be introduced.
This sysctl knob makes the old one "pm_type" deprecated.
Signed-off-by: Geliang Tang <tanggeliang@kylinos.cn>
Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20250313-net-next-mptcp-pm-ops-intro-v1-8-f4e4a88efc50@kernel.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
This patch defines the original in-kernel netlink path manager as a
new struct mptcp_pm_ops named "mptcp_pm_kernel", and register it in
mptcp_pm_kernel_register(). And define the userspace path manager as
a new struct mptcp_pm_ops named "mptcp_pm_userspace", and register it
in mptcp_pm_init().
To ensure that there's always a valid path manager available, the default
path manager "mptcp_pm_kernel" will be skipped in mptcp_pm_unregister().
Signed-off-by: Geliang Tang <tanggeliang@kylinos.cn>
Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20250313-net-next-mptcp-pm-ops-intro-v1-7-f4e4a88efc50@kernel.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
In order to allow users to develop their own BPF-based path manager,
this patch defines a struct ops "mptcp_pm_ops" for an MPTCP path
manager, which contains a set of interfaces. Currently only init()
and release() interfaces are included, subsequent patches will add
others step by step.
Add a set of functions to register, unregister, find and validate a
given path manager struct ops.
"list" is used to add this path manager to mptcp_pm_list list when
it is registered. "name" is used to identify this path manager.
mptcp_pm_find() uses "name" to find a path manager on the list.
mptcp_pm_unregister is not used in this set, but will be invoked in
.unreg of struct bpf_struct_ops. mptcp_pm_validate() will be invoked
in .validate of struct bpf_struct_ops. That's why they are exported.
Signed-off-by: Geliang Tang <tanggeliang@kylinos.cn>
Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20250313-net-next-mptcp-pm-ops-intro-v1-6-f4e4a88efc50@kernel.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
This patch adds a "struct_group(reset, ...)" in struct mptcp_pm_data to
simplify the reset, and make sure we don't miss any.
Suggested-by: Matthieu Baerts <matttbe@kernel.org>
Signed-off-by: Geliang Tang <tanggeliang@kylinos.cn>
Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20250313-net-next-mptcp-pm-ops-intro-v1-5-f4e4a88efc50@kernel.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
id_avail_bitmap of struct mptcp_pm_data is currently only used by the
in-kernel PM, so this patch moves its initialization operation under
the "if (pm_type == MPTCP_PM_TYPE_KERNEL)" condition.
Suggested-by: Matthieu Baerts <matttbe@kernel.org>
Signed-off-by: Geliang Tang <tanggeliang@kylinos.cn>
Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20250313-net-next-mptcp-pm-ops-intro-v1-4-f4e4a88efc50@kernel.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
The variable "pm" has been defined in mptcp_pm_fully_established()
and mptcp_pm_data_reset() as "msk->pm", so use "pm" directly instead
of using "msk->pm".
Signed-off-by: Geliang Tang <tanggeliang@kylinos.cn>
Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20250313-net-next-mptcp-pm-ops-intro-v1-3-f4e4a88efc50@kernel.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
Instead of using kmalloc() or kzalloc() to allocate an entry and
then immediately duplicate another entry to the newly allocated
one, kmemdup() helper can be used to simplify the code.
Signed-off-by: Geliang Tang <tanggeliang@kylinos.cn>
Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20250313-net-next-mptcp-pm-ops-intro-v1-2-f4e4a88efc50@kernel.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
The registration of mptcp_genl_family is useful for both the in-kernel
and the userspace PM. It should then be done in pm_netlink.c.
On the other hand, the registration of the in-kernel pernet subsystem is
specific to the in-kernel PM, and should stay there in pm_kernel.c.
Reviewed-by: Mat Martineau <martineau@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20250313-net-next-mptcp-pm-ops-intro-v1-1-f4e4a88efc50@kernel.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
In architectures that use the polling bit, current_clr_polling() employs
smp_mb() to ensure that the clearing of the polling bit is visible to
other cores before checking TIF_NEED_RESCHED.
However, smp_mb() can be costly. Given that clear_bit() is an atomic
operation, replacing smp_mb() with smp_mb__after_atomic() is appropriate.
Many architectures implement smp_mb__after_atomic() as a lighter-weight
barrier compared to smp_mb(), leading to performance improvements.
For instance, on x86, smp_mb__after_atomic() is a no-op. This change
eliminates a smp_mb() instruction in the cpuidle wake-up path, saving
several CPU cycles and thereby reducing wake-up latency.
Architectures that do not use the polling bit will retain the original
smp_mb() behavior to ensure that existing dependencies remain unaffected.
Signed-off-by: Yujun Dong <yujundong@pascal-lab.net>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lore.kernel.org/r/20241230141624.155356-1-yujundong@pascal-lab.net
|