Age | Commit message (Collapse) | Author |
|
syzkaller hit a WARN_ON_ONCE(!scm->pid) in scm_pidfd_recv().
In unix_stream_read_generic(), if there is no skb in the queue, we could
bail out the do-while loop without calling scm_set_cred():
1. No skb in the queue
2. sk is non-blocking
or
shutdown(sk, RCV_SHUTDOWN) is called concurrently
or
peer calls close()
If the socket is configured with SO_PASSCRED or SO_PASSPIDFD, scm_recv()
would populate cmsg with garbage.
Let's not call scm_recv() unless there is skb to receive.
WARNING: CPU: 1 PID: 3245 at include/net/scm.h:138 scm_pidfd_recv include/net/scm.h:138 [inline]
WARNING: CPU: 1 PID: 3245 at include/net/scm.h:138 scm_recv.constprop.0+0x754/0x850 include/net/scm.h:177
Modules linked in:
CPU: 1 PID: 3245 Comm: syz-executor.1 Not tainted 6.4.0-rc5-01219-gfa0e21fa4443 #2
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.0-0-gd239552ce722-prebuilt.qemu.org 04/01/2014
RIP: 0010:scm_pidfd_recv include/net/scm.h:138 [inline]
RIP: 0010:scm_recv.constprop.0+0x754/0x850 include/net/scm.h:177
Code: 67 fd e9 55 fd ff ff e8 4a 70 67 fd e9 7f fd ff ff e8 40 70 67 fd e9 3e fb ff ff e8 36 70 67 fd e9 02 fd ff ff e8 8c 3a 20 fd <0f> 0b e9 fe fb ff ff e8 50 70 67 fd e9 2e f9 ff ff e8 46 70 67 fd
RSP: 0018:ffffc90009af7660 EFLAGS: 00010216
RAX: 00000000000000a1 RBX: ffff888041e58a80 RCX: ffffc90003852000
RDX: 0000000000040000 RSI: ffffffff842675b4 RDI: 0000000000000007
RBP: ffffc90009af7810 R08: 0000000000000007 R09: 0000000000000013
R10: 00000000000000f8 R11: 0000000000000001 R12: ffffc90009af7db0
R13: 0000000000000000 R14: ffff888041e58a88 R15: 1ffff9200135eecc
FS: 00007f6b7113f640(0000) GS:ffff88806cf00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f6b7111de38 CR3: 0000000012a6e002 CR4: 0000000000770ee0
PKRU: 55555554
Call Trace:
<TASK>
unix_stream_read_generic+0x5fe/0x1f50 net/unix/af_unix.c:2830
unix_stream_recvmsg+0x194/0x1c0 net/unix/af_unix.c:2880
sock_recvmsg_nosec net/socket.c:1019 [inline]
sock_recvmsg+0x188/0x1d0 net/socket.c:1040
____sys_recvmsg+0x210/0x610 net/socket.c:2712
___sys_recvmsg+0xff/0x190 net/socket.c:2754
do_recvmmsg+0x25d/0x6c0 net/socket.c:2848
__sys_recvmmsg net/socket.c:2927 [inline]
__do_sys_recvmmsg net/socket.c:2950 [inline]
__se_sys_recvmmsg net/socket.c:2943 [inline]
__x64_sys_recvmmsg+0x224/0x290 net/socket.c:2943
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x3f/0x90 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x72/0xdc
RIP: 0033:0x7f6b71da2e5d
Code: ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 73 9f 1b 00 f7 d8 64 89 01 48
RSP: 002b:00007f6b7113ecc8 EFLAGS: 00000246 ORIG_RAX: 000000000000012b
RAX: ffffffffffffffda RBX: 00000000004bc050 RCX: 00007f6b71da2e5d
RDX: 0000000000000007 RSI: 0000000020006600 RDI: 000000000000000b
RBP: 00000000004bc050 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000120 R11: 0000000000000246 R12: 0000000000000000
R13: 000000000000006e R14: 00007f6b71e03530 R15: 0000000000000000
</TASK>
Fixes: 5e2ff6704a27 ("scm: add SO_PASSPIDFD and SCM_PIDFD")
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Reported-by: syzkaller <syzkaller@googlegroups.com>
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Reviewed-by: Alexander Mikhalitsyn <alexander@mihalicyn.com>
Link: https://lore.kernel.org/r/20230622184351.91544-1-kuniyu@amazon.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
In blamed commit, I missed that get_dist_table() was allocating
memory using GFP_KERNEL, and acquiring qdisc lock to perform
the swap of newly allocated table with current one.
In this patch, get_dist_table() is allocating memory and
copy user data before we acquire the qdisc lock.
Then we perform swap operations while being protected by the lock.
Note that after this patch netem_change() no longer can do partial changes.
If an error is returned, qdisc conf is left unchanged.
Fixes: 2174a08db80d ("sch_netem: acquire qdisc lock in netem_change()")
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Stephen Hemminger <stephen@networkplumber.org>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Link: https://lore.kernel.org/r/20230622181503.2327695-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue
Tony Nguyen says:
====================
Intel Wired LAN Driver Updates 2023-06-22 (iavf)
This series contains updates to iavf driver only.
Przemek defers removing, previous, primary MAC address until after
getting result of adding its replacement. He also does some cleanup by
removing unused functions and making applicable functions static.
* '40GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue:
iavf: make functions static where possible
iavf: remove some unused functions and pointless wrappers
iavf: fix err handling for MAC replace
====================
Link: https://lore.kernel.org/r/20230622165914.2203081-1-anthony.l.nguyen@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
The referenced patch is causing build errors when ETHERNET=y and
FDDI=m. While we work out the preferred patch(es), revert this patch
to make the pain go away.
Fixes: 128272336120 ("s390/net: lcs: use IS_ENABLED() for kconfig detection")
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Reported-by: kernel test robot <lkp@intel.com>
Link: lore.kernel.org/r/202306202129.pl0AqK8G-lkp@intel.com
Cc: Alexandra Winter <wintera@linux.ibm.com>
Cc: Wenjia Zhang <wenjia@linux.ibm.com>
Cc: Heiko Carstens <hca@linux.ibm.com>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Alexander Gordeev <agordeev@linux.ibm.com>
Cc: Christian Borntraeger <borntraeger@linux.ibm.com>
Cc: Sven Schnelle <svens@linux.ibm.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Link: https://lore.kernel.org/r/20230622155409.27311-1-rdunlap@infradead.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Linux provides phy_set_bits() helper so let's drop brcm_phy_setbits() and
use phy_set_bits() in its place.
Signed-off-by: Giulio Benetti <giulio.benetti@benettiengineering.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com>
Link: https://lore.kernel.org/r/20230622184721.24368-1-giulio.benetti@benettiengineering.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue
Tony Nguyen says:
====================
igc: TX timestamping fixes
This is the fixes part of the series intended to add support for using
the 4 timestamp registers present in i225/i226.
Moving the timestamp handling to be inline with the interrupt handling
has the advantage of improving the TX timestamping retrieval latency,
here are some numbers using ntpperf:
Before:
$ sudo ./ntpperf -i enp3s0 -m 10:22:22:22:22:21 -d 192.168.1.3 -s 172.18.0.0/16 -I -H -o -37
| responses | TX timestamp offset (ns)
rate clients | lost invalid basic xleave | min mean max stddev
1000 100 0.00% 0.00% 0.00% 100.00% -56 +9 +52 19
1500 150 0.00% 0.00% 0.00% 100.00% -40 +30 +75 22
2250 225 0.00% 0.00% 0.00% 100.00% -11 +29 +72 15
3375 337 0.00% 0.00% 0.00% 100.00% -18 +40 +88 22
5062 506 0.00% 0.00% 0.00% 100.00% -19 +23 +77 15
7593 759 0.00% 0.00% 0.00% 100.00% +7 +47 +5168 43
11389 1138 0.00% 0.00% 0.00% 100.00% -11 +41 +5240 39
17083 1708 0.00% 0.00% 0.00% 100.00% +19 +60 +5288 50
25624 2562 0.00% 0.00% 0.00% 100.00% +1 +56 +5368 58
38436 3843 0.00% 0.00% 0.00% 100.00% -84 +12 +8847 66
57654 5765 0.00% 0.00% 100.00% 0.00%
86481 8648 0.00% 0.00% 100.00% 0.00%
129721 12972 0.00% 0.00% 100.00% 0.00%
194581 16384 0.00% 0.00% 100.00% 0.00%
291871 16384 27.35% 0.00% 72.65% 0.00%
437806 16384 50.05% 0.00% 49.95% 0.00%
After:
$ sudo ./ntpperf -i enp3s0 -m 10:22:22:22:22:21 -d 192.168.1.3 -s 172.18.0.0/16 -I -H -o -37
| responses | TX timestamp offset (ns)
rate clients | lost invalid basic xleave | min mean max stddev
1000 100 0.00% 0.00% 0.00% 100.00% -44 +0 +61 19
1500 150 0.00% 0.00% 0.00% 100.00% -6 +39 +81 16
2250 225 0.00% 0.00% 0.00% 100.00% -22 +25 +69 15
3375 337 0.00% 0.00% 0.00% 100.00% -28 +15 +56 14
5062 506 0.00% 0.00% 0.00% 100.00% +7 +78 +143 27
7593 759 0.00% 0.00% 0.00% 100.00% -54 +24 +144 47
11389 1138 0.00% 0.00% 0.00% 100.00% -90 -33 +28 21
17083 1708 0.00% 0.00% 0.00% 100.00% -50 -2 +35 14
25624 2562 0.00% 0.00% 0.00% 100.00% -62 +7 +66 23
38436 3843 0.00% 0.00% 0.00% 100.00% -33 +30 +5395 36
57654 5765 0.00% 0.00% 100.00% 0.00%
86481 8648 0.00% 0.00% 100.00% 0.00%
129721 12972 0.00% 0.00% 100.00% 0.00%
194581 16384 19.50% 0.00% 80.50% 0.00%
291871 16384 35.81% 0.00% 64.19% 0.00%
437806 16384 55.40% 0.00% 44.60% 0.00%
During this series, and to show that as is always the case, things are
never easy as they should be, a hardware issue was found, and it took
some time to find the workaround(s). The bug and workaround are better
explained in patch 4/4.
Note: the workaround has a simpler alternative, but it would involve
adding support for the other timestamp registers, and only using the
TXSTMP{H/L}_0 as a way to clear the interrupt. But I feel bad about
throwing this kind of resources away. Didn't test this extensively but
it should work.
Also, as Marc Kleine-Budde suggested, after some consensus is reached
on this series, most parts of it will be proposed for igb.
* '1GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue:
igc: Work around HW bug causing missing timestamps
igc: Retrieve TX timestamp during interrupt handling
igc: Check if hardware TX timestamping is enabled earlier
igc: Fix race condition in PTP tx code
====================
Link: https://lore.kernel.org/r/20230622165244.2202786-1-anthony.l.nguyen@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next
Daniel Borkmann says:
====================
pull-request: bpf-next 2023-06-23
We've added 49 non-merge commits during the last 24 day(s) which contain
a total of 70 files changed, 1935 insertions(+), 442 deletions(-).
The main changes are:
1) Extend bpf_fib_lookup helper to allow passing the route table ID,
from Louis DeLosSantos.
2) Fix regsafe() in verifier to call check_ids() for scalar registers,
from Eduard Zingerman.
3) Extend the set of cpumask kfuncs with bpf_cpumask_first_and()
and a rework of bpf_cpumask_any*() kfuncs. Additionally,
add selftests, from David Vernet.
4) Fix socket lookup BPF helpers for tc/XDP to respect VRF bindings,
from Gilad Sever.
5) Change bpf_link_put() to use workqueue unconditionally to fix it
under PREEMPT_RT, from Sebastian Andrzej Siewior.
6) Follow-ups to address issues in the bpf_refcount shared ownership
implementation, from Dave Marchevsky.
7) A few general refactorings to BPF map and program creation permissions
checks which were part of the BPF token series, from Andrii Nakryiko.
8) Various fixes for benchmark framework and add a new benchmark
for BPF memory allocator to BPF selftests, from Hou Tao.
9) Documentation improvements around iterators and trusted pointers,
from Anton Protopopov.
10) Small cleanup in verifier to improve allocated object check,
from Daniel T. Lee.
11) Improve performance of bpf_xdp_pointer() by avoiding access
to shared_info when XDP packet does not have frags,
from Jesper Dangaard Brouer.
12) Silence a harmless syzbot-reported warning in btf_type_id_size(),
from Yonghong Song.
13) Remove duplicate bpfilter_umh_cleanup in favor of umd_cleanup_helper,
from Jarkko Sakkinen.
14) Fix BPF selftests build for resolve_btfids under custom HOSTCFLAGS,
from Viktor Malik.
* tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (49 commits)
bpf, docs: Document existing macros instead of deprecated
bpf, docs: BPF Iterator Document
selftests/bpf: Fix compilation failure for prog vrf_socket_lookup
selftests/bpf: Add vrf_socket_lookup tests
bpf: Fix bpf socket lookup from tc/xdp to respect socket VRF bindings
bpf: Call __bpf_sk_lookup()/__bpf_skc_lookup() directly via TC hookpoint
bpf: Factor out socket lookup functions for the TC hookpoint.
selftests/bpf: Set the default value of consumer_cnt as 0
selftests/bpf: Ensure that next_cpu() returns a valid CPU number
selftests/bpf: Output the correct error code for pthread APIs
selftests/bpf: Use producer_cnt to allocate local counter array
xsk: Remove unused inline function xsk_buff_discard()
bpf: Keep BPF_PROG_LOAD permission checks clear of validations
bpf: Centralize permissions checks for all BPF map types
bpf: Inline map creation logic in map_create() function
bpf: Move unprivileged checks into map_create() and bpf_prog_load()
bpf: Remove in_atomic() from bpf_link_put().
selftests/bpf: Verify that check_ids() is used for scalars in regsafe()
bpf: Verify scalar ids mapping in regsafe() using check_ids()
selftests/bpf: Check if mark_chain_precision() follows scalar ids
...
====================
Link: https://lore.kernel.org/r/20230623211256.8409-1-daniel@iogearbox.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
This is a small step towards a model where GUP itself would not expand
the stack, and any user that needs GUP to not look up existing mappings,
but actually expand on them, would have to do so manually before-hand,
and with the mm lock held for writing.
It turns out that execve() already did almost exactly that, except it
didn't take the mm lock at all (it's single-threaded so no locking
technically needed, but it could cause lockdep errors). And it only did
it for the CONFIG_STACK_GROWSUP case, since in that case GUP has
obviously never expanded the stack downwards.
So just make that CONFIG_STACK_GROWSUP case do the right thing with
locking, and enable it generally. This will eventually help GUP, and in
the meantime avoids a special case and the lockdep issue.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Make calls to extend_vma() and find_extend_vma() fail if the write lock
is required.
To avoid making this a flag-day event, this still allows the old
read-locking case for the trivial situations, and passes in a flag to
say "is it write-locked". That way write-lockers can say "yes, I'm
being careful", and legacy users will continue to work in all the common
cases until they have been fully converted to the new world order.
Co-Developed-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
This is one of the simple cases, except there's no pt_regs pointer.
Which is fine, as lock_mm_and_find_vma() is set up to work fine with a
NULL pt_regs.
Powerpc already enabled LOCK_MM_AND_FIND_VMA for the main CPU faulting,
so we can just use the helper without any extra work.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
This does the simple pattern conversion of alpha, arc, csky, hexagon,
loongarch, nios2, sh, sparc32, and xtensa to the lock_mm_and_find_vma()
helper. They all have the regular fault handling pattern without odd
special cases.
The remaining architectures all have something that keeps us from a
straightforward conversion: ia64 and parisc have stacks that can grow
both up as well as down (and ia64 has special address region checks).
And m68k, microblaze, openrisc, sparc64, and um end up having extra
rules about only expanding the stack down a limited amount below the
user space stack pointer. That is something that x86 used to do too
(long long ago), and it probably could just be skipped, but it still
makes the conversion less than trivial.
Note that this conversion was done manually and with the exception of
alpha without any build testing, because I have a fairly limited cross-
building environment. The cases are all simple, and I went through the
changes several times, but...
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
arm has an additional check for address < FIRST_USER_ADDRESS before
expanding the stack. Since FIRST_USER_ADDRESS is defined everywhere
(generally as 0), move that check to the generic expand_downwards().
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
This converts arm64 to use the new page fault helper. It was very
straightforward, but still needed a fix for the "obvious" conversion I
initially did. Thanks to Suren for the fix and testing.
Fixed-and-tested-by: Suren Baghdasaryan <surenb@google.com>
Unnecessary-code-removal-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
This is done as a separate patch from introducing the new
lock_mm_and_find_vma() helper, because while it's an obvious change,
it's not what x86 used to do in this area.
We already abort the page fault on fatal signals anyway, so why should
we wait for the mmap lock only to then abort later? With the new helper
function that returns without the lock held on failure anyway, this is
particularly easy and straightforward.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
.. and make x86 use it.
This basically extracts the existing x86 "find and expand faulting vma"
code, but extends it to also take the mmap lock for writing in case we
actually do need to expand the vma.
We've historically short-circuited that case, and have some rather ugly
special logic to serialize the stack segment expansion (since we only
hold the mmap lock for reading) that doesn't match the normal VM
locking.
That slight violation of locking worked well, right up until it didn't:
the maple tree code really does want proper locking even for simple
extension of an existing vma.
So extract the code for "look up the vma of the fault" from x86, fix it
up to do the necessary write locking, and make it available as a helper
function for other architectures that can use the common helper.
Note: I say "common helper", but it really only handles the normal
stack-grows-down case. Which is all architectures except for PA-RISC
and IA64. So some rare architectures can't use the helper, but if they
care they'll just need to open-code this logic.
It's also worth pointing out that this code really would like to have an
optimistic "mmap_upgrade_trylock()" to make it quicker to go from a
read-lock (for the common case) to taking the write lock (for having to
extend the vma) in the normal single-threaded situation where there is
no other locking activity.
But that _is_ all the very uncommon special case, so while it would be
nice to have such an operation, it probably doesn't matter in reality.
I did put in the skeleton code for such a possible future expansion,
even if it only acts as pseudo-documentation for what we're doing.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
There is a potential OOB read at fast_imageblit, for
"colortab[(*src >> 4)]" can become a negative value due to
"const char *s = image->data, *src".
This change makes sure the index for colortab always positive
or zero.
Similar commit:
https://patchwork.kernel.org/patch/11746067
Potential bug report:
https://groups.google.com/g/syzkaller-bugs/c/9ubBXKeKXf4/m/k-QXy4UgAAAJ
Signed-off-by: Zhang Shurong <zhang_shurong@foxmail.com>
Cc: stable@vger.kernel.org
Signed-off-by: Helge Deller <deller@gmx.de>
|
|
The ret variable in the vmd_enable_domain() function was used
uninitialized when printing a warning message upon failure of
the pci_reset_bus() function.
Thus, fix the issue by assigning ret with the value returned from
pci_reset_bus() before referencing it in the warning message.
This was detected by Smatch:
drivers/pci/controller/vmd.c:931 vmd_enable_domain() error: uninitialized symbol 'ret'.
[kwilczynski: drop the second patch from the series, add missing reported
by tag, commit log]
Fixes: 0a584655ef89 ("PCI: vmd: Fix secondary bus reset for Intel bridges")
Link: https://lore.kernel.org/all/202305270219.B96IiIfv-lkp@intel.com
Link: https://lore.kernel.org/linux-pci/20230420094332.1507900-2-korantwork@gmail.com
Reported-by: kernel test robot <lkp@intel.com>
Reported-by: Dan Carpenter <error27@gmail.com>
Signed-off-by: Xinghui Li <korantli@tencent.com>
Signed-off-by: Krzysztof Wilczyński <kwilczynski@kernel.org>
Reviewed-by: Nirmal Patel <nirmal.patel@linux.intel.com>
|
|
The series 2022/2023 reports slightly longer vendor/product strings
and shares USB ids. Technically the reply size is the USB HID packet
size (64 bytes) but all the supported commands do not use more than 8
bytes and replies reporting back strings do not use more then 24 bytes
(vendor and product are in one string in the newer devices now). The
rest of the reply is always filled with '\0'. Also update comments
and documentation accordingly.
Signed-off-by: Wilken Gottwalt <wilken.gottwalt@posteo.net>
Link: https://lore.kernel.org/r/ZJbB72CAPmLflhHG@monster.localdomain
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
|
|
The .remove() callback for a platform driver returns an int which makes
many driver authors wrongly assume it's possible to do error handling by
returning an error code. However the value returned is (mostly) ignored
and this typically results in resource leaks. To improve here there is a
quest to make the remove callback return void. In the first step of this
quest all drivers are converted to .remove_new() which already returns
void.
Trivially convert this driver from always returning zero in the remove
callback to the void returning variant.
Link: https://lore.kernel.org/linux-pci/20230321193208.366561-16-u.kleine-koenig@pengutronix.de
Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Signed-off-by: Krzysztof Wilczyński <kwilczynski@kernel.org>
|
|
The .remove() callback for a platform driver returns an int which makes
many driver authors wrongly assume it's possible to do error handling by
returning an error code. However the value returned is (mostly) ignored
and this typically results in resource leaks. To improve here there is a
quest to make the remove callback return void. In the first step of this
quest all drivers are converted to .remove_new() which already returns
void.
Trivially convert this driver from always returning zero in the remove
callback to the void returning variant.
Link: https://lore.kernel.org/linux-pci/20230321193208.366561-15-u.kleine-koenig@pengutronix.de
Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Signed-off-by: Krzysztof Wilczyński <kwilczynski@kernel.org>
|
|
The .remove() callback for a platform driver returns an int which makes
many driver authors wrongly assume it's possible to do error handling by
returning an error code. However the value returned is (mostly) ignored
and this typically results in resource leaks. To improve here there is a
quest to make the remove callback return void. In the first step of this
quest all drivers are converted to .remove_new() which already returns
void.
Trivially convert this driver from always returning zero in the remove
callback to the void returning variant.
Link: https://lore.kernel.org/linux-pci/20230321193208.366561-14-u.kleine-koenig@pengutronix.de
Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Signed-off-by: Krzysztof Wilczyński <kwilczynski@kernel.org>
|
|
The .remove() callback for a platform driver returns an int which makes
many driver authors wrongly assume it's possible to do error handling by
returning an error code. However the value returned is (mostly) ignored
and this typically results in resource leaks. To improve here there is a
quest to make the remove callback return void. In the first step of this
quest all drivers are converted to .remove_new() which already returns
void.
Trivially convert this driver from always returning zero in the remove
callback to the void returning variant.
Link: https://lore.kernel.org/linux-pci/20230321193208.366561-13-u.kleine-koenig@pengutronix.de
Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Signed-off-by: Krzysztof Wilczyński <kwilczynski@kernel.org>
|
|
The .remove() callback for a platform driver returns an int which makes
many driver authors wrongly assume it's possible to do error handling by
returning an error code. However the value returned is (mostly) ignored
and this typically results in resource leaks. To improve here there is a
quest to make the remove callback return void. In the first step of this
quest all drivers are converted to .remove_new() which already returns
void.
Trivially convert this driver from always returning zero in the remove
callback to the void returning variant.
Link: https://lore.kernel.org/linux-pci/20230321193208.366561-12-u.kleine-koenig@pengutronix.de
Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Signed-off-by: Krzysztof Wilczyński <kwilczynski@kernel.org>
Reviewed-by: Sergio Paracuellos <sergio.paracuellos@gmail.com>
|
|
The .remove() callback for a platform driver returns an int which makes
many driver authors wrongly assume it's possible to do error handling by
returning an error code. However the value returned is (mostly) ignored
and this typically results in resource leaks. To improve here there is a
quest to make the remove callback return void. In the first step of this
quest all drivers are converted to .remove_new() which already returns
void.
Trivially convert this driver from always returning zero in the remove
callback to the void returning variant.
Link: https://lore.kernel.org/linux-pci/20230321193208.366561-11-u.kleine-koenig@pengutronix.de
Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Signed-off-by: Krzysztof Wilczyński <kwilczynski@kernel.org>
|
|
The .remove() callback for a platform driver returns an int which makes
many driver authors wrongly assume it's possible to do error handling by
returning an error code. However the value returned is (mostly) ignored
and this typically results in resource leaks. To improve here there is a
quest to make the remove callback return void. In the first step of this
quest all drivers are converted to .remove_new() which already returns
void.
Trivially convert this driver from always returning zero in the remove
callback to the void returning variant.
Link: https://lore.kernel.org/linux-pci/20230321193208.366561-10-u.kleine-koenig@pengutronix.de
Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Signed-off-by: Krzysztof Wilczyński <kwilczynski@kernel.org>
|
|
The .remove() callback for a platform driver returns an int which makes
many driver authors wrongly assume it's possible to do error handling by
returning an error code. However the value returned is (mostly) ignored
and this typically results in resource leaks.
The iproc driver always returns 0, it's just a bit hidden. So make
iproc_pcie_remove() return void instead of always zero and convert the
platform driver to the alternative remove callback that returns void and
eventually replaces the int returning callback.
Link: https://lore.kernel.org/linux-pci/20230321193208.366561-9-u.kleine-koenig@pengutronix.de
Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Signed-off-by: Krzysztof Wilczyński <kwilczynski@kernel.org>
Acked-by: Florian Fainelli <f.fainelli@gmail.com>
|
|
The .remove() callback for a platform driver returns an int which makes
many driver authors wrongly assume it's possible to do error handling by
returning an error code. However the value returned is (mostly) ignored
and this typically results in resource leaks. To improve here there is a
quest to make the remove callback return void. In the first step of this
quest all drivers are converted to .remove_new() which already returns
void.
Trivially convert this driver from always returning zero in the remove
callback to the void returning variant.
Link: https://lore.kernel.org/linux-pci/20230321193208.366561-8-u.kleine-koenig@pengutronix.de
Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Signed-off-by: Krzysztof Wilczyński <kwilczynski@kernel.org>
|
|
The .remove() callback for a platform driver returns an int which makes
many driver authors wrongly assume it's possible to do error handling by
returning an error code. However the value returned is (mostly) ignored
and this typically results in resource leaks. To improve here there is a
quest to make the remove callback return void. In the first step of this
quest all drivers are converted to .remove_new() which already returns
void.
Trivially convert the dwc drivers from always returning zero in the remove
callback to the void returning variant.
[kwilczynski: commit log]
Link: https://lore.kernel.org/linux-pci/20230321193208.366561-7-u.kleine-koenig@pengutronix.de
Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Signed-off-by: Krzysztof Wilczyński <kwilczynski@kernel.org>
Acked-by: Serge Semin <fancer.lancer@gmail.com>
|
|
The .remove() callback for a platform driver returns an int which makes
many driver authors wrongly assume it's possible to do error handling by
returning an error code. However the value returned is (mostly) ignored
and this typically results in resource leaks. To improve here there is a
quest to make the remove callback return void. In the first step of this
quest all drivers are converted to .remove_new() which already returns
void.
Trivially convert this driver from always returning zero in the remove
callback to the void returning variant.
[kwilczynski: commit log]
Link: https://lore.kernel.org/linux-pci/20230321193208.366561-6-u.kleine-koenig@pengutronix.de
Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Signed-off-by: Krzysztof Wilczyński <kwilczynski@kernel.org>
|
|
The .remove() callback for a platform driver returns an int which makes
many driver authors wrongly assume it's possible to do error handling by
returning an error code. However the value returned is (mostly) ignored
and this typically results in resource leaks. To improve here there is a
quest to make the remove callback return void. In the first step of this
quest all drivers are converted to .remove_new() which already returns
void.
Trivially convert this driver from always returning zero in the remove
callback to the void returning variant.
Link: https://lore.kernel.org/linux-pci/20230321193208.366561-5-u.kleine-koenig@pengutronix.de
Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Signed-off-by: Krzysztof Wilczyński <kwilczynski@kernel.org>
Acked-by: Florian Fainelli <f.fainelli@gmail.com>
|
|
The .remove() callback for a platform driver returns an int which makes
many driver authors wrongly assume it's possible to do error handling by
returning an error code. However the value returned is (mostly) ignored
and this typically results in resource leaks. To improve here there is a
quest to make the remove callback return void. In the first step of this
quest all drivers are converted to .remove_new() which already returns
void.
Trivially convert this driver from always returning zero in the remove
callback to the void returning variant.
Link: https://lore.kernel.org/linux-pci/20230321193208.366561-4-u.kleine-koenig@pengutronix.de
Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Signed-off-by: Krzysztof Wilczyński <kwilczynski@kernel.org>
|
|
The .remove() callback for a platform driver returns an int which makes
many driver authors wrongly assume it's possible to do error handling by
returning an error code. However the value returned is (mostly) ignored
and this typically results in resource leaks. To improve here there is a
quest to make the remove callback return void. In the first step of this
quest all drivers are converted to .remove_new() which already returns
void.
Trivially convert this driver from always returning zero in the remove
callback to the void returning variant.
Link: https://lore.kernel.org/linux-pci/20230321193208.366561-3-u.kleine-koenig@pengutronix.de
Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Signed-off-by: Krzysztof Wilczyński <kwilczynski@kernel.org>
|
|
The .remove() callback for a platform driver returns an int which makes
many driver authors wrongly assume it's possible to do error handling by
returning an error code. However the value returned is (mostly) ignored
and this typically results in resource leaks. To improve here there is a
quest to make the remove callback return void. In the first step of this
quest all drivers are converted to .remove_new() which already returns
void.
Trivially convert this driver from always returning zero in the remove
callback to the void returning variant.
Link: https://lore.kernel.org/linux-pci/20230321193208.366561-2-u.kleine-koenig@pengutronix.de
Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Signed-off-by: Krzysztof Wilczyński <kwilczynski@kernel.org>
|
|
In the absence of other debug facilities dumping user code around the
unhandled exception address may help debugging the issue.
Signed-off-by: Max Filippov <jcmvbkbc@gmail.com>
|
|
Renesas uses "R-Car" as the name for their product family and development
platform. Thus, correct other variants such as "rcar", "RCar", "Rcar",
etc., to the preferred spelling.
[kwilczynski: commit log]
Link: https://lore.kernel.org/linux-pci/20230607204750.27837-1-wsa+renesas@sang-engineering.com
Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com>
Signed-off-by: Krzysztof Wilczyński <kwilczynski@kernel.org>
Reviewed-by: Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com>
Reviewed-by: Geert Uytterhoeven <geert+renesas@glider.be>
|
|
Commit a3eb95484f27 ("spi: dt-bindings: atmel,at91rm9200-spi: add sam9x7
compatible") adding sam9x7 compatible did not make any sense as it added
new compatible into middle of existing compatible list. The intention
was probably to add new set of compatibles with sam9x7 as first one.
Fixes: a3eb95484f27 ("spi: dt-bindings: atmel,at91rm9200-spi: add sam9x7 compatible")
Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Reviewed-by: Conor Dooley <conor.dooley@microchip.com>
Link: https://lore.kernel.org/r/Message-Id: <20230624082054.37697-1-krzysztof.kozlowski@linaro.org>
Signed-off-by: Mark Brown <broonie@kernel.org>
|
|
Emit a warning when the mod description is missed and only
when the W=1 is enabled.
Reported-by: Roland Kletzing <devzero@web.de>
Link: https://bugzilla.kernel.org/show_bug.cgi?id=10770
Signed-off-by: Vincenzo Palazzo <vincenzopalazzodev@gmail.com>
Tested-by: Nicolas Schier <n.schier@avm.de>
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
|
|
Josh Triplett reports that initramfs-tools needs modules.builtin and
modules.builtin.modinfo to create a working initramfs for a non-modular
kernel.
If this is a general tooling issue not limited to Debian, I think it
makes sense to change modules_install.
This commit changes the targets as follows when CONFIG_MODULES=n.
In-tree builds:
make modules -> no-op
make modules_install -> install modules.builtin(.modinfo)
External module builds:
make modules -> show error message like before
make modules_install -> show error message like before
Link: https://lore.kernel.org/lkml/36a4014c73a52af27d930d3ca31d362b60f4461c.1686356364.git.josh@joshtriplett.org/
Reported-by: Josh Triplett <josh@joshtriplett.org>
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Reviewed-by: Nicolas Schier <nicolas@fjasle.eu>
Tested-by: Nicolas Schier <nicolas@fjasle.eu>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
Tested-by: Josh Triplett <josh@joshtriplett.org>
|
|
Commit d4313a68ec91 ("fbdev/media: Use GPIO descriptors for VIA GPIO")
moves via-gpio.h from include/linux to drivers/video/fbdev/via, but misses
to adjust the file entry for the VIA UNICHROME(PRO)/CHROME9 FRAMEBUFFER
DRIVER section.
Hence, ./scripts/get_maintainer.pl --self-test=patterns complains about a
broken reference.
Remove the file entry in VIA UNICHROME(PRO)/CHROME9 FRAMEBUFFER DRIVER, as
the new location of the header is already covered by the file entry
drivers/video/fbdev/via/.
Signed-off-by: Lukas Bulwahn <lukas.bulwahn@gmail.com>
Fixes: d4313a68ec91 ("fbdev/media: Use GPIO descriptors for VIA GPIO")
Signed-off-by: Helge Deller <deller@gmx.de>
|
|
This test checks if the output of perf stat to match event names and
metrics. So it wants the output lines to have both event name and
metric. Otherwise it should skip the line.
On AMD machines, the instruction event has two metrics and they are printed
in separate lines. It makes the line without event name like below:
# perf stat -a sleep 1
Performance counter stats for 'system wide':
64,383.34 msec cpu-clock # 64.048 CPUs utilized
14,526 context-switches # 225.617 /sec
112 cpu-migrations # 1.740 /sec
190 page-faults # 2.951 /sec
807,558,652 cycles # 0.013 GHz (83.30%)
69,809,799 stalled-cycles-frontend # 8.64% frontend cycles idle (83.30%)
196,983,266 stalled-cycles-backend # 24.39% backend cycles idle (83.30%)
424,876,008 instructions # 0.53 insn per cycle
(here) ---> # 0.46 stalled cycles per insn (83.30%)
97,788,321 branches # 1.519 M/sec (83.34%)
4,147,377 branch-misses # 4.24% of all branches (83.46%)
1.005241409 seconds time elapsed
Also modern Intel machines have TopDown metrics which also don't have
event names.
# perf stat -a sleep 1
Performance counter stats for 'system wide':
8,015.39 msec cpu-clock # 7.996 CPUs utilized
5,823 context-switches # 726.477 /sec
189 cpu-migrations # 23.580 /sec
139 page-faults # 17.342 /sec
435,139,308 cycles # 0.054 GHz
193,891,345 instructions # 0.45 insn per cycle
42,773,028 branches # 5.336 M/sec
2,298,113 branch-misses # 5.37% of all branches
TopdownL1 # 25.5 % tma_backend_bound
/--> # 7.9 % tma_bad_speculation
(here) --+ # 55.7 % tma_frontend_bound
\--> # 10.9 % tma_retiring
1.002395924 seconds time elapsed
There is a check to skip TopdownL1 and TopdownL2 specifically but it
does not cover every affected lines.
So there is another check to skip the line if it has nothing on the left
side of # sign. Well.. it seems ok but that's not enough too.
When aggregation mode (like --per-socket or --per-thread) is used, it
adds some prefix (e.g. CPU socket, task name and PID) in the output
line. So the test code ignores them to normalize result.
A problem can happen for per-thread mode when task name contains one or
more spaces. It'd only ignore the first part of the task name, and it
thinks there's something more in the line so it would not skip.
# perf stat -a --perf-thread sleep 1
...
perf-21276 # 70.2 % tma_backend_bound
perf-21276 # 3.9 % tma_bad_speculation
perf-21276 # 10.5 % tma_frontend_bound
perf-21276 # 15.3 % tma_retiring
^^^^^^^^^^
(ignored)
my task-21328 # 70.2 % tma_backend_bound
my task-21328 # 3.9 % tma_bad_speculation
my task-21328 # 10.5 % tma_frontend_bound
my task-21328 # 15.3 % tma_retiring
^^
(ignored)
So I think it should look at the metric names instead. Add skip_metric
to hold the list of names to skip. It would contain 'stalled cycles per
insn' and metrics started by 'tma_'.
Fixes: 99a04a48f225 ("perf test: Add test case for the standard 'perf stat' output")
Acked-by: Ian Rogers <irogers@google.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Ingo Molnar <mingo@kernel.org>
Link: https://lore.kernel.org/r/20230623230139.985594-2-namhyung@kernel.org
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
|
|
On AMD machines, the perf stat STD output test failed like below:
$ sudo ./perf test -v 98
98: perf stat STD output linter :
--- start ---
test child forked, pid 1841901
Checking STD output: no argswrong event metric.
expected 'GHz' in 108,121 stalled-cycles-frontend # 10.88% frontend cycles idle
test child finished with -1
---- end ----
perf stat STD output linter: FAILED!
This is because there are stalled-cycles-{frontend,backend} events are
used by default. The current logic checks the event_name array to find
which event it's running. But 'cycles' event comes before those stalled
cycles event and it matches first. So it tries to find 'GHz' metric
in the output (which is for the 'cycles') and fails.
Move the stalled-cycles-{frontend,backend} events before 'cycles' so
that it can find the stalled cycles events first.
Also add a space after 'no args' test name for consistency.
Fixes: 99a04a48f225 ("perf test: Add test case for the standard 'perf stat' output")
Acked-by: Ian Rogers <irogers@google.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Ingo Molnar <mingo@kernel.org>
Link: https://lore.kernel.org/r/20230623230139.985594-1-namhyung@kernel.org
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
|
|
Petr Machata says:
====================
mlxsw: Maintain candidate RIFs
The mlxsw driver currently makes the assumption that the user applies
configuration in a bottom-up manner. Thus netdevices need to be added to
the bridge before IP addresses are configured on that bridge or SVI added
on top of it. Enslaving a netdevice to another netdevice that already has
uppers is in fact forbidden by mlxsw for this reason. Despite this safety,
it is rather easy to get into situations where the offloaded configuration
is just plain wrong.
As an example, take a front panel port, configure an IP address: it gets a
RIF. Now enslave the port to the bridge, and the RIF is gone. Remove the
port from the bridge again, but the RIF never comes back. There is a number
of similar situations, where changing the configuration there and back
utterly breaks the offload.
The situation is going to be made better by implementing a range of replays
and post-hoc offloads.
This patch set lays the ground for replay of next hops. The particular
issue that it deals with is that currently, driver-specific bookkeeping for
next hops is hooked off RIF objects, which come and go across the lifetime
of a netdevice. We would rather keep these objects at an entity that
mirrors the lifetime of the netdevice itself. That way they are at hand and
can be offloaded when a RIF is eventually created.
To that end, with this patchset, mlxsw keeps a hash table of CRIFs:
candidate RIFs, persistent handles for netdevices that mlxsw deems
potentially interesting. The lifetime of a CRIF matches that of the
underlying netdevice, and thus a RIF can always assume a CRIF exists. A
CRIF is where next hops are kept, and when RIF is created, these next hops
can be easily offloaded. (Previously only the next hops created after the
RIF was created were offloaded.)
- Patches #1 and #2 are minor adjustments.
- In patches #3 and #4, add CRIF bookkeeping.
- In patch #5, link CRIFs to RIFs such that given a netdevice-backed RIF,
the corresponding CRIF is easy to look up.
- Patch #6 is a clean-up allowed by the previous patches
- Patches #7 and #8 move next hop tracking to CRIFs
No observable effects are intended as of yet. This will be useful once
there is support for RIF creation for netdevices that become mlxsw uppers,
which will come in following patch sets.
====================
Link: https://lore.kernel.org/r/cover.1687438411.git.petrm@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Move the list of next hops from struct mlxsw_sp_rif to mlxsw_sp_crif. The
reason is that eventually, next hops for mlxsw uppers should be offloaded
and unoffloaded on demand as a netdevice becomes an upper, or stops being
one. Currently, next hops are tracked at RIFs, but RIFs do not exist when a
netdevice is not an mlxsw uppers. CRIFs are kept track of throughout the
netdevice lifetime.
Correspondingly, track at each next hop not its RIF, but its CRIF (from
which a RIF can always be deduced).
Note that now that next hops are tracked at a CRIF, it is not necessary to
move each over to a new RIF when it is necessary to edit a RIF. Therefore
drop mlxsw_sp_nexthop_rif_migrate() and have mlxsw_sp_rif_migrate_destroy()
call mlxsw_sp_nexthop_rif_update() directly.
Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Danielle Ratson <danieller@nvidia.com>
Link: https://lore.kernel.org/r/e7c1c0a7dd13883b0f09aeda12c4fcf4d63a70e3.1687438411.git.petrm@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Nexthop finalization consists of two steps: the part where the offload is
removed, because the backing RIF is now gone; and the part where the
association to the RIF is severed.
Extract from mlxsw_sp_nexthop_type_fini() a helper that covers the
unoffloading part, mlxsw_sp_nexthop_type_rif_gone(), so that it can later
be called independently.
Note that this swaps around the ordering of mlxsw_sp_nexthop_ipip_fini()
vs. mlxsw_sp_nexthop_rif_fini(). The current ordering is more of a
historical happenstance than a conscious decision. The two cleanups do not
depend on each other, and this change should have no observable effects.
Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Danielle Ratson <danieller@nvidia.com>
Link: https://lore.kernel.org/r/7134559534c5f5c4807c3a1569fae56f8887e763.1687438411.git.petrm@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
A previous patch added a pointer to loopback CRIF to the router data
structure. That makes the loopback RIF index redundant, as everything
necessary can be derived from the CRIF. Drop the field and adjust the code
accordingly.
Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Danielle Ratson <danieller@nvidia.com>
Link: https://lore.kernel.org/r/8637bf959bc5b6c9d5184b9bd8a0cd53c5132835.1687438411.git.petrm@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
When a RIF is about to be created, the registration of the netdevice that
it should be associated with must have been seen in the past, and a CRIF
created. Therefore make this a hard requirement by looking up the CRIF
during RIF creation, and complaining loudly when there isn't one.
This then allows to keep a link between a RIF and its corresponding
CRIF (and back, as the relationship is one-to-at-most-one), which do.
The CRIF will later be useful as the objects tracked there will be
offloaded lazily as a result of RIF creation.
CRIFs are created when an "interesting" netdevice is registered, and
destroyed after such device is unregistered. CRIFs are supposed to already
exist when a RIF creation request arises, and exist at least as long as
that RIF exists. This makes for a simple invariant: it is always safe to
dereference CRIF pointer from "its" RIF.
To guarantee this, CRIFs cannot be removed immediately when the UNREGISTER
event is delivered. The reason is that if a RIF's netdevices has an IPv6
address, removal of this address is notified in an atomic block. To remove
the RIF, the IPv6 removal handler schedules a work item. It must be safe
for this work item to access the associated CRIF as well.
Thus when a netdevice that backs the CRIF is removed, if it still has a
RIF, do not actually free the CRIF, only toggle its can_destroy flag, which
this patch adds. Later on, mlxsw_sp_rif_destroy() collects the CRIF.
Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Danielle Ratson <danieller@nvidia.com>
Link: https://lore.kernel.org/r/68c8e33afa6b8c03c431b435e1685ffdff752e63.1687438411.git.petrm@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
CRIFs are generally not maintained for loopback RIFs. However, the RIF for
the default VRF is used for offloading of blackhole nexthops. Nexthops
expect to have a valid CRIF. Therefore in this patch, add code to maintain
CRIF for the loopback RIF as well.
Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Danielle Ratson <danieller@nvidia.com>
Link: https://lore.kernel.org/r/7f2b2fcc98770167ed1254a904c3f7f585ba43f0.1687438411.git.petrm@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|