summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2023-02-11nfsd: don't destroy global nfs4_file table in per-net shutdownJeff Layton
The nfs4_file table is global, so shutting it down when a containerized nfsd is shut down is wrong and can lead to double-frees. Tear down the nfs4_file_rhltable in nfs4_state_shutdown instead of nfs4_state_shutdown_net. Fixes: d47b295e8d76 ("NFSD: Use rhashtable for managing nfs4_file objects") Link: https://bugzilla.redhat.com/show_bug.cgi?id=2169017 Reported-by: JianHong Yin <jiyin@redhat.com> Signed-off-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2023-02-11perf/x86/intel/uncore: Add Meteor Lake supportKan Liang
The uncore subsystem for Meteor Lake is similar to the previous Alder Lake. The main difference is that MTL provides PMU support for different tiles, while ADL only provides PMU support for the whole package. On ADL, there are CBOX, ARB, and clockbox uncore PMON units. On MTL, they are split into CBOX/HAC_CBOX, ARB/HAC_ARB, and cncu/sncu which provides a fixed counter for clockticks. Also, new MSR addresses are introduced on MTL. The IMC uncore PMON is the same as Alder Lake. Add new PCIIDs of IMC for Meteor Lake. Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/20230210190238.1726237-1-kan.liang@linux.intel.com
2023-02-11x86/perf/zhaoxin: Add stepping check for ZXCsilviazhao
Some of Nano series processors will lead GP when accessing PMC fixed counter. Meanwhile, their hardware support for PMC has not announced externally. So exclude Nano CPUs from ZXC by checking stepping information. This is an unambiguous way to differentiate between ZXC and Nano CPUs. Following are Nano and ZXC FMS information: Nano FMS: Family=6, Model=F, Stepping=[0-A][C-D] ZXC FMS: Family=6, Model=F, Stepping=E-F OR Family=6, Model=0x19, Stepping=0-3 Fixes: 3a4ac121c2ca ("x86/perf: Add hardware performance events support for Zhaoxin CPU.") Reported-by: Arjan <8vvbbqzo567a@nospam.xutrox.com> Reported-by: Kevin Brace <kevinbrace@gmx.com> Signed-off-by: silviazhao <silviazhao-oc@zhaoxin.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://bugzilla.kernel.org/show_bug.cgi?id=212389
2023-02-11perf/x86/intel/ds: Fix the conversion from TSC to perf timeKan Liang
The time order is incorrect when the TSC in a PEBS record is used. $perf record -e cycles:upp dd if=/dev/zero of=/dev/null count=10000 $ perf script --show-task-events perf-exec 0 0.000000: PERF_RECORD_COMM: perf-exec:915/915 dd 915 106.479872: PERF_RECORD_COMM exec: dd:915/915 dd 915 106.483270: PERF_RECORD_EXIT(915:915):(914:914) dd 915 106.512429: 1 cycles:upp: ffffffff96c011b7 [unknown] ([unknown]) ... ... The perf time is from sched_clock_cpu(). The current PEBS code unconditionally convert the TSC to native_sched_clock(). There is a shift between the two clocks. If the TSC is stable, the shift is consistent, __sched_clock_offset. If the TSC is unstable, the shift has to be calculated at runtime. This patch doesn't support the conversion when the TSC is unstable. The TSC unstable case is a corner case and very unlikely to happen. If it happens, the TSC in a PEBS record will be dropped and fall back to perf_event_clock(). Fixes: 47a3aeb39e8d ("perf/x86/intel/pebs: Fix PEBS timestamps overwritten") Reported-by: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lore.kernel.org/all/CAM9d7cgWDVAq8-11RbJ2uGfwkKD6fA-OMwOKDrNUrU_=8MgEjg@mail.gmail.com/
2023-02-11sched/rt: pick_next_rt_entity(): check list_entryPietro Borrello
Commit 326587b84078 ("sched: fix goto retry in pick_next_task_rt()") removed any path which could make pick_next_rt_entity() return NULL. However, BUG_ON(!rt_se) in _pick_next_task_rt() (the only caller of pick_next_rt_entity()) still checks the error condition, which can never happen, since list_entry() never returns NULL. Remove the BUG_ON check, and instead emit a warning in the only possible error condition here: the queue being empty which should never happen. Fixes: 326587b84078 ("sched: fix goto retry in pick_next_task_rt()") Signed-off-by: Pietro Borrello <borrello@diag.uniroma1.it> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Phil Auld <pauld@redhat.com> Reviewed-by: Steven Rostedt (Google) <rostedt@goodmis.org> Link: https://lore.kernel.org/r/20230128-list-entry-null-check-sched-v3-1-b1a71bd1ac6b@diag.uniroma1.it
2023-02-11sched/deadline: Add more reschedule cases to prio_changed_dl()Valentin Schneider
I've been tracking down an issue on a ~5.17ish kernel where: CPUx CPUy <DL task p0 owns an rtmutex M> <p0 depletes its runtime, gets throttled> <rq switches to the idle task> <DL task p1 blocks on M, boost/replenish p0> <No call to resched_curr() happens here> [idle task keeps running here until *something* accidentally sets TIF_NEED_RESCHED] On that kernel, it is quite easy to trigger using rt-tests's deadline_test [1] with the test running on isolated CPUs (this reduces the chance of something unrelated setting TIF_NEED_RESCHED on the idle tasks, making the issue even more obvious as the hung task detector chimes in). I haven't been able to reproduce this using a mainline kernel, even if I revert 2972e3050e35 ("tracing: Make trace_marker{,_raw} stream-like") which gets rid of the lock involved in the above test, *but* I cannot convince myself the issue isn't there from looking at the code. Make prio_changed_dl() issue a reschedule if the current task isn't a deadline one. While at it, ensure a reschedule is emitted when a queued-but-not-current task gets boosted with an earlier deadline that current's. [1]: https://git.kernel.org/pub/scm/utils/rt-tests/rt-tests.git Signed-off-by: Valentin Schneider <vschneid@redhat.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Juri Lelli <juri.lelli@redhat.com> Link: https://lore.kernel.org/r/20230206140612.701871-1-vschneid@redhat.com
2023-02-11sched/fair: sanitize vruntime of entity being placedZhang Qiao
When a scheduling entity is placed onto cfs_rq, its vruntime is pulled to the base level (around cfs_rq->min_vruntime), so that the entity doesn't gain extra boost when placed backwards. However, if the entity being placed wasn't executed for a long time, its vruntime may get too far behind (e.g. while cfs_rq was executing a low-weight hog), which can inverse the vruntime comparison due to s64 overflow. This results in the entity being placed with its original vruntime way forwards, so that it will effectively never get to the cpu. To prevent that, ignore the vruntime of the entity being placed if it didn't execute for much longer than the characteristic sheduler time scale. [rkagan: formatted, adjusted commit log, comments, cutoff value] Signed-off-by: Zhang Qiao <zhangqiao22@huawei.com> Co-developed-by: Roman Kagan <rkagan@amazon.de> Signed-off-by: Roman Kagan <rkagan@amazon.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20230130122216.3555094-1-rkagan@amazon.de
2023-02-11sched/fair: Remove capacity inversion detectionVincent Guittot
Remove the capacity inversion detection which is now handled by util_fits_cpu() returning -1 when we need to continue to look for a potential CPU with better performance. This ends up almost reverting patches below except for some comments: commit da07d2f9c153 ("sched/fair: Fixes for capacity inversion detection") commit aa69c36f31aa ("sched/fair: Consider capacity inversion in util_fits_cpu()") commit 44c7b80bffc3 ("sched/fair: Detect capacity inversion") Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lore.kernel.org/r/20230201143628.270912-3-vincent.guittot@linaro.org
2023-02-11sched/fair: unlink misfit task from cpu overutilizedVincent Guittot
By taking into account uclamp_min, the 1:1 relation between task misfit and cpu overutilized is no more true as a task with a small util_avg may not fit a high capacity cpu because of uclamp_min constraint. Add a new state in util_fits_cpu() to reflect the case that task would fit a CPU except for the uclamp_min hint which is a performance requirement. Use -1 to reflect that a CPU doesn't fit only because of uclamp_min so we can use this new value to take additional action to select the best CPU that doesn't match uclamp_min hint. When util_fits_cpu() returns -1, we will continue to look for a possible CPU with better performance, which replaces Capacity Inversion detection with capacity_orig_of() - thermal_load_avg to detect a capacity inversion. Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org> Reviewed-and-tested-by: Qais Yousef <qyousef@layalina.io> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Dietmar Eggemann <dietmar.eggemann@arm.com> Tested-by: Kajetan Puchalski <kajetan.puchalski@arm.com> Link: https://lore.kernel.org/r/20230201143628.270912-2-vincent.guittot@linaro.org
2023-02-11objtool: mem*() are not uaccess safePeter Zijlstra
For mysterious raisins I listed the new __asan_mem*() functions as being uaccess safe, this is giving objtool fails on KASAN builds because these functions call out to the actual __mem*() functions which are not marked uaccess safe. Removing it doesn't make the robots unhappy. Fixes: 69d4c0d32186 ("entry, kasan, x86: Disallow overriding mem*() functions") Reported-by: "Paul E. McKenney" <paulmck@kernel.org> Bisected-by: Josh Poimboeuf <jpoimboe@kernel.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20230126182302.GA687063@paulmck-ThinkPad-P17-Gen-1
2023-02-11x86/tsc: Do feature check as the very first thingBorislav Petkov (AMD)
Do the feature check as the very first thing in the function. Everything else comes after that and is meaningless work if the TSC CPUID bit is not even set. Switch to cpu_feature_enabled() too, while at it. No functional changes. Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://lore.kernel.org/r/Y5990CUCuWd5jfBH@zn.tnic
2023-02-11x86/tsc: Make recalibrate_cpu_khz() export GPL onlyBorislav Petkov (AMD)
A quick search doesn't reveal any use outside of the kernel - which would be questionable to begin with anyway - so make the export GPL only. Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://lore.kernel.org/r/Y599miBzWRAuOwhg@zn.tnic
2023-02-11x86/cacheinfo: Remove unused trace variableBorislav Petkov (AMD)
15cd8812ab2c ("x86: Remove the CPU cache size printk's") removed the last use of the trace local var. Remove it too and the useless trace cache case. No functional changes. Reported-by: Jiapeng Chong <jiapeng.chong@linux.alibaba.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://lore.kernel.org/r/20230210234541.9694-1-bp@alien8.de Link: http://lore.kernel.org/r/20220705073349.1512-1-jiapeng.chong@linux.alibaba.com
2023-02-11ALSA: hda: make kobj_type structure constantThomas Weißschuh
Since commit ee6d3dd4ed48 ("driver core: make kobj_type constant.") the driver core allows the usage of const struct kobj_type. Take advantage of this to constify the structure definition to prevent modification at runtime. Signed-off-by: Thomas Weißschuh <linux@weissschuh.net> Link: https://lore.kernel.org/r/20230211-kobj_type-sound-v1-1-17107ceb25b7@weissschuh.net Signed-off-by: Takashi Iwai <tiwai@suse.de>
2023-02-11ALSA: hda: Fix codec device field initializanCezary Rojewski
Commit f2bd1c5ae2cb ("ALSA: hda: Fix page fault in snd_hda_codec_shutdown()") relocated initialization of several codec device fields. Due to differences between codec_exec_verb() and snd_hdac_bus_exec_bus() in how they handle VERB execution - the latter does not touch PM - assigning ->exec_verb to codec_exec_verb() causes PM to be engaged before it is configured for the device. Configuration of PM for the ASoC HDAudio sound card is done with snd_hda_set_power_save() during skl_hda_audio_probe() whereas the assignment happens early, in snd_hda_codec_device_init(). Revert to previous behavior to avoid problems caused by too early PM manipulation. Suggested-by: Jason Montleon <jmontleo@redhat.com> Link: https://lore.kernel.org/regressions/CALFERdzKUodLsm6=Ub3g2+PxpNpPtPq3bGBLbff=eZr9_S=YVA@mail.gmail.com Fixes: f2bd1c5ae2cb ("ALSA: hda: Fix page fault in snd_hda_codec_shutdown()") Signed-off-by: Cezary Rojewski <cezary.rojewski@intel.com> Link: https://lore.kernel.org/r/20230210165541.3543604-1-cezary.rojewski@intel.com Signed-off-by: Takashi Iwai <tiwai@suse.de>
2023-02-10Merge branch 'sk-sk_forward_alloc-fixes'Jakub Kicinski
Kuniyuki Iwashima says: ==================== sk->sk_forward_alloc fixes. The first patch fixes a negative sk_forward_alloc by adding sk_rmem_schedule() before skb_set_owner_r(), and second patch removes an unnecessary WARN_ON_ONCE(). v2: https://lore.kernel.org/netdev/20230209013329.87879-1-kuniyu@amazon.com/ v1: https://lore.kernel.org/netdev/20230207183718.54520-1-kuniyu@amazon.com/ ==================== Link: https://lore.kernel.org/r/20230210002202.81442-1-kuniyu@amazon.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-02-10net: Remove WARN_ON_ONCE(sk->sk_forward_alloc) from sk_stream_kill_queues().Kuniyuki Iwashima
Christoph Paasch reported that commit b5fc29233d28 ("inet6: Remove inet6_destroy_sock() in sk->sk_prot->destroy().") started triggering WARN_ON_ONCE(sk->sk_forward_alloc) in sk_stream_kill_queues(). [0 - 2] Also, we can reproduce it by a program in [3]. In the commit, we delay freeing ipv6_pinfo.pktoptions from sk->destroy() to sk->sk_destruct(), so sk->sk_forward_alloc is no longer zero in inet_csk_destroy_sock(). The same check has been in inet_sock_destruct() from at least v2.6, we can just remove the WARN_ON_ONCE(). However, among the users of sk_stream_kill_queues(), only CAIF is not calling inet_sock_destruct(). Thus, we add the same WARN_ON_ONCE() to caif_sock_destructor(). [0]: https://lore.kernel.org/netdev/39725AB4-88F1-41B3-B07F-949C5CAEFF4F@icloud.com/ [1]: https://github.com/multipath-tcp/mptcp_net-next/issues/341 [2]: WARNING: CPU: 0 PID: 3232 at net/core/stream.c:212 sk_stream_kill_queues+0x2f9/0x3e0 Modules linked in: CPU: 0 PID: 3232 Comm: syz-executor.0 Not tainted 6.2.0-rc5ab24eb4698afbe147b424149c529e2a43ec24eb5 #2 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014 RIP: 0010:sk_stream_kill_queues+0x2f9/0x3e0 Code: 03 0f b6 04 02 84 c0 74 08 3c 03 0f 8e ec 00 00 00 8b ab 08 01 00 00 e9 60 ff ff ff e8 d0 5f b6 fe 0f 0b eb 97 e8 c7 5f b6 fe <0f> 0b eb a0 e8 be 5f b6 fe 0f 0b e9 6a fe ff ff e8 02 07 e3 fe e9 RSP: 0018:ffff88810570fc68 EFLAGS: 00010293 RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000 RDX: ffff888101f38f40 RSI: ffffffff8285e529 RDI: 0000000000000005 RBP: 0000000000000ce0 R08: 0000000000000005 R09: 0000000000000000 R10: 0000000000000ce0 R11: 0000000000000001 R12: ffff8881009e9488 R13: ffffffff84af2cc0 R14: 0000000000000000 R15: ffff8881009e9458 FS: 00007f7fdfbd5800(0000) GS:ffff88811b600000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000001b32923000 CR3: 00000001062fc006 CR4: 0000000000170ef0 Call Trace: <TASK> inet_csk_destroy_sock+0x1a1/0x320 __tcp_close+0xab6/0xe90 tcp_close+0x30/0xc0 inet_release+0xe9/0x1f0 inet6_release+0x4c/0x70 __sock_release+0xd2/0x280 sock_close+0x15/0x20 __fput+0x252/0xa20 task_work_run+0x169/0x250 exit_to_user_mode_prepare+0x113/0x120 syscall_exit_to_user_mode+0x1d/0x40 do_syscall_64+0x48/0x90 entry_SYSCALL_64_after_hwframe+0x72/0xdc RIP: 0033:0x7f7fdf7ae28d Code: c1 20 00 00 75 10 b8 03 00 00 00 0f 05 48 3d 01 f0 ff ff 73 31 c3 48 83 ec 08 e8 ee fb ff ff 48 89 04 24 b8 03 00 00 00 0f 05 <48> 8b 3c 24 48 89 c2 e8 37 fc ff ff 48 89 d0 48 83 c4 08 48 3d 01 RSP: 002b:00000000007dfbb0 EFLAGS: 00000293 ORIG_RAX: 0000000000000003 RAX: 0000000000000000 RBX: 0000000000000004 RCX: 00007f7fdf7ae28d RDX: 0000000000000000 RSI: ffffffffffffffff RDI: 0000000000000003 RBP: 0000000000000000 R08: 000000007f338e0f R09: 0000000000000e0f R10: 000000007f338e13 R11: 0000000000000293 R12: 00007f7fdefff000 R13: 00007f7fdefffcd8 R14: 00007f7fdefffce0 R15: 00007f7fdefffcd8 </TASK> [3]: https://lore.kernel.org/netdev/20230208004245.83497-1-kuniyu@amazon.com/ Fixes: b5fc29233d28 ("inet6: Remove inet6_destroy_sock() in sk->sk_prot->destroy().") Reported-by: syzbot <syzkaller@googlegroups.com> Reported-by: Christoph Paasch <christophpaasch@icloud.com> Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-02-10dccp/tcp: Avoid negative sk_forward_alloc by ipv6_pinfo.pktoptions.Kuniyuki Iwashima
Eric Dumazet pointed out [0] that when we call skb_set_owner_r() for ipv6_pinfo.pktoptions, sk_rmem_schedule() has not been called, resulting in a negative sk_forward_alloc. We add a new helper which clones a skb and sets its owner only when sk_rmem_schedule() succeeds. Note that we move skb_set_owner_r() forward in (dccp|tcp)_v6_do_rcv() because tcp_send_synack() can make sk_forward_alloc negative before ipv6_opt_accepted() in the crossed SYN-ACK or self-connect() cases. [0]: https://lore.kernel.org/netdev/CANn89iK9oc20Jdi_41jb9URdF210r7d1Y-+uypbMSbOfY6jqrg@mail.gmail.com/ Fixes: 323fbd0edf3f ("net: dccp: Add handling of IPV6_PKTOPTIONS to dccp_v6_do_rcv()") Fixes: 3df80d9320bc ("[DCCP]: Introduce DCCPv6") Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-02-10i40e: Add checking for null for nlmsg_find_attr()Natalia Petrova
The result of nlmsg_find_attr() 'br_spec' is dereferenced in nla_for_each_nested(), but it can take NULL value in nla_find() function, which will result in an error. Found by Linux Verification Center (linuxtesting.org) with SVACE. Fixes: 51616018dd1b ("i40e: Add support for getlink, setlink ndo ops") Signed-off-by: Natalia Petrova <n.petrova@fintech.ru> Reviewed-by: Jesse Brandeburg <jesse.brandeburg@intel.com> Tested-by: Gurucharan G <gurucharanx.g@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Link: https://lore.kernel.org/r/20230209172833.3596034-1-anthony.l.nguyen@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-02-10Merge branch '40GbE' of ↵Jakub Kicinski
git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue Tony Nguyen says: ==================== Intel Wired LAN Driver Updates 2023-02-09 (i40e) Jan removes i40e_status from the driver; replacing them with standard kernel error codes. Kees Cook replaces 0-length array with flexible array. * '40GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue: net/i40e: Replace 0-length array with flexible array i40e: use ERR_PTR error print in i40e messages i40e: use int for i40e_status i40e: Remove string printing for i40e_status i40e: Remove unused i40e status codes ==================== Link: https://lore.kernel.org/r/20230209172536.3595838-1-anthony.l.nguyen@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-02-10Merge branch 's390-net-updates-2023-02-06'Jakub Kicinski
Alexandra Winter says: ==================== s390/net: updates 2023-02-06 Just maintenance patches, no functional changes. If you disagree with patch 4, we can leave it out. We prefer scnprintf, but no strong opinion. ==================== Link: https://lore.kernel.org/r/20230209110424.1707501-1-wintera@linux.ibm.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-02-10s390/qeth: Convert sprintf/snprintf to scnprintfThorsten Winkler
This LWN article explains the rationale for this change https: //lwn.net/Articles/69419/ Ie. snprintf() returns what *would* be the resulting length, while scnprintf() returns the actual length. Reported-by: Jules Irenge <jbi.octave@gmail.com> Reviewed-by: Alexandra Winter <wintera@linux.ibm.com> Signed-off-by: Thorsten Winkler <twinkler@linux.ibm.com> Signed-off-by: Alexandra Winter <wintera@linux.ibm.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-02-10s390/qeth: Convert sysfs sprintf to sysfs_emitThorsten Winkler
Following the advice of the Documentation/filesystems/sysfs.rst. All sysfs related show()-functions should only use sysfs_emit() or sysfs_emit_at() when formatting the value to be returned to user space. Reported-by: Jules Irenge <jbi.octave@gmail.com> Reported-by: Joe Perches <joe@perches.com> Reviewed-by: Alexandra Winter <wintera@linux.ibm.com> Signed-off-by: Thorsten Winkler <twinkler@linux.ibm.com> Signed-off-by: Alexandra Winter <wintera@linux.ibm.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-02-10s390/qeth: Use constant for IP address buffersThorsten Winkler
Use INET6_ADDRSTRLEN constant with size of 48 which be used for char arrays storing ip addresses (for IPv4 and IPv6) Reviewed-by: Alexandra Winter <wintera@linux.ibm.com> Signed-off-by: Thorsten Winkler <twinkler@linux.ibm.com> Signed-off-by: Alexandra Winter <wintera@linux.ibm.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-02-10s390/ctcm: cleanup indentingAlexandra Winter
Get rid of multiple smatch warnings, like: warn: inconsistent indenting Reviewed-by: Simon Horman <simon.horman@corigine.com> Signed-off-by: Alexandra Winter <wintera@linux.ibm.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-02-10Merge branch 'net-renesas-rswitch-improve-tx-timestamp-accuracy'Jakub Kicinski
Yoshihiro Shimoda says: ==================== net: renesas: rswitch: Improve TX timestamp accuracy The patch [[123]/4] are minor refacoring for readability. The patch [4/4] is for improving TX timestamp accuracy. To improve the accuracy, it requires refactoring so that this is not a fixed patch. v2: https://lore.kernel.org/all/20230208235721.2336249-1-yoshihiro.shimoda.uh@renesas.com/ v1: https://lore.kernel.org/all/20230208073445.2317192-1-yoshihiro.shimoda.uh@renesas.com/ ==================== Link: https://lore.kernel.org/r/20230209081741.2536034-1-yoshihiro.shimoda.uh@renesas.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-02-10net: renesas: rswitch: Improve TX timestamp accuracyYoshihiro Shimoda
In the previous code, TX timestamp accuracy was bad because the irq handler got the timestamp from the timestamp register at that time. This hardware has "Timestamp capture" feature which can store each TX timestamp into the timestamp descriptors. To improve TX timestamp accuracy, implement timestamp descriptors' handling. Signed-off-by: Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com> Reviewed-by: Alexander Duyck <alexanderduyck@fb.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-02-10net: renesas: rswitch: Remove gptp flag from rswitch_gwca_queueYoshihiro Shimoda
In the previous code, the gptp flag was completely related to the !dir_tx in struct rswitch_gwca_queue because rswitch_gwca_queue_alloc() was called below: < In rswitch_txdmac_alloc() > err = rswitch_gwca_queue_alloc(ndev, priv, rdev->tx_queue, true, false, TX_RING_SIZE); So, dir_tx = true, and gptp = false. < In rswitch_rxdmac_alloc() > err = rswitch_gwca_queue_alloc(ndev, priv, rdev->rx_queue, false, true, RX_RING_SIZE); So, dir_tx = false, and gptp = true. In the future, a new queue handling for timestamp will be implemented and this gptp flag is confusable. So, remove the gptp flag. Signed-off-by: Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com> Reviewed-by: Alexander Duyck <alexanderduyck@fb.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-02-10net: renesas: rswitch: Move linkfix variables to rswitch_gwcaYoshihiro Shimoda
To improve readability, move linkfix related variables to struct rswitch_gwca. Also, rename function names "desc" with "linkfix". Signed-off-by: Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com> Reviewed-by: Alexander Duyck <alexanderduyck@fb.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-02-10net: renesas: rswitch: Rename rings in struct rswitch_gwca_queueYoshihiro Shimoda
To add a new ring which is really related to timestamp (ts_ring) in the future, rename the following members to improve readability: ring --> tx_ring ts_ring --> rx_ring Signed-off-by: Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com> Reviewed-by: Alexander Duyck <alexanderduyck@fb.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-02-10ice: xsk: Fix cleaning of XDP_TX framesLarysa Zaremba
Incrementation of xsk_frames inside the for-loop produces infinite loop, if we have both normal AF_XDP-TX and XDP_TXed buffers to complete. Split xsk_frames into 2 variables (xsk_frames and completed_frames) to eliminate this bug. Fixes: 29322791bc8b ("ice: xsk: change batched Tx descriptor cleaning") Acked-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Signed-off-by: Larysa Zaremba <larysa.zaremba@intel.com> Reviewed-by: Alexander Duyck <alexanderduyck@fb.com> Acked-by: Tony Nguyen <anthony.l.nguyen@intel.com> Link: https://lore.kernel.org/r/20230209160130.1779890-1-larysa.zaremba@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-02-10net/sched: tcindex: update imperfect hash filters respecting rcuPedro Tammela
The imperfect hash area can be updated while packets are traversing, which will cause a use-after-free when 'tcf_exts_exec()' is called with the destroyed tcf_ext. CPU 0: CPU 1: tcindex_set_parms tcindex_classify tcindex_lookup tcindex_lookup tcf_exts_change tcf_exts_exec [UAF] Stop operating on the shared area directly, by using a local copy, and update the filter with 'rcu_replace_pointer()'. Delete the old filter version only after a rcu grace period elapsed. Fixes: 9b0d4446b569 ("net: sched: avoid atomic swap in tcf_exts_change") Reported-by: valis <sec@valis.email> Suggested-by: valis <sec@valis.email> Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: Pedro Tammela <pctammela@mojatatu.com> Link: https://lore.kernel.org/r/20230209143739.279867-1-pctammela@mojatatu.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-02-10net: libwx: fix an error code in wx_alloc_page_pool()Dan Carpenter
This function always returns success. We need to preserve the error code before setting rx_ring->page_pool = NULL. Fixes: 850b971110b2 ("net: libwx: Allocate Rx and Tx resources") Signed-off-by: Dan Carpenter <error27@gmail.com> Link: https://lore.kernel.org/r/Y+T4aoefc1XWvGYb@kili Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-02-10net: dsa: ocelot: add PTP dependency for NET_DSA_MSCC_OCELOT_EXTArnd Bergmann
A new user of MSCC_OCELOT_SWITCH_LIB was added, bringing back an old link failure that was fixed with e5f31552674e ("ethernet: fix PTP_1588_CLOCK dependencies"): x86_64-linux-ld: drivers/net/ethernet/mscc/ocelot_ptp.o: in function `ocelot_ptp_enable': ocelot_ptp.c:(.text+0x8ee): undefined reference to `ptp_find_pin' x86_64-linux-ld: drivers/net/ethernet/mscc/ocelot_ptp.o: in function `ocelot_get_ts_info': ocelot_ptp.c:(.text+0xd5d): undefined reference to `ptp_clock_index' x86_64-linux-ld: drivers/net/ethernet/mscc/ocelot_ptp.o: in function `ocelot_init_timestamp': ocelot_ptp.c:(.text+0x15ca): undefined reference to `ptp_clock_register' x86_64-linux-ld: drivers/net/ethernet/mscc/ocelot_ptp.o: in function `ocelot_deinit_timestamp': ocelot_ptp.c:(.text+0x16b7): undefined reference to `ptp_clock_unregister' Add the same PTP dependency here, as well as in the MSCC_OCELOT_SWITCH_LIB symbol itself to make it more obvious what is going on when the next driver selects it. Fixes: 3d7316ac81ac ("net: dsa: ocelot: add external ocelot switch control") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Colin Foster <colin.foster@in-advantage.com> Link: https://lore.kernel.org/r/20230209124435.1317781-1-arnd@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-02-10sctp: sctp_sock_filter(): avoid list_entry() on possibly empty listPietro Borrello
Use list_is_first() to check whether tsp->asoc matches the first element of ep->asocs, as the list is not guaranteed to have an entry. Fixes: 8f840e47f190 ("sctp: add the sctp_diag.c file") Signed-off-by: Pietro Borrello <borrello@diag.uniroma1.it> Acked-by: Xin Long <lucien.xin@gmail.com> Link: https://lore.kernel.org/r/20230208-sctp-filter-v2-1-6e1f4017f326@diag.uniroma1.it Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-02-10net: ethernet: ti: am65-cpsw: Add RX DMA Channel Teardown QuirkSiddharth Vadapalli
In TI's AM62x/AM64x SoCs, successful teardown of RX DMA Channel raises an interrupt. The process of servicing this interrupt involves flushing all pending RX DMA descriptors and clearing the teardown completion marker (TDCM). The am65_cpsw_nuss_rx_packets() function invoked from the RX NAPI callback services the interrupt. Thus, it is necessary to wait for this handler to run, drain all packets and clear TDCM, before calling napi_disable() in am65_cpsw_nuss_common_stop() function post channel teardown. If napi_disable() executes before ensuring that TDCM is cleared, the TDCM remains set when the interfaces are down, resulting in an interrupt storm when the interfaces are brought up again. Since the interrupt raised to indicate the RX DMA Channel teardown is specific to the AM62x and AM64x SoCs, add a quirk for it. Fixes: 4f7cce272403 ("net: ethernet: ti: am65-cpsw: add support for am64x cpsw3g") Co-developed-by: Vignesh Raghavendra <vigneshr@ti.com> Signed-off-by: Vignesh Raghavendra <vigneshr@ti.com> Signed-off-by: Siddharth Vadapalli <s-vadapalli@ti.com> Reviewed-by: Roger Quadros <rogerq@kernel.org> Link: https://lore.kernel.org/r/20230209084432.189222-1-s-vadapalli@ti.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-02-10Merge branch 'bridge-mcast-preparations-for-vxlan-mdb'Jakub Kicinski
Ido Schimmel says: ==================== bridge: mcast: Preparations for VXLAN MDB This patchset contains small preparations for VXLAN MDB that were split from this RFC [1]. Tested using existing bridge MDB forwarding selftests. [1] https://lore.kernel.org/netdev/20230204170801.3897900-1-idosch@nvidia.com/ ==================== Link: https://lore.kernel.org/r/20230209071852.613102-1-idosch@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-02-10selftests: forwarding: Add MDB dump test casesIdo Schimmel
The kernel maintains three markers for the MDB dump: 1. The last bridge device from which the MDB was dumped. 2. The last MDB entry from which the MDB was dumped. 3. The last port-group entry that was dumped. Add test cases for large scale MDB dump to make sure that all the configured entries are dumped and that the markers are used correctly. Specifically, create 2 bridges with 32 ports and add 256 MDB entries in which all the ports are member of. Test that each bridge reports 8192 (256 * 32) permanent entries. Do that with IPv4, IPv6 and L2 MDB entries. On my system, MDB dump of the above is contained in about 50 netlink messages. Example output: # ./bridge_mdb.sh [...] INFO: # Large scale dump tests TEST: IPv4 large scale dump tests [ OK ] TEST: IPv6 large scale dump tests [ OK ] TEST: L2 large scale dump tests [ OK ] [...] Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Petr Machata <petrm@nvidia.com> Acked-by: Nikolay Aleksandrov <razor@blackwall.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-02-10bridge: mcast: Move validation to a policyIdo Schimmel
Future patches are going to move parts of the bridge MDB code to the common rtnetlink code in preparation for VXLAN MDB support. To facilitate code sharing between both drivers, move the validation of the top level attributes in RTM_{NEW,DEL}MDB messages to a policy that will eventually be moved to the rtnetlink code. Use 'NLA_NESTED' for 'MDBA_SET_ENTRY_ATTRS' instead of NLA_POLICY_NESTED() as this attribute is going to be validated using different policies in the underlying drivers. Signed-off-by: Ido Schimmel <idosch@nvidia.com> Acked-by: Nikolay Aleksandrov <razor@blackwall.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-02-10bridge: mcast: Remove pointless sequence generation counter assignmentIdo Schimmel
The purpose of the sequence generation counter in the netlink callback is to identify if a multipart dump is consistent or not by calling nl_dump_check_consistent() whenever a message is generated. The function is not invoked by the MDB code, rendering the sequence generation counter assignment pointless. Remove it. Note that even if the function was invoked, we still could not accurately determine if the dump is consistent or not, as there is no sequence generation counter for MDB entries, unlike nexthop objects, for example. Signed-off-by: Ido Schimmel <idosch@nvidia.com> Acked-by: Nikolay Aleksandrov <razor@blackwall.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-02-10bridge: mcast: Use correct define in MDB dumpIdo Schimmel
'MDB_PG_FLAGS_PERMANENT' and 'MDB_PERMANENT' happen to have the same value, but the latter is uAPI and cannot change, so use it when dumping an MDB entry. Signed-off-by: Ido Schimmel <idosch@nvidia.com> Acked-by: Nikolay Aleksandrov <razor@blackwall.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-02-10Merge tag 'for-net-next-2023-02-09' of ↵Jakub Kicinski
git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth-next Luiz Augusto von Dentz says: ==================== pull-request: bluetooth-next - Add new PID/VID 0489:e0f2 for MT7921 - Add VID:PID 13d3:3529 for Realtek RTL8821CE - Add CIS feature bits to controller information - Set Per Platform Antenna Gain(PPAG) for Intel controllers * tag 'for-net-next-2023-02-09' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth-next: Bluetooth: btintel: Set Per Platform Antenna Gain(PPAG) Bluetooth: Make sure LE create conn cancel is sent when timeout Bluetooth: Free potentially unfreed SCO connection Bluetooth: hci_qca: get wakeup status from serdev device handle Bluetooth: L2CAP: Fix potential user-after-free Bluetooth: MGMT: add CIS feature bits to controller information Bluetooth: hci_conn: Refactor hci_bind_bis() since it always succeeds Bluetooth: HCI: Replace zero-length arrays with flexible-array members Bluetooth: qca: Fix sparse warnings Bluetooth: btusb: Add VID:PID 13d3:3529 for Realtek RTL8821CE Bluetooth: btusb: Add new PID/VID 0489:e0f2 for MT7921 Bluetooth: Fix issue with Actions Semi ATS2851 based devices ==================== Link: https://lore.kernel.org/r/20230209234922.3756173-1-luiz.dentz@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-02-10Merge branch 'bpf, mm: introduce cgroup.memory=nobpf'Alexei Starovoitov
Yafang Shao says: ==================== The bpf memory accouting has some known problems in contianer environment, - The container memory usage is not consistent if there's pinned bpf program After the container restart, the leftover bpf programs won't account to the new generation, so the memory usage of the container is not consistent. This issue can be resolved by introducing selectable memcg, but we don't have an agreement on the solution yet. See also the discussions at https://lwn.net/Articles/905150/ . - The leftover non-preallocated bpf map can't be limited The leftover bpf map will be reparented, and thus it will be limited by the parent, rather than the container itself. Furthermore, if the parent is destroyed, it be will limited by its parent's parent, and so on. It can also be resolved by introducing selectable memcg. - The memory dynamically allocated in bpf prog is charged into root memcg only Nowdays the bpf prog can dynamically allocate memory, for example via bpf_obj_new(), but it only allocate from the global bpf_mem_alloc pool, so it will charge into root memcg only. That needs to be addressed by a new proposal. So let's give the container user an option to disable bpf memory accouting. The idea of "cgroup.memory=nobpf" is originally by Tejun[1]. [1]. https://lwn.net/ml/linux-mm/YxjOawzlgE458ezL@slm.duckdns.org/ Changes, v1->v2: - squash patches (Roman) - commit log improvement in patch #2. (Johannes) ==================== Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-02-10bpf: allow to disable bpf prog memory accountingYafang Shao
We can simply disable the bpf prog memory accouting by not setting the GFP_ACCOUNT. Signed-off-by: Yafang Shao <laoar.shao@gmail.com> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Acked-by: Roman Gushchin <roman.gushchin@linux.dev> Link: https://lore.kernel.org/r/20230210154734.4416-5-laoar.shao@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-02-10bpf: allow to disable bpf map memory accountingYafang Shao
We can simply set root memcg as the map's memcg to disable bpf memory accounting. bpf_map_area_alloc is a little special as it gets the memcg from current rather than from the map, so we need to disable GFP_ACCOUNT specifically for it. Signed-off-by: Yafang Shao <laoar.shao@gmail.com> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Acked-by: Roman Gushchin <roman.gushchin@linux.dev> Link: https://lore.kernel.org/r/20230210154734.4416-4-laoar.shao@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-02-10bpf: use bpf_map_kvcalloc in bpf_local_storageYafang Shao
Introduce new helper bpf_map_kvcalloc() for the memory allocation in bpf_local_storage(). Then the allocation will charge the memory from the map instead of from current, though currently they are the same thing as it is only used in map creation path now. By charging map's memory into the memcg from the map, it will be more clear. Signed-off-by: Yafang Shao <laoar.shao@gmail.com> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Acked-by: Roman Gushchin <roman.gushchin@linux.dev> Link: https://lore.kernel.org/r/20230210154734.4416-3-laoar.shao@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-02-10mm: memcontrol: add new kernel parameter cgroup.memory=nobpfYafang Shao
Add new kernel parameter cgroup.memory=nobpf to allow user disable bpf memory accounting. This is a preparation for the followup patch. Signed-off-by: Yafang Shao <laoar.shao@gmail.com> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Acked-by: Roman Gushchin <roman.gushchin@linux.dev> Link: https://lore.kernel.org/r/20230210154734.4416-2-laoar.shao@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-02-10docs, bpf: Ensure IETF's BPF mailing list gets copied for ISA doc changesDaniel Borkmann
Given BPF is increasingly being used beyond just the Linux kernel, with implementations in NICs and other hardware, Windows, etc, there is an ongoing effort to document and standardize parts of the existing BPF infrastructure such as its ISA. As "source of truth" we decided some time ago to rely on the in-tree documentation, in particular, starting out with the Documentation/bpf/instruction-set.rst as a base for later RFC drafts on the ISA. Therefore, we want to ensure that changes to that document have bpf@ietf.org in Cc, so add a MAINTAINERS file entry with a section on documents related to standardization efforts. For now, this only relates to instruction-set.rst, and later additional files will be added. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Cc: Dave Thaler <dthaler@microsoft.com> Cc: bpf@ietf.org Link: https://datatracker.ietf.org/doc/bofreq-thaler-bpf-ebpf/ Link: https://lore.kernel.org/r/57619c0dd8e354d82bf38745f99405e3babdc970.1676068387.git.daniel@iogearbox.net Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-02-10Daniel Borkmann says:Jakub Kicinski
==================== pull-request: bpf-next 2023-02-11 We've added 96 non-merge commits during the last 14 day(s) which contain a total of 152 files changed, 4884 insertions(+), 962 deletions(-). There is a minor conflict in drivers/net/ethernet/intel/ice/ice_main.c between commit 5b246e533d01 ("ice: split probe into smaller functions") from the net-next tree and commit 66c0e13ad236 ("drivers: net: turn on XDP features") from the bpf-next tree. Remove the hunk given ice_cfg_netdev() is otherwise there a 2nd time, and add XDP features to the existing ice_cfg_netdev() one: [...] ice_set_netdev_features(netdev); netdev->xdp_features = NETDEV_XDP_ACT_BASIC | NETDEV_XDP_ACT_REDIRECT | NETDEV_XDP_ACT_XSK_ZEROCOPY; ice_set_ops(netdev); [...] Stephen's merge conflict mail: https://lore.kernel.org/bpf/20230207101951.21a114fa@canb.auug.org.au/ The main changes are: 1) Add support for BPF trampoline on s390x which finally allows to remove many test cases from the BPF CI's DENYLIST.s390x, from Ilya Leoshkevich. 2) Add multi-buffer XDP support to ice driver, from Maciej Fijalkowski. 3) Add capability to export the XDP features supported by the NIC. Along with that, add a XDP compliance test tool, from Lorenzo Bianconi & Marek Majtyka. 4) Add __bpf_kfunc tag for marking kernel functions as kfuncs, from David Vernet. 5) Add a deep dive documentation about the verifier's register liveness tracking algorithm, from Eduard Zingerman. 6) Fix and follow-up cleanups for resolve_btfids to be compiled as a host program to avoid cross compile issues, from Jiri Olsa & Ian Rogers. 7) Batch of fixes to the BPF selftest for xdp_hw_metadata which resulted when testing on different NICs, from Jesper Dangaard Brouer. 8) Fix libbpf to better detect kernel version code on Debian, from Hao Xiang. 9) Extend libbpf to add an option for when the perf buffer should wake up, from Jon Doron. 10) Follow-up fix on xdp_metadata selftest to just consume on TX completion, from Stanislav Fomichev. 11) Extend the kfuncs.rst document with description on kfunc lifecycle & stability expectations, from David Vernet. 12) Fix bpftool prog profile to skip attaching to offline CPUs, from Tonghao Zhang. ==================== Link: https://lore.kernel.org/r/20230211002037.8489-1-daniel@iogearbox.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-02-10Documentation: isdn: correct spellingRandy Dunlap
Correct spelling problems for Documentation/isdn/ as reported by codespell. Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Cc: Karsten Keil <isdn@linux-pingi.de> Cc: Jonathan Corbet <corbet@lwn.net> Link: https://lore.kernel.org/r/20230209071400.31476-9-rdunlap@infradead.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>