Age | Commit message (Collapse) | Author |
|
Next patch will set the same hardware domain for all bridge ports,
including VXLAN, to prevent packets from being forwarded by software when
they were already forwarded by hardware.
ARP packets are not flooded by hardware to VXLAN, so software should handle
such flooding. When hardware domain of VXLAN device will be changed, ARP
packets which are trapped and marked with offload_fwd_mark will not be
flooded to VXLAN also in software, which will break VXLAN traffic.
To prevent such breaking, trap ARP packets at layer 2 and don't mark them
as L2-forwarded in hardware, then flooding ARP packets will be done only
in software, and VXLAN will send ARP packets.
Remove NVE_ENCAP_ARP which is no longer needed, as now ARP packets are
trapped when they enter the device.
Signed-off-by: Amit Cohen <amcohen@nvidia.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/b2a2cc607a1f4cb96c10bd3b0b0244ba3117fd2e.1742224300.git.petrm@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Marek Behún says:
====================
Fixes for mv88e6xxx (mainly 6320 family)
v1: https://patchwork.kernel.org/project/netdevbpf/cover/20250313134146.27087-1-kabel@kernel.org/
====================
Link: https://patch.msgid.link/20250317173250.28780-1-kabel@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Implement the workaround for erratum
3.3 RGMII timing may be out of spec when transmit delay is enabled
for the 6320 family, which says:
When transmit delay is enabled via Port register 1 bit 14 = 1, duty
cycle may be out of spec. Under very rare conditions this may cause
the attached device receive CRC errors.
Signed-off-by: Marek Behún <kabel@kernel.org>
Cc: <stable@vger.kernel.org> # 5.4.x
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://patch.msgid.link/20250317173250.28780-8-kabel@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Fix internal PHYs definition for the 6320 family, which has only 2
internal PHYs (on ports 3 and 4).
Fixes: bc3931557d1d ("net: dsa: mv88e6xxx: Add number of internal PHYs")
Signed-off-by: Marek Behún <kabel@kernel.org>
Cc: <stable@vger.kernel.org> # 6.6.x
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://patch.msgid.link/20250317173250.28780-7-kabel@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Commit c050f5e91b47 ("net: dsa: mv88e6xxx: Fill in STU support for all
supported chips") introduced STU methods, but did not add them to the
6320 family. Fix it.
Fixes: c050f5e91b47 ("net: dsa: mv88e6xxx: Fill in STU support for all supported chips")
Signed-off-by: Marek Behún <kabel@kernel.org>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://patch.msgid.link/20250317173250.28780-6-kabel@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Commit f3a2cd326e44 ("net: dsa: mv88e6xxx: introduce .port_set_policy")
did not add the .port_set_policy() method for the 6320 family. Fix it.
Fixes: f3a2cd326e44 ("net: dsa: mv88e6xxx: introduce .port_set_policy")
Signed-off-by: Marek Behún <kabel@kernel.org>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://patch.msgid.link/20250317173250.28780-5-kabel@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Commit f36456522168 ("net: dsa: mv88e6xxx: move PVT description in
info") did not enable PVT for 6321 switch. Fix it.
Fixes: f36456522168 ("net: dsa: mv88e6xxx: move PVT description in info")
Signed-off-by: Marek Behún <kabel@kernel.org>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://patch.msgid.link/20250317173250.28780-4-kabel@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
The atu_move_port_mask for 6341 family (Topaz) is 0xf, not 0x1f. The
PortVec field is 8 bits wide, not 11 as in 6390 family. Fix this.
Fixes: e606ca36bbf2 ("net: dsa: mv88e6xxx: rework ATU Remove")
Signed-off-by: Marek Behún <kabel@kernel.org>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://patch.msgid.link/20250317173250.28780-3-kabel@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
The VTU registers of the 6320 family use the 6352 semantics, not 6185.
Fix it.
Fixes: b8fee9571063 ("net: dsa: mv88e6xxx: add VLAN Get Next support")
Signed-off-by: Marek Behún <kabel@kernel.org>
Cc: <stable@vger.kernel.org> # 5.15.x
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://patch.msgid.link/20250317173250.28780-2-kabel@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux
Pull execve updates from Kees Cook:
- elf: Define and use note name macros (Akihiko Odaki)
- elf: add remaining SHF_ flag macros (Timur Tabi)
- binfmt: Remove loader from linux_binprm struct (Yonatan Goldschmidt)
- binfmt_elf_fdpic: fix variable set but not used warning (sunliming)
* tag 'execve-v6.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
binfmt_elf_fdpic: fix variable set but not used warning
elf: add remaining SHF_ flag macros
binfmt: Remove loader from linux_binprm struct
crash: Remove KEXEC_CORE_NOTE_NAME
s390/crash: Use note name macros
crash: Use note name macros
powerpc/crash: Use note name macros
binfmt_elf: Use note name macros
elf: Define note name macros
|
|
A change to the clk driver broke configurations that enable DA830
but not DA850:
arm-linux-gnueabi-ld: drivers/clk/davinci/pll.o: in function `__da850_pll0_of_clk_init_declare':
pll.c:(.init.text+0x30): undefined reference to `of_da850_pll0_init'
arm-linux-gnueabi-ld: drivers/clk/davinci/pll.o:(.rodata.davinci_pll_id_table+0x14): undefined reference to `da850_pll0_init'
arm-linux-gnueabi-ld: drivers/clk/davinci/pll.o:(.rodata.davinci_pll_id_table+0x2c): undefined reference to `da850_pll1_init'
arm-linux-gnueabi-ld: drivers/clk/davinci/pll.o:(.rodata.davinci_pll_of_match+0xc0): undefined reference to `of_da850_pll1_init'
arm-linux-gnueabi-ld: drivers/clk/davinci/psc.o:(.rodata.davinci_psc_id_table+0x14): undefined reference to `da850_psc0_init_data'
arm-linux-gnueabi-ld: drivers/clk/davinci/psc.o:(.rodata.davinci_psc_id_table+0x2c): undefined reference to `da850_psc1_init_data'
Select ARCH_DAVINCI_DA850 unconditionally to ensure the driver can still
build.
Fixes: a31b4dcf188c ("clk: davinci: remove support for da830")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
|
|
Add possibility to supply the container name to rv list:
# rv list sched
mon1
mon2
mon3
This lists only monitors in sched, without indentation.
Supplying -h, any option (string starting with -) or more than 1
argument will still print the usage.
Passing a non-existent container prints nothing and passing no container
continues to print all monitors, showing indentation for nested
monitors, reported after their container.
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Juri Lelli <juri.lelli@redhat.com>
Link: https://lore.kernel.org/20250305140406.350227-10-gmonaco@redhat.com
Signed-off-by: Gabriele Monaco <gmonaco@redhat.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
|
|
Add man page and kernel documentation for the sched monitors, as sched
is a container of other monitors, document all in the same page.
sched is the first nested monitor, also explain what is a nested monitor
and how enabling containers or children monitors work.
To: Ingo Molnar <mingo@redhat.com>
To: Peter Zijlstra <peterz@infradead.org>
Cc: Juri Lelli <juri.lelli@redhat.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: John Kacur <jkacur@redhat.com>
Cc: Clark Williams <williams@redhat.com>
Link: https://lore.kernel.org/20250305140406.350227-9-gmonaco@redhat.com
Signed-off-by: Gabriele Monaco <gmonaco@redhat.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
|
|
RV now supports nested monitors, this functionality requires a container
monitor, which has virtually no functionality besides holding other
monitors, and nested monitors, that have a container as parent.
Add the -p flag to pass a parent to a monitor, this sets it up while
registering the monitor and adds necessary includes and configurations.
Add the -c flag to create a container, since containers are empty, we
don't allow supplying a dot model or a monitor type, the template is
also different since functions to enable and disable the monitor are not
defined, nor any tracepoint. The generated header file only allows to
include the rv_monitor structure in children monitors.
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Juri Lelli <juri.lelli@redhat.com>
Link: https://lore.kernel.org/20250305140406.350227-8-gmonaco@redhat.com
Signed-off-by: Gabriele Monaco <gmonaco@redhat.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
|
|
RV now supports nested monitors, this functionality requires a container
monitor, which has virtually no functionality besides holding other
monitors, and nested monitors, that have a container as parent.
Nested monitors' sysfs folders are physically nested in the container's
folder, and they are listed in the available_monitors file with the
notation container:monitor.
These changes go against the assumption that each line in the
available_monitors file correspond to a folder in the rv directory,
breaking the functionality of the rv tool.
Add support for nested containers in the rv userspace tool, indenting
nested monitors while listed and allowing both the notation with and
without container name, which are equivalent:
# rv list
mon1
mon2
container:
- nested1
- nested2
## notation with container name
# rv mon container:nested1
## notation without container name
# rv mon nested1
Either way, enabling a nested monitor is the same as enabling any other
non-nested monitor.
Selecting the container with rv mon enables all the nested monitors, if
-t is passed, the trace also includes the monitor name next to the
event:
# rv mon nested1 -t
<idle>-0 [004] event state1 x event -> state2
<idle>-0 [004] error event not expected in state2
# rv mon sched -t
<idle>-0 [004] event_nested1 state1 x event -> state2
<idle>-0 [004] error_nested1 event not expected in state2
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Juri Lelli <juri.lelli@redhat.com>
Link: https://lore.kernel.org/20250305140406.350227-7-gmonaco@redhat.com
Signed-off-by: Gabriele Monaco <gmonaco@redhat.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
|
|
Add 3 per-cpu monitors as part of the sched model:
* scpd: schedule called with preemption disabled
Monitor to ensure schedule is called with preemption disabled
* snep: schedule does not enable preempt
Monitor to ensure schedule does not enable preempt
* sncid: schedule not called with interrupt disabled
Monitor to ensure schedule is not called with interrupt disabled
To: Ingo Molnar <mingo@redhat.com>
To: Peter Zijlstra <peterz@infradead.org>
Cc: Juri Lelli <juri.lelli@redhat.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: John Kacur <jkacur@redhat.com>
Cc: Clark Williams <williams@redhat.com>
Link: https://lore.kernel.org/20250305140406.350227-6-gmonaco@redhat.com
Signed-off-by: Gabriele Monaco <gmonaco@redhat.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
|
|
Add a per-task monitor as part of the sched model:
* snroc: set non runnable on its own context
Monitor to ensure set_state happens only in the respective task's context
To: Ingo Molnar <mingo@redhat.com>
To: Peter Zijlstra <peterz@infradead.org>
Cc: Juri Lelli <juri.lelli@redhat.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: John Kacur <jkacur@redhat.com>
Cc: Clark Williams <williams@redhat.com>
Link: https://lore.kernel.org/20250305140406.350227-5-gmonaco@redhat.com
Signed-off-by: Gabriele Monaco <gmonaco@redhat.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
|
|
Add 2 per-cpu monitors as part of the sched model:
* sco: scheduling context operations
Monitor to ensure sched_set_state happens only in thread context
* tss: task switch while scheduling
Monitor to ensure sched_switch happens only in scheduling context
To: Ingo Molnar <mingo@redhat.com>
To: Peter Zijlstra <peterz@infradead.org>
Cc: Juri Lelli <juri.lelli@redhat.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: John Kacur <jkacur@redhat.com>
Cc: Clark Williams <williams@redhat.com>
Link: https://lore.kernel.org/20250305140406.350227-4-gmonaco@redhat.com
Signed-off-by: Gabriele Monaco <gmonaco@redhat.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
|
|
Monitors describing complex systems, such as the scheduler, can easily
grow to the point where they are just hard to understand because of the
many possible state transitions.
Often it is possible to break such descriptions into smaller monitors,
sharing some or all events. Enabling those smaller monitors concurrently
is, in fact, testing the system as if we had one single larger monitor.
Splitting models into multiple specification is not only easier to
understand, but gives some more clues when we see errors.
Add the possibility to create container monitors, whose only purpose is
to host other nested monitors. Enabling a container monitor enables all
nested ones, but it's still possible to enable nested monitors
independently.
Add the sched monitor as first container, for now empty.
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Juri Lelli <juri.lelli@redhat.com>
Link: https://lore.kernel.org/20250305140406.350227-3-gmonaco@redhat.com
Signed-off-by: Gabriele Monaco <gmonaco@redhat.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
|
|
Add the following tracepoints:
* sched_entry(bool preempt, ip)
Called while entering __schedule
* sched_exit(bool is_switch, ip)
Called while exiting __schedule
* sched_set_state(task, curr_state, state)
Called when a task changes its state (to and from running)
These tracepoints are useful to describe the Linux task model and are
adapted from the patches by Daniel Bristot de Oliveira
(https://bristot.me/linux-task-model/).
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Juri Lelli <juri.lelli@redhat.com>
Link: https://lore.kernel.org/20250305140406.350227-2-gmonaco@redhat.com
Signed-off-by: Gabriele Monaco <gmonaco@redhat.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
|
|
Michael Chan says:
====================
bnxt_en: Fix MAX_SKB_FRAGS > 30
The driver was written with the assumption that MAX_SKB_FRAGS could
not exceed what the NIC can support. About 2 years ago,
CONFIG_MAX_SKB_FRAGS was added. The value can exceed what the NIC
can support and it may cause TX timeout. These 2 patches will fix
the issue.
====================
Link: https://patch.msgid.link/20250321211639.3812992-1-michael.chan@broadcom.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
If skb_shinfo(skb)->nr_frags excceds what the chip can support,
linearize the SKB and warn once to let the user know.
net.core.max_skb_frags can be lowered, for example, to avoid the
issue.
Fixes: 3948b05950fd ("net: introduce a config option to tweak MAX_SKB_FRAGS")
Reviewed-by: Somnath Kotur <somnath.kotur@broadcom.com>
Reviewed-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com>
Reviewed-by: Andy Gospodarek <andrew.gospodarek@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20250321211639.3812992-3-michael.chan@broadcom.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
The bd_cnt field in the TX BD specifies the total number of BDs for
the TX packet. The bd_cnt field has 5 bits and the maximum number
supported is 32 with the value 0.
CONFIG_MAX_SKB_FRAGS can be modified and the total number of SKB
fragments can approach or exceed the maximum supported by the chip.
Add a macro to properly mask the bd_cnt field so that the value 32
will be properly masked and set to 0 in the bd_cnd field.
Without this patch, the out-of-range bd_cnt value will corrupt the
TX BD and may cause TX timeout.
The next patch will check for values exceeding 32.
Fixes: 3948b05950fd ("net: introduce a config option to tweak MAX_SKB_FRAGS")
Reviewed-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com>
Reviewed-by: Somnath Kotur <somnath.kotur@broadcom.com>
Reviewed-by: Andy Gospodarek <andrew.gospodarek@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20250321211639.3812992-2-michael.chan@broadcom.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
An update was made to up the module ref count when a synthetic event is
registered for both trace and perf events. But if perf is not configured
in, the perf enums used will cause the kernel to fail to build.
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Douglas Raillard <douglas.raillard@arm.com>
Link: https://lore.kernel.org/20250323152151.528b5ced@batman.local.home
Fixes: 21581dd4e7ff ("tracing: Ensure module defining synth event cannot be unloaded while tracing")
Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202503232230.TeREVy8R-lkp@intel.com/
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
|
|
Currently network taps unbound to any interface are linked in the
global ptype_all list, affecting the performance in all the network
namespaces.
Add per netns ptypes chains, so that in the mentioned case only
the netns owning the packet socket(s) is affected.
While at that drop the global ptype_all list: no in kernel user
registers a tap on "any" type without specifying either the target
device or the target namespace (and IMHO doing that would not make
any sense).
Note that this adds a conditional in the fast path (to check for
per netns ptype_specific list) and increases the dataset size by
a cacheline (owing the per netns lists).
Reviewed-by: Sabrina Dubroca <sd@queasysnail.net>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Reviewed-by: Eric Dumazet <edumaze@google.com>
Link: https://patch.msgid.link/ae405f98875ee87f8150c460ad162de7e466f8a7.1742494826.git.pabeni@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Remove debugfs_tx() which was added when the caif driver was added in
commit 9b27105b4a44 ("net-caif-driver: add CAIF serial driver (ldisc)")
but it has never been used.
Flagged by LLVM 19.1.7 W=1 builds.
Signed-off-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20250320-caif-debugfs-tx-v1-1-be5654770088@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
In linux-next
commit c760174401f6 ("perf cpumap: Reduce cpu size from int to int16_t")
causes the perf tests 100 126 to fail on s390:
Output before:
# ./perf test 100
100: perf trace BTF general tests : FAILED!
#
The root cause is the change from int to int16_t for the
cpu maps. The size of the CPU key value pair changes from
four bytes to two bytes. However a two byte key size is
not supported for bpf_map__update_elem().
Note: validate_map_op() in libbpf.c emits warning
libbpf: map '__augmented_syscalls__': \
unexpected key size 2 provided, expected 4
when key size is set to int16_t.
Therefore change to variable size back to 4 bytes for
invocation of bpf_map__update_elem().
Output after:
# ./perf test 100
100: perf trace BTF general tests : Ok
#
Fixes: c760174401f6 ("perf cpumap: Reduce cpu size from int to int16_t")
Signed-off-by: Thomas Richter <tmricht@linux.ibm.com>
Reviewed-by: Ian Rogers <irogers@google.com>
Acked-by: Howard Chu <howardchu95@gmail.com>
Cc: James Clark <james.clark@linaro.org>
Link: https://lore.kernel.org/r/20250324152756.3879571-1-tmricht@linux.ibm.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
|
|
of_gpio.h is deprecated. Since there is no of_gpio_x API, drop
unused of_gpio.h. While at here, drop gpio.h and gpio/consumer.h if
no user in driver.
Signed-off-by: Peng Fan <peng.fan@nxp.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://patch.msgid.link/20250320031542.3960381-1-peng.fan@oss.nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
The net fixed phy driver does not require the creation of a platform
device. Originally, this approach was chosen for simplicity when the
driver was first implemented.
With the introduction of the lightweight faux device interface, we now
have a more appropriate alternative. Migrate the device to utilize the
faux bus, given that the platform device it previously created was not
a real one anyway. This will get rid of the fake platform device.
Cc: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Sudeep Holla <sudeep.holla@arm.com>
Link: https://patch.msgid.link/20250319135209.2734594-1-sudeep.holla@arm.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Tariq Toukan says:
====================
mlx5 cleanups 2025-03-19
This series contains small cleanups to the mlx5 core and Eth drivers.
====================
Link: https://patch.msgid.link/1742412199-159596-1-git-send-email-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Always set PAGE_POOL_STATS in mlx5 Eth driver.
Cleanup the corresponding #ifdefs.
Page pool stats are essential to monitor and analyze RX performance.
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Gal Pressman <gal@nvidia.com>
Link: https://patch.msgid.link/1742412199-159596-4-git-send-email-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Use bitmap_free() to free memory allocated with bitmap_zalloc_node().
This fixes memtrack error:
mtl rsc inconsistency: memtrack_free: .../drivers/net/ethernet/mellanox/mlx5/core/en_main.c::466: kfree for unknown address=0xFFFF0000CA3619E8, device=0x0
Signed-off-by: Mark Zhang <markzhang@nvidia.com>
Reviewed-by: Maher Sanalla <msanalla@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com>
Link: https://patch.msgid.link/1742412199-159596-3-git-send-email-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Fix coccinelle warnings:
WARNING: NULL check before dev_{put, hold} functions is not needed.
Signed-off-by: Gal Pressman <gal@nvidia.com>
Reviewed-by: Jianbo Liu <jianbol@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com>
Link: https://patch.msgid.link/1742412199-159596-2-git-send-email-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
From what I can tell the .get_fixed_state pointer in the phylink structure
hasn't been used since commit 5c05c1dbb177 ("net: phylink, dsa: eliminate
phylink_fixed_state_cb()") . Since I can't find any users for it we might
as well just drop the pointer.
Signed-off-by: Alexander Duyck <alexanderduyck@fb.com>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Link: https://patch.msgid.link/174240634772.1745174.5690351737682751849.stgit@ahduyck-xeon-server.home.arpa
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
The assignment of zero to udph->check is unnecessary as it is
immediately overwritten in the subsequent line. Remove the redundant
assignment.
Signed-off-by: Breno Leitao <leitao@debian.org>
Reviewed-by: Joe Damato <jdamato@fastly.com>
Link: https://patch.msgid.link/20250319-netpoll_nit-v1-1-a7faac5cbd92@debian.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs
Pull tasklist_lock optimizations from Christian Brauner:
"According to the performance testbots this brings a 23% performance
increase when creating new processes:
- Reduce tasklist_lock hold time on exit:
- Perform add_device_randomness() without tasklist_lock
- Perform free_pid() calls outside of tasklist_lock
- Drop irq disablement around pidmap_lock
- Add some tasklist_lock asserts
- Call flush_sigqueue() lockless by changing release_task()
- Don't pointlessly clear TIF_SIGPENDING in __exit_signal() ->
clear_tsk_thread_flag()"
* tag 'kernel-6.15-rc1.tasklist_lock' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
pid: drop irq disablement around pidmap_lock
pid: perform free_pid() calls outside of tasklist_lock
pid: sprinkle tasklist_lock asserts
exit: hoist get_pid() in release_task() outside of tasklist_lock
exit: perform add_device_randomness() without tasklist_lock
exit: kill the pointless __exit_signal()->clear_tsk_thread_flag(TIF_SIGPENDING)
exit: change the release_task() paths to call flush_sigqueue() lockless
|
|
There commands can be used to add an RSS context and steer some traffic
into it:
# ethtool -X eth0 context new
New RSS context is 1
# ethtool -N eth0 flow-type ip4 dst-ip 1.1.1.1 context 1
Added rule with ID 1023
However, the second command fails with EINVAL on mlx5e:
# ethtool -N eth0 flow-type ip4 dst-ip 1.1.1.1 context 1
rmgr: Cannot insert RX class rule: Invalid argument
Cannot insert classification rule
It happens when flow_get_tirn calls flow_type_to_traffic_type with
flow_type = IP_USER_FLOW or IPV6_USER_FLOW. That function only handles
IPV4_FLOW and IPV6_FLOW cases, but unlike all other cases which are
common for hash and spec, IPv4 and IPv6 defines different contants for
hash and for spec:
#define TCP_V4_FLOW 0x01 /* hash or spec (tcp_ip4_spec) */
#define UDP_V4_FLOW 0x02 /* hash or spec (udp_ip4_spec) */
...
#define IPV4_USER_FLOW 0x0d /* spec only (usr_ip4_spec) */
#define IP_USER_FLOW IPV4_USER_FLOW
#define IPV6_USER_FLOW 0x0e /* spec only (usr_ip6_spec; nfc only) */
#define IPV4_FLOW 0x10 /* hash only */
#define IPV6_FLOW 0x11 /* hash only */
Extend the switch in flow_type_to_traffic_type to support both, which
fixes the failing ethtool -N command with flow-type ip4 or ip6.
Fixes: 248d3b4c9a39 ("net/mlx5e: Support flow classification into RSS contexts")
Signed-off-by: Maxim Mikityanskiy <maxim@isovalent.com>
Tested-by: Daniel Borkmann <daniel@iogearbox.net>
Reviewed-by: Joe Damato <jdamato@fastly.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://patch.msgid.link/20250319124508.3979818-1-maxim@isovalent.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Use the "struct" keyword in kernel-doc when describing struct
ipefs_file. Add kernel-doc for the struct members also.
Don't use kernel-doc notation for 'policy_subdir'. kernel-doc does
not support documentation comments for data definitions.
This eliminates multiple kernel-doc warnings:
security/ipe/policy_fs.c:21: warning: cannot understand function prototype: 'struct ipefs_file '
security/ipe/policy_fs.c:407: warning: cannot understand function prototype: 'const struct ipefs_file policy_subdir[] = '
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Cc: Fan Wu <wufan@kernel.org>
Cc: Paul Moore <paul@paul-moore.com>
Cc: James Morris <jmorris@namei.org>
Cc: Serge E. Hallyn <serge@hallyn.com>
Cc: linux-security-module@vger.kernel.org
Signed-off-by: Fan Wu <wufan@kernel.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs
Pull vfs rust updates from Christian Brauner:
"This contains minor fixes and improvements to rust file bindings:
- Optimize rust symbol generation for FileDescriptorReservation
- Optimize rust symbol generation for SeqFile"
* tag 'vfs-6.15-rc1.rust' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
rust: optimize rust symbol generation for SeqFile
rust: file: optimize rust symbol generation for FileDescriptorReservation
|
|
Some dwmac variants such as dwmac_socfpga don't use xpcs but lynx_pcs.
Don't call xpcs_config_eee_mult_fact() in this case, as this causes a
crash at init :
Unable to handle kernel NULL pointer dereference at virtual address 00000039 when write
[...]
Call trace:
xpcs_config_eee_mult_fact from stmmac_pcs_setup+0x40/0x10c
stmmac_pcs_setup from stmmac_dvr_probe+0xc0c/0x1244
stmmac_dvr_probe from socfpga_dwmac_probe+0x130/0x1bc
socfpga_dwmac_probe from platform_probe+0x5c/0xb0
Fixes: 060fb27060e8 ("net: stmmac: call xpcs_config_eee_mult_fact()")
Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Link: https://patch.msgid.link/20250321103502.1303539-1-maxime.chevallier@bootlin.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs
Pull vfs file handling updates from Christian Brauner:
"This contains performance improvements for struct file's new refcount
mechanism and various other performance work:
- The stock kernel transitioning the file to no refs held penalizes
the caller with an extra atomic to block any increments. For cases
where the file is highly likely to be going away this is easily
avoidable.
Add file_ref_put_close() to better handle the common case where
closing a file descriptor also operates on the last reference and
build fput_close_sync() and fput_close() on top of it. This brings
about 1% performance improvement by eliding one atomic in the
common case.
- Predict no error in close() since the vast majority of the time
system call returns 0.
- Reduce the work done in fdget_pos() by predicting that the file was
found and by explicitly comparing the reference count to one and
ignoring the dead zone"
* tag 'vfs-6.15-rc1.file' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
fs: reduce work in fdget_pos()
fs: use fput_close() in path_openat()
fs: use fput_close() in filp_close()
fs: use fput_close_sync() in close()
file: add fput and file_ref_put routines optimized for use when closing a fd
fs: predict no error in close()
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs
Pull vfs orangefs updates from Christian Brauner:
"This contains the work to remove orangefs_writepage() and partially
convert it to folios.
A few regular bugfixes are included as well"
* tag 'vfs-6.15-rc1.orangefs' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
orangefs: Convert orangefs_writepages to contain an array of folios
orangefs: Simplify bvec setup in orangefs_writepages_work()
orangefs: Unify error & success paths in orangefs_writepages_work()
orangefs: Pass mapping to orangefs_writepages_work()
orangefs: Convert orangefs_writepage_locked() to take a folio
orangefs: Remove orangefs_writepage()
orangefs: make open_for_read and open_for_write boolean
orangefs: Move s_kmod_keyword_mask_map to orangefs-debugfs.c
orangefs: Do not truncate file size
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs
Pull vfs afs updates from Christian Brauner:
"This contains the work for afs for this cycle:
- Fix an occasional hang that's only really encountered when
rmmod'ing the kafs module
- Remove the "-o autocell" mount option. This is obsolete with the
dynamic root and removing it makes the next patch slightly easier
- Change how the dynamic root mount is constructed. Currently, the
root directory is (de)populated when it is (un)mounted if there are
cells already configured and, further, pairs of automount points
have to be created/removed each time a cell is added/deleted
This is changed so that readdir on the root dir lists all the known
cell automount pairs plus the @cell symlinks and the inodes and
dentries are constructed by lookup on demand. This simplifies the
cell management code
- A few improvements to the afs_volume and afs_server tracepoints
- Pass trace info into the afs_lookup_cell() function to allow the
trace log to indicate the purpose of the lookup
- Remove the 'net' parameter from afs_unuse_cell() as it's
superfluous
- In rxrpc, allow a kernel app (such as kafs) to store a word of
information on rxrpc_peer records
- Use the information stored on the rxrpc_peer record to point to the
afs_server record. This allows the server address lookup to be done
away with
- Simplify the afs_server ref/activity accounting to make each one
self-contained and not garbage collected from the cell management
work item
- Simplify the afs_cell ref/activity accounting to make each one of
these also self-contained and not driven by a central management
work item
The current code was intended to make it such that a single timer
for the namespace and one work item per cell could do all the work
required to maintain these records. This, however, made for some
sequencing problems when cleaning up these records. Further, the
attempt to pass refs along with timers and work items made getting
it right rather tricky when the timer or work item already had a
ref attached and now a ref had to be got rid of"
* tag 'vfs-6.15-rc1.afs' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
afs: Simplify cell record handling
afs: Fix afs_server ref accounting
afs: Use the per-peer app data provided by rxrpc
rxrpc: Allow the app to store private data on peer structs
afs: Drop the net parameter from afs_unuse_cell()
afs: Make afs_lookup_cell() take a trace note
afs: Improve server refcount/active count tracing
afs: Improve afs_volume tracing to display a debug ID
afs: Change dynroot to create contents on demand
afs: Remove the "autocell" mount option
|
|
Common YAML schema for devices that exports internal peripherals through
PCI BARs. The BARs are exposed as simple-buses through which the
peripherals can be accessed.
This is not intended to be used as a standalone binding, but should be
included by device specific bindings.
Signed-off-by: Andrea della Porta <andrea.porta@suse.com>
Reviewed-by: Rob Herring (Arm) <robh@kernel.org>
Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com>
Signed-off-by: Krzysztof Wilczyński <kwilczynski@kernel.org>
[bhelgaas: fix typo]
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Link: https://lore.kernel.org/r/096ab7addb39e498e28ac2526c07157cc9327c42.1742418429.git.andrea.porta@suse.com
|
|
Remove intel_pcie_cpu_addr(), the .cpu_addr_fixup() method, because the dwc
core driver already handles address translation based on the devicetree
description.
[bhelgaas: this does require a minor dts change, but maintainer
Lei Chuan Hua <lchuanhua@maxlinear.com> confirms that the driver is only
used internally to Maxlinear and internal users will update dts:
https://lore.kernel.org/r/BY3PR19MB507667CE7531D863E1E5F8AEBDD82@BY3PR19MB5076.namprd19.prod.outlook.com]
Link: https://lore.kernel.org/r/20250305-intel-v1-1-40db3a685490@nxp.com
Signed-off-by: Frank Li <Frank.Li@nxp.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
|
|
Remove imx_pcie_cpu_addr_fixup, the .cpu_addr_fixup() method, because the
dwc core driver already handles address translation based on the devicetree
description.
Link: https://lore.kernel.org/r/20250315201548.858189-14-helgaas@kernel.org
Signed-off-by: Frank Li <Frank.Li@nxp.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
Acked-by: Richard Zhu <hongxing.zhu@nxp.com>
|
|
We know the parent_bus_offset, either computed from a DT reg property (the
offset is the CPU physical addr - the 'config'/'addr_space' address on the
parent bus) or from a .cpu_addr_fixup() (which may have used a host bridge
window offset).
Apply that parent_bus_offset instead of calling .cpu_addr_fixup() when
programming the ATU.
This assumes all intermediate addresses are at the same offset from the CPU
physical addresses.
[bhelgaas: commit log]
Link: https://lore.kernel.org/r/20250315201548.858189-13-helgaas@kernel.org
Signed-off-by: Frank Li <Frank.Li@nxp.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
|
|
Most systems' PCIe outbound map windows have non-zero physical addresses,
but the possibility of encountering zero increased after following commit
("PCI: dwc: Use parent_bus_offset").
'ep->outbound_addr[n]', representing 'parent_bus_address', might be 0 on
some hardware, which trims high address bits through bus fabric before
sending to the PCIe controller.
Replace the iteration logic with 'for_each_set_bit()' to ensure only
allocated map windows are iterated when determining the ATU index from a
given address.
Link: https://lore.kernel.org/r/20250315201548.858189-12-helgaas@kernel.org
Signed-off-by: Frank Li <Frank.Li@nxp.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
|
|
Endpoint
┌───────────────────────────────────────────────┐
│ pcie-ep@5f010000 │
│ ┌────────────────┐│
│ │ Endpoint ││
│ │ PCIe ││
│ │ Controller ││
│ bus@5f000000 │ ┌────────►
│ ┌──────────┐ │ │ ││dynamically
│ │ │ Outbound Transfer │ ││allocated
│┌─────┐ │ Bus ┼─────►│ ATU ───────┘ ││PCI Addr
││ │ │ Fabric │Bus │ ││
││ CPU ├───►│ │Addr │ ││
││ │CPU │ │0x8000_0000 ││
│└─────┘Addr└──────────┘ │ ││
│ 0x7000_0000 └────────────────┘│
└───────────────────────────────────────────────┘
bus@5f000000 {
compatible = "simple-bus";
ranges = <0x80000000 0x0 0x70000000 0x10000000>;
pcie-ep@5f010000 {
reg = <0x80000000 0x10000000>;
reg-names ="addr_space";
...
};
...
};
In the diagram above, CPU writes data to outbound window address
0x7000_0000, and the bus fabric maps it to 0x8000_0000. The ATU uses
bus address 0x8000_0000 as input address and maps to some PCI address
dynamically allocated by a PCI device driver on the host side.
The pcie-ep@5f010000 'reg[addr_space]' is the parent bus address, which is
the input of PCIe controller, including the ATU.
Set parent_bus_offset, the offset from the CPU address to the PCIe
controller input address using dw_pcie_init_parent_bus_offset(). The
parent_bus_offset is not used yet, so no functional change intended.
[bhelgaas: commit log]
Link: https://lore.kernel.org/r/20250315201548.858189-11-helgaas@kernel.org
Signed-off-by: Frank Li <Frank.Li@nxp.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
|
|
Consolidate devicetree resource handling in dw_pcie_ep_get_resources().
No functional change intended.
Link: https://lore.kernel.org/r/20250315201548.858189-10-helgaas@kernel.org
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Frank Li <Frank.Li@nxp.com>
|