Age | Commit message (Collapse) | Author |
|
It can be easy to miss that the notifier mechanism invokes the callbacks
in an atomic context, so add some comments to that effect on the two
handlers we register here.
Signed-off-by: Benjamin Gray <bgray@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20230829063457.54157-4-bgray@linux.ibm.com
|
|
This is called in an atomic context, so is not allowed to sleep if a
user page needs to be faulted in and has nowhere it can be deferred to.
The pagefault_disabled() function is documented as preventing user
access methods from sleeping.
In practice the page will be mapped in nearly always because we are
reading the instruction that just triggered the watchpoint trap.
Signed-off-by: Benjamin Gray <bgray@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20230829063457.54157-3-bgray@linux.ibm.com
|
|
thread_change_pc() uses CPU local data, so must be protected from
swapping CPUs while it is reading the breakpoint struct.
The error is more noticeable after 1e60f3564bad ("powerpc/watchpoints:
Track perf single step directly on the breakpoint"), which added an
unconditional __this_cpu_read() call in thread_change_pc(). However the
existing __this_cpu_read() that runs if a breakpoint does need to be
re-inserted has the same issue.
Signed-off-by: Benjamin Gray <bgray@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20230829063457.54157-2-bgray@linux.ibm.com
|
|
Valid domain value is in range 1 to HV_PERF_DOMAIN_MAX. Current code has
check for domain value greater than or equal to HV_PERF_DOMAIN_MAX. But
the check for domain value 0 is missing.
Fix this issue by adding check for domain value 0.
Before:
# ./perf stat -v -e hv_24x7/CPM_ADJUNCT_INST,domain=0,core=1/ sleep 1
Using CPUID 00800200
Control descriptor is not initialized
Error:
The sys_perf_event_open() syscall returned with 5 (Input/output error) for
event (hv_24x7/CPM_ADJUNCT_INST,domain=0,core=1/).
/bin/dmesg | grep -i perf may provide additional information.
Result from dmesg:
[ 37.819387] hv-24x7: hcall failed: [0 0x60040000 0x100 0] => ret
0xfffffffffffffffc (-4) detail=0x2000000 failing ix=0
After:
# ./perf stat -v -e hv_24x7/CPM_ADJUNCT_INST,domain=0,core=1/ sleep 1
Using CPUID 00800200
Control descriptor is not initialized
Warning:
hv_24x7/CPM_ADJUNCT_INST,domain=0,core=1/ event is not supported by the kernel.
failed to read counter hv_24x7/CPM_ADJUNCT_INST,domain=0,core=1/
Fixes: ebd4a5a3ebd9 ("powerpc/perf/hv-24x7: Minor improvements")
Reported-by: Krishan Gopal Sarawast <krishang@linux.vnet.ibm.com>
Signed-off-by: Kajol Jain <kjain@linux.ibm.com>
Tested-by: Disha Goel <disgoel@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20230825055601.360083-1-kjain@linux.ibm.com
|
|
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Ingo Molnar:
"Misc fixes:
- Fix an UV boot crash
- Skip spurious ENDBR generation on _THIS_IP_
- Fix ENDBR use in putuser() asm methods
- Fix corner case boot crashes on 5-level paging
- and fix a false positive WARNING on LTO kernels"
* tag 'x86-urgent-2023-09-17' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/purgatory: Remove LTO flags
x86/boot/compressed: Reserve more memory for page tables
x86/ibt: Avoid duplicate ENDBR in __put_user_nocheck*()
x86/ibt: Suppress spurious ENDBR
x86/platform/uv: Use alternate source for socket to node data
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull scheduler fixes from Ingo Molnar:
"Fix a performance regression on large SMT systems, an Intel SMT4
balancing bug, and a topology setup bug on (Intel) hybrid processors"
* tag 'sched-urgent-2023-09-17' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/sched: Restore the SD_ASYM_PACKING flag in the DIE domain
sched/fair: Fix SMT4 group_smt_balance handling
sched/fair: Optimize should_we_balance() for large SMT systems
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull objtool fix from Ingo Molnar:
"Fix a cold functions related false-positive objtool warning that
triggers on Clang"
* tag 'objtool-urgent-2023-09-17' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
objtool: Fix _THIS_IP_ detection for cold functions
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull WARN fix from Ingo Molnar:
"Fix a missing preempt-enable in the WARN() slowpath"
* tag 'core-urgent-2023-09-17' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
panic: Reenable preemption in WARN slowpath
|
|
If, for any reason, `tx_stats_num + rx_stats_num` wraps around, the
protection that struct_size() adds against potential integer overflows
is defeated. Fix this by hardening call to struct_size() with size_add().
Fixes: 691f4077d560 ("gve: Replace zero-length array with flexible-array member")
Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>
Reviewed-by: Kees Cook <keescook@chromium.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The choose_32_64() macros were added to deal with an odd inconsistency
between the 32-bit and 64-bit layout of 'struct stat' way back when in
commit a52dd971f947 ("vfs: de-crapify "cp_new_stat()" function").
Then a decade later Mikulas noticed that said inconsistency had been a
mistake in the early x86-64 port, and shouldn't have existed in the
first place. So commit 932aba1e1690 ("stat: fix inconsistency between
struct stat and struct compat_stat") removed the uses of the helpers.
But the helpers remained around, unused.
Get rid of them.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Pull smb client fixes from Steve French:
"Three small SMB3 client fixes, one to improve a null check and two
minor cleanups"
* tag '6.6-rc1-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6:
smb3: fix some minor typos and repeated words
smb3: correct places where ENOTSUPP is used instead of preferred EOPNOTSUPP
smb3: move server check earlier when setting channel sequence number
|
|
Pull smb server fixes from Steve French:
"Two ksmbd server fixes"
* tag '6.6-rc1-ksmbd' of git://git.samba.org/ksmbd:
ksmbd: fix passing freed memory 'aux_payload_buf'
ksmbd: remove unneeded mark_inode_dirty in set_info_sec()
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4
Pull ext4 fixes from Ted Ts'o:
"Regression and bug fixes for ext4"
* tag 'ext4_for_linus-6.6-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4:
ext4: fix rec_len verify error
ext4: do not let fstrim block system suspend
ext4: move setting of trimmed bit into ext4_try_to_trim_range()
jbd2: Fix memory leak in journal_init_common()
jbd2: Remove page size assumptions
buffer: Make bh_offset() work for compound pages
|
|
queue
Tony Nguyen says:
====================
This series contains updates to iavf and i40e drivers.
Radoslaw prevents admin queue operations being added when the driver is
being removed for iavf.
Petr Oros immediately starts reconfiguration on changes to VLANs on
iavf.
Ivan Vecera moves reset of VF to occur after port VLAN values are set
on i40e.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Nothing prevents iscsi_sw_tcp_conn_bind() to receive file descriptor
pointing to non TCP socket (af_unix for example).
Return -EINVAL if this is attempted, instead of crashing the kernel.
Fixes: 7ba247138907 ("[SCSI] open-iscsi/linux-iscsi-5 Initiator: Initiator code")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Lee Duncan <lduncan@suse.com>
Cc: Chris Leech <cleech@redhat.com>
Cc: Mike Christie <michael.christie@oracle.com>
Cc: "James E.J. Bottomley" <jejb@linux.ibm.com>
Cc: "Martin K. Petersen" <martin.petersen@oracle.com>
Cc: open-iscsi@googlegroups.com
Cc: linux-scsi@vger.kernel.org
Reviewed-by: Mike Christie <michael.christie@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Stefano Garzarella says:
====================
vsock/test: add recv_buf()/send_buf() utility functions and some improvements
We recently found that some tests were failing [1].
The problem was that we were not waiting for all the bytes correctly,
so we had a partial read. I had initially suggested using MSG_WAITALL,
but this could have timeout problems.
Since we already had send_byte() and recv_byte() that handled the timeout,
but also the expected return value, I moved that code to two new functions
that we can now use to send/receive generic buffers.
The last commit is just an improvement to a test I found difficult to
understand while using the new functions.
@Arseniy a review and some testing are really appreciated :-)
[1] https://lore.kernel.org/netdev/63xflnwiohdfo6m3vnrrxgv2ulplencpwug5qqacugqh7xxpu3@tsczkuqgwurb/
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The test was a bit complicated to read.
Added variables to keep track of the bytes read and to be read
in each step. Also some comments.
The test is unchanged.
Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
Reviewed-by: Arseniy Krasnov <avkrasnov@salutedevices.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
We have a very common pattern used in vsock_test that we can
now replace with the new send_buf().
This allows us to reuse the code we already had to check the
actual return value and wait for all the bytes to be sent with
an appropriate timeout.
Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
Reviewed-by: Arseniy Krasnov <avkrasnov@salutedevices.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Move the code of send_byte() out in a new utility function that
can be used to send a generic buffer.
This new function can be used when we need to send a custom
buffer and not just a single 'A' byte.
Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
Reviewed-by: Arseniy Krasnov <avkrasnov@salutedevices.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
We have a very common pattern used in vsock_test that we can
now replace with the new recv_buf().
This allows us to reuse the code we already had to check the
actual return value and wait for all bytes to be received with
an appropriate timeout.
Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
Reviewed-by: Arseniy Krasnov <avkrasnov@salutedevices.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Move the code of recv_byte() out in a new utility function that
can be used to receive a generic buffer.
This new function can be used when we need to receive a custom
buffer and not just a single 'A' byte.
Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
Reviewed-by: Arseniy Krasnov <avkrasnov@salutedevices.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Currently, we assume the skb is associated with a device before calling
__ip_options_compile, which is not always the case if it is re-routed by
ipvs.
When skb->dev is NULL, dev_net(skb->dev) will become null-dereference.
This patch adds a check for the edge case and switch to use the net_device
from the rtable when skb->dev is NULL.
Fixes: ed0de45a1008 ("ipv4: recompile ip options in ipv4_link_failure")
Suggested-by: David Ahern <dsahern@kernel.org>
Signed-off-by: Kyle Zeng <zengyhkyle@gmail.com>
Cc: Stephen Suryaputra <ssuryaextr@gmail.com>
Cc: Vadim Fedorenko <vfedorenko@novek.ru>
Reviewed-by: David Ahern <dsahern@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Alexei Starovoitov says:
====================
The following pull-request contains BPF updates for your *net-next* tree.
We've added 73 non-merge commits during the last 9 day(s) which contain
a total of 79 files changed, 5275 insertions(+), 600 deletions(-).
The main changes are:
1) Basic BTF validation in libbpf, from Andrii Nakryiko.
2) bpf_assert(), bpf_throw(), exceptions in bpf progs, from Kumar Kartikeya Dwivedi.
3) next_thread cleanups, from Oleg Nesterov.
4) Add mcpu=v4 support to arm32, from Puranjay Mohan.
5) Add support for __percpu pointers in bpf progs, from Yonghong Song.
6) Fix bpf tailcall interaction with bpf trampoline, from Leon Hwang.
7) Raise irq_work in bpf_mem_alloc while irqs are disabled to improve refill probabablity, from Hou Tao.
Please consider pulling these changes from:
git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next.git
Thanks a lot!
Also thanks to reporters, reviewers and testers of commits in this pull-request:
Alan Maguire, Andrey Konovalov, Dave Marchevsky, "Eric W. Biederman",
Jiri Olsa, Maciej Fijalkowski, Quentin Monnet, Russell King (Oracle),
Song Liu, Stanislav Fomichev, Yonghong Song
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Russell King says:
====================
net: phy: avoid race when erroring stopping PHY
This series addresses a problem reported by Jijie Shao where the PHY
state machine can race with phy_stop() leading to an incorrect state.
The issue centres around phy_state_machine() dropping the phydev->lock
mutex briefly, which allows phy_stop() to get in half-way through the
state machine, and when the state machine resumes, it overwrites
phydev->state with a value incompatible with a stopped PHY. This causes
a subsequent phy_start() to issue a warning.
We address this firstly by using versions of functions that do not take
tne lock, moving them into the locked region. The only function that
this can't be done with is phy_suspend() which needs to call into the
driver without taking the lock.
For phy_suspend(), we split the state machine into two parts - the
initial part which runs under the phydev->lock, and the second part
which runs without the lock.
We finish off by using the split state machine in phy_stop() which
removes another unnecessary unlock-lock sequence from phylib.
Changes from RFC:
- Added Jijie Shao's tested-by
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Convert phy_stop() to use the new locked-section and unlocked-section
parts of the PHY state machine.
Tested-by: Jijie Shao <shaojijie@huawei.com>
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Split out the locked and unlocked sections of phy_state_machine() into
two separate functions which can be called inside the phydev lock and
outside the phydev lock as appropriate, thus allowing us to combine
the locked regions in the caller of phy_state_machine() with the
locked region inside phy_state_machine().
This avoids unnecessarily dropping the phydev lock which may allow
races to occur.
Tested-by: Jijie Shao <shaojijie@huawei.com>
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Move phy_state_machine() before phy_stop() to avoid subsequent patches
introducing forward references.
Tested-by: Jijie Shao <shaojijie@huawei.com>
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Move the call to phy_suspend() to the end of phy_state_machine() after
we release the lock so that we can combine the locked areas.
phy_suspend() can not be called while holding phydev->lock as it has
caused deadlocks in the past.
Tested-by: Jijie Shao <shaojijie@huawei.com>
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Move the call to start auto-negotiation inside the lock in the PHYLIB
state machine, calling the locked variant _phy_start_aneg(). This
avoids unnecessarily releasing and re-acquiring the lock.
Tested-by: Jijie Shao <shaojijie@huawei.com>
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Move the locking out of phy_error_precise() and to its only call site,
merging with the locked region that has already been taken.
Tested-by: Jijie Shao <shaojijie@huawei.com>
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
phy_stop() calls phy_process_state_change() while holding the phydev
lock, so also arrange for phy_state_machine() to do the same, so that
this function is called with consistent locking.
Tested-by: Jijie Shao <shaojijie@huawei.com>
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
This patch adds partial Access Control List (ACL) support for the
ksz9477 family of switches. ACLs enable filtering of incoming layer 2
MAC, layer 3 IP, and layer 4 TCP/UDP packets on each port. They provide
additional capabilities for filtering routed network protocols and can
take precedence over other forwarding functions.
ACLs can filter ingress traffic based on header fields such as
source/destination MAC address, EtherType, IPv4 address, IPv4 protocol,
UDP/TCP ports, and TCP flags. The ACL is an ordered list of up to 16
access control rules programmed into the ACL Table. Each entry specifies
a set of matching conditions and action rules for controlling packet
forwarding and priority.
The ACL also implements a count function, generating an interrupt
instead of a forwarding action. It can be used as a watchdog timer or an
event counter. The ACL consists of three parts: matching rules, action
rules, and processing entries. Multiple match conditions can be either
AND'ed or OR'ed together.
This patch introduces support for a subset of the available ACL
functionality, specifically layer 2 matching and prioritization of
matched packets. For example:
tc qdisc add dev lan2 clsact
tc filter add dev lan2 ingress protocol 0x88f7 flower action skbedit prio 7
tc qdisc add dev lan1 clsact
tc filter add dev lan1 ingress protocol 0x88f7 flower action skbedit prio 7
The hardware offloading implementation was benchmarked against a
configuration without hardware offloading. This latter setup relied on a
software-based Linux bridge. No noticeable differences were observed
between the two configurations. Here is an example of software-based
test:
ip l s dev enu1u1 up
ip l s dev enu1u2 up
ip l s dev enu1u4 up
ethtool -A enu1u1 autoneg off rx off tx off
ethtool -A enu1u2 autoneg off rx off tx off
ethtool -A enu1u4 autoneg off rx off tx off
ip l a name br0 type bridge
ip l s dev br0 up
ip l s enu1u1 master br0
ip l s enu1u2 master br0
ip l s enu1u4 master br0
tc qdisc add dev enu1u1 root handle 1: ets strict 4 priomap 3 3 2 2 1 1 0 0
tc qdisc add dev enu1u4 root handle 1: ets strict 4 priomap 3 3 2 2 1 1 0 0
tc qdisc add dev enu1u2 root handle 1: ets strict 4 priomap 3 3 2 2 1 1 0 0
tc qdisc add dev enu1u1 clsact
tc filter add dev enu1u1 ingress protocol ipv4 flower action skbedit prio 7
tc qdisc add dev enu1u4 clsact
tc filter add dev enu1u4 ingress protocol ipv4 flower action skbedit prio 0
On a system attached to the port enu1u2 I run two iperf3 server
instances:
iperf3 -s -p 5210 &
iperf3 -s -p 5211 &
On systems attached to enu1u4 and enu1u1 I run:
iperf3 -u -c 172.17.0.1 -p 5210 -b100M -l1472 -t100
and
iperf3 -u -c 172.17.0.1 -p 5211 -b100M -l1472 -t100
As a result, IP traffic on port enu1u1 will be prioritized and take
precedence over IP traffic on port enu1u4
Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de>
Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Right now, the *_port_setup code is in dsa_switch_ops::port_enable(),
which is not the best place for it. This patch moves it to a more
suitable place, dsa_switch_ops::port_setup(), to match the function's
purpose and name.
This patch is a preparation for coming ACL support patch.
Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de>
Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Jiri Pirko says:
====================
expose devlink instances relationships
From: Jiri Pirko <jiri@nvidia.com>
Currently, the user can instantiate new SF using "devlink port add"
command. That creates an E-switch representor devlink port.
When user activates this SF, there is an auxiliary device created and
probed for it which leads to SF devlink instance creation.
There is 1:1 relationship between E-switch representor devlink port and
the SF auxiliary device devlink instance.
Also, for example in mlx5, one devlink instance is created for
PCI device and one is created for an auxiliary device that represents
the uplink port. The relation between these is invisible to the user.
Patches #1-#3 and #5 are small preparations.
Patch #4 adds netnsid attribute for nested devlink if that in a
different namespace.
Patch #5 is the main one in this set, introduces the relationship
tracking infrastructure later on used to track SFs, linecards and
devlink instance relationships with nested devlink instances.
Expose the relation to the user by introducing new netlink attribute
DEVLINK_PORT_FN_ATTR_DEVLINK which contains the devlink instance related
to devlink port function. This is done by patch #8.
Patch #9 implements this in mlx5 driver.
Patch #10 converts the linecard nested devlink handling to the newly
introduced rel infrastructure.
Patch #11 benefits from the rel infra and introduces possiblitily to
have relation between devlink instances.
Patch #12 implements this in mlx5 driver.
Examples:
$ devlink dev
pci/0000:08:00.0: nested_devlink auxiliary/mlx5_core.eth.0
pci/0000:08:00.1: nested_devlink auxiliary/mlx5_core.eth.1
auxiliary/mlx5_core.eth.1
auxiliary/mlx5_core.eth.0
$ devlink port add pci/0000:08:00.0 flavour pcisf pfnum 0 sfnum 106
pci/0000:08:00.0/32768: type eth netdev eth4 flavour pcisf controller 0 pfnum 0 sfnum 106 splittable false
function:
hw_addr 00:00:00:00:00:00 state inactive opstate detached roce enable
$ devlink port function set pci/0000:08:00.0/32768 state active
$ devlink port show pci/0000:08:00.0/32768
pci/0000:08:00.0/32768: type eth netdev eth4 flavour pcisf controller 0 pfnum 0 sfnum 106 splittable false
function:
hw_addr 00:00:00:00:00:00 state active opstate attached roce enable nested_devlink auxiliary/mlx5_core.sf.2
$ devlink port show pci/0000:08:00.0/32768
pci/0000:08:00.0/32768: type eth netdev eth4 flavour pcisf controller 0 pfnum 0 sfnum 106 splittable false
function:
hw_addr 00:00:00:00:00:00 state active opstate attached roce enable nested_devlink auxiliary/mlx5_core.sf.2 nested_devlink_netns ns1
====================
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Benefit from the previous commit introducing exposure of devlink
instances relationship and set the nested instance for en auxiliary
device.
Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
In mlx5, there is a devlink instance created for PCI device. Also, one
separate devlink instance is created for auxiliary device that
represents the netdev of uplink port. This relation is currently
invisible to the devlink user.
Benefit from the rel infrastructure and allow for nested devlink
instance to set the relationship for the nested-in devlink instance.
Note that there may be many nested instances, therefore use xarray to
hold the list of rel_indexes for individual nested instances.
Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Benefit from the newly introduced rel infrastructure, treat the linecard
nested devlink instances in the same way as port function instances.
Convert the code to use the rel infrastructure.
Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Benefit from the existence of internal mlx5 notifier and extend it by
event MLX5_DRIVER_EVENT_SF_PEER_DEVLINK. Use this event from SF
auxiliary device probe/remove functions to pass the registered SF
devlink instance to the SF representor.
Process the new event in SF representor code and call
devl_port_fn_devlink_set() to do the assignments. Implement this in work
to avoid possible deadlock when probe/remove function of SF may be
called with devlink instance lock held during devlink reload.
Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Introduce a new helper devl_port_fn_devlink_set() to be used by driver
assigning a devlink instance to the peer devlink port function.
Expose this to user over new netlink attribute nested under port
function nest to expose devlink handle related to the port function.
This is particularly helpful for user to understand the relationship
between devlink instances created for SFs and the port functions
they belong to.
Note that caller of devlink_port_notify() needs to hold devlink
instance lock, put the assertion to devl_port_fn_devlink_set() to make
this requirement explicit. Also note the limitations that only allow to
make this assignment for registered objects.
Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
It is a bit tricky to maintain relationship between devlink objects and
nested devlink instances due to following aspects:
1) Locking. It is necessary to lock the devlink instance that contains
the object first, only after that to lock the nested instance.
2) Lifetimes. Objects (e.g devlink port) may be removed before
the nested devlink instance.
3) Notifications. If nested instance changes (e.g. gets
registered/unregistered) the nested-in object needs to send
appropriate notifications.
Resolve this by introducing an xarray that holds 1:1 relationships
between devlink object and related nested devlink instance.
Use that xarray index to get the object/nested devlink instance on
the other side.
Provide necessary helpers:
devlink_rel_nested_in_add/clear() to add and clear the relationship.
devlink_rel_nested_in_notify() to call the nested-in object to send
notifications during nested instance register/unregister/netns
change.
devlink_rel_devlink_handle_put() to be used by nested-in object fill
function to fill the nested handle.
Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
As the next patch is going to call this helper with need to fill another
type of nested attribute, pass it over function arg.
Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
As the next patch is going to call this helper out of the linecard.c,
move to netlink.c.
Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
If netns of devlink instance and nested devlink instance differs,
put netnsid attr to indicate that.
Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Historically, the shared devlink_mutex prevented devlink instances from
being registered/unregistered during another devlink instance reload
operation. However, devlink_muxex is gone for some time now, this
limitation is no longer needed. Lift it.
Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The eswitch disable call does removal of all representors. Do that
before clearing the SF device table and maintain the same flow as during
SF devlink port removal, where the representor is removed before
the actual SF is removed.
Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Instead of exposing linecard struct, expose a simple helper to get the
linecard index, which is all is needed outside linecard.c. Move the
linecard struct to linecard.c and keep it private similar to the rest of
the devlink objects.
Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
When operating in fixed phy mode and if there is repeated open/close
phy test cases, everytime the fixed phy is registered as a new phy
which leads to overrun after 32 iterations. It is solved by adding
fixed_phy_unregister() in the phy_close path.
In phy_close path, netdev->phydev cannot be used directly in
fixed_phy_unregister() due to two reasons,
- netdev->phydev is set to NULL in phy_disconnect()
- fixed_phy_unregister() can be called only after phy_disconnect()
So saving the netdev->phydev in local variable 'phydev' and
passing it to phy_disconnect().
Signed-off-by: Pavithra Sathyanarayanan <Pavithra.Sathyanarayanan@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Vadim Fedorenko says:
====================
Create common DPLL configuration API
Implement common API for DPLL configuration and status reporting.
The API utilises netlink interface as transport for commands and event
notifications. This API aims to extend current pin configuration
provided by PTP subsystem and make it flexible and easy to cover
complex configurations.
Netlink interface is based on ynl spec, it allows use of in-kernel
tools/net/ynl/cli.py application to control the interface with properly
formated command and json attribute strings. Here are few command
examples of how it works with `ice` driver on supported NIC:
- dump dpll devices:
$ sudo ./tools/net/ynl/cli.py --spec Documentation/netlink/specs/dpll.yaml \
--dump device-get
[{'clock-id': 4658613174691613800,
'id': 0,
'lock-status': 'locked-ho-acq',
'mode': 'automatic',
'mode-supported': ['automatic'],
'module-name': 'ice',
'type': 'eec'},
{'clock-id': 4658613174691613800,
'id': 1,
'lock-status': 'locked-ho-acq',
'mode': 'automatic',
'mode-supported': ['automatic'],
'module-name': 'ice',
'type': 'pps'}]
- get single pin info:
$ sudo ./tools/net/ynl/cli.py --spec Documentation/netlink/specs/dpll.yaml \
--do pin-get --json '{"id":2}'
{'board-label': 'C827_0-RCLKA',
'clock-id': 4658613174691613800,
'capabilities': 6,
'frequency': 1953125,
'id': 2,
'module-name': 'ice',
'parent-device': [{'direction': 'input',
'parent-id': 0,
'prio': 9,
'state': 'disconnected'},
{'direction': 'input',
'parent-id': 1,
'prio': 9,
'state': 'disconnected'}],
'type': 'mux'}
- set pin's state on dpll:
$ sudo ./tools/net/ynl/cli.py --spec Documentation/netlink/specs/dpll.yaml \
--do pin-set --json '{"id":2, "parent-device":{"parent-id":1, "state":2}}'
- set pin's prio on dpll:
$ sudo ./tools/net/ynl/cli.py --spec Documentation/netlink/specs/dpll.yaml \
--do pin-set --json '{"id":2, "parent-device":{"parent-id":1, "prio":4}}'
- set pin's state on parent pin:
$ sudo ./tools/net/ynl/cli.py --spec Documentation/netlink/specs/dpll.yaml \
--do pin-set --json '{"id":13, "parent-pin":{"parent-id":2, "state":1}}'
Changelog:
v7 -> v8:
- rebase on top of net-next
- no functional changes in patchset
v6 -> v7:
- use unique id in references array to prevent possible crashes
v5 -> v6:
- change dpll-caps to pin capabilities and adjust enum accordingly
- remove dpll.h from netdevice.h
v4 -> v5:
- separate namespace for pin attributes
- small fixes, more details in the patches
v3 -> v4:
- rebase on top of net-next
- fix flag usage in ice
v2 -> v3:
- more style and warning fixes
- details in per-patch logs
v1 -> v2:
- remove FREERUN/DETACHED mode
- reorder functions in commits not to depend on files introduced in
future commits
- style and warning fixes
v9 RFC -> v1:
- Merge header patch into the patches where the actual functions are
implemented
- Address comments from previous reviews
- Per patch change log contains more details
RFC versions:
v8 -> v9:
[00/10] Create common DPLL configuration API
- update examples to reflect new pin-parent nest split
[01/10] dpll: documentation on DPLL subsystem interface
- fix docs build warnings
- separate netlink command/attribute list
- replace enum description with uapi header
- add brief explanation what is a DPLL
- fix EOPNOTSUPP typo
- fix typo .state_get -> .state_on_dpll_get
[02/10] dpll: spec: Add Netlink spec in YAML
- regenerate policy max values
- add missing enum descriptions
- split pin-parent nest:
- pin-parent-device - for configuration of pin-device tuple
- pin-parent-pin - for configuration od pin-pin tuple
- fix typos:
- s/working-modes/working modes/
- s/differentiate/differentiates/
- s/valid input, auto selected by dpll/input pin auto selected by dpll/
- remove FREERUN and HOLDOVER modes
[03/10] dpll: core: Add DPLL framework base functions
- fix description in spdx header.
- remove refcount check if refcount was already set
- do not validate dpll ptr in dpll_device_put(..)
- fix return -ENOMEM on failed memory alloc
- do not validate pin ptr in dpll_pin_put(..)
- return -EINVAL in case of module/clock_id mismatch
- do not {} around one-line xa_for_each() macro
- move dpll_<x>_registration structs to dpll_core.c
- rephrase doc comment on device and pin id struct members
- remove ref in case of memory allocation fail
- check for required ops on pin/device registration
- mark pin with DPLL_REGISTERED once pin is registered with dpll
[04/10] dpll: netlink: Add DPLL framework base functions
- fix pin-id-get/device-id-get behavior
- reshuffle order of functions
- avoid forward declarations
- functions for adding pin/device handle next to each other
- pass ops callback return values to the user
- remove dpll_cmd_pin_fill_details(..) function, merge the code into
__dpll_cmd_pin_dump_one(..)
- rename __dpll_cmd_pin_dump_one() to dpll_cmd_pin_get_one()
- use WARN_ON macro when dpll ref is missing
- remove redundant pin's dpll list not empty check
- remove double spaces inside if statement
- add extack message when set command is not possible
- do not return error when callback is not required
- WARN_ON missing ops moved to dpll_core.c
- use DPLL_REGISTERED if pin was registered with dpll
- fix pin-id-get return and add extack errors
- fix device-id-get return and add extack errors
- drop pointless init of variables
- add macro for iterating over marked pins/devices
- move dpll_set_from_nlattr() for consistent order
- use GENL_REQ_ATTR_CHECK() for checking attibute presence
- fill extack if pin/device was not found
- drop pointless init of variables
- WARN_ON if dpll not registered on send event
- rename goto labels to indicate error path
- fix docs
- drop pointless init of variables
- verify pin in notify with a mark
- prevent ops->mode_set call if missing callback
- move static dpll_msg_add_pin_handle() from pin<->netdev patch
- split pin-parent nest:
- pin-parent-device - for configuration of pin-device tuple
- pin-parent-pin - for configuration od pin-pin tuple
[06/10] netdev: expose DPLL pin handle for netdevice
- net_device->dpll_pin is only valid if IS_ENABLED(CONFIG_DPLL) fix the
code in net/core/rtnetlink.c to respect that.
- move dpll_msg_add_pin_handle to "dpll: netlink" patch + export the
function with this patch
[07/10] ice: add admin commands to access cgu configuration
- rename MAX_NETLIST_SIZE -> ICE_MAX_NETLIST_SIZE
- simplify function: s64 convert_s48_to_s64(s64 signed_48)
- do not assign 0 to field that is already 0
[08/10] ice: implement dpll interface to control cgu
- drop pointless 0 assignement
- ice_dpll_init(..) returns void instead of int
- fix context description of the functions
- fix ice_dpll_init(..) traces
- fix use package_label instead pf board_label for rclk pin
- be consistent on cgu presence naming
- remove indent in ice_dpll_deinit(..)
- remove unused struct field lock_err_num
- fix kworker resched behavior
- remove debug log from ice_dpll_deinit_worker(..)
- reorder ice internal functions
- release resources directly on error path
- remove redundant NULL checks when releasing resources
- do not assign NULL to pointers after releasing resources
- simplify variable assignement
- fix 'int ret;' declarations across the ice_dpll.c
- remove leftover ice_dpll_find(..)
- get pf pointer from dpll_priv without type cast
- improve error reporting
- fix documentation
- fix ice_dpll_update_state(..) flow
- fix return in case out of range prio set
v7 -> v8:
[0/10] Create common DPLL configuration API
- reorder the patches in patch series
- split patch "[RFC PATCH v7 2/8] dpll: Add DPLL framework base functions"
into 3 smaller patches for easier review:
- [03/10] dpll: core: Add DPLL framework base functions
- [04/10] dpll: netlink: Add DPLL framework base functions
- [05/10] dpll: api header: Add DPLL framework base
- add cli.py usage examples in commit message
[01/10] dpll: documentation on DPLL subsystem interface
- fix DPLL_MODE_MANUAL documentation
- remove DPLL_MODE_NCO
- remove DPLL_LOCK_STATUS_CALIBRATING
- add grepability Use full names of commands, attributes and values of
dpll subsystem in the documentation
- align documentation with changes introduced in v8
- fix typos
- fix phrases to better show the intentions
- move dpll.rst to Documentation/driver-api/
[02/10] dpll: spec: Add Netlink spec in YAML
- remove unspec attribute values
- add 10 KHZ and 77,5 KHZ frequency defines
- fix documentation
- remove assigned values from subset attributes
- reorder dpll attributes
- fix `device` nested attribute usage, device get is not used on pin-get
- temperature with 3 digit float precision
- remove enum from subset definitions
- move pin-direction to pin-dpll tuple/subset
- remove DPLL_MODE_NCO
- remove DPLL_LOCK_STATUS_CALIBRATING
- fix naming scheme od notification interface functions
- separate notifications for pins
- rename attribute enum name: dplla -> dpll_a
- rename pin-idx to pin-id
- remove attributes: pin-parent-idx, device
- replace bus-name and dev-name attributes with module-name
- replace pin-label with 3 new attributes: pin-board-label,
pin-panel-label, pin-package-label
- add device-id-get and pin-id-get commands
- remove rclk-dev-name atribute
- rename DPLL_PIN_DIRECTION_SOURCE -> DPLL_PIN_DIRECTION_INPUT
[03/10] dpll: core: Add DPLL framework base functions
[04/10] dpll: netlink: Add DPLL framework base functions
[05/10] dpll: api header: Add DPLL framework base
- remove unspec attributes after removing from dpll netlink spec
- move pin-direction to pin-dpll tuple
- pass parent_priv on state_on_pin_<get/set>
- align with new notification definitions from netlink spec
- use separated notifications for dpll pins and devices
- format notification messages as corresponding get netlink commands
- rename pin-idx to pin-id
- remove attributes pin-parent-idx, device
- use DPLL_A_PIN_PARENT to hold information on parent pin or dpll device
- refactor lookup for pins and dplls for dpll subsystem
- replace bus-name, dev-name with module-name
- replace pin-label with 3 new attributes: pin-board-label,
pin-panel-label, pin-package-label
- add device-id-get and pin-id-get commands
- rename dpll_xa_lock to dpll_lock
- improve doxygen in dpll_core.c
- remove unused parent and dev fields from dpll_device struct
- use u32 for pin_idx in dpll_pin_alloc
- use driver provided pin properties struct
- verify pin/dpll owner on registering pin
- remove const arg modifier for helper _priv functions
- remove function declaration _get_by_name()
- update SPDX headers
- parse netlink set attributes with nlattr array
- remove rclk-dev-name attribute
- remove device pointer from dpll_pin_register/dpll_device_register
- remove redundant doxygen from dpll header
- use module_name() to get name of module
- add missing/remove outdated kdocs
- fix call frequency_set only if available
- fix call direction_set only for pin-dpll tuple
[06/10] netdev: expose DPLL pin handle for netdevice
- rebased on top of v8 changes
- use dpll_msg_add_pin_handle() in dpll_pin_find_from_nlattr()
and dpll_msg_add_pin_parents()
- fixed handle to use DPLL_A_PIN_ID and removed temporary comments
- added documentation record for dpll_pin pointer
- fixed compilation of net/core/dev.c when CONFIG_DPLL is not enabled
- adjusted patch description a bit
[07/10] ice: add admin commands to access cgu configuration
- Remove unspec attributes after removing from dpll netlink spec.
[08/10] ice: implement dpll interface to control cgu
- remove unspec attributes
- do not store pin flags received in set commands
- use pin state field to provide pin state to the caller
- remove include of uapi header
- remove redundant check against null arguments
- propagate lock function return value to the caller
- use switch case instead of if statements
- fix dev_dbg to dev_err for error cases
- fix dpll/pin lookup on dpll subsytem callbacks
- fix extack of dpll subsystem callbacks
- remove double negation and variable cast
- simplify ice_dpll_pin_state_set function
- pass parent_priv on state_on_pin_<get/set>
- remove parent hw_idx lookup
- fix use const qualifier for dpll/dpll_pin ops
- fix IS_ERR macros usage in ice_dpll
- add notify previous source state change
- fix mutex locking on releasing pins
- use '|=' instead of '+=' when modifing capabilities field
- rename ice_dpll_register_pins function
- clock_id function to return clock ID on the stack instead of using
an output variable
- DPLL_LOCK_STATUS_CALIBRATING was removed, return:
DPLL_LOCK_STATUS_LOCKED - if dpll was locked
DPLL_LOCK_STATUS_LOCKED_HO_ACQ - if dpll was locked and holdover is
acquired
- propagate and use dpll_priv to obtain pf pointer in corresponding
functions.
- remove null check for pf pointer
- adapt to `dpll: core: fix notification scheme`
- expose pf related pin to corresponding netdevice
- fix dpll init error path
- fix dpll pins naming scheme `source` -> `input`
- replace pin-label with pin-board-label
- dpll remove parent and dev fields from dpll_device
- remove device pointer from dpll_pin_register/dpll_device_register
- rename DPLL_PIN_DIRECTION_SOURCE -> DPLL_PIN_DIRECTION_INPUT
[09/10] ptp_ocp: implement DPLL ops
- replace pin-label with pin-board-label
- dpll remove parent and dev fields from dpll_device
- remove device pointer from dpll_pin_register/dpll_device_register
- rename DPLL_PIN_DIRECTION_SOURCE -> DPLL_PIN_DIRECTION_INPUT
[10/10] mlx5: Implement SyncE support using DPLL infrastructure
- rebased on top of v8 changes:
- changed notification scheme
- no need to fill pin label
- implemented locked_ho_acq status
- rename DPLL_PIN_DIRECTION_SOURCE -> DPLL_PIN_DIRECTION_INPUT
- remove device pointer from dpll_pin_register/dpll_device_register
- fixed MSEES register writes
- adjusted pin state and lock state values reported
- fixed a white space issue
v6 -> v7:
* YAML spec:
- remove nested 'pin' attribute
- clean up definitions on top of the latest changes
* pin object:
- pin xarray uses id provided by the driver
- remove usage of PIN_IDX_INVALID in set function
- source_pin_get() returns object instead of idx
- fixes in frequency support API
* device and pin operations are const now
* small fixes in naming in Makefile and in the functions
* single mutex for the subsystem to avoid possible ABBA locks
* no special *_priv() helpers anymore, private data is passed as void*
* no netlink filters by name anymore, only index is supported
* update ptp_ocp and ice drivers to follow new API version
* add mlx5e driver as a new customer of the subsystem
v5 -> v6:
* rework pin part to better fit shared pins use cases
* add YAML spec to easy generate user-space apps
* simple implementation in ptp_ocp is back again
v4 -> v5:
* fix code issues found during last reviews:
- replace cookie with clock id
- follow one naming schema in dpll subsys
- move function comments to dpll_core.c, fix exports
- remove single-use helper functions
- merge device register with alloc
- lock and unlock mutex on dpll device release
- move dpll_type to uapi header
- rename DPLLA_DUMP_FILTER to DPLLA_FILTER
- rename dpll_pin_state to dpll_pin_mode
- rename DPLL_MODE_FORCED to DPLL_MODE_MANUAL
- remove DPLL_CHANGE_PIN_TYPE enum value
* rewrite framework once again (Arkadiusz)
- add clock class:
Provide userspace with clock class value of DPLL with dpll device
dump netlink request. Clock class is assigned by driver allocating
a dpll device. Clock class values are defined as specified in:
ITU-T G.8273.2/Y.1368.2 recommendation.
- dpll device naming schema use new pattern:
"dpll_%s_%d_%d", where:
- %s - dev_name(parent) of parent device,
- %d (1) - enum value of dpll type,
- %d (2) - device index provided by parent device.
- new muxed/shared pin registration:
Let the kernel module to register a shared or muxed pin without
finding it or its parent. Instead use a parent/shared pin
description to find correct pin internally in dpll_core, simplifing
a dpll API
* Implement complex DPLL design in ice driver (Arkadiusz)
* Remove ptp_ocp driver from the series for now
v3 -> v4:
* redesign framework to make pins dynamically allocated (Arkadiusz)
* implement shared pins (Arkadiusz)
v2 -> v3:
* implement source select mode (Arkadiusz)
* add documentation
* implementation improvements (Jakub)
v1 -> v2:
* implement returning supported input/output types
* ptp_ocp: follow suggestions from Jonathan
* add linux-clk mailing list
v0 -> v1:
* fix code style and errors
* add linux-arm mailing list
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Implement SyncE support using newly introduced DPLL support.
Make sure that each PFs/VFs/SFs probed with appropriate capability
will spawn a dpll auxiliary device and register appropriate dpll device
and pin instances.
Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Arkadiusz Kubalewski <arkadiusz.kubalewski@intel.com>
Signed-off-by: Vadim Fedorenko <vadim.fedorenko@linux.dev>
Signed-off-by: David S. Miller <davem@davemloft.net>
|