Age | Commit message (Collapse) | Author |
|
In order to implement byte queue limits (bql) in CAN drivers, the length of the
CAN frame needs to be passed into the networking stack after queueing and after
transmission completion.
To avoid to calculate this length twice, extend the struct can_skb_priv to hold
the length of the CAN frame and extend __can_get_echo_skb() to return that
value.
Reviewed-by: Vincent Mailhol <mailhol.vincent@wanadoo.fr>
Link: https://lore.kernel.org/r/20210111141930.693847-12-mkl@pengutronix.de
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
|
|
of frame in data link layer
This patch adds the function can_skb_get_frame_len() which returns the length
of a CAN frame on the data link layer, including Start-of-frame, Identifier,
various other bits, the actual data, the CRC, the End-of-frame, the Inter frame
spacing.
Co-developed-by: Arunachalam Santhanam <arunachalam.santhanam@in.bosch.com>
Signed-off-by: Arunachalam Santhanam <arunachalam.santhanam@in.bosch.com>
Co-developed-by: Vincent Mailhol <mailhol.vincent@wanadoo.fr>
Signed-off-by: Vincent Mailhol <mailhol.vincent@wanadoo.fr>
Acked-by: Vincent Mailhol <mailhol.vincent@wanadoo.fr>
Reviewed-by: Vincent Mailhol <mailhol.vincent@wanadoo.fr>
Co-developed-by: Marc Kleine-Budde <mkl@pengutronix.de>
Link: https://lore.kernel.org/r/20210111141930.693847-11-mkl@pengutronix.de
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
|
|
The data field in CAN-FD frames have specifig frame length (0, 1, 2, 3, 4, 5,
6, 7, 8, 12, 16, 20, 24, 32, 48, 64). This function "rounds" up a given length
to the next valid CAN-FD frame length.
Reviewed-by: Vincent Mailhol <mailhol.vincent@wanadoo.fr>
Link: https://lore.kernel.org/r/20210111141930.693847-10-mkl@pengutronix.de
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
|
|
This patch moves the netlink related code of the CAN device infrastructure into
a separate file.
Reviewed-by: Vincent Mailhol <mailhol.vincent@wanadoo.fr>
Link: https://lore.kernel.org/r/20210111141930.693847-7-mkl@pengutronix.de
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
|
|
This patch moves the skb related code of the CAN device infrastructure into a
separate file.
Reviewed-by: Vincent Mailhol <mailhol.vincent@wanadoo.fr>
Link: https://lore.kernel.org/r/20210111141930.693847-6-mkl@pengutronix.de
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
|
|
This patch moves all CAN frame length related code of the CAN device
infrastructure into a separate file.
Reviewed-by: Vincent Mailhol <mailhol.vincent@wanadoo.fr>
Link: https://lore.kernel.org/r/20210111141930.693847-5-mkl@pengutronix.de
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
|
|
This patch moves the bittiming related code of the CAN device infrastructure
into a separate file.
Reviewed-by: Vincent Mailhol <mailhol.vincent@wanadoo.fr>
Link: https://lore.kernel.org/r/20210111141930.693847-4-mkl@pengutronix.de
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
|
|
Allow DSA drivers to export stats64
Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de>
Reviewed-by: Vladimir Oltean <olteanv@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
List of things fixed:
- Two of the socket options were idented with spaces instead of tabs.
- Trailing whitespace in some lines.
- Improper spacing around parenthesis caught by checkpatch.pl.
- Mix of space and tabs in tcp_word_hdr union.
Signed-off-by: Danilo Carvalho <doak@google.com>
Link: https://lore.kernel.org/r/20210108222104.2079472-1-doak@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Now that all users of struct switchdev_trans have been modified to do
without it, we can remove this structure and the two helpers to determine
the phase.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Acked-by: Linus Walleij <linus.walleij@linaro.org>
Acked-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
It should be the driver's business to logically separate its VLAN
offloading into a preparation and a commit phase, and some drivers don't
need / can't do this.
So remove the transactional shim from DSA and let drivers propagate
errors directly from the .port_vlan_add callback.
It would appear that the code has worse error handling now than it had
before. DSA is the only in-kernel user of switchdev that offloads one
switchdev object to more than one port: for every VLAN object offloaded
to a user port, that VLAN is also offloaded to the CPU port. So the
"prepare for user port -> check for errors -> prepare for CPU port ->
check for errors -> commit for user port -> commit for CPU port"
sequence appears to make more sense than the one we are using now:
"offload to user port -> check for errors -> offload to CPU port ->
check for errors", but it is really a compromise. In the new way, we can
catch errors from the commit phase that we previously had to ignore.
But we have our hands tied and cannot do any rollback now: if we add a
VLAN on the CPU port and it fails, we can't do the rollback by simply
deleting it from the user port, because the switchdev API is not so nice
with us: it could have simply been there already, even with the same
flags. So we don't even attempt to rollback anything on addition error,
just leave whatever VLANs managed to get offloaded right where they are.
This should not be a problem at all in practice.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Acked-by: Linus Walleij <linus.walleij@linaro.org>
Acked-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
For many drivers, the .port_mdb_prepare callback was not a good opportunity
to avoid any error condition, and they would suppress errors found during
the actual commit phase.
Where a logical separation between the prepare and the commit phase
existed, the function that used to implement the .port_mdb_prepare
callback still exists, but now it is called directly from .port_mdb_add,
which was modified to return an int code.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Acked-by: Linus Walleij <linus.walleij@linaro.org>
Acked-by: Jiri Pirko <jiri@nvidia.com>
Reviewed-by: Kurt Kanzenbach <kurt@linutronix.de> # hellcreek
Reviewed-by: Linus Wallei <linus.walleij@linaro.org> # RTL8366
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Since the introduction of the switchdev API, port attributes were
transmitted to drivers for offloading using a two-step transactional
model, with a prepare phase that was supposed to catch all errors, and a
commit phase that was supposed to never fail.
Some classes of failures can never be avoided, like hardware access, or
memory allocation. In the latter case, merely attempting to move the
memory allocation to the preparation phase makes it impossible to avoid
memory leaks, since commit 91cf8eceffc1 ("switchdev: Remove unused
transaction item queue") which has removed the unused mechanism of
passing on the allocated memory between one phase and another.
It is time we admit that separating the preparation from the commit
phase is something that is best left for the driver to decide, and not
something that should be baked into the API, especially since there are
no switchdev callers that depend on this.
This patch removes the struct switchdev_trans member from switchdev port
attribute notifier structures, and converts drivers to not look at this
member.
In part, this patch contains a revert of my previous commit 2e554a7a5d8a
("net: dsa: propagate switchdev vlan_filtering prepare phase to
drivers").
For the most part, the conversion was trivial except for:
- Rocker's world implementation based on Broadcom OF-DPA had an odd
implementation of ofdpa_port_attr_bridge_flags_set. The conversion was
done mechanically, by pasting the implementation twice, then only
keeping the code that would get executed during prepare phase on top,
then only keeping the code that gets executed during the commit phase
on bottom, then simplifying the resulting code until this was obtained.
- DSA's offloading of STP state, bridge flags, VLAN filtering and
multicast router could be converted right away. But the ageing time
could not, so a shim was introduced and this was left for a further
commit.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Acked-by: Linus Walleij <linus.walleij@linaro.org>
Acked-by: Jiri Pirko <jiri@nvidia.com>
Reviewed-by: Kurt Kanzenbach <kurt@linutronix.de> # hellcreek
Reviewed-by: Linus Walleij <linus.walleij@linaro.org> # RTL8366RB
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Since the introduction of the switchdev API, port objects were
transmitted to drivers for offloading using a two-step transactional
model, with a prepare phase that was supposed to catch all errors, and a
commit phase that was supposed to never fail.
Some classes of failures can never be avoided, like hardware access, or
memory allocation. In the latter case, merely attempting to move the
memory allocation to the preparation phase makes it impossible to avoid
memory leaks, since commit 91cf8eceffc1 ("switchdev: Remove unused
transaction item queue") which has removed the unused mechanism of
passing on the allocated memory between one phase and another.
It is time we admit that separating the preparation from the commit
phase is something that is best left for the driver to decide, and not
something that should be baked into the API, especially since there are
no switchdev callers that depend on this.
This patch removes the struct switchdev_trans member from switchdev port
object notifier structures, and converts drivers to not look at this
member.
Where driver conversion is trivial (like in the case of the Marvell
Prestera driver, NXP DPAA2 switch, TI CPSW, and Rocker drivers), it is
done in this patch.
Where driver conversion needs more attention (DSA, Mellanox Spectrum),
the conversion is left for subsequent patches and here we only fake the
prepare/commit phases at a lower level, just not in the switchdev
notifier itself.
Where the code has a natural structure that is best left alone as a
preparation and a commit phase (as in the case of the Ocelot switch),
that structure is left in place, just made to not depend upon the
switchdev transactional model.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Acked-by: Linus Walleij <linus.walleij@linaro.org>
Acked-by: Jiri Pirko <jiri@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
The call path of a switchdev VLAN addition to the bridge looks something
like this today:
nbp_vlan_init
| __br_vlan_set_default_pvid
| | |
| | br_afspec |
| | | |
| | v |
| | br_process_vlan_info |
| | | |
| | v |
| | br_vlan_info |
| | / \ /
| | / \ /
| | / \ /
| | / \ /
v v v v v
nbp_vlan_add br_vlan_add ------+
| ^ ^ | |
| / | | |
| / / / |
\ br_vlan_get_master/ / v
\ ^ / / br_vlan_add_existing
\ | / / |
\ | / / /
\ | / / /
\ | / / /
\ | / / /
v | | v /
__vlan_add /
/ | /
/ | /
v | /
__vlan_vid_add | /
\ | /
v v v
br_switchdev_port_vlan_add
The ranges UAPI was introduced to the bridge in commit bdced7ef7838
("bridge: support for multiple vlans and vlan ranges in setlink and
dellink requests") (Jan 10 2015). But the VLAN ranges (parsed in br_afspec)
have always been passed one by one, through struct bridge_vlan_info
tmp_vinfo, to br_vlan_info. So the range never went too far in depth.
Then Scott Feldman introduced the switchdev_port_bridge_setlink function
in commit 47f8328bb1a4 ("switchdev: add new switchdev bridge setlink").
That marked the introduction of the SWITCHDEV_OBJ_PORT_VLAN, which made
full use of the range. But switchdev_port_bridge_setlink was called like
this:
br_setlink
-> br_afspec
-> switchdev_port_bridge_setlink
Basically, the switchdev and the bridge code were not tightly integrated.
Then commit 41c498b9359e ("bridge: restore br_setlink back to original")
came, and switchdev drivers were required to implement
.ndo_bridge_setlink = switchdev_port_bridge_setlink for a while.
In the meantime, commits such as 0944d6b5a2fa ("bridge: try switchdev op
first in __vlan_vid_add/del") finally made switchdev penetrate the
br_vlan_info() barrier and start to develop the call path we have today.
But remember, br_vlan_info() still receives VLANs one by one.
Then Arkadi Sharshevsky refactored the switchdev API in 2017 in commit
29ab586c3d83 ("net: switchdev: Remove bridge bypass support from
switchdev") so that drivers would not implement .ndo_bridge_setlink any
longer. The switchdev_port_bridge_setlink also got deleted.
This refactoring removed the parallel bridge_setlink implementation from
switchdev, and left the only switchdev VLAN objects to be the ones
offloaded from __vlan_vid_add (basically RX filtering) and __vlan_add
(the latter coming from commit 9c86ce2c1ae3 ("net: bridge: Notify about
bridge VLANs")).
That is to say, today the switchdev VLAN object ranges are not used in
the kernel. Refactoring the above call path is a bit complicated, when
the bridge VLAN call path is already a bit complicated.
Let's go off and finish the job of commit 29ab586c3d83 by deleting the
bogus iteration through the VLAN ranges from the drivers. Some aspects
of this feature never made too much sense in the first place. For
example, what is a range of VLANs all having the BRIDGE_VLAN_INFO_PVID
flag supposed to mean, when a port can obviously have a single pvid?
This particular configuration _is_ denied as of commit 6623c60dc28e
("bridge: vlan: enforce no pvid flag in vlan ranges"), but from an API
perspective, the driver still has to play pretend, and only offload the
vlan->vid_end as pvid. And the addition of a switchdev VLAN object can
modify the flags of another, completely unrelated, switchdev VLAN
object! (a VLAN that is PVID will invalidate the PVID flag from whatever
other VLAN had previously been offloaded with switchdev and had that
flag. Yet switchdev never notifies about that change, drivers are
supposed to guess).
Nonetheless, having a VLAN range in the API makes error handling look
scarier than it really is - unwinding on errors and all of that.
When in reality, no one really calls this API with more than one VLAN.
It is all unnecessary complexity.
And despite appearing pretentious (two-phase transactional model and
all), the switchdev API is really sloppy because the VLAN addition and
removal operations are not paired with one another (you can add a VLAN
100 times and delete it just once). The bridge notifies through
switchdev of a VLAN addition not only when the flags of an existing VLAN
change, but also when nothing changes. There are switchdev drivers out
there who don't like adding a VLAN that has already been added, and
those checks don't really belong at driver level. But the fact that the
API contains ranges is yet another factor that prevents this from being
addressed in the future.
Of the existing switchdev pieces of hardware, it appears that only
Mellanox Spectrum supports offloading more than one VLAN at a time,
through mlxsw_sp_port_vlan_set. I have kept that code internal to the
driver, because there is some more bookkeeping that makes use of it, but
I deleted it from the switchdev API. But since the switchdev support for
ranges has already been de facto deleted by a Mellanox employee and
nobody noticed for 4 years, I'm going to assume it's not a biggie.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com> # switchdev and mlxsw
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Reviewed-by: Kurt Kanzenbach <kurt@linutronix.de> # hellcreek
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
This patch added a new command MPTCP_PM_CMD_SET_FLAGS in PM netlink:
In mptcp_nl_cmd_set_flags, parse the input address, get the backup value
according to whether the address's FLAG_BACKUP flag is set from the
user-space. Then check whether this address had been added in the local
address list. If it had been, then call mptcp_nl_addr_backup to deal with
this address.
In mptcp_nl_addr_backup, traverse all the existing msk sockets to find
the relevant sockets, and call mptcp_pm_nl_mp_prio_send_ack to send out
a MP_PRIO ACK packet.
Finally in mptcp_nl_cmd_set_flags, set or clear the address's FLAG_BACKUP
flag.
Signed-off-by: Geliang Tang <geliangtang@gmail.com>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
GRO_DROP can only be returned from napi_gro_frags()
if the skb has not been allocated by a prior napi_get_frags()
Since drivers must use napi_get_frags() and test its result
before populating the skb with metadata, we can safely remove
GRO_DROP since it offers no practical use.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Jesse Brandeburg <jesse.brandeburg@intel.com>
Acked-by: Edward Cree <ecree.xilinx@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Define stub functions for the exposed MDT functions in case
QCOM_MDT_LOADER is not configured. This allows users of these
functions to link correctly for COMPILE_TEST builds without
QCOM_SCM enabled.
Reviewed-by: Bjorn Andersson <bjorn.andersson@linaro.org>
Signed-off-by: Alex Elder <elder@linaro.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Stub functions are defined for SSR notifier registration in case
QCOM_RPROC_COMMON is not configured. As a result, code that uses
these functions can link successfully even if the common remoteproc
code is not built.
Code that registers an SSR notifier function likely needs the
types defined in "qcom_rproc.h", but those are only exposed if
QCOM_RPROC_COMMON is enabled.
Rearrange the conditional definition so the qcom_ssr_notify_data
structure and qcom_ssr_notify_type enumerated type are defined
whether or not QCOM_RPROC_COMMON is enabled.
Reviewed-by: Bjorn Andersson <bjorn.andersson@linaro.org>
Signed-off-by: Alex Elder <elder@linaro.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Trivial conflict in CAN on file rename.
Conflicts:
drivers/net/can/m_can/tcan4x5x-core.c
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Pull more networking fixes from Jakub Kicinski:
"Slightly lighter pull request to get back into the Thursday cadence.
Current release - always broken:
- can: mcp251xfd: fix Tx/Rx ring buffer driver race conditions
- dsa: hellcreek: fix led_classdev build errors
Previous releases - regressions:
- ipv6: fib: flush exceptions when purging route to avoid netdev
reference leak
- ip_tunnels: fix pmtu check in nopmtudisc mode
- ip: always refragment ip defragmented packets to avoid MTU issues
when forwarding through tunnels, correct "packet too big" message
is prohibitively tricky to generate
- s390/qeth: fix locking for discipline setup / removal and during
recovery to prevent both deadlocks and races
- mlx5: Use port_num 1 instead of 0 when delete a RoCE address
Previous releases - always broken:
- cdc_ncm: correct overhead calculation in delayed_ndp_size to
prevent out of bound accesses with Huawei 909s-120 LTE module
- fix stmmac dwmac-sun8i suspend/resume:
- PHY being left powered off
- MAC syscon configuration being reset
- reference to the reset controller being improperly dropped
- qrtr: fix null-ptr-deref in qrtr_ns_remove
- can: tcan4x5x: fix bittiming const, use common bittiming from m_can
driver
- mlx5e: CT: Use per flow counter when CT flow accounting is enabled
- mlx5e: Fix SWP offsets when vlan inserted by driver
Misc:
- bpf: Fix a task_iter bug caused by a bpf -> net merge conflict
resolution
And the usual many fixes to various error paths"
* tag 'net-5.11-rc3-2' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (69 commits)
net: dsa: lantiq_gswip: Exclude RMII from modes that report 1 GbE
s390/qeth: fix L2 header access in qeth_l3_osa_features_check()
s390/qeth: fix locking for discipline setup / removal
s390/qeth: fix deadlock during recovery
selftests: fib_nexthops: Fix wrong mausezahn invocation
nexthop: Bounce NHA_GATEWAY in FDB nexthop groups
nexthop: Unlink nexthop group entry in error path
nexthop: Fix off-by-one error in error path
octeontx2-af: fix memory leak of lmac and lmac->name
chtls: Fix chtls resources release sequence
chtls: Added a check to avoid NULL pointer dereference
chtls: Replace skb_dequeue with skb_peek
chtls: Avoid unnecessary freeing of oreq pointer
chtls: Fix panic when route to peer not configured
chtls: Remove invalid set_tcb call
chtls: Fix hardware tid leak
net: ip: always refragment ip defragmented packets
net: fix pmtu check in nopmtudisc mode
selftests: netfilter: add selftest for ipip pmtu discovery with enabled connection tracking
docs: octeontx2: tune rst markup
...
|
|
This reverts commit 757055ae8dedf5333af17b3b5b4b70ba9bc9da4e.
The commit caused that ttynull was used as the default console
on several systems[1][2][3]. As a result, the console was
blank even when a better alternative existed.
It happened when there was no console configured
on the command line and ttynull_init() was the first initcall
calling register_console().
Or it happened when /dev/ did not exist when console_on_rootfs()
was called. It was not able to open /dev/console even though
a console driver was registered. It tried to add ttynull console
but it obviously did not help. But ttynull became the preferred
console and was used by /dev/console when it was available later.
The commit tried to fix a historical problem that have been there
for ages. The primary motivation was the commit 3cffa06aeef7ece30f6
("printk/console: Allow to disable console output by using console=""
or console=null"). It provided a clean solution for a workaround
that was widely used and worked only by chance.
This revert causes that the console="" or console=null command line
options will again work only by chance. These options will cause that
a particular console will be preferred and the default (tty) ones
will not get enabled. There will be no console registered at
all. As a result there won't be stdin, stdout, and stderr for
the init process. But it worked exactly this way even before.
The proper solution has to fulfill many conditions:
+ Register ttynull only when explicitly required or as
the ultimate fallback.
+ ttynull should get associated with /dev/console but it must
not become preferred console when used as a fallback.
Especially, it must still be possible to replace it
by a better console later.
Such a change requires clean up of the register_console() code.
Otherwise, it would be even harder to follow. Especially, the use
of has_preferred_console and CON_CONSDEV flag is tricky. The clean
up is risky. The ordering of consoles is not well defined. And
any changes tend to break existing user settings.
Do the revert at the least risky solution for now.
[1] https://lore.kernel.org/linux-kselftest/20201221144302.GR4077@smile.fi.intel.com/
[2] https://lore.kernel.org/lkml/d2a3b3c0-e548-7dd1-730f-59bc5c04e191@synopsys.com/
[3] https://patchwork.ozlabs.org/project/linux-um/patch/20210105120128.10854-1-thomas@m3y3r.de/
Reported-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Reported-by: Vineet Gupta <vgupta@synopsys.com>
Reported-by: Thomas Meyer <thomas@m3y3r.de>
Signed-off-by: Petr Mladek <pmladek@suse.com>
Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Acked-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Unlike the rest of the skb_zcopy_ functions, these routines
operate on a 'struct ubuf', not a skb. Remove the 'skb_'
prefix from the naming to make things clearer.
Suggested-by: Willem de Bruijn <willemdebruijn.kernel@gmail.com>
Signed-off-by: Jonathan Lemon <jonathan.lemon@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Replace direct assignments with skb_zcopy_init() for zerocopy
cases where a new skb is initialized, without changing the
reference counts.
Signed-off-by: Jonathan Lemon <jonathan.lemon@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Currently, when an ubuf is attached to a new skb, the shared
flags word is initialized to a fixed value. Instead of doing
this, set the default flags in the ubuf, and have new skbs
inherit from this default.
This is needed when setting up different zerocopy types.
Signed-off-by: Jonathan Lemon <jonathan.lemon@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
In preparation for expanded zerocopy (TX and RX), move
the zerocopy related bits out of tx_flags into their own
flag word.
Signed-off-by: Jonathan Lemon <jonathan.lemon@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
At Willem's suggestion, rename the sock_zerocopy_* functions
so that they match the MSG_ZEROCOPY flag, which makes it clear
they are specific to this zerocopy implementation.
Signed-off-by: Jonathan Lemon <jonathan.lemon@gmail.com>
Acked-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
The sock_zerocopy_put_abort function contains logic which is
specific to the current zerocopy implementation. Add a wrapper
which checks the callback and dispatches apppropriately.
Signed-off-by: Jonathan Lemon <jonathan.lemon@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Add an optional skb parameter to the zerocopy callback parameter,
which is passed down from skb_zcopy_clear(). This gives access
to the original skb, which is needed for upcoming RX zero-copy
error handling.
Signed-off-by: Jonathan Lemon <jonathan.lemon@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Rename the get routines for consistency.
Signed-off-by: Jonathan Lemon <jonathan.lemon@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Replace sock_zerocopy_put with the generic skb_zcopy_put()
function. Pass 'true' as the success argument, as this
is identical to no change.
Signed-off-by: Jonathan Lemon <jonathan.lemon@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Before this change, the caller of sock_zerocopy_callback would
need to save the zerocopy status, decrement and check the refcount,
and then call the callback function - the callback was only invoked
when the refcount reached zero.
Now, the caller just passes the status into the callback function,
which saves the status and handles its own refcounts.
This makes the behavior of the sock_zerocopy_callback identical
to the tpacket and vhost callbacks.
Signed-off-by: Jonathan Lemon <jonathan.lemon@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
skb_zcopy_abort() has no in-tree consumers, remove it.
Signed-off-by: Jonathan Lemon <jonathan.lemon@gmail.com>
Acked-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
This effectively reverts commit 60724d4bae14 ("net: dsa: Add support for
DSA specific notifiers"). The reason is that since commit 2f1e8ea726e9
("net: dsa: link interfaces with the DSA master to get rid of lockdep
warnings"), it appears that there is a generic way to achieve the same
purpose. The only user thus far, the Broadcom SYSTEMPORT driver, was
converted to use the generic notifiers.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Acked-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Using the NETDEV_CHANGEUPPER notifications, drivers can be aware when
they are enslaved to e.g. a bridge by calling netif_is_bridge_master().
Export this helper from DSA to get the equivalent functionality of
determining whether the upper interface of a CHANGEUPPER notifier is a
DSA switch interface or not.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Acked-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
It is a bit strange to see something as specific as Broadcom SYSTEMPORT
bits in the main DSA include file. Move these away into a separate
header, and have the tagger and the SYSTEMPORT driver include them.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Acked-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
neighbors
Some DSA switches (and not only) cannot learn source MAC addresses from
packets injected from the CPU. They only perform hardware address
learning from inbound traffic.
This can be problematic when we have a bridge spanning some DSA switch
ports and some non-DSA ports (which we'll call "foreign interfaces" from
DSA's perspective).
There are 2 classes of problems created by the lack of learning on
CPU-injected traffic:
- excessive flooding, due to the fact that DSA treats those addresses as
unknown
- the risk of stale routes, which can lead to temporary packet loss
To illustrate the second class, consider the following situation, which
is common in production equipment (wireless access points, where there
is a WLAN interface and an Ethernet switch, and these form a single
bridging domain).
AP 1:
+------------------------------------------------------------------------+
| br0 |
+------------------------------------------------------------------------+
+------------+ +------------+ +------------+ +------------+ +------------+
| swp0 | | swp1 | | swp2 | | swp3 | | wlan0 |
+------------+ +------------+ +------------+ +------------+ +------------+
| ^ ^
| | |
| | |
| Client A Client B
|
|
|
+------------+ +------------+ +------------+ +------------+ +------------+
| swp0 | | swp1 | | swp2 | | swp3 | | wlan0 |
+------------+ +------------+ +------------+ +------------+ +------------+
+------------------------------------------------------------------------+
| br0 |
+------------------------------------------------------------------------+
AP 2
- br0 of AP 1 will know that Clients A and B are reachable via wlan0
- the hardware fdb of a DSA switch driver today is not kept in sync with
the software entries on other bridge ports, so it will not know that
clients A and B are reachable via the CPU port UNLESS the hardware
switch itself performs SA learning from traffic injected from the CPU.
Nonetheless, a substantial number of switches don't.
- the hardware fdb of the DSA switch on AP 2 may autonomously learn that
Client A and B are reachable through swp0. Therefore, the software br0
of AP 2 also may or may not learn this. In the example we're
illustrating, some Ethernet traffic has been going on, and br0 from AP
2 has indeed learnt that it can reach Client B through swp0.
One of the wireless clients, say Client B, disconnects from AP 1 and
roams to AP 2. The topology now looks like this:
AP 1:
+------------------------------------------------------------------------+
| br0 |
+------------------------------------------------------------------------+
+------------+ +------------+ +------------+ +------------+ +------------+
| swp0 | | swp1 | | swp2 | | swp3 | | wlan0 |
+------------+ +------------+ +------------+ +------------+ +------------+
| ^
| |
| Client A
|
|
| Client B
| |
| v
+------------+ +------------+ +------------+ +------------+ +------------+
| swp0 | | swp1 | | swp2 | | swp3 | | wlan0 |
+------------+ +------------+ +------------+ +------------+ +------------+
+------------------------------------------------------------------------+
| br0 |
+------------------------------------------------------------------------+
AP 2
- br0 of AP 1 still knows that Client A is reachable via wlan0 (no change)
- br0 of AP 1 will (possibly) know that Client B has left wlan0. There
are cases where it might never find out though. Either way, DSA today
does not process that notification in any way.
- the hardware FDB of the DSA switch on AP 1 may learn autonomously that
Client B can be reached via swp0, if it receives any packet with
Client 1's source MAC address over Ethernet.
- the hardware FDB of the DSA switch on AP 2 still thinks that Client B
can be reached via swp0. It does not know that it has roamed to wlan0,
because it doesn't perform SA learning from the CPU port.
Now Client A contacts Client B.
AP 1 routes the packet fine towards swp0 and delivers it on the Ethernet
segment.
AP 2 sees a frame on swp0 and its fdb says that the destination is swp0.
Hairpinning is disabled => drop.
This problem comes from the fact that these switches have a 'blind spot'
for addresses coming from software bridging. The generic solution is not
to assume that hardware learning can be enabled somehow, but to listen
to more bridge learning events. It turns out that the bridge driver does
learn in software from all inbound frames, in __br_handle_local_finish.
A proper SWITCHDEV_FDB_ADD_TO_DEVICE notification is emitted for the
addresses serviced by the bridge on 'foreign' interfaces. The software
bridge also does the right thing on migration, by notifying that the old
entry is deleted, so that does not need to be special-cased in DSA. When
it is deleted, we just need to delete our static FDB entry towards the
CPU too, and wait.
The problem is that DSA currently only cares about SWITCHDEV_FDB_ADD_TO_DEVICE
events received on its own interfaces, such as static FDB entries.
Luckily we can change that, and DSA can listen to all switchdev FDB
add/del events in the system and figure out if those events were emitted
by a bridge that spans at least one of DSA's own ports. In case that is
true, DSA will also offload that address towards its own CPU port, in
the eventuality that there might be bridge clients attached to the DSA
switch who want to talk to the station connected to the foreign
interface.
In terms of implementation, we need to keep the fdb_info->added_by_user
check for the case where the switchdev event was targeted directly at a
DSA switch port. But we don't need to look at that flag for snooped
events. So the check is currently too late, we need to move it earlier.
This also simplifies the code a bit, since we avoid uselessly allocating
and freeing switchdev_work.
We could probably do some improvements in the future. For example,
multi-bridge support is rudimentary at the moment. If there are two
bridges spanning a DSA switch's ports, and both of them need to service
the same MAC address, then what will happen is that the migration of one
of those stations will trigger the deletion of the FDB entry from the
CPU port while it is still used by other bridge. That could be improved
with reference counting but is left for another time.
This behavior needs to be enabled at driver level by setting
ds->assisted_learning_on_cpu_port = true. This is because we don't want
to inflict a potential performance penalty (accesses through
MDIO/I2C/SPI are expensive) to hardware that really doesn't need it
because address learning on the CPU port works there.
Reported-by: DENG Qingfang <dqfext@gmail.com>
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
BCM72116 features a 28nm integrated EPHY, add an entry to match this PHY
OUI.
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://lore.kernel.org/r/20210106170944.1253046-1-f.fainelli@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Move the NETIF_F_RX_UDP_TUNNEL_PORT feature check into
udp_tunnel_nic_*_port() helpers, since they're always
done right before the call.
Add similar checks before calling the notifier.
udp_tunnel_nic invokes the notifier without checking
features which could result in some wasted cycles.
Reviewed-by: Alexander Duyck <alexanderduyck@fb.com>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
All UDP tunnel port management is now routed via udp_tunnel_nic
infra directly. Remove the old callbacks.
Reviewed-by: Alexander Duyck <alexanderduyck@fb.com>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Expose firmware indication that it supports setting eswitch uplink state
to follow (follow the physical link). Condition setting the eswitch
uplink admin-state with this capability bit. Older FW may not support
the uplink state setting.
Fixes: 7d0314b11cdd ("net/mlx5e: Modify uplink state on interface up/down")
Signed-off-by: Aya Levin <ayal@nvidia.com>
Reviewed-by: Moshe Shemesh <moshe@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
|
From the existing definitions it's unclear which stat to
use to report filtering based on L2 dst addr in old
broadcast-medium Ethernet.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Pull networking fixes from Jakub Kicinski:
"Networking fixes, including fixes from netfilter, wireless and bpf
trees.
Current release - regressions:
- mt76: fix NULL pointer dereference in mt76u_status_worker and
mt76s_process_tx_queue
- net: ipa: fix interconnect enable bug
Current release - always broken:
- netfilter: fixes possible oops in mtype_resize in ipset
- ath11k: fix number of coding issues found by static analysis tools
and spurious error messages
Previous releases - regressions:
- e1000e: re-enable s0ix power saving flows for systems with the
Intel i219-LM Ethernet controllers to fix power use regression
- virtio_net: fix recursive call to cpus_read_lock() to avoid a
deadlock
- ipv4: ignore ECN bits for fib lookups in fib_compute_spec_dst()
- sysfs: take the rtnl lock around XPS configuration
- xsk: fix memory leak for failed bind and rollback reservation at
NETDEV_TX_BUSY
- r8169: work around power-saving bug on some chip versions
Previous releases - always broken:
- dcb: validate netlink message in DCB handler
- tun: fix return value when the number of iovs exceeds MAX_SKB_FRAGS
to prevent unnecessary retries
- vhost_net: fix ubuf refcount when sendmsg fails
- bpf: save correct stopping point in file seq iteration
- ncsi: use real net-device for response handler
- neighbor: fix div by zero caused by a data race (TOCTOU)
- bareudp: fix use of incorrect min_headroom size and a false
positive lockdep splat from the TX lock
- mvpp2:
- clear force link UP during port init procedure in case
bootloader had set it
- add TCAM entry to drop flow control pause frames
- fix PPPoE with ipv6 packet parsing
- fix GoP Networking Complex Control config of port 3
- fix pkt coalescing IRQ-threshold configuration
- xsk: fix race in SKB mode transmit with shared cq
- ionic: account for vlan tag len in rx buffer len
- stmmac: ignore the second clock input, current clock framework does
not handle exclusive clock use well, other drivers may reconfigure
the second clock
Misc:
- ppp: change PPPIOCUNBRIDGECHAN ioctl request number to follow
existing scheme"
* tag 'net-5.11-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (99 commits)
net: dsa: lantiq_gswip: Fix GSWIP_MII_CFG(p) register access
net: dsa: lantiq_gswip: Enable GSWIP_MII_CFG_EN also for internal PHYs
net: lapb: Decrease the refcount of "struct lapb_cb" in lapb_device_event
r8169: work around power-saving bug on some chip versions
net: usb: qmi_wwan: add Quectel EM160R-GL
selftests: mlxsw: Set headroom size of correct port
net: macb: Correct usage of MACB_CAPS_CLK_HW_CHG flag
ibmvnic: fix: NULL pointer dereference.
docs: networking: packet_mmap: fix old config reference
docs: networking: packet_mmap: fix formatting for C macros
vhost_net: fix ubuf refcount incorrectly when sendmsg fails
bareudp: Fix use of incorrect min_headroom size
bareudp: set NETIF_F_LLTX flag
net: hdlc_ppp: Fix issues when mod_timer is called while timer is running
atlantic: remove architecture depends
erspan: fix version 1 check in gre_parse_header()
net: hns: fix return value check in __lb_other_process()
net: sched: prevent invalid Scell_log shift count
net: neighbor: fix a crash caused by mod zero
ipv4: Ignore ECN bits for fib lookups in fib_compute_spec_dst()
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs
Pull AFS fixes from David Howells:
"Two fixes.
The first is the fix for the strnlen() array limit check and the
second fixes the calculation of the number of dirent records used to
represent any particular filename length"
* tag 'afs-fixes-04012021' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs:
afs: Fix directory entry size calculation
afs: Work around strnlen() oops with CONFIG_FORTIFIED_SOURCE=y
|
|
Pablo Neira Ayuso says:
====================
Netfilter fixes for net
The following patchset contains Netfilter fixes for net:
1) Missing sanitization of rateest userspace string, bug has been
triggered by syzbot, patch from Florian Westphal.
2) Report EOPNOTSUPP on missing set features in nft_dynset, otherwise
error reporting to userspace via EINVAL is misleading since this is
reserved for malformed netlink requests.
3) New binaries with old kernels might silently accept several set
element expressions. New binaries set on the NFT_SET_EXPR and
NFT_DYNSET_F_EXPR flags to request for several expressions per
element, hence old kernels which do not support for this bail out
with EOPNOTSUPP.
* git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf:
netfilter: nftables: add set expression flags
netfilter: nft_dynset: report EOPNOTSUPP on missing set feature
netfilter: xt_RATEEST: reject non-null terminated string from userspace
====================
Link: https://lore.kernel.org/r/20210103192920.18639-1-pablo@netfilter.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu
Pull RCU fix from Paul McKenney:
"This is a fix for a regression in the v5.10 merge window, but it was
reported quite late in the v5.10 process, plus generating and testing
the fix took some time.
The regression is due to commit 36dadef23fcc ("kprobes: Init kprobes
in early_initcall") which on powerpc can use RCU Tasks before
initialization, resulting in boot failures.
The fix is straightforward, simply moving initialization of RCU Tasks
before the early_initcall()s. The fix has been exposed to -next and
kbuild test robot testing, and has been tested by the PowerPC guys"
* 'rcu/urgent' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu:
rcu-tasks: Move RCU-tasks initialization to before early_initcall()
|
|
Pull ENABLE_MUST_CHECK removal from Miguel Ojeda:
"Remove CONFIG_ENABLE_MUST_CHECK (Masahiro Yamada)"
Note that this removes the config option by making the must-check
unconditional, not by removing must check itself.
* tag 'compiler-attributes-for-linus-v5.11' of git://github.com/ojeda/linux:
Compiler Attributes: remove CONFIG_ENABLE_MUST_CHECK
|
|
The number of dirent records used by an AFS directory entry should be
calculated using the assumption that there is a 16-byte name field in the
first block, rather than a 20-byte name field (which is actually the case).
This miscalculation is historic and effectively standard, so we have to use
it.
The calculation we need to use is:
1 + (((strlen(name) + 1) + 15) >> 5)
where we are adding one to the strlen() result to account for the NUL
termination.
Fix this by the following means:
(1) Create an inline function to do the calculation for a given name
length.
(2) Use the function to calculate the number of records used for a dirent
in afs_dir_iterate_block().
Use this to move the over-end check out of the loop since it only
needs to be done once.
Further use this to only go through the loop for the 2nd+ records
composing an entry. The only test there now is for if the record is
allocated - and we already checked the first block at the top of the
outer loop.
(3) Add a max name length check in afs_dir_iterate_block().
(4) Make afs_edit_dir_add() and afs_edit_dir_remove() use the function
from (1) to calculate the number of blocks rather than doing it
incorrectly themselves.
Fixes: 63a4681ff39c ("afs: Locally edit directory data for mkdir/create/unlink/...")
Fixes: ^1da177e4c3f4 ("Linux-2.6.12-rc2")
Signed-off-by: David Howells <dhowells@redhat.com>
Tested-by: Marc Dionne <marc.dionne@auristor.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi
Pull SCSI fixes from James Bottomley:
"This is a load of driver fixes (12 ufs, 1 mpt3sas, 1 cxgbi).
The big core two fixes are for power management ("block: Do not accept
any requests while suspended" and "block: Fix a race in the runtime
power management code") which finally sorts out the resume problems
we've occasionally been having.
To make the resume fix, there are seven necessary precursors which
effectively renames REQ_PREEMPT to REQ_PM, so every "special" request
in block is automatically a power management exempt one.
All of the non-PM preempt cases are removed except for the one in the
SCSI Parallel Interface (spi) domain validation which is a genuine
case where we have to run requests at high priority to validate the
bus so this becomes an autopm get/put protected request"
* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: (22 commits)
scsi: cxgb4i: Fix TLS dependency
scsi: ufs: Un-inline ufshcd_vops_device_reset function
scsi: ufs: Re-enable WriteBooster after device reset
scsi: ufs-mediatek: Use correct path to fix compile error
scsi: mpt3sas: Signedness bug in _base_get_diag_triggers()
scsi: block: Do not accept any requests while suspended
scsi: block: Remove RQF_PREEMPT and BLK_MQ_REQ_PREEMPT
scsi: core: Only process PM requests if rpm_status != RPM_ACTIVE
scsi: scsi_transport_spi: Set RQF_PM for domain validation commands
scsi: ide: Mark power management requests with RQF_PM instead of RQF_PREEMPT
scsi: ide: Do not set the RQF_PREEMPT flag for sense requests
scsi: block: Introduce BLK_MQ_REQ_PM
scsi: block: Fix a race in the runtime power management code
scsi: ufs-pci: Enable UFSHCD_CAP_RPM_AUTOSUSPEND for Intel controllers
scsi: ufs-pci: Fix recovery from hibernate exit errors for Intel controllers
scsi: ufs-pci: Ensure UFS device is in PowerDown mode for suspend-to-disk ->poweroff()
scsi: ufs-pci: Fix restore from S4 for Intel controllers
scsi: ufs-mediatek: Keep VCC always-on for specific devices
scsi: ufs: Allow regulators being always-on
scsi: ufs: Clear UAC for RPMB after ufshcd resets
...
|
|
Pull ceph fixes from Ilya Dryomov:
"A fix for an edge case in MClientRequest encoding and a couple of
trivial fixups for the new msgr2 support"
* tag 'ceph-for-5.11-rc2' of git://github.com/ceph/ceph-client:
libceph: add __maybe_unused to DEFINE_MSGR2_FEATURE
libceph: align session_key and con_secret to 16 bytes
libceph: fix auth_signature buffer allocation in secure mode
ceph: reencode gid_list when reconnecting
|