summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2022-03-17rethook: x86: Add rethook x86 implementationMasami Hiramatsu
Add rethook for x86 implementation. Most of the code has been copied from kretprobes on x86. Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org> Tested-by: Steven Rostedt (Google) <rostedt@goodmis.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/164735286243.1084943.7477055110527046644.stgit@devnote2
2022-03-17rethook: Add a generic return hookMasami Hiramatsu
Add a return hook framework which hooks the function return. Most of the logic came from the kretprobe, but this is independent from kretprobe. Note that this is expected to be used with other function entry hooking feature, like ftrace, fprobe, adn kprobes. Eventually this will replace the kretprobe (e.g. kprobe + rethook = kretprobe), but at this moment, this is just an additional hook. Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org> Tested-by: Steven Rostedt (Google) <rostedt@goodmis.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/164735285066.1084943.9259661137330166643.stgit@devnote2
2022-03-17fprobe: Add ftrace based probe APIsMasami Hiramatsu
The fprobe is a wrapper API for ftrace function tracer. Unlike kprobes, this probes only supports the function entry, but this can probe multiple functions by one fprobe. The usage is similar, user will set their callback to fprobe::entry_handler and call register_fprobe*() with probed functions. There are 3 registration interfaces, - register_fprobe() takes filtering patterns of the functin names. - register_fprobe_ips() takes an array of ftrace-location addresses. - register_fprobe_syms() takes an array of function names. The registered fprobes can be unregistered with unregister_fprobe(). e.g. struct fprobe fp = { .entry_handler = user_handler }; const char *targets[] = { "func1", "func2", "func3"}; ... ret = register_fprobe_syms(&fp, targets, ARRAY_SIZE(targets)); ... unregister_fprobe(&fp); Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org> Tested-by: Steven Rostedt (Google) <rostedt@goodmis.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/164735283857.1084943.1154436951479395551.stgit@devnote2
2022-03-17ftrace: Add ftrace_set_filter_ips functionJiri Olsa
Adding ftrace_set_filter_ips function to be able to set filter on multiple ip addresses at once. With the kprobe multi attach interface we have cases where we need to initialize ftrace_ops object with thousands of functions, so having single function diving into ftrace_hash_move_and_update_ops with ftrace_lock is faster. The functions ips are passed as unsigned long array with count. Signed-off-by: Jiri Olsa <jolsa@kernel.org> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org> Tested-by: Steven Rostedt (Google) <rostedt@goodmis.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/164735282673.1084943.18310504594134769804.stgit@devnote2
2022-03-17Merge tag 'nvme-5.18-2022-03-17' of git://git.infradead.org/nvme into ↵Jens Axboe
for-5.18/drivers Pull NVMe updates from Christoph: "Second round of nvme updates for Linux 5.18 - add lockdep annotations for in-kernel sockets (Chris Leech) - use vmalloc for ANA log buffer (Hannes Reinecke) - kerneldoc fixes (Chaitanya Kulkarni) - cleanups (Guoqing Jiang, Chaitanya Kulkarni, me) - warn about shared namespaces without multipathing (me)" * tag 'nvme-5.18-2022-03-17' of git://git.infradead.org/nvme: nvme: warn about shared namespaces without CONFIG_NVME_MULTIPATH nvme: remove nvme_alloc_request and nvme_alloc_request_qid nvme: cleanup how disk->disk_name is assigned nvmet: move the call to nvmet_ns_changed out of nvmet_ns_revalidate nvmet: use snprintf() with PAGE_SIZE in configfs nvmet: don't fold lines nvmet-rdma: fix kernel-doc warning for nvmet_rdma_device_removal nvmet-fc: fix kernel-doc warning for nvmet_fc_unregister_targetport nvmet-fc: fix kernel-doc warning for nvmet_fc_register_targetport nvme-tcp: lockdep: annotate in-kernel sockets nvme-tcp: don't fold the line nvme-tcp: don't initialize ret variable nvme-multipath: call bio_io_error in nvme_ns_head_submit_bio nvme-multipath: use vmalloc for ANA log buffer
2022-03-17block: limit request dispatch loop durationShin'ichiro Kawasaki
When IO requests are made continuously and the target block device handles requests faster than request arrival, the request dispatch loop keeps on repeating to dispatch the arriving requests very long time, more than a minute. Since the loop runs as a workqueue worker task, the very long loop duration triggers workqueue watchdog timeout and BUG [1]. To avoid the very long loop duration, break the loop periodically. When opportunity to dispatch requests still exists, check need_resched(). If need_resched() returns true, the dispatch loop already consumed its time slice, then reschedule the dispatch work and break the loop. With heavy IO load, need_resched() does not return true for 20~30 seconds. To cover such case, check time spent in the dispatch loop with jiffies. If more than 1 second is spent, reschedule the dispatch work and break the loop. [1] [ 609.691437] BUG: workqueue lockup - pool cpus=10 node=1 flags=0x0 nice=-20 stuck for 35s! [ 609.701820] Showing busy workqueues and worker pools: [ 609.707915] workqueue events: flags=0x0 [ 609.712615] pwq 0: cpus=0 node=0 flags=0x0 nice=0 active=1/256 refcnt=2 [ 609.712626] pending: drm_fb_helper_damage_work [drm_kms_helper] [ 609.712687] workqueue events_freezable: flags=0x4 [ 609.732943] pwq 0: cpus=0 node=0 flags=0x0 nice=0 active=1/256 refcnt=2 [ 609.732952] pending: pci_pme_list_scan [ 609.732968] workqueue events_power_efficient: flags=0x80 [ 609.751947] pwq 0: cpus=0 node=0 flags=0x0 nice=0 active=1/256 refcnt=2 [ 609.751955] pending: neigh_managed_work [ 609.752018] workqueue kblockd: flags=0x18 [ 609.769480] pwq 21: cpus=10 node=1 flags=0x0 nice=-20 active=3/256 refcnt=4 [ 609.769488] in-flight: 1020:blk_mq_run_work_fn [ 609.769498] pending: blk_mq_timeout_work, blk_mq_run_work_fn [ 609.769744] pool 21: cpus=10 node=1 flags=0x0 nice=-20 hung=35s workers=2 idle: 67 [ 639.899730] BUG: workqueue lockup - pool cpus=10 node=1 flags=0x0 nice=-20 stuck for 66s! [ 639.909513] Showing busy workqueues and worker pools: [ 639.915404] workqueue events: flags=0x0 [ 639.920197] pwq 0: cpus=0 node=0 flags=0x0 nice=0 active=1/256 refcnt=2 [ 639.920215] pending: drm_fb_helper_damage_work [drm_kms_helper] [ 639.920365] workqueue kblockd: flags=0x18 [ 639.939932] pwq 21: cpus=10 node=1 flags=0x0 nice=-20 active=3/256 refcnt=4 [ 639.939942] in-flight: 1020:blk_mq_run_work_fn [ 639.939955] pending: blk_mq_timeout_work, blk_mq_run_work_fn [ 639.940212] pool 21: cpus=10 node=1 flags=0x0 nice=-20 hung=66s workers=2 idle: 67 Fixes: 6e6fcbc27e778 ("blk-mq: support batching dispatch in case of io") Signed-off-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com> Cc: stable@vger.kernel.org # v5.10+ Link: https://lore.kernel.org/linux-block/20220310091649.zypaem5lkyfadymg@shindev/ Link: https://lore.kernel.org/r/20220318022641.133484-1-shinichiro.kawasaki@wdc.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-03-17Merge branch 'mirroring-for-ocelot-switches'Jakub Kicinski
Vladimir Oltean says: ==================== Mirroring for Ocelot switches This series adds support for tc-matchall (port-based) and tc-flower (flow-based) offloading of the tc-mirred action. Support has been added for both the ocelot switchdev driver and felix DSA driver. ==================== Link: https://lore.kernel.org/r/20220316204144.2679277-1-vladimir.oltean@nxp.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-03-17net: dsa: felix: add port mirroring supportVladimir Oltean
Gain support for port mirroring using tc-matchall by forwarding the calls to the ocelot switch library. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-03-17net: dsa: pass extack to dsa_switch_ops :: port_mirror_add()Vladimir Oltean
Drivers might have error messages to propagate to user space, most common being that they support a single mirror port. Propagate the netlink extack so that they can inform user space in a verbal way of their limitations. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-03-17net: mscc: ocelot: offload per-flow mirroring using tc-mirred and VCAP IS2Vladimir Oltean
Per-flow mirroring with the VCAP IS2 TCAM (in itself handled as an offload for tc-flower) is done by setting the MIRROR_ENA bit from the action vector of the filter. The packet is mirrored to the port mask configured in the ANA:ANA:MIRRORPORTS register (the same port mask as the destinations for port-based mirroring). Functionality was tested with: tc qdisc add dev swp3 clsact tc filter add dev swp3 ingress protocol ip \ flower skip_sw ip_proto icmp \ action mirred egress mirror dev swp1 and pinging through swp3, while seeing that the ICMP replies are mirrored towards swp1. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-03-17net: mscc: ocelot: establish functions for handling VCAP aux resourcesVladimir Oltean
Some VCAP filters utilize resources which are global to the switch, like for example VCAP IS2 policers take an index into a global policer pool. In commit c9a7fe1238e5 ("net: mscc: ocelot: add action of police on vcap_is2"), Xiaoliang expressed this by hooking into the low-level ocelot_vcap_filter_add_to_block() and ocelot_vcap_block_remove_filter() functions, and allocating/freeing the policers from there. Evaluating the code, there probably isn't a better place, but we'll need to do something similar for the mirror ports, and the code will start to look even more hacked up than it is right now. Create two ocelot_vcap_filter_{add,del}_aux_resources() functions to contain the madness, and pollute less the body of other functions such as ocelot_vcap_filter_add_to_block(). Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-03-17net: mscc: ocelot: add port mirroring support using tc-matchallVladimir Oltean
Ocelot switches perform port-based ingress mirroring if ANA:PORT:PORT_CFG field SRC_MIRROR_ENA is set, and egress mirroring if the port is in ANA:ANA:EMIRRORPORTS. Both ingress-mirrored and egress-mirrored frames are copied to the port mask from ANA:ANA:MIRRORPORTS. So the choice of limiting to a single mirror port via ocelot_mirror_get() and ocelot_mirror_put() may seem bizarre, but the hardware model doesn't map very well to the user space model. If the user wants to mirror the ingress of swp1 towards swp2 and the ingress of swp3 towards swp4, we'd have to program ANA:ANA:MIRRORPORTS with BIT(2) | BIT(4), and that would make swp1 be mirrored towards swp4 too, and swp3 towards swp2. But there are no tc-matchall rules to describe those actions. Now, we could offload a matchall rule with multiple mirred actions, one per desired mirror port, and force the user to stick to the multi-action rule format for subsequent matchall filters. But both DSA and ocelot have the flow_offload_has_one_action() check for the matchall offload, plus the fact that it will get cumbersome to cross-check matchall mirrors with flower mirrors (which will be added in the next patch). As a result, we limit the configuration to a single mirror port, with the possibility of lifting the restriction in the future. Frames injected from the CPU don't get egress-mirrored, since they are sent with the BYPASS bit in the injection frame header, and this bypasses the analyzer module (effectively also the mirroring logic). I don't know what to do/say about this. Functionality was tested with: tc qdisc add dev swp3 clsact tc filter add dev swp3 ingress \ matchall skip_sw \ action mirred egress mirror dev swp1 and pinging through swp3, while seeing that the ICMP replies are mirrored towards swp1. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-03-17net: mscc: ocelot: refactor policer work out of ocelot_setup_tc_cls_matchallVladimir Oltean
In preparation for adding port mirroring support to the ocelot driver, the dispatching function ocelot_setup_tc_cls_matchall() must be free of action-specific code. Move port policer creation and deletion to separate functions. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-03-17ptp: ocp: Make debugfs variables the correct bitwidthJonathan Lemon
An earlier patch mistakenly changed these variables from u32 to u16, leading to unintended truncation. Restore the original logic. Fixes: a509a7c61e3b ("ptp: ocp: Add support for selectable SMA directions.") Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Jonathan Lemon <jonathan.lemon@gmail.com> Link: https://lore.kernel.org/r/20220316165347.599154-1-jonathan.lemon@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-03-17net: dsa: microchip: ksz8795: handle eee specif erratumOleksij Rempel
According to erratum described in DS80000687C[1]: "Module 2: Link drops with some EEE link partners.", we need to "Disable the EEE next page exchange in EEE Global Register 2" 1 - https://ww1.microchip.com/downloads/en/DeviceDoc/KSZ87xx-Errata-DS80000687C.pdf Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de> Reviewed-by: Vladimir Oltean <olteanv@gmail.com> Link: https://lore.kernel.org/r/20220316125529.1489045-1-o.rempel@pengutronix.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-03-17Merge branch 'net-bridge-multiple-spanning-trees'Jakub Kicinski
Tobias Waldekranz says: ==================== net: bridge: Multiple Spanning Trees The bridge has had per-VLAN STP support for a while now, since: https://lore.kernel.org/netdev/20200124114022.10883-1-nikolay@cumulusnetworks.com/ The current implementation has some problems: - The mapping from VLAN to STP state is fixed as 1:1, i.e. each VLAN is managed independently. This is awkward from an MSTP (802.1Q-2018, Clause 13.5) point of view, where the model is that multiple VLANs are grouped into MST instances. Because of the way that the standard is written, presumably, this is also reflected in hardware implementations. It is not uncommon for a switch to support the full 4k range of VIDs, but that the pool of MST instances is much smaller. Some examples: Marvell LinkStreet (mv88e6xxx): 4k VLANs, but only 64 MSTIs Marvell Prestera: 4k VLANs, but only 128 MSTIs Microchip SparX-5i: 4k VLANs, but only 128 MSTIs - By default, the feature is enabled, and there is no way to disable it. This makes it hard to add offloading in a backwards compatible way, since any underlying switchdevs have no way to refuse the function if the hardware does not support it - The port-global STP state has precedence over per-VLAN states. In MSTP, as far as I understand it, all VLANs will use the common spanning tree (CST) by default - through traffic engineering you can then optimize your network to group subsets of VLANs to use different trees (MSTI). To my understanding, the way this is typically managed in silicon is roughly: Incoming packet: .----.----.--------------.----.------------- | DA | SA | 802.1Q VID=X | ET | Payload ... '----'----'--------------'----'------------- | '->|\ .----------------------------. | +--> | VID | Members | ... | MSTI | PVID -->|/ |-----|---------|-----|------| | 1 | 0001001 | ... | 0 | | 2 | 0001010 | ... | 10 | | 3 | 0001100 | ... | 10 | '----------------------------' | .-----------------------------' | .------------------------. '->| MSTI | Fwding | Lrning | |------|--------|--------| | 0 | 111110 | 111110 | | 10 | 110111 | 110111 | '------------------------' What this is trying to show is that the STP state (whether MSTP is used, or ye olde STP) is always accessed via the VLAN table. If STP is running, all MSTI pointers in that table will reference the same index in the STP stable - if MSTP is running, some VLANs may point to other trees (like in this example). The fact that in the Linux bridge, the global state (think: index 0 in most hardware implementations) is supposed to override the per-VLAN state, is very awkward to offload. In effect, this means that when the global state changes to blocking, drivers will have to iterate over all MSTIs in use, and alter them all to match. This also means that you have to cache whether the hardware state is currently tracking the global state or the per-VLAN state. In the first case, you also have to cache the per-VLAN state so that you can restore it if the global state transitions back to forwarding. This series adds a new mst_enable bridge setting (as suggested by Nik) that can only be changed when no VLANs are configured on the bridge. Enabling this mode has the following effect: - The port-global STP state is used to represent the CST (Common Spanning Tree) (1/15) - Ingress STP filtering is deferred until the frame's VLAN has been resolved (1/15) - The preexisting per-VLAN states can no longer be controlled directly (1/15). They are instead placed under the MST module's control, which is managed using a new netlink interface (described in 3/15) - VLANs can br mapped to MSTIs in an arbitrary M:N fashion, using a new global VLAN option (2/15) Switchdev notifications are added so that a driver can track: - MST enabled state - VID to MSTI mappings - MST port states An offloading implementation is this provided for mv88e6xxx. ==================== Link: https://lore.kernel.org/r/20220316150857.2442916-1-tobias@waldekranz.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-03-17net: dsa: mv88e6xxx: MST OffloadingTobias Waldekranz
Allocate a SID in the STU for each MSTID in use by a bridge and handle the mapping of MSTIDs to VLANs using the SID field of each VTU entry. Signed-off-by: Tobias Waldekranz <tobias@waldekranz.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-03-17net: dsa: mv88e6xxx: Export STU as devlink regionTobias Waldekranz
Export the raw STU data in a devlink region so that it can be inspected from userspace and compared to the current bridge configuration. Signed-off-by: Tobias Waldekranz <tobias@waldekranz.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-03-17net: dsa: mv88e6xxx: Disentangle STU from VTUTobias Waldekranz
In early LinkStreet silicon (e.g. 6095/6185), the per-VLAN STP states were kept in the VTU - there was no concept of a SID. Later, the information was split into two tables, where the VTU only tracked memberships and deferred the STP state tracking to the STU via a pointer (SID). This meant that a group of VLANs could share the same STU entry. Most likely, this was done to align with MSTP (802.1Q-2018, Clause 13), which is built on this principle. While the VTU is still 4k lines on most devices, the STU is capped at 64 entries. This means that the current stategy, updating STU info whenever a VTU entry is updated, can not easily support MSTP because: - The maximum number of VIDs would also be capped at 64, as we would have to allocate one SID for every VTU entry - even if many VLANs would effectively share the same MST. - MSTP updates would be unnecessarily slow as you would have to iterate over all VLANs that share the same MST. In order to support MSTP offloading in the future, manage the STU as a separate entity from the VTU. Only add support for newer hardware with separate VTU and STU. VTU-only devices can also be supported, but essentially this requires a software implementation of an STU (fanning out state changed to all VLANs tied to the same MST). Signed-off-by: Tobias Waldekranz <tobias@waldekranz.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-03-17net: dsa: Handle MST state changesTobias Waldekranz
Add the usual trampoline functionality from the generic DSA layer down to the drivers for MST state changes. When a state changes to disabled/blocking/listening, make sure to fast age any dynamic entries in the affected VLANs (those controlled by the MSTI in question). Signed-off-by: Tobias Waldekranz <tobias@waldekranz.com> Reviewed-by: Vladimir Oltean <olteanv@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-03-17net: dsa: Pass VLAN MSTI migration notifications to driverTobias Waldekranz
Add the usual trampoline functionality from the generic DSA layer down to the drivers for VLAN MSTI migrations. Signed-off-by: Tobias Waldekranz <tobias@waldekranz.com> Reviewed-by: Vladimir Oltean <olteanv@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-03-17net: dsa: Validate hardware support for MSTTobias Waldekranz
When joining a bridge where MST is enabled, we validate that the proper offloading support is in place, otherwise we fallback to software bridging. When then mode is changed on a bridge in which we are members, we refuse the change if offloading is not supported. At the moment we only check for configurable learning, but this will be further restricted as we support more MST related switchdev events. Signed-off-by: Tobias Waldekranz <tobias@waldekranz.com> Reviewed-by: Vladimir Oltean <olteanv@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-03-17net: bridge: mst: Add helper to query a port's MST stateTobias Waldekranz
This is useful for switchdev drivers who are offloading MST states into hardware. As an example, a driver may wish to flush the FDB for a port when it transitions from forwarding to blocking - which means that the previous state must be discoverable. Signed-off-by: Tobias Waldekranz <tobias@waldekranz.com> Reviewed-by: Vladimir Oltean <olteanv@gmail.com> Acked-by: Nikolay Aleksandrov <razor@blackwall.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-03-17net: bridge: mst: Add helper to check if MST is enabledTobias Waldekranz
This is useful for switchdev drivers that might want to refuse to join a bridge where MST is enabled, if the hardware can't support it. Signed-off-by: Tobias Waldekranz <tobias@waldekranz.com> Reviewed-by: Vladimir Oltean <olteanv@gmail.com> Acked-by: Nikolay Aleksandrov <razor@blackwall.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-03-17net: bridge: mst: Add helper to map an MSTI to a VID setTobias Waldekranz
br_mst_get_info answers the question: "On this bridge, which VIDs are mapped to the given MSTI?" This is useful in switchdev drivers, which might have to fan-out operations, relating to an MSTI, per VLAN. An example: When a port's MST state changes from forwarding to blocking, a driver may choose to flush the dynamic FDB entries on that port to get faster reconvergence of the network, but this should only be done in the VLANs that are managed by the MSTI in question. Signed-off-by: Tobias Waldekranz <tobias@waldekranz.com> Reviewed-by: Vladimir Oltean <olteanv@gmail.com> Acked-by: Nikolay Aleksandrov <razor@blackwall.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-03-17net: bridge: mst: Notify switchdev drivers of MST state changesTobias Waldekranz
Generate a switchdev notification whenever an MST state changes. This notification is keyed by the VLANs MSTI rather than the VID, since multiple VLANs may share the same MST instance. Signed-off-by: Tobias Waldekranz <tobias@waldekranz.com> Acked-by: Nikolay Aleksandrov <razor@blackwall.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-03-17net: bridge: mst: Notify switchdev drivers of VLAN MSTI migrationsTobias Waldekranz
Whenever a VLAN moves to a new MSTI, send a switchdev notification so that switchdevs can track a bridge's VID to MSTI mappings. Signed-off-by: Tobias Waldekranz <tobias@waldekranz.com> Acked-by: Nikolay Aleksandrov <razor@blackwall.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-03-17net: bridge: mst: Notify switchdev drivers of MST mode changesTobias Waldekranz
Trigger a switchdev event whenever the bridge's MST mode is enabled/disabled. This allows constituent ports to either perform any required hardware config, or refuse the change if it not supported. Signed-off-by: Tobias Waldekranz <tobias@waldekranz.com> Acked-by: Nikolay Aleksandrov <razor@blackwall.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-03-17net: bridge: mst: Support setting and reporting MST port statesTobias Waldekranz
Make it possible to change the port state in a given MSTI by extending the bridge port netlink interface (RTM_SETLINK on PF_BRIDGE).The proposed iproute2 interface would be: bridge mst set dev <PORT> msti <MSTI> state <STATE> Current states in all applicable MSTIs can also be dumped via a corresponding RTM_GETLINK. The proposed iproute interface looks like this: $ bridge mst port msti vb1 0 state forwarding 100 state disabled vb2 0 state forwarding 100 state forwarding The preexisting per-VLAN states are still valid in the MST mode (although they are read-only), and can be queried as usual if one is interested in knowing a particular VLAN's state without having to care about the VID to MSTI mapping (in this example VLAN 20 and 30 are bound to MSTI 100): $ bridge -d vlan port vlan-id vb1 10 state forwarding mcast_router 1 20 state disabled mcast_router 1 30 state disabled mcast_router 1 40 state forwarding mcast_router 1 vb2 10 state forwarding mcast_router 1 20 state forwarding mcast_router 1 30 state forwarding mcast_router 1 40 state forwarding mcast_router 1 Signed-off-by: Tobias Waldekranz <tobias@waldekranz.com> Acked-by: Nikolay Aleksandrov <razor@blackwall.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-03-17net: bridge: mst: Allow changing a VLAN's MSTITobias Waldekranz
Allow a VLAN to move out of the CST (MSTI 0), to an independent tree. The user manages the VID to MSTI mappings via a global VLAN setting. The proposed iproute2 interface would be: bridge vlan global set dev br0 vid <VID> msti <MSTI> Changing the state in non-zero MSTIs is still not supported, but will be addressed in upcoming changes. Signed-off-by: Tobias Waldekranz <tobias@waldekranz.com> Acked-by: Nikolay Aleksandrov <razor@blackwall.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-03-17net: bridge: mst: Multiple Spanning Tree (MST) modeTobias Waldekranz
Allow the user to switch from the current per-VLAN STP mode to an MST mode. Up to this point, per-VLAN STP states where always isolated from each other. This is in contrast to the MSTP standard (802.1Q-2018, Clause 13.5), where VLANs are grouped into MST instances (MSTIs), and the state is managed on a per-MSTI level, rather that at the per-VLAN level. Perhaps due to the prevalence of the standard, many switching ASICs are built after the same model. Therefore, add a corresponding MST mode to the bridge, which we can later add offloading support for in a straight-forward way. For now, all VLANs are fixed to MSTI 0, also called the Common Spanning Tree (CST). That is, all VLANs will follow the port-global state. Upcoming changes will make this actually useful by allowing VLANs to be mapped to arbitrary MSTIs and allow individual MSTI states to be changed. Signed-off-by: Tobias Waldekranz <tobias@waldekranz.com> Acked-by: Nikolay Aleksandrov <razor@blackwall.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-03-17nfsd: use correct format charactersBill Wendling
When compiling with -Wformat, clang emits the following warnings: fs/nfsd/flexfilelayout.c:120:27: warning: format specifies type 'unsigned char' but the argument has type 'int' [-Wformat] "%s.%hhu.%hhu", addr, port >> 8, port & 0xff); ~~~~ ^~~~~~~~~ %d fs/nfsd/flexfilelayout.c:120:38: warning: format specifies type 'unsigned char' but the argument has type 'int' [-Wformat] "%s.%hhu.%hhu", addr, port >> 8, port & 0xff); ~~~~ ^~~~~~~~~~~ %d The types of these arguments are unconditionally defined, so this patch updates the format character to the correct ones for ints and unsigned ints. Link: https://github.com/ClangBuiltLinux/linux/issues/378 Signed-off-by: Bill Wendling <morbo@google.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Reviewed-by: Tom Haynes <loghyr@hammerspace.com>
2022-03-17r8169: improve driver unload and system shutdown behavior on DASH-enabled ↵Heiner Kallweit
systems There's a number of systems supporting DASH remote management. Driver unload and system shutdown can result in the PHY suspending, thus making DASH unusable. Improve this by handling DASH being enabled very similar to WoL being enabled. Tested-by: Yanko Kaneti <yaneti@declera.com> Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Link: https://lore.kernel.org/r/1de3b176-c09c-1654-6f00-9785f7a4f954@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-03-17Merge branch '100GbE' of ↵Jakub Kicinski
git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue Tony Nguyen says: ==================== 100GbE Intel Wired LAN Driver Updates 2022-03-16 This series contains updates to gtp and ice driver. Wojciech fixes smatch reported inconsistent indenting for gtp and ice. Yang Yingliang fixes a couple of return value checks for GNSS to IS_PTR instead of null. Jacob adds support for trace events on tx timestamps. * '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue: ice: add trace events for tx timestamps ice: fix return value check in ice_gnss.c ice: Fix inconsistent indenting in ice_switch gtp: Fix inconsistent indenting ==================== Link: https://lore.kernel.org/r/20220316204024.3201500-1-anthony.l.nguyen@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-03-17ethernet: sun: Fix spelling mistake "mis-matched" -> "mismatched"Colin Ian King
There is a spelling mistake in a dev_err message. Fix it. Signed-off-by: Colin Ian King <colin.i.king@gmail.com> Link: https://lore.kernel.org/r/20220316234620.55885-1-colin.i.king@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-03-17net: ethernet: ti: Fix spelling mistake and clean up messageColin Ian King
There is a spelling mistake in a dev_err message and the MAX_SKB_FRAGS value does not need to be printed between parentheses. Fix this. Signed-off-by: Colin Ian King <colin.i.king@gmail.com> Link: https://lore.kernel.org/r/20220316233455.54541-1-colin.i.king@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-03-17vlan: use correct format charactersBill Wendling
When compiling with -Wformat, clang emits the following warning: net/8021q/vlanproc.c:284:22: warning: format specifies type 'unsigned short' but the argument has type 'int' [-Wformat] mp->priority, ((mp->vlan_qos >> 13) & 0x7)); ^~~~~~~~~~~~~~~~~~~~~~~~~~~~ The types of these arguments are unconditionally defined, so this patch updates the format character to the correct ones for ints and unsigned ints. Link: https://github.com/ClangBuiltLinux/linux/issues/378 Signed-off-by: Bill Wendling <morbo@google.com> Link: https://lore.kernel.org/r/20220316213125.2353370-1-morbo@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-03-17net/fsl: xgmac_mdio: use correct format charactersBill Wendling
When compiling with -Wformat, clang emits the following warning: drivers/net/ethernet/freescale/xgmac_mdio.c:243:22: warning: format specifies type 'unsigned char' but the argument has type 'int' [-Wformat] phy_id, dev_addr, regnum); ^~~~~~ ./include/linux/dev_printk.h:163:47: note: expanded from macro 'dev_dbg' dev_printk(KERN_DEBUG, dev, dev_fmt(fmt), ##__VA_ARGS__); \ ~~~ ^~~~~~~~~~~ ./include/linux/dev_printk.h:129:34: note: expanded from macro 'dev_printk' _dev_printk(level, dev, fmt, ##__VA_ARGS__); \ ~~~ ^~~~~~~~~~~ The types of these arguments are unconditionally defined, so this patch updates the format character to the correct ones for ints and unsigned ints. Link: https://github.com/ClangBuiltLinux/linux/issues/378 Signed-off-by: Bill Wendling <morbo@google.com> Link: https://lore.kernel.org/r/20220316213114.2352352-1-morbo@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-03-17bnx2x: use correct format charactersBill Wendling
When compiling with -Wformat, clang emits the following warnings: drivers/net/ethernet/broadcom/bnx2x/bnx2x_link.c:6181:40: warning: format specifies type 'unsigned short' but the argument has type 'u32' (aka 'unsigned int') [-Wformat] ret = scnprintf(str, *len, "%hx.%hx", num >> 16, num); ~~~ ^~~~~~~~~ %x drivers/net/ethernet/broadcom/bnx2x/bnx2x_link.c:6181:51: warning: format specifies type 'unsigned short' but the argument has type 'u32' (aka 'unsigned int') [-Wformat] ret = scnprintf(str, *len, "%hx.%hx", num >> 16, num); ~~~ ^~~ %x drivers/net/ethernet/broadcom/bnx2x/bnx2x_link.c:6196:47: warning: format specifies type 'unsigned char' but the argument has type 'u32' (aka 'unsigned int') [-Wformat] ret = scnprintf(str, *len, "%hhx.%hhx.%hhx", num >> 16, num >> 8, num); ~~~~ ^~~~~~~~~ %x drivers/net/ethernet/broadcom/bnx2x/bnx2x_link.c:6196:58: warning: format specifies type 'unsigned char' but the argument has type 'u32' (aka 'unsigned int') [-Wformat] ret = scnprintf(str, *len, "%hhx.%hhx.%hhx", num >> 16, num >> 8, num); ~~~~ ^~~~~~~~ %x drivers/net/ethernet/broadcom/bnx2x/bnx2x_link.c:6196:68: warning: format specifies type 'unsigned char' but the argument has type 'u32' (aka 'unsigned int') [-Wformat] ret = scnprintf(str, *len, "%hhx.%hhx.%hhx", num >> 16, num >> 8, num); ~~~~ ^~~ %x The types of these arguments are unconditionally defined, so this patch updates the format character to the correct ones for ints and unsigned ints. Link: https://github.com/ClangBuiltLinux/linux/issues/378 Signed-off-by: Bill Wendling <morbo@google.com> Link: https://lore.kernel.org/r/20220316213104.2351651-1-morbo@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-03-17enetc: use correct format charactersBill Wendling
When compiling with -Wformat, clang emits the following warning: drivers/net/ethernet/freescale/enetc/enetc_mdio.c:151:22: warning: format specifies type 'unsigned char' but the argument has type 'int' [-Wformat] phy_id, dev_addr, regnum); ^~~~~~ ./include/linux/dev_printk.h:163:47: note: expanded from macro 'dev_dbg' dev_printk(KERN_DEBUG, dev, dev_fmt(fmt), ##__VA_ARGS__); \ ~~~ ^~~~~~~~~~~ ./include/linux/dev_printk.h:129:34: note: expanded from macro 'dev_printk' _dev_printk(level, dev, fmt, ##__VA_ARGS__); \ ~~~ ^~~~~~~~~~~ The types of these arguments are unconditionally defined, so this patch updates the format character to the correct ones for ints and unsigned ints. Link: https://github.com/ClangBuiltLinux/linux/issues/378 Signed-off-by: Bill Wendling <morbo@google.com> Reviewed-by: Claudiu Manoil <claudiu.manoil@nxp.com> Link: https://lore.kernel.org/r/20220316213109.2352015-1-morbo@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-03-17io_uring: manage provided buffers strictly orderedJens Axboe
Workloads using provided buffers benefit from using and returning buffers in the right order, and so does TLBs for that matter. Manage the internal buffer list in a straight list, rather than use the head buffer as the insertion node. Use a hashed list for the buffer group IDs instead of xarray, the overhead is much lower this way. xarray provides internal locking and other trickery that is handy for some uses cases, but io_uring already locks internally for the buffer manipulation and needs none of that. This is good for about a 2% reduction in overhead, combination of the improved management and the fact that the workload has an easier time bundling back provided buffers. Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-03-17selftests/bpf: Fix tunnel remote IP commentsKaixi Fan
In namespace at_ns0, the IP address of tnl dev is 10.1.1.100 which is the overlay IP, and the ip address of veth0 is 172.16.1.100 which is the vtep IP. When doing 'ping 10.1.1.100' from root namespace, the remote_ip should be 172.16.1.100. Fixes: 933a741e3b82 ("selftests/bpf: bpf tunnel test.") Signed-off-by: Kaixi Fan <fankaixi.li@bytedance.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: John Fastabend <john.fastabend@gmail.com> Link: https://lore.kernel.org/bpf/20220313164116.5889-1-fankaixi.li@bytedance.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-03-17parisc: Improve CPU socket and core bootup info textHelge Deller
Improve CPU bootup info text from: CPU1: thread -1, cpu 0, socket 1 to CPU1: cpu core 0 of socket 1 Signed-off-by: Helge Deller <deller@gmx.de>
2022-03-17parisc: Enable ARCH_HAS_DEBUG_VM_PGTABLEHelge Deller
Allow to enable page table boot-up checks. Suggested-by: Rolf Eike Beer <eike-kernel@sf-tec.de> Signed-off-by: Helge Deller <deller@gmx.de>
2022-03-17Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski
No conflicts. Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-03-17Merge tag 'net-5.17-final' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Pull networking fixes from Jakub Kicinski: "Including fixes from netfilter, ipsec, and wireless. A few last minute revert / disable and fix patches came down from our sub-trees. We're not waiting for any fixes at this point. Current release - regressions: - Revert "netfilter: nat: force port remap to prevent shadowing well-known ports", restore working conntrack on asymmetric paths - Revert "ath10k: drop beacon and probe response which leak from other channel", restore working AP and mesh mode on QCA9984 - eth: intel: fix hang during reboot/shutdown Current release - new code bugs: - netfilter: nf_tables: disable register tracking, it needs more work to cover all corner cases Previous releases - regressions: - ipv6: fix skb_over_panic in __ip6_append_data when (admin-only) extension headers get specified - esp6: fix ESP over TCP/UDP, interpret ipv6_skip_exthdr's return value more selectively - bnx2x: fix driver load failure when FW not present in initrd Previous releases - always broken: - vsock: stop destroying unrelated sockets in nested virtualization - packet: fix slab-out-of-bounds access in packet_recvmsg() Misc: - add Paolo Abeni to networking maintainers!" * tag 'net-5.17-final' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (26 commits) iavf: Fix hang during reboot/shutdown net: mscc: ocelot: fix backwards compatibility with single-chain tc-flower offload net: bcmgenet: skip invalid partial checksums bnx2x: fix built-in kernel driver load failure net: phy: mscc: Add MODULE_FIRMWARE macros net: dsa: Add missing of_node_put() in dsa_port_parse_of net: handle ARPHRD_PIMREG in dev_is_mac_header_xmit() Revert "ath10k: drop beacon and probe response which leak from other channel" hv_netvsc: Add check for kvmalloc_array iavf: Fix double free in iavf_reset_task ice: destroy flow director filter mutex after releasing VSIs ice: fix NULL pointer dereference in ice_update_vsi_tx_ring_stats() Add Paolo Abeni to networking maintainers atm: eni: Add check for dma_map_single net/packet: fix slab-out-of-bounds access in packet_recvmsg() net: mdio: mscc-miim: fix duplicate debugfs entry net: phy: marvell: Fix invalid comparison in the resume and suspend functions esp6: fix check on ipv6_skip_exthdr's return value net: dsa: microchip: add spi_device_id tables netfilter: nf_tables: disable register tracking ...
2022-03-17Merge tag 'acpi-5.17-rc9' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull ACPI fix from Rafael Wysocki: "Revert recent commit that caused multiple systems to misbehave due to firmware issues" * tag 'acpi-5.17-rc9' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: Revert "ACPI: scan: Do not add device IDs from _CID if _HID is not valid"
2022-03-17Merge branch 'akpm' (patches from Andrew)Linus Torvalds
Merge misc fixes from Andrew Morton: "Four patches. Subsystems affected by this patch series: mm/swap, kconfig, ocfs2, and selftests" * emailed patches from Andrew Morton <akpm@linux-foundation.org>: selftests: vm: fix clang build error multiple output files ocfs2: fix crash when initialize filecheck kobj fails configs/debug: restore DEBUG_INFO=y for overriding mm: swap: get rid of livelock in swapin readahead
2022-03-17veth: Allow jumbo frames in xdp modeLorenzo Bianconi
Allow increasing the MTU over page boundaries on veth devices if the attached xdp program declares to support xdp fragments. Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Toke Høiland-Jørgensen <toke@redhat.com> Acked-by: John Fastabend <john.fastabend@gmail.com> Link: https://lore.kernel.org/bpf/d5dc039c3d4123426e7023a488c449181a7bc57f.1646989407.git.lorenzo@kernel.org
2022-03-17veth: Rework veth_xdp_rcv_skb in order to accept non-linear skbLorenzo Bianconi
Introduce veth_convert_skb_to_xdp_buff routine in order to convert a non-linear skb into a xdp buffer. If the received skb is cloned or shared, veth_convert_skb_to_xdp_buff will copy it in a new skb composed by order-0 pages for the linear and the fragmented area. Moreover veth_convert_skb_to_xdp_buff guarantees we have enough headroom for xdp. This is a preliminary patch to allow attaching xdp programs with frags support on veth devices. Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Toke Høiland-Jørgensen <toke@redhat.com> Acked-by: John Fastabend <john.fastabend@gmail.com> Link: https://lore.kernel.org/bpf/8d228b106bc1903571afd1d77e797bffe9a5ea7c.1646989407.git.lorenzo@kernel.org