Age | Commit message (Collapse) | Author |
|
System76 oryp5
On the System76 Oryx Pro (oryp5), there is a headset microphone input
attached to 0x19 that does not have a jack detect. In order to get it
working, the pin configuration needs to be set correctly, and the
ALC269_FIXUP_HEADSET_MODE_NO_HP_MIC fixup needs to be applied. This is
similar to the MIC_NO_PRESENCE fixups for some Dell laptops, except we
have a separate microphone jack that is already configured correctly.
Since the ALC1220 does not have a fixup similar to
ALC269_FIXUP_HEADSET_MODE_NO_HP_MIC, I have exposed the fixup from the
ALC269 in a way that it can be accessed from the
alc1220_fixup_system76_oryp5 function. In addition, the
alc1220_fixup_clevo_p950 needs to be applied to gain speaker output.
Signed-off-by: Jeremy Soller <jeremy@system76.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
|
|
Ido Schimmel says:
====================
mlxsw: hwmon and thermal extensions
Vadim says:
This patchset contains various improvements to hwmon and thermal code in
mlxsw. The most significant improvement is the ability to read modules'
temperature attributes (input, fault, critical and emergency thresholds)
as well as fans' fault indication. These new attributes will improve the
ability to monitor the system.
Patches #1-#4 add the necessary device registers and APIs to read
modules' temperature attributes and fans' fault indication.
Patches #5-#8 perform small improvements in hwmon and thermal code such
as using a more indicative name for cooling devices.
Patch #9 exposes fans' fault indication via hwmon.
Patch #10 exposes modules' temperature attributes via hwmon.
Patch #11 adds an hwmon label to modules' temperature sensor. This helps
to parse the output of utilities such as "sensors".
Patch #12 allows to bind an external cooling device ("mlxreg-fan") to
mlxsw thermal zone. This will allow the mlxsw thermal zone to change the
cooling level of cooling devices not programmed via switch registers.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Allow thermal zone binding to an external cooling device from the
cooling devices white list.
It provides support for Mellanox next generation systems on which
cooling device logic is not controlled through the switch registers.
Signed-off-by: Vadim Pasternak <vadimp@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Add label attribute to hwmon object for exposing QSFP module's
temperature sensor name. Modules are labeled as "front panel xxx". The
label is used by utilities such as "sensors":
front panel 001: +0.0C (crit = +0.0C, emerg = +0.0C)
..
front panel 020: +31.0C (crit = +70.0C, emerg = +80.0C)
..
front panel 056: +41.0C (crit = +70.0C, emerg = +80.0C)
Signed-off-by: Vadim Pasternak <vadimp@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Add new attributes to hwmon object for exposing QSFP module temperature
input, fault indication, critical and emergency thresholds. Temperature
input and fault indication are read from Management Temperature Bulk
Register. Temperature thresholds are read from Management Cable Info
Access Register.
Signed-off-by: Vadim Pasternak <vadimp@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Add new fan hwmon attribute for exposing fan faults (fault indication is
read from Fan Out of Range Event Register).
Signed-off-by: Vadim Pasternak <vadimp@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Rename cooling device from "Fan" to "mlxsw_fan". Name "Fan" is too
common name, and such name is misleading, while it's interpreted by
user. For example name "Fan" could be used by ACPI.
Signed-off-by: Vadim Pasternak <vadimp@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Replace thermal hardcoded temperature trip values with defines.
Signed-off-by: Vadim Pasternak <vadimp@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Modify thermal zone trip points setting for better alignment with system
thermal requirement.
Add hysteresis thresholds for thermal trips in order to avoid throttling
around thermal trip point. If hysteresis temperature is not considered,
PWM can have side effect of flip up/down on thermal trip point boundary.
Signed-off-by: Vadim Pasternak <vadimp@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
capability
Add low frequency bus capability in order to allow core functionality
separation based on bus type. Driver could run over PCIe, which is
considered as high frequency bus or I2C, which is considered as low
frequency bus. In the last case time setting, for example, for thermal
polling interval, should be increased.
Use different thermal monitoring based on bus type. For I2C bus time is
set to 20 seconds, while for PCIe 1 second polling interval is used.
Signed-off-by: Vadim Pasternak <vadimp@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Add new API to read QSFP module's temperature thresholds - warning and
critical.
New internal API reads the temperature thresholds from the modules,
which are equipped with the thermal sensor. These thresholds will be
exposed via hwmon subsystem.
Signed-off-by: Vadim Pasternak <vadimp@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Add FORE (Fan Out of Range Event Register), which is used for fan fault
reading.
Signed-off-by: Vadim Pasternak <vadimp@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Add MTBR (Management Temperature Bulk Register), which is used for port
temperature reading in a bulk mode.
Signed-off-by: Vadim Pasternak <vadimp@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Move QSFP EEPROM definitions to common location from the spectrum driver
in order to make them available for other mlxsw modules. They are common
for all kind of chips and have relation to SFF specifications 8024,
8436, 8472, 8636, rather than to chip type.
Signed-off-by: Vadim Pasternak <vadimp@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
git://git.open-mesh.org/linux-merge
Simon Wunderlich says:
====================
This feature/cleanup patchset includes the following patches:
- fix memory leak in in batadv_dat_put_dhcp, by Martin Weinelt
- fix typo, by Sven Eckelmann
- netlink restructuring patch series (part 2), by Sven Eckelmann
(19 patches)
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
We need to set the error message on this path otherwise some of the
callers, such as test_hints_case(), print from an uninitialized pointer.
We had a similar bug earlier and set "errmsg" to NULL in the caller,
test_delta_action_item(). That code is no longer required so I have
removed it.
Fixes: 9069a3817d82 ("lib: objagg: implement optimization hints assembly and use hints for object creation")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
There is a typo here. We intended to check "objagg2" but we instead
test "objagg" which is not an error pointer.
Fixes: 9069a3817d82 ("lib: objagg: implement optimization hints assembly and use hints for object creation")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
We need to set the error code on this path otherwise we return
ERR_PTR(0) which would result in a NULL dereference in the caller.
Fixes: 9069a3817d82 ("lib: objagg: implement optimization hints assembly and use hints for object creation")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The value of ->num_ports comes from bcm_sf2_sw_probe() and it is less
than or equal to DSA_MAX_PORTS. The ds->ports[] array is used inside
the dsa_is_user_port() and dsa_is_cpu_port() functions. The ds->ports[]
array is allocated in dsa_switch_alloc() and it has ds->num_ports
elements so this leads to a static checker warning about a potential out
of bounds read.
Fixes: 8cfa94984c9c ("net: dsa: bcm_sf2: add suspend/resume callbacks")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Reviewed-by: Vivien Didelot <vivien.didelot@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
CR4_QSFP 10G Speed technology should be 10000baseKR_Full
And also report available FEC modes.
Signed-off-by: Vishal Kulkarni <vishal@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Jesper Dangaard Brouer says:
====================
Fix page_pool API and dma address storage
As pointed out by David Miller in [1] the current page_pool implementation
stores dma_addr_t in page->private. This won't work on 32-bit platforms with
64-bit DMA addresses since the page->private is an unsigned long and the
dma_addr_t a u64.
Since no driver is yet using the DMA mapping capabilities of the API let's
fix this by storing the information in 'struct page' and use that to store
and retrieve DMA addresses from network drivers.
As long as the addresses returned from dma_map_page() are aligned the first
bit, used by the compound pages code should not be set.
Ilias tested the first two patches on Espressobin driver mvneta, for which
we have patches for using the DMA API of page_pool.
[1]: https://lore.kernel.org/netdev/20181207.230655.1261252486319967024.davem@davemloft.net/
====================
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Ilias Apalodimas <ilias.apalodimas@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
As pointed out by Alexander Duyck, the DMA mapping done in page_pool needs
to use the DMA attribute DMA_ATTR_SKIP_CPU_SYNC.
As the principle behind page_pool keeping the pages mapped is that the
driver takes over the DMA-sync steps.
Reported-by: Alexander Duyck <alexander.duyck@gmail.com>
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Ilias Apalodimas <ilias.apalodimas@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
As pointed out by David Miller the current page_pool implementation
stores dma_addr_t in page->private.
This won't work on 32-bit platforms with 64-bit DMA addresses since the
page->private is an unsigned long and the dma_addr_t a u64.
A previous patch is adding dma_addr_t on struct page to accommodate this.
This patch adapts the page_pool related functions to use the newly added
struct for storing and retrieving DMA addresses from network drivers.
Signed-off-by: Ilias Apalodimas <ilias.apalodimas@linaro.org>
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The page_pool API is using page->private to store DMA addresses.
As pointed out by David Miller we can't use that on 32-bit architectures
with 64-bit DMA
This patch adds a new dma_addr_t struct to allow storing DMA addresses
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Ilias Apalodimas <ilias.apalodimas@linaro.org>
Acked-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Remove duplicated include.
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
With the introduction of flow_stats_update(), drivers now update the stats
fields of the passed tc_cls_flower_offload struct, rather than call
tcf_exts_stats_update() directly to update the stats of offloaded TC
flower rules. However, if multiple qdiscs are registered to a TC shared
block and a flower rule is applied, then, when getting stats for the rule,
multiple callbacks may be made.
Take this into consideration by modifying flow_stats_update to gather the
stats from all callbacks. Currently, the values in tc_cls_flower_offload
only account for the last stats callback in the list.
Fixes: 3b1903ef97c0 ("flow_offload: add statistics retrieval infrastructure and use it")
Signed-off-by: John Hurley <john.hurley@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Acked-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Recently introduced tc_setup_flow_action() can fail when parsing tcf_exts
on some unsupported action commands. However, this should not affect the
case when user did not explicitly request hw offload by setting skip_sw
flag. Modify tc_setup_flow_action() callers to only propagate the error if
skip_sw flag is set for filter that is being offloaded, and set extack
error message in that case.
Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Fixes: 3a7b68617de7 ("cls_api: add translator to flow_action representation")
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
With many active TCP sockets, fat TCP sockets could fool
__sk_mem_raise_allocated() thanks to an overflow.
They would increase their share of the memory, instead
of decreasing it.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
There are some statements that are indented incorrectly. Fix these.
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
There are some statements in an if-block that are not correctly
indented. Fix these.
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
profiles
dev_consume_skb_irq() should be called in eth_txdone_irq() when skb
xmit done. It makes drop profiles(dropwatch, perf) more friendly.
Signed-off-by: Yang Wei <yang.wei9@zte.com.cn>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
dev_consume_skb_irq() should be called in at91ether_interrupt() when
skb xmit done. It makes drop profiles(dropwatch, perf) more friendly.
Signed-off-by: Yang Wei <yang.wei9@zte.com.cn>
Reviewed-by: Claudiu Beznea <claudiu.beznea@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
dev_consume_skb_irq() should be called when skb xmit done. It makes
drop profiles(dropwatch, perf) more friendly.
Signed-off-by: Yang Wei <yang.wei9@zte.com.cn>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
dev_consume_skb_irq() should be called in intr_handler() when skb
xmit done. It makes drop profiles(dropwatch, perf) more friendly.
Signed-off-by: Yang Wei <yang.wei9@zte.com.cn>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
dev_consume_skb_irq() should be called in moxart_tx_finished() when
skb xmit done. It makes drop profiles(dropwatch, perf) more friendly.
Signed-off-by: Yang Wei <yang.wei9@zte.com.cn>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
dev_consume_skb_irq() should be called in mace_interrupt() when skb
xmit done. It makes drop profiles(dropwatch, perf) more friendly.
Signed-off-by: Yang Wei <yang.wei9@zte.com.cn>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
dev_consume_skb_irq() should be called when skb xmit done. It makes
drop profiles(dropwatch, perf) more friendly.
Signed-off-by: Yang Wei <yang.wei9@zte.com.cn>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
drop profiles
dev_consume_skb_irq() should be called in emac_mac_tx_process() when
skb xmit done. It makes drop profiles(dropwatch, perf) more friendly.
Signed-off-by: Yang Wei <yang.wei9@zte.com.cn>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
profiles
dev_consume_skb_irq() should be called when skb xmit done. It makes
drop profiles(dropwatch, perf) more friendly.
Signed-off-by: Yang Wei <yang.wei9@zte.com.cn>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The GPIO interrupt controller on the espressobin board only supports edge interrupts.
If one enables the use of hardware interrupts in the device tree for the 88E6341, it is
possible to miss an edge. When this happens, the INTn pin on the Marvell switch is
stuck low and no further interrupts occur.
I found after adding debug statements to mv88e6xxx_g1_irq_thread_work() that there is
a race in handling device interrupts (e.g. PHY link interrupts). Some interrupts are
directly cleared by reading the Global 1 status register. However, the device interrupt
flag, for example, is not cleared until all the unmasked SERDES and PHY ports are serviced.
This is done by reading the relevant SERDES and PHY status register.
The code only services interrupts whose status bit is set at the time of reading its status
register. If an interrupt event occurs after its status is read and before all interrupts
are serviced, then this event will not be serviced and the INTn output pin will remain low.
This is not a problem with polling or level interrupts since the handler will be called
again to process the event. However, it's a big problem when using level interrupts.
The fix presented here is to add a loop around the code servicing switch interrupts. If
any pending interrupts remain after the current set has been handled, we loop and process
the new set. If there are no pending interrupts after servicing, we are sure that INTn has
gone high and we will get an edge when a new event occurs.
Tested on espressobin board.
Fixes: dc30c35be720 ("net: dsa: mv88e6xxx: Implement interrupt support.")
Signed-off-by: John David Anglin <dave.anglin@bell.net>
Tested-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
phylib enables interrupts before phy_start() has been called, and if
we receive an interrupt in a non-started state, the interrupt handler
returns IRQ_NONE. This causes problems with at least one Marvell chip
as reported by Andrew.
Fix this by handling interrupts the same as in phy_mac_interrupt(),
basically always running the phylib state machine. It knows when it
has to do something and when not.
This change allows to handle interrupts gracefully even if they
occur in a non-started state.
Fixes: 2b3e88ea6528 ("net: phy: improve phy state checking")
Reported-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Peter Oskolkov says:
====================
This patchset implements BPF_LWT_ENCAP_IP mode in bpf_lwt_push_encap
BPF helper. It enables BPF programs (specifically, BPF_PROG_TYPE_LWT_IN
and BPF_PROG_TYPE_LWT_XMIT prog types) to add IP encapsulation headers
to packets (e.g. IP/GRE, GUE, IPIP).
This is useful when thousands of different short-lived flows should be
encapped, each with different and dynamically determined destination.
Although lwtunnels can be used in some of these scenarios, the ability
to dynamically generate encap headers adds more flexibility, e.g.
when routing depends on the state of the host (reflected in global bpf
maps).
V2 changes: added flowi-based route lookup, IPv6 encapping, and
encapping on ingress.
V3 changes: incorporated David Ahern's suggestions:
- added l3mdev check/oif (patch 2)
- sync bpf.h from include/uapi into tools/include/uapi
- selftest tweaks
V4 changes: moved route lookup/dst change from bpf_push_ip_encap
to when BPF_LWT_REROUTE is handled, as suggested by David Ahern.
V5 changes: added a check in lwt_xmit that skb->protocol stays the
same if the skb is to be passed back to the stack (ret == BPF_OK).
Again, suggested by David Ahern.
V6 changes: abandoned.
V7 changes: added handling of GSO packets (patch 3 in the patchset added),
as suggested by BPF maintainers.
V8 changes:
- fixed build errors when LWT or IPV6 are not enabled;
- whitelisted TCP GSO instead of blacklisting SCTP and UDP GSO, as
suggested by Willem de Bruijn;
- added validation that pushed length cover needed headers when GRE/UDP
encap is detected, as suggested by Willem de Bruijn;
- a couple of minor/stylistic tweaks/fixed typos.
V9 changes:
- fixed a kbuild test robot compiler warning;
- added ipv6_route_input to ipv6_stub (patch 4 in the patchset
added), and IPv6 routing functions are now invoked via ipv6_stub,
as suggested by David Ahern.
V10 changes:
- removed unnecessary IS_ENABLED and pr_warn_once from patch 5.
V11 changes: fixed a potential dst leak in patch 5, as suggested by
David Ahern.
====================
Reviewed-by: David Ahern <dsahern@gmail.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
This patch adds a bpf self-test to cover BPF_LWT_ENCAP_IP mode
in bpf_lwt_push_encap.
Covered:
- encapping in LWT_IN and LWT_XMIT
- IPv4 and IPv6
A follow-up patch will add GSO and VRF-enabled tests.
Signed-off-by: Peter Oskolkov <posk@google.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
This patch copies changes in bpf.h done by a previous patch
in this patchset from the kernel uapi include dir into tools
uapi include dir.
Signed-off-by: Peter Oskolkov <posk@google.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
This patch builds on top of the previous patch in the patchset,
which added BPF_LWT_ENCAP_IP mode to bpf_lwt_push_encap. As the
encapping can result in the skb needing to go via a different
interface/route/dst, bpf programs can indicate this by returning
BPF_LWT_REROUTE, which triggers a new route lookup for the skb.
v8 changes: fix kbuild errors when LWTUNNEL_BPF is builtin, but
IPV6 is a module: as LWTUNNEL_BPF can only be either Y or N,
call IPV6 routing functions only if they are built-in.
v9 changes:
- fixed a kbuild test robot compiler warning;
- call IPV6 routing functions via ipv6_stub.
v10 changes: removed unnecessary IS_ENABLED and pr_warn_once.
v11 changes: fixed a potential dst leak.
Signed-off-by: Peter Oskolkov <posk@google.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
Proxy ip6_route_input via ipv6_stub, for later use by lwt bpf ip encap
(see the next patch in the patchset).
Signed-off-by: Peter Oskolkov <posk@google.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
This patch adds handling of GSO packets in bpf_lwt_push_ip_encap()
(called from bpf_lwt_push_encap):
* IPIP, GRE, and UDP encapsulation types are deduced by looking
into iphdr->protocol or ipv6hdr->next_header;
* SCTP GSO packets are not supported (as bpf_skb_proto_4_to_6
and similar do);
* UDP_L4 GSO packets are also not supported (although they are
not blocked in bpf_skb_proto_4_to_6 and similar), as
skb_decrease_gso_size() will break it;
* SKB_GSO_DODGY bit is set.
Note: it may be possible to support SCTP and UDP_L4 gso packets;
but as these cases seem to be not well handled by other
tunneling/encapping code paths, the solution should
be generic enough to apply to all tunneling/encapping code.
v8 changes:
- make sure that if GRE or UDP encap is detected, there is
enough of pushed bytes to cover both IP[v6] + GRE|UDP headers;
- do not reject double-encapped packets;
- whitelist TCP GSO packets rather than block SCTP GSO and
UDP GSO.
Signed-off-by: Peter Oskolkov <posk@google.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
Implement BPF_LWT_ENCAP_IP mode in bpf_lwt_push_encap BPF helper.
It enables BPF programs (specifically, BPF_PROG_TYPE_LWT_IN and
BPF_PROG_TYPE_LWT_XMIT prog types) to add IP encapsulation headers
to packets (e.g. IP/GRE, GUE, IPIP).
This is useful when thousands of different short-lived flows should be
encapped, each with different and dynamically determined destination.
Although lwtunnels can be used in some of these scenarios, the ability
to dynamically generate encap headers adds more flexibility, e.g.
when routing depends on the state of the host (reflected in global bpf
maps).
v7 changes:
- added a call skb_clear_hash();
- removed calls to skb_set_transport_header();
- refuse to encap GSO-enabled packets.
v8 changes:
- fix build errors when LWT is not enabled.
Note: the next patch in the patchset with deal with GSO-enabled packets,
which are currently rejected at encapping attempt.
Signed-off-by: Peter Oskolkov <posk@google.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
This patch adds all needed plumbing in preparation to allowing
bpf programs to do IP encapping via bpf_lwt_push_encap. Actual
implementation is added in the next patch in the patchset.
Of note:
- bpf_lwt_push_encap can now be called from BPF_PROG_TYPE_LWT_XMIT
prog types in addition to BPF_PROG_TYPE_LWT_IN;
- if the skb being encapped has GSO set, encapsulation is limited
to IPIP/IP+GRE/IP+GUE (both IPv4 and IPv6);
- as route lookups are different for ingress vs egress, the single
external bpf_lwt_push_encap BPF helper is routed internally to
either bpf_lwt_in_push_encap or bpf_lwt_xmit_push_encap BPF_CALLs,
depending on prog type.
v8 changes: fixed a typo.
Signed-off-by: Peter Oskolkov <posk@google.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
In sctp_stream_init(), after sctp_stream_outq_migrate() freed the
surplus streams' ext, but sctp_stream_alloc_out() returns -ENOMEM,
stream->outcnt will not be set to 'outcnt'.
With the bigger value on stream->outcnt, when closing the assoc and
freeing its streams, the ext of those surplus streams will be freed
again since those stream exts were not set to NULL after freeing in
sctp_stream_outq_migrate(). Then the invalid-free issue reported by
syzbot would be triggered.
We fix it by simply setting them to NULL after freeing.
Fixes: 5bbbbe32a431 ("sctp: introduce stream scheduler foundations")
Reported-by: syzbot+58e480e7b28f2d890bfd@syzkaller.appspotmail.com
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|