summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2019-02-14ALSA: hda/realtek - Headset microphone and internal speaker support for ↵Jeremy Soller
System76 oryp5 On the System76 Oryx Pro (oryp5), there is a headset microphone input attached to 0x19 that does not have a jack detect. In order to get it working, the pin configuration needs to be set correctly, and the ALC269_FIXUP_HEADSET_MODE_NO_HP_MIC fixup needs to be applied. This is similar to the MIC_NO_PRESENCE fixups for some Dell laptops, except we have a separate microphone jack that is already configured correctly. Since the ALC1220 does not have a fixup similar to ALC269_FIXUP_HEADSET_MODE_NO_HP_MIC, I have exposed the fixup from the ALC269 in a way that it can be accessed from the alc1220_fixup_system76_oryp5 function. In addition, the alc1220_fixup_clevo_p950 needs to be applied to gain speaker output. Signed-off-by: Jeremy Soller <jeremy@system76.com> Cc: <stable@vger.kernel.org> Signed-off-by: Takashi Iwai <tiwai@suse.de>
2019-02-13Merge branch 'mlxsw-hwmon-and-thermal-extensions'David S. Miller
Ido Schimmel says: ==================== mlxsw: hwmon and thermal extensions Vadim says: This patchset contains various improvements to hwmon and thermal code in mlxsw. The most significant improvement is the ability to read modules' temperature attributes (input, fault, critical and emergency thresholds) as well as fans' fault indication. These new attributes will improve the ability to monitor the system. Patches #1-#4 add the necessary device registers and APIs to read modules' temperature attributes and fans' fault indication. Patches #5-#8 perform small improvements in hwmon and thermal code such as using a more indicative name for cooling devices. Patch #9 exposes fans' fault indication via hwmon. Patch #10 exposes modules' temperature attributes via hwmon. Patch #11 adds an hwmon label to modules' temperature sensor. This helps to parse the output of utilities such as "sensors". Patch #12 allows to bind an external cooling device ("mlxreg-fan") to mlxsw thermal zone. This will allow the mlxsw thermal zone to change the cooling level of cooling devices not programmed via switch registers. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-13mlxsw: core: Allow thermal zone binding to an external cooling deviceVadim Pasternak
Allow thermal zone binding to an external cooling device from the cooling devices white list. It provides support for Mellanox next generation systems on which cooling device logic is not controlled through the switch registers. Signed-off-by: Vadim Pasternak <vadimp@mellanox.com> Reviewed-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-13mlxsw: core: Add QSFP module temperature label attribute to hwmonVadim Pasternak
Add label attribute to hwmon object for exposing QSFP module's temperature sensor name. Modules are labeled as "front panel xxx". The label is used by utilities such as "sensors": front panel 001: +0.0C (crit = +0.0C, emerg = +0.0C) .. front panel 020: +31.0C (crit = +70.0C, emerg = +80.0C) .. front panel 056: +41.0C (crit = +70.0C, emerg = +80.0C) Signed-off-by: Vadim Pasternak <vadimp@mellanox.com> Reviewed-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-13mlxsw: core: Extend hwmon interface with QSFP module temperature attributesVadim Pasternak
Add new attributes to hwmon object for exposing QSFP module temperature input, fault indication, critical and emergency thresholds. Temperature input and fault indication are read from Management Temperature Bulk Register. Temperature thresholds are read from Management Cable Info Access Register. Signed-off-by: Vadim Pasternak <vadimp@mellanox.com> Reviewed-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-13mlxsw: core: Extend hwmon interface with fan fault attributeVadim Pasternak
Add new fan hwmon attribute for exposing fan faults (fault indication is read from Fan Out of Range Event Register). Signed-off-by: Vadim Pasternak <vadimp@mellanox.com> Reviewed-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-13mlxsw: core: Rename cooling deviceVadim Pasternak
Rename cooling device from "Fan" to "mlxsw_fan". Name "Fan" is too common name, and such name is misleading, while it's interpreted by user. For example name "Fan" could be used by ACPI. Signed-off-by: Vadim Pasternak <vadimp@mellanox.com> Reviewed-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-13mlxsw: core: Replace thermal temperature trips with definesVadim Pasternak
Replace thermal hardcoded temperature trip values with defines. Signed-off-by: Vadim Pasternak <vadimp@mellanox.com> Reviewed-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-13mlxsw: core: Modify thermal zone definitionVadim Pasternak
Modify thermal zone trip points setting for better alignment with system thermal requirement. Add hysteresis thresholds for thermal trips in order to avoid throttling around thermal trip point. If hysteresis temperature is not considered, PWM can have side effect of flip up/down on thermal trip point boundary. Signed-off-by: Vadim Pasternak <vadimp@mellanox.com> Reviewed-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-13mlxsw: core: Set different thermal polling time based on bus frequency ↵Vadim Pasternak
capability Add low frequency bus capability in order to allow core functionality separation based on bus type. Driver could run over PCIe, which is considered as high frequency bus or I2C, which is considered as low frequency bus. In the last case time setting, for example, for thermal polling interval, should be increased. Use different thermal monitoring based on bus type. For I2C bus time is set to 20 seconds, while for PCIe 1 second polling interval is used. Signed-off-by: Vadim Pasternak <vadimp@mellanox.com> Reviewed-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-13mlxsw: core: Add API for QSFP module temperature thresholds readingVadim Pasternak
Add new API to read QSFP module's temperature thresholds - warning and critical. New internal API reads the temperature thresholds from the modules, which are equipped with the thermal sensor. These thresholds will be exposed via hwmon subsystem. Signed-off-by: Vadim Pasternak <vadimp@mellanox.com> Reviewed-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-13mlxsw: reg: Add Fan Out of Range Event RegisterVadim Pasternak
Add FORE (Fan Out of Range Event Register), which is used for fan fault reading. Signed-off-by: Vadim Pasternak <vadimp@mellanox.com> Reviewed-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-13mlxsw: reg: Add Management Temperature Bulk RegisterVadim Pasternak
Add MTBR (Management Temperature Bulk Register), which is used for port temperature reading in a bulk mode. Signed-off-by: Vadim Pasternak <vadimp@mellanox.com> Reviewed-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-13mlxsw: spectrum: Move QSFP EEPROM definitions to common locationVadim Pasternak
Move QSFP EEPROM definitions to common location from the spectrum driver in order to make them available for other mlxsw modules. They are common for all kind of chips and have relation to SFF specifications 8024, 8436, 8472, 8636, rather than to chip type. Signed-off-by: Vadim Pasternak <vadimp@mellanox.com> Reviewed-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-13Merge tag 'batadv-next-for-davem-20190213' of ↵David S. Miller
git://git.open-mesh.org/linux-merge Simon Wunderlich says: ==================== This feature/cleanup patchset includes the following patches: - fix memory leak in in batadv_dat_put_dhcp, by Martin Weinelt - fix typo, by Sven Eckelmann - netlink restructuring patch series (part 2), by Sven Eckelmann (19 patches) ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-13test_objagg: Uninitialized variable in error handlingDan Carpenter
We need to set the error message on this path otherwise some of the callers, such as test_hints_case(), print from an uninitialized pointer. We had a similar bug earlier and set "errmsg" to NULL in the caller, test_delta_action_item(). That code is no longer required so I have removed it. Fixes: 9069a3817d82 ("lib: objagg: implement optimization hints assembly and use hints for object creation") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-13test_objagg: Test the correct variableDan Carpenter
There is a typo here. We intended to check "objagg2" but we instead test "objagg" which is not an error pointer. Fixes: 9069a3817d82 ("lib: objagg: implement optimization hints assembly and use hints for object creation") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-13lib: objagg: Fix an error code in objagg_hints_get()Dan Carpenter
We need to set the error code on this path otherwise we return ERR_PTR(0) which would result in a NULL dereference in the caller. Fixes: 9069a3817d82 ("lib: objagg: implement optimization hints assembly and use hints for object creation") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-13net: dsa: bcm_sf2: potential array overflow in bcm_sf2_sw_suspend()Dan Carpenter
The value of ->num_ports comes from bcm_sf2_sw_probe() and it is less than or equal to DSA_MAX_PORTS. The ds->ports[] array is used inside the dsa_is_user_port() and dsa_is_cpu_port() functions. The ds->ports[] array is allocated in dsa_switch_alloc() and it has ds->num_ports elements so this leads to a static checker warning about a potential out of bounds read. Fixes: 8cfa94984c9c ("net: dsa: bcm_sf2: add suspend/resume callbacks") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Reviewed-by: Vivien Didelot <vivien.didelot@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-13cxgb4vf: Few more link management changes.Vishal Kulkarni
CR4_QSFP 10G Speed technology should be 10000baseKR_Full And also report available FEC modes. Signed-off-by: Vishal Kulkarni <vishal@chelsio.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-13Merge branch 'pagepool-api-and-dma-address-storage'David S. Miller
Jesper Dangaard Brouer says: ==================== Fix page_pool API and dma address storage As pointed out by David Miller in [1] the current page_pool implementation stores dma_addr_t in page->private. This won't work on 32-bit platforms with 64-bit DMA addresses since the page->private is an unsigned long and the dma_addr_t a u64. Since no driver is yet using the DMA mapping capabilities of the API let's fix this by storing the information in 'struct page' and use that to store and retrieve DMA addresses from network drivers. As long as the addresses returned from dma_map_page() are aligned the first bit, used by the compound pages code should not be set. Ilias tested the first two patches on Espressobin driver mvneta, for which we have patches for using the DMA API of page_pool. [1]: https://lore.kernel.org/netdev/20181207.230655.1261252486319967024.davem@davemloft.net/ ==================== Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: Ilias Apalodimas <ilias.apalodimas@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-13page_pool: use DMA_ATTR_SKIP_CPU_SYNC for DMA mappingsJesper Dangaard Brouer
As pointed out by Alexander Duyck, the DMA mapping done in page_pool needs to use the DMA attribute DMA_ATTR_SKIP_CPU_SYNC. As the principle behind page_pool keeping the pages mapped is that the driver takes over the DMA-sync steps. Reported-by: Alexander Duyck <alexander.duyck@gmail.com> Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: Ilias Apalodimas <ilias.apalodimas@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-13net: page_pool: don't use page->private to store dma_addr_tIlias Apalodimas
As pointed out by David Miller the current page_pool implementation stores dma_addr_t in page->private. This won't work on 32-bit platforms with 64-bit DMA addresses since the page->private is an unsigned long and the dma_addr_t a u64. A previous patch is adding dma_addr_t on struct page to accommodate this. This patch adapts the page_pool related functions to use the newly added struct for storing and retrieving DMA addresses from network drivers. Signed-off-by: Ilias Apalodimas <ilias.apalodimas@linaro.org> Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-13mm: add dma_addr_t to struct pageJesper Dangaard Brouer
The page_pool API is using page->private to store DMA addresses. As pointed out by David Miller we can't use that on 32-bit architectures with 64-bit DMA This patch adds a new dma_addr_t struct to allow storing DMA addresses Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: Ilias Apalodimas <ilias.apalodimas@linaro.org> Acked-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-13net: sched: remove duplicated include from cls_api.cYueHaibing
Remove duplicated include. Signed-off-by: YueHaibing <yuehaibing@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-13flow_offload: fix block statsJohn Hurley
With the introduction of flow_stats_update(), drivers now update the stats fields of the passed tc_cls_flower_offload struct, rather than call tcf_exts_stats_update() directly to update the stats of offloaded TC flower rules. However, if multiple qdiscs are registered to a TC shared block and a flower rule is applied, then, when getting stats for the rule, multiple callbacks may be made. Take this into consideration by modifying flow_stats_update to gather the stats from all callbacks. Currently, the values in tc_cls_flower_offload only account for the last stats callback in the list. Fixes: 3b1903ef97c0 ("flow_offload: add statistics retrieval infrastructure and use it") Signed-off-by: John Hurley <john.hurley@netronome.com> Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com> Acked-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-13net: sched: flower: only return error from hw offload if skip_swVlad Buslov
Recently introduced tc_setup_flow_action() can fail when parsing tcf_exts on some unsupported action commands. However, this should not affect the case when user did not explicitly request hw offload by setting skip_sw flag. Modify tc_setup_flow_action() callers to only propagate the error if skip_sw flag is set for filter that is being offloaded, and set extack error message in that case. Signed-off-by: Vlad Buslov <vladbu@mellanox.com> Fixes: 3a7b68617de7 ("cls_api: add translator to flow_action representation") Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-13net: fix possible overflow in __sk_mem_raise_allocated()Eric Dumazet
With many active TCP sockets, fat TCP sockets could fool __sk_mem_raise_allocated() thanks to an overflow. They would increase their share of the memory, instead of decreasing it. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-13qlge: fix some indentation issuesColin Ian King
There are some statements that are indented incorrectly. Fix these. Signed-off-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-13qed: fix indentation issue with statements in an if-blockColin Ian King
There are some statements in an if-block that are not correctly indented. Fix these. Signed-off-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-13net: ixp4xx_eth: replace dev_kfree_skb_irq by dev_consume_skb_irq for drop ↵Yang Wei
profiles dev_consume_skb_irq() should be called in eth_txdone_irq() when skb xmit done. It makes drop profiles(dropwatch, perf) more friendly. Signed-off-by: Yang Wei <yang.wei9@zte.com.cn> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-13net: macb: replace dev_kfree_skb_irq by dev_consume_skb_irq for drop profilesYang Wei
dev_consume_skb_irq() should be called in at91ether_interrupt() when skb xmit done. It makes drop profiles(dropwatch, perf) more friendly. Signed-off-by: Yang Wei <yang.wei9@zte.com.cn> Reviewed-by: Claudiu Beznea <claudiu.beznea@microchip.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-13net: sis: replace dev_kfree_skb_irq by dev_consume_skb_irq for drop profilesYang Wei
dev_consume_skb_irq() should be called when skb xmit done. It makes drop profiles(dropwatch, perf) more friendly. Signed-off-by: Yang Wei <yang.wei9@zte.com.cn> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-13net: fealnx: replace dev_kfree_skb_irq by dev_consume_skb_irq for drop profilesYang Wei
dev_consume_skb_irq() should be called in intr_handler() when skb xmit done. It makes drop profiles(dropwatch, perf) more friendly. Signed-off-by: Yang Wei <yang.wei9@zte.com.cn> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-13net: moxa: replace dev_kfree_skb_irq by dev_consume_skb_irq for drop profilesYang Wei
dev_consume_skb_irq() should be called in moxart_tx_finished() when skb xmit done. It makes drop profiles(dropwatch, perf) more friendly. Signed-off-by: Yang Wei <yang.wei9@zte.com.cn> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-13net: apple: replace dev_kfree_skb_irq by dev_consume_skb_irq for drop profilesYang Wei
dev_consume_skb_irq() should be called in mace_interrupt() when skb xmit done. It makes drop profiles(dropwatch, perf) more friendly. Signed-off-by: Yang Wei <yang.wei9@zte.com.cn> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-13net: atheros: replace dev_kfree_skb_irq by dev_consume_skb_irq for drop profilesYang Wei
dev_consume_skb_irq() should be called when skb xmit done. It makes drop profiles(dropwatch, perf) more friendly. Signed-off-by: Yang Wei <yang.wei9@zte.com.cn> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-13net: qualcomm: emac: replace dev_kfree_skb_irq by dev_consume_skb_irq for ↵Yang Wei
drop profiles dev_consume_skb_irq() should be called in emac_mac_tx_process() when skb xmit done. It makes drop profiles(dropwatch, perf) more friendly. Signed-off-by: Yang Wei <yang.wei9@zte.com.cn> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-13net: neterion: replace dev_kfree_skb_irq by dev_consume_skb_irq for drop ↵Yang Wei
profiles dev_consume_skb_irq() should be called when skb xmit done. It makes drop profiles(dropwatch, perf) more friendly. Signed-off-by: Yang Wei <yang.wei9@zte.com.cn> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-13dsa: mv88e6xxx: Ensure all pending interrupts are handled prior to exitJohn David Anglin
The GPIO interrupt controller on the espressobin board only supports edge interrupts. If one enables the use of hardware interrupts in the device tree for the 88E6341, it is possible to miss an edge. When this happens, the INTn pin on the Marvell switch is stuck low and no further interrupts occur. I found after adding debug statements to mv88e6xxx_g1_irq_thread_work() that there is a race in handling device interrupts (e.g. PHY link interrupts). Some interrupts are directly cleared by reading the Global 1 status register. However, the device interrupt flag, for example, is not cleared until all the unmasked SERDES and PHY ports are serviced. This is done by reading the relevant SERDES and PHY status register. The code only services interrupts whose status bit is set at the time of reading its status register. If an interrupt event occurs after its status is read and before all interrupts are serviced, then this event will not be serviced and the INTn output pin will remain low. This is not a problem with polling or level interrupts since the handler will be called again to process the event. However, it's a big problem when using level interrupts. The fix presented here is to add a loop around the code servicing switch interrupts. If any pending interrupts remain after the current set has been handled, we loop and process the new set. If there are no pending interrupts after servicing, we are sure that INTn has gone high and we will get an edge when a new event occurs. Tested on espressobin board. Fixes: dc30c35be720 ("net: dsa: mv88e6xxx: Implement interrupt support.") Signed-off-by: John David Anglin <dave.anglin@bell.net> Tested-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-13net: phy: fix interrupt handling in non-started statesHeiner Kallweit
phylib enables interrupts before phy_start() has been called, and if we receive an interrupt in a non-started state, the interrupt handler returns IRQ_NONE. This causes problems with at least one Marvell chip as reported by Andrew. Fix this by handling interrupts the same as in phy_mac_interrupt(), basically always running the phylib state machine. It knows when it has to do something and when not. This change allows to handle interrupts gracefully even if they occur in a non-started state. Fixes: 2b3e88ea6528 ("net: phy: improve phy state checking") Reported-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-13Merge branch 'lwt_encap_ip'Alexei Starovoitov
Peter Oskolkov says: ==================== This patchset implements BPF_LWT_ENCAP_IP mode in bpf_lwt_push_encap BPF helper. It enables BPF programs (specifically, BPF_PROG_TYPE_LWT_IN and BPF_PROG_TYPE_LWT_XMIT prog types) to add IP encapsulation headers to packets (e.g. IP/GRE, GUE, IPIP). This is useful when thousands of different short-lived flows should be encapped, each with different and dynamically determined destination. Although lwtunnels can be used in some of these scenarios, the ability to dynamically generate encap headers adds more flexibility, e.g. when routing depends on the state of the host (reflected in global bpf maps). V2 changes: added flowi-based route lookup, IPv6 encapping, and encapping on ingress. V3 changes: incorporated David Ahern's suggestions: - added l3mdev check/oif (patch 2) - sync bpf.h from include/uapi into tools/include/uapi - selftest tweaks V4 changes: moved route lookup/dst change from bpf_push_ip_encap to when BPF_LWT_REROUTE is handled, as suggested by David Ahern. V5 changes: added a check in lwt_xmit that skb->protocol stays the same if the skb is to be passed back to the stack (ret == BPF_OK). Again, suggested by David Ahern. V6 changes: abandoned. V7 changes: added handling of GSO packets (patch 3 in the patchset added), as suggested by BPF maintainers. V8 changes: - fixed build errors when LWT or IPV6 are not enabled; - whitelisted TCP GSO instead of blacklisting SCTP and UDP GSO, as suggested by Willem de Bruijn; - added validation that pushed length cover needed headers when GRE/UDP encap is detected, as suggested by Willem de Bruijn; - a couple of minor/stylistic tweaks/fixed typos. V9 changes: - fixed a kbuild test robot compiler warning; - added ipv6_route_input to ipv6_stub (patch 4 in the patchset added), and IPv6 routing functions are now invoked via ipv6_stub, as suggested by David Ahern. V10 changes: - removed unnecessary IS_ENABLED and pr_warn_once from patch 5. V11 changes: fixed a potential dst leak in patch 5, as suggested by David Ahern. ==================== Reviewed-by: David Ahern <dsahern@gmail.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2019-02-13selftests: bpf: add test_lwt_ip_encap selftestPeter Oskolkov
This patch adds a bpf self-test to cover BPF_LWT_ENCAP_IP mode in bpf_lwt_push_encap. Covered: - encapping in LWT_IN and LWT_XMIT - IPv4 and IPv6 A follow-up patch will add GSO and VRF-enabled tests. Signed-off-by: Peter Oskolkov <posk@google.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2019-02-13bpf: sync <kdir>/include/.../bpf.h with tools/include/.../bpf.hPeter Oskolkov
This patch copies changes in bpf.h done by a previous patch in this patchset from the kernel uapi include dir into tools uapi include dir. Signed-off-by: Peter Oskolkov <posk@google.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2019-02-13bpf: add handling of BPF_LWT_REROUTE to lwt_bpf.cPeter Oskolkov
This patch builds on top of the previous patch in the patchset, which added BPF_LWT_ENCAP_IP mode to bpf_lwt_push_encap. As the encapping can result in the skb needing to go via a different interface/route/dst, bpf programs can indicate this by returning BPF_LWT_REROUTE, which triggers a new route lookup for the skb. v8 changes: fix kbuild errors when LWTUNNEL_BPF is builtin, but IPV6 is a module: as LWTUNNEL_BPF can only be either Y or N, call IPV6 routing functions only if they are built-in. v9 changes: - fixed a kbuild test robot compiler warning; - call IPV6 routing functions via ipv6_stub. v10 changes: removed unnecessary IS_ENABLED and pr_warn_once. v11 changes: fixed a potential dst leak. Signed-off-by: Peter Oskolkov <posk@google.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2019-02-13ipv6_stub: add ipv6_route_input stub/proxy.Peter Oskolkov
Proxy ip6_route_input via ipv6_stub, for later use by lwt bpf ip encap (see the next patch in the patchset). Signed-off-by: Peter Oskolkov <posk@google.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2019-02-13bpf: handle GSO in bpf_lwt_push_encapPeter Oskolkov
This patch adds handling of GSO packets in bpf_lwt_push_ip_encap() (called from bpf_lwt_push_encap): * IPIP, GRE, and UDP encapsulation types are deduced by looking into iphdr->protocol or ipv6hdr->next_header; * SCTP GSO packets are not supported (as bpf_skb_proto_4_to_6 and similar do); * UDP_L4 GSO packets are also not supported (although they are not blocked in bpf_skb_proto_4_to_6 and similar), as skb_decrease_gso_size() will break it; * SKB_GSO_DODGY bit is set. Note: it may be possible to support SCTP and UDP_L4 gso packets; but as these cases seem to be not well handled by other tunneling/encapping code paths, the solution should be generic enough to apply to all tunneling/encapping code. v8 changes: - make sure that if GRE or UDP encap is detected, there is enough of pushed bytes to cover both IP[v6] + GRE|UDP headers; - do not reject double-encapped packets; - whitelist TCP GSO packets rather than block SCTP GSO and UDP GSO. Signed-off-by: Peter Oskolkov <posk@google.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2019-02-13bpf: implement BPF_LWT_ENCAP_IP mode in bpf_lwt_push_encapPeter Oskolkov
Implement BPF_LWT_ENCAP_IP mode in bpf_lwt_push_encap BPF helper. It enables BPF programs (specifically, BPF_PROG_TYPE_LWT_IN and BPF_PROG_TYPE_LWT_XMIT prog types) to add IP encapsulation headers to packets (e.g. IP/GRE, GUE, IPIP). This is useful when thousands of different short-lived flows should be encapped, each with different and dynamically determined destination. Although lwtunnels can be used in some of these scenarios, the ability to dynamically generate encap headers adds more flexibility, e.g. when routing depends on the state of the host (reflected in global bpf maps). v7 changes: - added a call skb_clear_hash(); - removed calls to skb_set_transport_header(); - refuse to encap GSO-enabled packets. v8 changes: - fix build errors when LWT is not enabled. Note: the next patch in the patchset with deal with GSO-enabled packets, which are currently rejected at encapping attempt. Signed-off-by: Peter Oskolkov <posk@google.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2019-02-13bpf: add plumbing for BPF_LWT_ENCAP_IP in bpf_lwt_push_encapPeter Oskolkov
This patch adds all needed plumbing in preparation to allowing bpf programs to do IP encapping via bpf_lwt_push_encap. Actual implementation is added in the next patch in the patchset. Of note: - bpf_lwt_push_encap can now be called from BPF_PROG_TYPE_LWT_XMIT prog types in addition to BPF_PROG_TYPE_LWT_IN; - if the skb being encapped has GSO set, encapsulation is limited to IPIP/IP+GRE/IP+GUE (both IPv4 and IPv6); - as route lookups are different for ingress vs egress, the single external bpf_lwt_push_encap BPF helper is routed internally to either bpf_lwt_in_push_encap or bpf_lwt_xmit_push_encap BPF_CALLs, depending on prog type. v8 changes: fixed a typo. Signed-off-by: Peter Oskolkov <posk@google.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2019-02-13sctp: set stream ext to NULL after freeing it in sctp_stream_outq_migrateXin Long
In sctp_stream_init(), after sctp_stream_outq_migrate() freed the surplus streams' ext, but sctp_stream_alloc_out() returns -ENOMEM, stream->outcnt will not be set to 'outcnt'. With the bigger value on stream->outcnt, when closing the assoc and freeing its streams, the ext of those surplus streams will be freed again since those stream exts were not set to NULL after freeing in sctp_stream_outq_migrate(). Then the invalid-free issue reported by syzbot would be triggered. We fix it by simply setting them to NULL after freeing. Fixes: 5bbbbe32a431 ("sctp: introduce stream scheduler foundations") Reported-by: syzbot+58e480e7b28f2d890bfd@syzkaller.appspotmail.com Signed-off-by: Xin Long <lucien.xin@gmail.com> Acked-by: Neil Horman <nhorman@tuxdriver.com> Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>