summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2018-10-08net: sched: cls_u32: disallow linking to root hnodeAl Viro
Operation makes no sense. Nothing will actually break if we do so (depth limit in u32_classify() will prevent infinite loops), but according to maintainers it's best prohibited outright. NOTE: doing so guarantees that u32_destroy() will trigger the call of u32_destroy_hnode(); we might want to make that unconditional. Test: tc qdisc add dev eth0 ingress tc filter add dev eth0 parent ffff: protocol ip prio 100 u32 \ link 800: offset at 0 mask 0f00 shift 6 plus 0 eat match ip protocol 6 ff should fail with Error: cls_u32: Not linking to root node Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-08net: sched: cls_u32: mark root hnode explicitlyAl Viro
... and produce consistent error on attempt to delete such. Existing check in u32_delete() is inconsistent - after tc qdisc add dev eth0 ingress tc filter add dev eth0 parent ffff: protocol ip prio 100 handle 1: u32 \ divisor 1 tc filter add dev eth0 parent ffff: protocol ip prio 200 handle 2: u32 \ divisor 1 both tc filter delete dev eth0 parent ffff: protocol ip prio 100 handle 801: u32 and tc filter delete dev eth0 parent ffff: protocol ip prio 100 handle 800: u32 will fail (at least with refcounting fixes), but the former will complain about an attempt to remove a busy table, while the latter will recognize it as root and yield "Not allowed to delete root node" instead. The problem with the existing check is that several tcf_proto instances might share the same tp->data and handle-to-hnode lookup will be the same for all of them. So comparing an hnode to be deleted with tp->root won't catch the case when one tp is used to try deleting the root of another. Solution is trivial - mark the root hnodes explicitly upon allocation and check for that. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-08Merge branch ↵David S. Miller
'net-phy-mscc-add-support-for-VSC8584-and-VSC8574-Microsemi-quad-port-PHYs' Quentin Schulz says: ==================== net: phy: mscc: add support for VSC8584 and VSC8574 Microsemi quad-port PHYs RESEND: rebased on top of latest net-next and on top of latest version of "net: phy: mscc: various improvements to Microsemi PHY driver" patch series. Both PHYs are 4-port PHY that are 10/100/1000BASE-T, 100BASE-FX, 1000BASE-X and triple-speed copper SFP capable, can communicate with the MAC via SGMII, QSGMII or 1000BASE-X, supports downshifting and can set the blinking pattern of each of its 4 LEDs, supports SyncE as well as HP Auto-MDIX detection. VSC8574 supports WOL and VSC8584 supports hardware offloading of MACsec. This patch series add support for 10/100/1000BASE-T, SGMII/QSGMII link with the MAC, downshifting, HP Auto-MDIX detection and blinking pattern for their 4 LEDs. They have also an internal Intel 8051 microcontroller whose firmware needs to be patched when the PHY is reset. If the 8051's firmware has the expected CRC, its patching can be skipped. The microcontroller can be accessed from any port of the PHY, though the CRC function can only be done through the PHY that is the base PHY of the package (internal address 0) due to a limitation of the firmware. The GPIO register bank is a set of registers that are common to all PHYs in the package. So any modification in any register of this bank affects all PHYs of the package. If the PHYs haven't been reset before booting the Linux kernel and were configured to use interrupts for e.g. link status updates, it is required to clear the interrupts mask register of all PHYs before being able to use interrupts with any PHY. The first PHY of the package that will be init will take care of clearing all PHYs interrupts mask registers. Thus, we need to keep track of the init sequence in the package, if it's already been done or if it's to be done. Most of the init sequence of a PHY of the package is common to all PHYs in the package, thus we use the SMI broadcast feature which enables us to propagate a write in one register of one PHY to all PHYs in the same package. We also introduce a new development board called PCB120 which exists in variants for VSC8584 and VSC8574 (and that's the only difference to the best of my knowledge). I suggest patches 1 to 3 go through net tree and patches 4 and 5 go through MIPS tree. Patches going through net tree and those going through MIPS tree do not depend on one another. This patch series depends on this patch series: (https://lore.kernel.org/lkml/20181008100728.24959-1-quentin.schulz@bootlin.com/) ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-08net: phy: mscc: add support for VSC8574 PHYQuentin Schulz
The VSC8574 PHY is a 4-ports PHY that is 10/100/1000BASE-T, 100BASE-FX, 1000BASE-X and triple-speed copper SFP capable, can communicate with the MAC via SGMII, QSGMII or 1000BASE-X, supports WOL, downshifting and can set the blinking pattern of each of its 4 LEDs, supports SyncE as well as HP Auto-MDIX detection. This adds support for 10/100/1000BASE-T, SGMII/QSGMII link with the MAC, WOL, downshifting, HP Auto-MDIX detection and blinking pattern for its 4 LEDs. The VSC8574 has also an internal Intel 8051 microcontroller whose firmware needs to be patched when the PHY is reset. If the 8051's firmware has the expected CRC, its patching can be skipped. The microcontroller can be accessed from any port of the PHY, though the CRC function can only be done through the PHY that is the base PHY of the package (internal address 0) due to a limitation of the firmware. The GPIO register bank is a set of registers that are common to all PHYs in the package. So any modification in any register of this bank affects all PHYs of the package. If the PHYs haven't been reset before booting the Linux kernel and were configured to use interrupts for e.g. link status updates, it is required to clear the interrupts mask register of all PHYs before being able to use interrupts with any PHY. The first PHY of the package that will be init will take care of clearing all PHYs interrupts mask registers. Thus, we need to keep track of the init sequence in the package, if it's already been done or if it's to be done. Most of the init sequence of a PHY of the package is common to all PHYs in the package, thus we use the SMI broadcast feature which enables us to propagate a write in one register of one PHY to all PHYs in the same package. Signed-off-by: Quentin Schulz <quentin.schulz@bootlin.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-08net: phy: mscc: add support for VSC8584 PHYQuentin Schulz
The VSC8584 PHY is a 4-ports PHY that is 10/100/1000BASE-T, 100BASE-FX, 1000BASE-X and triple-speed copper SFP capable, can communicate with the MAC via SGMII, QSGMII or 1000BASE-X, supports downshifting and can set the blinking pattern of each of its 4 LEDs, supports hardware offloading of MACsec and supports SyncE as well as HP Auto-MDIX detection. This adds support for 10/100/1000BASE-T, SGMII/QSGMII link with the MAC, downshifting, HP Auto-MDIX detection and blinking pattern for its 4 LEDs. The VSC8584 has also an internal Intel 8051 microcontroller whose firmware needs to be patched when the PHY is reset. If the 8051's firmware has the expected CRC, its patching can be skipped. The microcontroller can be accessed from any port of the PHY, though the CRC function can only be done through the PHY that is the base PHY of the package (internal address 0) due to a limitation of the firmware. The GPIO register bank is a set of registers that are common to all PHYs in the package. So any modification in any register of this bank affects all PHYs of the package. If the PHYs haven't been reset before booting the Linux kernel and were configured to use interrupts for e.g. link status updates, it is required to clear the interrupts mask register of all PHYs before being able to use interrupts with any PHY. The first PHY of the package that will be init will take care of clearing all PHYs interrupts mask registers. Thus, we need to keep track of the init sequence in the package, if it's already been done or if it's to be done. Most of the init sequence of a PHY of the package is common to all PHYs in the package, thus we use the SMI broadcast feature which enables us to propagate a write in one register of one PHY to all PHYs in the same package. The revA of the VSC8584 PHY (which is not and will not be publicly released) should NOT patch the firmware of the microcontroller or it'll make things worse, the easiest way is just to not support it. Signed-off-by: Quentin Schulz <quentin.schulz@bootlin.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-08dt-bindings: net: vsc8531: add two additional LED modes for VSC8584Quentin Schulz
The VSC8584 (and most likely other PHYs in the same generation) has two additional LED modes that can be picked, so let's add them. Reviewed-by: Rob Herring <robh@kernel.org> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: Quentin Schulz <quentin.schulz@bootlin.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-08Merge branch 'net-phy-mscc-various-improvements-to-Microsemi-PHY-driver'David S. Miller
Quentin Schulz says: ==================== net: phy: mscc: various improvements to Microsemi PHY driver The Microsemi PHYs have multiple banks of registers (called pages). Registers can only be accessed from one page, if we need a register from another page, we need to switch the page and the registers of all other pages are not accessible anymore. Basically, to read register 5 from page 0, 1, 2, etc., you do the same phy_read(phydev, 5); but you need to set the desired page beforehand. In order to guarantee that two concurrent functions do not change the page, we need to do some locking per page. This can be achieved with the use of phy_select_page and phy_restore_page functions but phy_write/read calls in-between those two functions shall be replaced by their lock-free alternative __phy_write/read. The Microsemi PHYs have several counters so let's make them available as PHY statistics. The VSC 8530/31/40/41 also need to update their EEE init sequence in order to avoid packet losses and improve performance. This patch series also makes some minor cosmetic changes to the driver. v3: - add reviewed-by, - use phy_read/write/modify_paged whenever possible instead of the combo phy_select_page, __phy_read/write/modify, phy_restore_page when only one __phy_read/write/modify was executed, v2: - add patch to migrate MSCC driver to use phy_restore/select_page, - migrate all patches from v1 to use those two functions, - put the multiple lines of constants writes in an array and iterate over it to write the values, - add reviewed-bys, ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-08net: phy: mscc: remove unneeded temporary variableQuentin Schulz
Here, the rc variable is either used only for the condition right after the assignment or right before being used as the return value of the function it's being used in. So let's remove this unneeded temporary variable whenever possible. Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: Quentin Schulz <quentin.schulz@bootlin.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-08net: phy: mscc: shorten `x != 0` condition to `x`Quentin Schulz
`if (x != 0)` is basically a more verbose version of `if (x)` so let's use the latter so it's consistent throughout the whole driver. Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: Quentin Schulz <quentin.schulz@bootlin.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-08net: phy: mscc: remove unneeded parenthesisQuentin Schulz
The == operator precedes the || operator, so we can remove the parenthesis around (a == b) || (c == d). The condition is rather explicit and short so removing the parenthesis definitely does not make it harder to read. Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: Quentin Schulz <quentin.schulz@bootlin.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-08net: phy: mscc: Add EEE init sequenceRaju Lakkaraju
Microsemi PHYs (VSC 8530/31/40/41) need to update the Energy Efficient Ethernet initialization sequence. In order to avoid certain link state errors that could result in link drops and packet loss, the physical coding sublayer (PCS) must be updated with settings related to EEE in order to improve performance. Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: Raju Lakkaraju <Raju.Lakkaraju@microchip.com> Signed-off-by: Quentin Schulz <quentin.schulz@bootlin.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-08net: phy: mscc: add ethtool statistics countersRaju Lakkaraju
There are a few counters available in the PHY: receive errors, false carriers, link disconnects, media CRC errors and valids counters. So let's expose those in the PHY driver. Use the priv structure as the next PHY to be supported has a few additional counters. Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: Raju Lakkaraju <Raju.Lakkaraju@microsemi.com> Signed-off-by: Quentin Schulz <quentin.schulz@bootlin.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-08net: phy: mscc: migrate to phy_select/restore_page functionsQuentin Schulz
The Microsemi PHYs have multiple banks of registers (called pages). Registers can only be accessed from one page, if we need a register from another page, we need to switch the page and the registers of all other pages are not accessible anymore. Basically, to read register 5 from page 0, 1, 2, etc., you do the same phy_read(phydev, 5); but you need to set the desired page beforehand. In order to guarantee that two concurrent functions do not change the page, we need to do some locking per page. This can be achieved with the use of phy_select_page and phy_restore_page functions but phy_write/read calls in-between those two functions shall be replaced by their lock-free alternative __phy_write/read. Let's migrate this driver to those functions. Suggested-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: Quentin Schulz <quentin.schulz@bootlin.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-08net: dpaa2: fix and improve dpaa2-ptp driverYangbo Lu
This patch is to fix and improve dpaa2-ptp driver in some places. - Fixed the return for some functions. - Replaced kzalloc with devm_kzalloc. - Removed dev_set_drvdata(dev, NULL). - Made ptp_dpaa2_caps const. Signed-off-by: Yangbo Lu <yangbo.lu@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-08net: dpaa2: remove unused code for dprtcYangbo Lu
This patch is to removed unused code for dprtc. This code will be re-added along with more features of dpaa2-ptp added. Signed-off-by: Yangbo Lu <yangbo.lu@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-08net: dpaa2: rename rtc as ptp in dpaa2-ptp driverYangbo Lu
In dpaa2-ptp driver, it's odd to use rtc in names of some functions and structures except these dprtc APIs. This patch is to use ptp instead of rtc in names. Signed-off-by: Yangbo Lu <yangbo.lu@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-08net: dpaa2: fix dependency of config FSL_DPAA2_ETHYangbo Lu
The NETDEVICES dependency and ETHERNET dependency hadn't been required since dpaa2-eth was moved out of staging. Also allowed COMPILE_TEST for dpaa2-eth. Signed-off-by: Yangbo Lu <yangbo.lu@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-08MAINTAINERS: update files maintained under DPAA2 PTP/ETHERNETYangbo Lu
The files maintained under DPAA2 PTP/ETHERNET needs to be updated since dpaa2 ptp driver had been moved into drivers/net/ethernet/freescale/dpaa2/. Signed-off-by: Yangbo Lu <yangbo.lu@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-08net: dpaa2: move DPAA2 PTP driver out of staging/Yangbo Lu
This patch is to move DPAA2 PTP driver out of staging/ since the dpaa2-eth had been moved out. Signed-off-by: Yangbo Lu <yangbo.lu@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-08tools headers uapi: Sync kvm.h copyArnaldo Carvalho de Melo
To pick up the changes introduced in: 6fbbde9a1969 ("KVM: x86: Control guest reads of MSR_PLATFORM_INFO") That is not yet used in tools such as 'perf trace'. The type of the change in this file, a simple integer parameter to the KVM_CHECK_EXTENSION ioctl should be easier to implement tho, adding to the libbeauty TODO list. This silences this perf build warning: Warning: Kernel ABI header at 'tools/include/uapi/linux/kvm.h' differs from latest version at 'include/uapi/linux/kvm.h' diff -u tools/include/uapi/linux/kvm.h include/uapi/linux/kvm.h Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Drew Schmitt <dasch@google.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Wang Nan <wangnan0@huawei.com> Link: https://lkml.kernel.org/n/tip-67h1bio5bihi1q6dy7hgwwx8@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-10-08tools arch uapi: Sync the x86 kvm.h copyArnaldo Carvalho de Melo
To get the changes in: d1766202779e ("x86/kvm/lapic: always disable MMIO interface in x2APIC mode") That at this time will not generate changes in tools such as 'perf trace', that still needs more work in tools/perf/examples/bpf/augmented_syscalls.c to need such id -> string tables. This silences the following perf build warning: Warning: Kernel ABI header at 'tools/arch/x86/include/uapi/asm/kvm.h' differs from latest version at 'arch/x86/include/uapi/asm/kvm.h' diff -u tools/arch/x86/include/uapi/asm/kvm.h arch/x86/include/uapi/asm/kvm.h Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Vitaly Kuznetsov <vkuznets@redhat.com> Cc: Wang Nan <wangnan0@huawei.com> Link: https://lkml.kernel.org/n/tip-yadntj2ok6zpzjwi656onuh0@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-10-08rxrpc: Carry call state out of locked section in rxrpc_rotate_tx_window()David Howells
Carry the call state out of the locked section in rxrpc_rotate_tx_window() rather than sampling it afterwards. This is only used to select tracepoint data, but could have changed by the time we do the tracepoint. Signed-off-by: David Howells <dhowells@redhat.com>
2018-10-08rxrpc: Don't check RXRPC_CALL_TX_LAST after calling rxrpc_rotate_tx_window()David Howells
We should only call the function to end a call's Tx phase if we rotated the marked-last packet out of the transmission buffer. Make rxrpc_rotate_tx_window() return an indication of whether it just rotated the packet marked as the last out of the transmit buffer, carrying the information out of the locked section in that function. We can then check the return value instead of examining RXRPC_CALL_TX_LAST. Fixes: 70790dbe3f66 ("rxrpc: Pass the last Tx packet marker in the annotation buffer") Signed-off-by: David Howells <dhowells@redhat.com>
2018-10-08rxrpc: Don't need to take the RCU read lock in the packet receiverDavid Howells
We don't need to take the RCU read lock in the rxrpc packet receive function because it's held further up the stack in the IP input routine around the UDP receive routines. Fix this by dropping the RCU read lock calls from rxrpc_input_packet(). This simplifies the code. Fixes: 70790dbe3f66 ("rxrpc: Pass the last Tx packet marker in the annotation buffer") Signed-off-by: David Howells <dhowells@redhat.com>
2018-10-08rxrpc: Use the UDP encap_rcv hookDavid Howells
Use the UDP encap_rcv hook to cut the bit out of the rxrpc packet reception in which a packet is placed onto the UDP receive queue and then immediately removed again by rxrpc. Going via the queue in this manner seems like it should be unnecessary. This does, however, require the invention of a value to place in encap_type as that's one of the conditions to switch packets out to the encap_rcv hook. Possibly the value doesn't actually matter for anything other than sockopts on the UDP socket, which aren't accessible outside of rxrpc anyway. This seems to cut a bit of time out of the time elapsed between each sk_buff being timestamped and turning up in rxrpc (the final number in the following trace excerpts). I measured this by making the rxrpc_rx_packet trace point print the time elapsed between the skb being timestamped and the current time (in ns), e.g.: ... 424.278721: rxrpc_rx_packet: ... ACK 25026 So doing a 512MiB DIO read from my test server, with an unmodified kernel: N min max sum mean stddev 27605 2626 7581 7.83992e+07 2840.04 181.029 and with the patch applied: N min max sum mean stddev 27547 1895 12165 6.77461e+07 2459.29 255.02 Signed-off-by: David Howells <dhowells@redhat.com>
2018-10-08Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparcGreg Kroah-Hartman
David writes: "Sparc fixes: 1) Minor fallthru comment tweaks from Gustavo A. R. Silva. 2) VLA removal from Kees Cook. 3) Make sparc vdso Makefile match x86, from Masahiro Yamada. 4) Fix clock divider programming in mach64 driver, from Mikulas Patocka." * git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc: sparc64: fix fall-through annotation sparc32: fix fall-through annotation sparc: vdso: clean-up vdso Makefile oradax: remove redundant null check before kfree sparc64: viohs: Remove VLA usage sbus: Use of_get_child_by_name helper sparc: Convert to using %pOFn instead of device_node.name mach64: detect the dot clock divider correctly on sparc
2018-10-08nvme: remove ns sibling before clearing pathKeith Busch
The code had been clearing a namespace being deleted as the current path while that namespace was still in the path siblings list. It is possible a new IO could set that namespace back to the current path since it appeared to be an eligable path to select, which may result in a use-after-free error. This patch ensures a namespace being removed is not eligable to be reset as a current path prior to clearing it as the current path. Signed-off-by: Keith Busch <keith.busch@intel.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Christoph Hellwig <hch@lst.de>
2018-10-08bpf: do not blindly change rlimit in reuseport net selftestEric Dumazet
If the current process has unlimited RLIMIT_MEMLOCK, we should should leave it as is. Fixes: 941ff6f11c02 ("bpf: fix rlimit in reuseport net selftest") Signed-off-by: John Sperbeck <jsperbeck@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-10-08Merge branch 'bpf-to-bpf-calls-nfp'Daniel Borkmann
Quentin Monnet says: ==================== This patch series adds support for hardware offload of programs containing BPF-to-BPF function calls. First, a new callback is added to the kernel verifier, to collect information after the main part of the verification has been performed. Then support for BPF-to-BPF calls is incrementally added to the nfp driver, before offloading programs containing such calls is eventually allowed by lifting the restriction in the kernel verifier, in the last patch. Please refer to individual patches for details. Many thanks to Jiong and Jakub for their precious help and contribution on the main patches for the JIT-compiler, and everything related to stack accesses. ==================== Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-10-08bpf: allow offload of programs with BPF-to-BPF function callsQuentin Monnet
Now that there is at least one driver supporting BPF-to-BPF function calls, lift the restriction, in the verifier, on hardware offload of eBPF programs containing such calls. But prevent jit_subprogs(), still in the verifier, from being run for offloaded programs. Signed-off-by: Quentin Monnet <quentin.monnet@netronome.com> Reviewed-by: Jiong Wang <jiong.wang@netronome.com> Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-10-08nfp: bpf: support pointers to other stack frames for BPF-to-BPF callsQuentin Monnet
Mark instructions that use pointers to areas in the stack outside of the current stack frame, and process them accordingly in mem_op_stack(). This way, we also support BPF-to-BPF calls where the caller passes a pointer to data in its own stack frame to the callee (typically, when the caller passes an address to one of its local variables located in the stack, as an argument). Thanks to Jakub and Jiong for figuring out how to deal with this case, I just had to turn their email discussion into this patch. Suggested-by: Jiong Wang <jiong.wang@netronome.com> Suggested-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: Quentin Monnet <quentin.monnet@netronome.com> Reviewed-by: Jiong Wang <jiong.wang@netronome.com> Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-10-08nfp: bpf: optimise save/restore for R6~R9 based on register usageQuentin Monnet
When pre-processing the instructions, it is trivial to detect what subprograms are using R6, R7, R8 or R9 as destination registers. If a subprogram uses none of those, then we do not need to jump to the subroutines dedicated to saving and restoring callee-saved registers in its prologue and epilogue. This patch introduces detection of callee-saved registers in subprograms and prevents the JIT from adding calls to those subroutines whenever we can: we save some instructions in the translated program, and some time at runtime on BPF-to-BPF calls and returns. If no subprogram needs to save those registers, we can avoid appending the subroutines at the end of the program. Signed-off-by: Quentin Monnet <quentin.monnet@netronome.com> Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-10-08nfp: bpf: fix return address from register-saving subroutine to calleeQuentin Monnet
On performing a BPF-to-BPF call, we first jump to a subroutine that pushes callee-saved registers (R6~R9) to the stack, and from there we goes to the start of the callee next. In order to do so, the caller must pass to the subroutine the address of the NFP instruction to jump to at the end of that subroutine. This cannot be reliably implemented when translated the caller, as we do not always know the start offset of the callee yet. This patch implement the required fixup step for passing the start offset in the callee via the register used by the subroutine to hold its return address. Signed-off-by: Quentin Monnet <quentin.monnet@netronome.com> Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-10-08nfp: bpf: update fixup function for BPF-to-BPF calls supportQuentin Monnet
Relocation for targets of BPF-to-BPF calls are required at the end of translation. Update the nfp_fixup_branches() function in that regard. When checking that the last instruction of each bloc is a branch, we must account for the length of the instructions required to pop the return address from the stack. Signed-off-by: Quentin Monnet <quentin.monnet@netronome.com> Signed-off-by: Jiong Wang <jiong.wang@netronome.com> Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-10-08nfp: bpf: account for additional stack usage when checking stack limitQuentin Monnet
Offloaded programs using BPF-to-BPF calls use the stack to store the return address when calling into a subprogram. Callees also need some space to save eBPF registers R6 to R9. And contrarily to kernel verifier, we align stack frames on 64 bytes (and not 32). Account for all this when checking the stack size limit before JIT-ing the program. This means we have to recompute maximum stack usage for the program, we cannot get the value from the kernel. In addition to adapting the checks on stack usage, move them to the finalize() callback, now that we have it and because such checks are part of the verification step rather than translation. Signed-off-by: Quentin Monnet <quentin.monnet@netronome.com> Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-10-08nfp: bpf: add main logics for BPF-to-BPF calls support in nfp driverQuentin Monnet
This is the main patch for the logics of BPF-to-BPF calls in the nfp driver. The functions called on BPF_JUMP | BPF_CALL and BPF_JUMP | BPF_EXIT were used to call helpers and exit from the program, respectively; make them usable for calling into, or returning from, a BPF subprogram as well. For all calls, push the return address as well as the callee-saved registers (R6 to R9) to the stack, and pop them upon returning from the calls. In order to limit the overhead in terms of instruction number, this is done through dedicated subroutines. Jumping to the callee actually consists in jumping to the subroutine, that "returns" to the callee: this will require some fixup for passing the address in a later patch. Similarly, returning consists in jumping to the subroutine, which pops registers and then return directly to the caller (but no fixup is needed here). Return to the caller is performed with the RTN instruction newly added to the JIT. For the few steps where we need to know what subprogram an instruction belongs to, the struct nfp_insn_meta is extended with a new subprog_idx field. Note that checks on the available stack size, to take into account the additional requirements associated to BPF-to-BPF calls (storing R6-R9 and return addresses), are added in a later patch. Signed-off-by: Quentin Monnet <quentin.monnet@netronome.com> Signed-off-by: Jiong Wang <jiong.wang@netronome.com> Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-10-08nfp: bpf: account for BPF-to-BPF calls when preparing nfp JITQuentin Monnet
Similarly to "exit" or "helper call" instructions, BPF-to-BPF calls will require additional processing before translation starts, in order to record and mark jump destinations. We also mark the instructions where each subprogram begins. This will be used in a following commit to determine where to add prologues for subprograms. Signed-off-by: Quentin Monnet <quentin.monnet@netronome.com> Reviewed-by: Jiong Wang <jiong.wang@netronome.com> Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-10-08nfp: bpf: ignore helper-related checks for BPF calls in nfp verifierQuentin Monnet
The checks related to eBPF helper calls are performed each time the nfp driver meets a BPF_JUMP | BPF_CALL instruction. However, these checks are not relevant for BPF-to-BPF call (same instruction code, different value in source register), so just skip the checks for such calls. While at it, rename the function that runs those checks to make it clear they apply to _helper_ calls only. Signed-off-by: Quentin Monnet <quentin.monnet@netronome.com> Reviewed-by: Jiong Wang <jiong.wang@netronome.com> Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-10-08nfp: bpf: copy eBPF subprograms information from kernel verifierQuentin Monnet
In order to support BPF-to-BPF calls in offloaded programs, the nfp driver must collect information about the distinct subprograms: namely, the number of subprograms composing the complete program and the stack depth of those subprograms. The latter in particular is non-trivial to collect, so we copy those elements from the kernel verifier via the newly added post-verification hook. The struct nfp_prog is extended to store this information. Stack depths are stored in an array of dedicated structs. Subprogram start indexes are not collected. Instead, meta instructions associated to the start of a subprogram will be marked with a flag in a later patch. Signed-off-by: Quentin Monnet <quentin.monnet@netronome.com> Reviewed-by: Jiong Wang <jiong.wang@netronome.com> Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-10-08nfp: bpf: rename nfp_prog->stack_depth as nfp_prog->stack_frame_depthQuentin Monnet
In preparation for support for BPF to BPF calls in offloaded programs, rename the "stack_depth" field of the struct nfp_prog as "stack_frame_depth". This is to make it clear that the field refers to the maximum size of the current stack frame (as opposed to the maximum size of the whole stack memory). Signed-off-by: Quentin Monnet <quentin.monnet@netronome.com> Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-10-08bpf: add verifier callback to get stack usage info for offloaded progsQuentin Monnet
In preparation for BPF-to-BPF calls in offloaded programs, add a new function attribute to the struct bpf_prog_offload_ops so that drivers supporting eBPF offload can hook at the end of program verification, and potentially extract information collected by the verifier. Implement a minimal callback (returning 0) in the drivers providing the structs, namely netdevsim and nfp. This will be useful in the nfp driver, in later commits, to extract the number of subprograms as well as the stack depth for those subprograms. Signed-off-by: Quentin Monnet <quentin.monnet@netronome.com> Reviewed-by: Jiong Wang <jiong.wang@netronome.com> Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-10-08bpf, doc: Document Jump X addressing modeArthur Fabre
bpf_asm and the other classic BPF tools support jump conditions comparing register A to register X, in addition to comparing register A with constant K. Only the latter was documented in filter.txt, add two new addressing modes that describe the former. Signed-off-by: Arthur Fabre <arthur@arthurfabre.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-10-08libbpf: relicense libbpf as LGPL-2.1 OR BSD-2-ClauseAlexei Starovoitov
libbpf is maturing as a library and gaining features that no other bpf libraries support (BPF Type Format, bpf to bpf calls, etc) Many Apache2 licensed projects (like bcc, bpftrace, gobpf, cilium, etc) would like to use libbpf, but cannot do this yet, since Apache Foundation explicitly states that LGPL is incompatible with Apache2. Hence let's relicense libbpf as dual license LGPL-2.1 or BSD-2-Clause, since BSD-2 is compatible with Apache2. Dual LGPL or Apache2 is invalid combination. Fix license mistake in Makefile as well. Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Andrey Ignatov <rdna@fb.com> Acked-by: Arnaldo Carvalho de Melo <acme@kernel.org> Acked-by: Björn Töpel <bjorn.topel@intel.com> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: David Beckett <david.beckett@netronome.com> Acked-by: Jakub Kicinski <jakub.kicinski@netronome.com> Acked-by: Joe Stringer <joe@ovn.org> Acked-by: John Fastabend <john.fastabend@gmail.com> Acked-by: Martin KaFai Lau <kafai@fb.com> Acked-by: Quentin Monnet <quentin.monnet@netronome.com> Acked-by: Thomas Graf <tgraf@suug.ch> Acked-by: Roman Gushchin <guro@fb.com> Acked-by: Wang Nan <wangnan0@huawei.com> Acked-by: Yonghong Song <yhs@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-10-08xsk: proper AF_XDP socket teardown orderingBjörn Töpel
The AF_XDP socket struct can exist in three different, implicit states: setup, bound and released. Setup is prior the socket has been bound to a device. Bound is when the socket is active for receive and send. Released is when the process/userspace side of the socket is released, but the sock object is still lingering, e.g. when there is a reference to the socket in an XSKMAP after process termination. The Rx fast-path code uses the "dev" member of struct xdp_sock to check whether a socket is bound or relased, and the Tx code uses the struct xdp_umem "xsk_list" member in conjunction with "dev" to determine the state of a socket. However, the transition from bound to released did not tear the socket down in correct order. On the Rx side "dev" was cleared after synchronize_net() making the synchronization useless. On the Tx side, the internal queues were destroyed prior removing them from the "xsk_list". This commit corrects the cleanup order, and by doing so xdp_del_sk_umem() can be simplified and one synchronize_net() can be removed. Fixes: 965a99098443 ("xsk: add support for bind for Rx") Fixes: ac98d8aab61b ("xsk: wire upp Tx zero-copy functions") Reported-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: Björn Töpel <bjorn.topel@intel.com> Acked-by: Song Liu <songliubraving@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-10-08iwlwifi: mvm: kill INACTIVE queue stateJohannes Berg
We don't really need this state: instead of having an inactive state where we can awaken zombie queues again if needed, just keep them in their normal state unless a new queue is actually needed and there's no other way of getting one. We do this here by making the inactivity check not free queues unless instructed that we now really need to allocate one to a specific station, and in that case it'll just free the queue immediately, without doing any inactivity step inbetween. The only downside is a little bit more processing in this case, but the code complexity is lower. Additionally, this fixes a corner case: due to the way the code worked, we could only ever reuse an inactive queue if it was the reserved queue for a station, as iwl_mvm_find_free_queue() would never consider returning an inactive queue. Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
2018-10-08iwlwifi: mvm: move iwl_mvm_sta_alloc_queue() downJohannes Berg
We want to call iwl_mvm_inactivity_check() from here in the next patch, so need to move the code down to be able to. Fix a minor checkpatch complaint while at it. Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
2018-10-08mac80211_hwsim: drop now unused work-queue from hwsimMartin Willi
The work-queue was used for deferred destruction of hwsim radios; this does not work well with namespaces about to exit. The one remaining user has been migrated, so drop the now unused work-queue instance. Signed-off-by: Martin Willi <martin@strongswan.org> Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2018-10-08iwlwifi: mvm: make iwl_mvm_scd_queue_redirect() staticJohannes Berg
This function is only used in the file where it's declared, so just make it static. Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
2018-10-08iwlwifi: mvm: make queue TID change more explicitJohannes Berg
Instead of iterating all the queues after having potentially changed some queue configurations, rechecking if that was done, mark the ones that do need a TID change explicitly in a bitmap and use that to send the change to the firmware. While at it, also rename iwl_mvm_change_queue_owner() to iwl_mvm_change_queue_tid() since that's more obvious - the "kind" of owner isn't immediately clear right now. Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
2018-10-08Merge remote-tracking branch 'net-next/master' into mac80211-nextJohannes Berg
Merge net-next, which pulled in net, so I can merge a few more patches that would otherwise conflict. Signed-off-by: Johannes Berg <johannes.berg@intel.com>