summaryrefslogtreecommitdiff
path: root/include/uapi
AgeCommit message (Collapse)Author
2018-04-04IB/uverbs: Add flow_action create and destroy verbsMatan Barak
A verbs application may receive and transmits packets using a data path pipeline. Sometimes, the first stage in the receive pipeline or the last stage in the transmit pipeline involves transforming a packet, either in order to make it easier for later stages to process it or to prepare it for transmission over the wire. Such transformation could be stripping/encapsulating the packet (i.e. vxlan), decrypting/encrypting it (i.e. ipsec), altering headers, doing some complex FPGA changes, etc. Some hardware could do such transformations without software data path intervention at all. The flow steering API supports steering a packet (either to a QP or dropping it) and some simple packet immutable actions (i.e. tagging a packet). Complex actions, that may change the packet, could bloat the flow steering API extensively. Sometimes the same action should be applied to several flows. In this case, it's easier to bind several flows to the same action and modify it than change all matching flows. Introducing a new flow_action object that abstracts any packet transformation (out of a standard and well defined set of actions). This flow_action object could be tied to a flow steering rule via a new specification. Currently, we support esp flow_action, which encrypts or decrypts a packet according to the given parameters. However, we present a flexible schema that could be used to other transformation actions tied to flow rules. Reviewed-by: Yishai Hadas <yishaih@mellanox.com> Signed-off-by: Matan Barak <matanb@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-04-04IB/uverbs: Add enum attribute type to ioctl() interfaceMatan Barak
Methods sometimes need to get one attribute out of a group of pre-defined attributes. This is an enum-like behavior. Since this is a common requirement, we add a new ENUM attribute to the generic uverbs ioctl() layer. This attribute is embedded in methods, like any other attributes we currently have. ENUM attributes point to an array of standard UVERBS_ATTR_PTR_IN. The user-space encodes the enum's attribute id in the id field and the internal PTR_IN attr id in the enum_data.elem_id field. This ENUM attribute could be shared by several attributes and it can get UVERBS_ATTR_SPEC_F_MANDATORY flag, stating this attribute must be supported by the kernel, like any other attribute. Reviewed-by: Yishai Hadas <yishaih@mellanox.com> Signed-off-by: Matan Barak <matanb@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-04-04dm: hold DM table for duration of ioctl rather than use blkdev_getMike Snitzer
Commit 519049afead ("dm: use blkdev_get rather than bdgrab when issuing pass-through ioctl") inadvertantly introduced a regression relative to users of device cgroups that issue ioctls (e.g. libvirt). Using blkdev_get() in DM's passthrough ioctl support implicitly introduced a cgroup permissions check that would fail unless care were taken to add all devices in the IO stack to the device cgroup. E.g. rather than just adding the top-level DM multipath device to the cgroup all the underlying devices would need to be allowed. Fix this, to no longer require allowing all underlying devices, by simply holding the live DM table (which includes the table's original blkdev_get() reference on the blockdevice that the ioctl will be issued to) for the duration of the ioctl. Also, bump the DM ioctl version so a user can know that their device cgroup allow workaround is no longer needed. Reported-by: Michal Privoznik <mprivozn@redhat.com> Suggested-by: Mikulas Patocka <mpatocka@redhat.com> Fixes: 519049afead ("dm: use blkdev_get rather than bdgrab when issuing pass-through ioctl") Cc: stable@vger.kernel.org # 4.16 Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2018-04-03Merge tag 'media/v4.17-1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media Pull media updates from Mauro Carvalho Chehab: - new CEC pin injection code for testing purposes - DVB frontend cxd2099 promoted from staging - new platform driver for Sony cxd2880 DVB devices - new sensor drivers: mt9t112, ov2685, ov5695, ov772x, tda1997x, tw9910.c - removal of unused cx18 and ivtv alsa mixers - the reneseas-ceu driver doesn't depend on soc_camera anymore and moved from staging - removed the mantis_vp3028 driver, unused since 2009 - s5p-mfc: add support for version 10 of the MSP - added a decoder for imon protocol - atomisp: lots of cleanups - imx074 and mt9t031: don't depend on soc_camera anymore, being promoted from staging - added helper functions to better support DVB I2C binding - lots of driver improvements and cleanups * tag 'media/v4.17-1' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media: (438 commits) media: v4l2-ioctl: rename a temp var that stores _IOC_SIZE(cmd) media: fimc-capture: get rid of two warnings media: dvb-usb-v2: fix a missing dependency of I2C_MUX media: uvc: to the right check at uvc_ioctl_enum_framesizes() media: cec-core: fix a bug at cec_error_inj_write() media: tda9840: cleanup a warning media: tm6000: avoid casting just to print pointer address media: em28xx-input: improve error handling code media: zr364xx: avoid casting just to print pointer address media: vivid-radio-rx: add a cast to avoid a warning media: saa7134-alsa: don't use casts to print a buffer address media: solo6x10: get rid of an address space warning media: zoran: don't cast pointers to print them media: ir-kbd-i2c: change the if logic to avoid a warning media: ir-kbd-i2c: improve error handling code media: saa7134-input: improve error handling media: s2255drv: fix a casting warning media: ivtvfb: Cleanup some warnings media: videobuf-dma-sg: Fix a weird cast soc_camera: fix a weird cast on printk ...
2018-04-03Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc-nextLinus Torvalds
Pull sparc updates from David Miller: 1) Add support for ADI (Application Data Integrity) found in more recent sparc64 cpus. Essentially this is keyed based access to virtual memory, and if the key encoded in the virual address is wrong you get a trap. The mm changes were reviewed by Andrew Morton and others. Work by Khalid Aziz. 2) Validate DAX completion index range properly, from Rob Gardner. 3) Add proper Kconfig deps for DAX driver. From Guenter Roeck. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc-next: sparc64: Make atomic_xchg() an inline function rather than a macro. sparc64: Properly range check DAX completion index sparc: Make auxiliary vectors for ADI available on 32-bit as well sparc64: Oracle DAX driver depends on SPARC64 sparc64: Update signal delivery to use new helper functions sparc64: Add support for ADI (Application Data Integrity) mm: Allow arch code to override copy_highpage() mm: Clear arch specific VM flags on protection change mm: Add address parameter to arch_validate_prot() sparc64: Add auxiliary vectors to report platform ADI properties sparc64: Add handler for "Memory Corruption Detected" trap sparc64: Add HV fault type handlers for ADI related faults sparc64: Add support for ADI register fields, ASIs and traps mm, swap: Add infrastructure for saving page metadata on swap signals, sparc: Add signal codes for ADI violations
2018-04-03Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-nextLinus Torvalds
Pull networking updates from David Miller: 1) Support offloading wireless authentication to userspace via NL80211_CMD_EXTERNAL_AUTH, from Srinivas Dasari. 2) A lot of work on network namespace setup/teardown from Kirill Tkhai. Setup and cleanup of namespaces now all run asynchronously and thus performance is significantly increased. 3) Add rx/tx timestamping support to mv88e6xxx driver, from Brandon Streiff. 4) Support zerocopy on RDS sockets, from Sowmini Varadhan. 5) Use denser instruction encoding in x86 eBPF JIT, from Daniel Borkmann. 6) Support hw offload of vlan filtering in mvpp2 dreiver, from Maxime Chevallier. 7) Support grafting of child qdiscs in mlxsw driver, from Nogah Frankel. 8) Add packet forwarding tests to selftests, from Ido Schimmel. 9) Deal with sub-optimal GSO packets better in BBR congestion control, from Eric Dumazet. 10) Support 5-tuple hashing in ipv6 multipath routing, from David Ahern. 11) Add path MTU tests to selftests, from Stefano Brivio. 12) Various bits of IPSEC offloading support for mlx5, from Aviad Yehezkel, Yossi Kuperman, and Saeed Mahameed. 13) Support RSS spreading on ntuple filters in SFC driver, from Edward Cree. 14) Lots of sockmap work from John Fastabend. Applications can use eBPF to filter sendmsg and sendpage operations. 15) In-kernel receive TLS support, from Dave Watson. 16) Add XDP support to ixgbevf, this is significant because it should allow optimized XDP usage in various cloud environments. From Tony Nguyen. 17) Add new Intel E800 series "ice" ethernet driver, from Anirudh Venkataramanan et al. 18) IP fragmentation match offload support in nfp driver, from Pieter Jansen van Vuuren. 19) Support XDP redirect in i40e driver, from Björn Töpel. 20) Add BPF_RAW_TRACEPOINT program type for accessing the arguments of tracepoints in their raw form, from Alexei Starovoitov. 21) Lots of striding RQ improvements to mlx5 driver with many performance improvements, from Tariq Toukan. 22) Use rhashtable for inet frag reassembly, from Eric Dumazet. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1678 commits) net: mvneta: improve suspend/resume net: mvneta: split rxq/txq init and txq deinit into SW and HW parts ipv6: frags: fix /proc/sys/net/ipv6/ip6frag_low_thresh net: bgmac: Fix endian access in bgmac_dma_tx_ring_free() net: bgmac: Correctly annotate register space route: check sysctl_fib_multipath_use_neigh earlier than hash fix typo in command value in drivers/net/phy/mdio-bitbang. sky2: Increase D3 delay to sky2 stops working after suspend net/mlx5e: Set EQE based as default TX interrupt moderation mode ibmvnic: Disable irqs before exiting reset from closed state net: sched: do not emit messages while holding spinlock vlan: also check phy_driver ts_info for vlan's real device Bluetooth: Mark expected switch fall-throughs Bluetooth: Set HCI_QUIRK_SIMULTANEOUS_DISCOVERY for BTUSB_QCA_ROME Bluetooth: btrsi: remove unused including <linux/version.h> Bluetooth: hci_bcm: Remove DMI quirk for the MINIX Z83-4 sh_eth: kill useless check in __sh_eth_get_regs() sh_eth: add sh_eth_cpu_data::no_xdfar flag ipv6: factorize sk_wmem_alloc updates done by __ip6_append_data() ipv4: factorize sk_wmem_alloc updates done by __ip_append_data() ...
2018-04-03RDMA/mlx5: Fix definition of mlx5_ib_create_qp_respJason Gunthorpe
This structure is pushed down the ex and the non-ex path, so it needs to be aligned to 8 bytes to go through ex without implicit padding. Old user space will provide 4 bytes of resp on !ex and 8 bytes on ex, so take the approach of just copying the minimum length. New user space will consistently provide 8 bytes in both cases. Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-04-03Merge tag 'for-linus-4.17' of git://github.com/cminyard/linux-ipmiLinus Torvalds
Pull IPMI updates from Corey Minyard: "Mostly small changes, as usual. This does add an IPMI BMC server-side driver, to allow a Linux system to act as an IPMI controller. That's the biggest change, but it is just a new driver that is fairly narrow in use. The other largish change is removing ACPI SPMI probe support, which should have never really been there in the beginning" * tag 'for-linus-4.17' of git://github.com/cminyard/linux-ipmi: ipmi/parisc: Add IPMI chassis poweroff for certain HP PA-RISC and IA-64 servers ipmi_ssif: Fix kernel panic at msg_done_handler ipmi:pci: Blacklist a Realtek "IPMI" device ipmi: Remove ACPI SPMI probing from the system interface driver ipmi: Remove ACPI SPMI probing from the SSIF (I2C) driver ipmi: missing error code in try_smi_init() ipmi: use ARRAY_SIZE for poweroff_functions array sizing calculation ipmi: Consolidate cleanup code ipmi: Remove some unnecessary initializations ipmi: Fix some error cleanup issues ipmi: Add or fix SPDX-License-Identifier in all files ipmi: Re-use existing macros for built-in properties ipmi:pci: Make the PCI defines consistent with normal Linux ones ipmi: kcs_bmc: coding-style fixes and use new poll type char/ipmi: add documentation for sysfs interface ipmi: kcs_bmc: mark expected switch fall-through in kcs_bmc_handle_data ipmi: add an Aspeed KCS IPMI BMC driver ipmi: add a KCS IPMI BMC driver
2018-04-03dm: allow targets to return output from messages they are sentMike Snitzer
Could be useful for a target to return stats or other information. If a target does DMEMIT() anything to @result from its .message method then it must return 1 to the caller. Signed-off-By: Mike Snitzer <snitzer@redhat.com>
2018-04-02Merge tag 'arch-removal' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic Pul removal of obsolete architecture ports from Arnd Bergmann: "This removes the entire architecture code for blackfin, cris, frv, m32r, metag, mn10300, score, and tile, including the associated device drivers. I have been working with the (former) maintainers for each one to ensure that my interpretation was right and the code is definitely unused in mainline kernels. Many had fond memories of working on the respective ports to start with and getting them included in upstream, but also saw no point in keeping the port alive without any users. In the end, it seems that while the eight architectures are extremely different, they all suffered the same fate: There was one company in charge of an SoC line, a CPU microarchitecture and a software ecosystem, which was more costly than licensing newer off-the-shelf CPU cores from a third party (typically ARM, MIPS, or RISC-V). It seems that all the SoC product lines are still around, but have not used the custom CPU architectures for several years at this point. In contrast, CPU instruction sets that remain popular and have actively maintained kernel ports tend to all be used across multiple licensees. [ See the new nds32 port merged in the previous commit for the next generation of "one company in charge of an SoC line, a CPU microarchitecture and a software ecosystem" - Linus ] The removal came out of a discussion that is now documented at https://lwn.net/Articles/748074/. Unlike the original plans, I'm not marking any ports as deprecated but remove them all at once after I made sure that they are all unused. Some architectures (notably tile, mn10300, and blackfin) are still being shipped in products with old kernels, but those products will never be updated to newer kernel releases. After this series, we still have a few architectures without mainline gcc support: - unicore32 and hexagon both have very outdated gcc releases, but the maintainers promised to work on providing something newer. At least in case of hexagon, this will only be llvm, not gcc. - openrisc, risc-v and nds32 are still in the process of finishing their support or getting it added to mainline gcc in the first place. They all have patched gcc-7.3 ports that work to some degree, but complete upstream support won't happen before gcc-8.1. Csky posted their first kernel patch set last week, their situation will be similar [ Palmer Dabbelt points out that RISC-V support is in mainline gcc since gcc-7, although gcc-7.3.0 is the recommended minimum - Linus ]" This really says it all: 2498 files changed, 95 insertions(+), 467668 deletions(-) * tag 'arch-removal' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic: (74 commits) MAINTAINERS: UNICORE32: Change email account staging: iio: remove iio-trig-bfin-timer driver tty: hvc: remove tile driver tty: remove bfin_jtag_comm and hvc_bfin_jtag drivers serial: remove tile uart driver serial: remove m32r_sio driver serial: remove blackfin drivers serial: remove cris/etrax uart drivers usb: Remove Blackfin references in USB support usb: isp1362: remove blackfin arch glue usb: musb: remove blackfin port usb: host: remove tilegx platform glue pwm: remove pwm-bfin driver i2c: remove bfin-twi driver spi: remove blackfin related host drivers watchdog: remove bfin_wdt driver can: remove bfin_can driver mmc: remove bfin_sdh driver input: misc: remove blackfin rotary driver input: keyboard: remove bf54x driver ...
2018-04-02signal: Correct the offset of si_pkey and si_lower in struct siginfo on m68kEric W. Biederman
The change moving addr_lsb into the _sigfault union failed to take into account that _sigfault._addr_bnd._lower being a pointer forced the entire union to have pointer alignment. The fix for _sigfault._addr_bnd._lower having pointer alignment failed to take into account that m68k has a pointer alignment less than the size of a pointer. So simply making the padding members pointers changed the location of later members in the structure. Fix this by directly computing the needed size of the padding members, and making the padding members char arrays of the needed size. AKA if __alignof__(void *) is 1 sizeof(short) otherwise __alignof__(void *). Which should be exactly the same rules the compiler whould have used when computing the padding. I have tested this change by adding BUILD_BUG_ONs to m68k to verify the offset of every member of struct siginfo, and with those testing that the offsets of the fields in struct siginfo is the same before I changed the generic _sigfault member and after the correction to the _sigfault member. I have also verified that the x86 with it's own BUILD_BUG_ONs to verify the offsets of the siginfo members also compiles cleanly. Cc: stable@vger.kernel.org Reported-by: Eugene Syromiatnikov <esyr@redhat.com> Fixes: 859d880cf544 ("signal: Correct the offset of si_pkey in struct siginfo") Fixes: b68a68d3dcc1 ("signal: Move addr_lsb into the _sigfault union for clarity") Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2018-04-02Merge branch 'perf-core-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull perf updates from Ingo Molnar: "The main kernel side changes were: - Modernize the kprobe and uprobe creation/destruction tooling ABIs: The existing text based APIs (kprobe_events and uprobe_events in tracefs), are naive, limited ABIs in that they require user-space to clean up after themselves, which is both difficult and fragile if the tool is buggy or exits unexpectedly. In other words they are not really suited for modern, robust tooling. So introduce a modern, file descriptor based ABI that does not have these limitations: introduce the 'perf_kprobe' and 'perf_uprobe' PMUs and extend the perf_event_open() syscall to create events with a kprobe/uprobe attached to them. These [k,u]probe are associated with this file descriptor, so they are not available in tracefs. (Song Liu) - Intel Cannon Lake CPU support (Harry Pan) - Intel PT cleanups (Alexander Shishkin) - Improve the performance of pinned/flexible event groups by using RB trees (Alexey Budankov) - Add PERF_EVENT_IOC_MODIFY_ATTRIBUTES which allows the modification of hardware breakpoints, which new ABI variant massively speeds up existing tooling that uses hardware breakpoints to instrument (and debug) memory usage. (Milind Chabbi, Jiri Olsa) - Various Intel PEBS handling fixes and improvements, and other Intel PMU improvements (Kan Liang) - Various perf core improvements and optimizations (Peter Zijlstra) - ... misc cleanups, fixes and updates. There's over 200 tooling commits, here's an (imperfect) list of highlights: - 'perf annotate' improvements: * Recognize and handle jumps to other functions as calls, which improves the navigation along jumps and back. (Arnaldo Carvalho de Melo) * Add the 'P' hotkey in TUI annotation to dump annotation output into a file, to ease e-mail reporting of annotation details. (Arnaldo Carvalho de Melo) * Add an IPC/cycles column to the TUI (Jin Yao) * Improve s390 assembly annotation (Thomas Richter) * Refactor the output formatting logic to better separate it into interactive and non-interactive features and add the --stdio2 output variant to demonstrate this. (Arnaldo Carvalho de Melo) - 'perf script' improvements: * Add Python 3 support (Jaroslav Škarvada) * Add --show-round-event (Jiri Olsa) - 'perf c2c' improvements: * Add NUMA analysis support (Jiri Olsa) - 'perf trace' improvements: * Improve PowerPC support (Ravi Bangoria) - 'perf inject' improvements: * Integrate ARM CoreSight traces (Robert Walker) - 'perf stat' improvements: * Add the --interval-count option (yuzhoujian) * Add the --timeout option (yuzhoujian) - 'perf sched' improvements (Changbin Du) - Vendor events improvements : * Add IBM s390 vendor events (Thomas Richter) * Add and improve arm64 vendor events (John Garry, Ganapatrao Kulkarni) * Update POWER9 vendor events (Sukadev Bhattiprolu) - Intel PT tooling improvements (Adrian Hunter) - PMU handling improvements (Agustin Vega-Frias) - Record machine topology in perf.data (Jiri Olsa) - Various overwrite related cleanups (Kan Liang) - Add arm64 dwarf post unwind support (Kim Phillips, Jean Pihet) - ... and lots of other changes, cleanups and fixes, see the shortlog and Git history for details" * 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (262 commits) perf/x86/intel: Enable C-state residency events for Cannon Lake perf/x86/intel: Add Cannon Lake support for RAPL profiling perf/x86/pt, coresight: Clean up address filter structure perf vendor events s390: Add JSON files for IBM z14 perf vendor events s390: Add JSON files for IBM z13 perf vendor events s390: Add JSON files for IBM zEC12 zBC12 perf vendor events s390: Add JSON files for IBM z196 perf vendor events s390: Add JSON files for IBM z10EC z10BC perf mmap: Be consistent when checking for an unmaped ring buffer perf mmap: Fix accessing unmapped mmap in perf_mmap__read_done() perf build: Fix check-headers.sh opts assignment perf/x86: Update rdpmc_always_available static key to the modern API perf annotate: Use absolute addresses to calculate jump target offsets perf annotate: Defer searching for comma in raw line till it is needed perf annotate: Support jumping from one function to another perf annotate: Add "_local" to jump/offset validation routines perf python: Reference Py_None before returning it perf annotate: Mark jumps to outher functions with the call arrow perf annotate: Pass function descriptor to its instruction parsing routines perf annotate: No need to calculate notes->start twice ...
2018-04-02Merge tag 'asoc-v4.17' of ↵Takashi Iwai
git://git.kernel.org/pub/scm/linux/kernel/git/broonie/sound into for-linus ASoC: Updates for v4.17 This is a *very* big release for ASoC. Not much change in the core but there s the transition of all the individual drivers over to components which is intended to support further core work. The goal is to make it easier to do further core work by removing the need to special case all the different driver classes in the core, many of the devices end up being used in multiple roles in modern systems. We also have quite a lot of new drivers added this month of all kinds, quite a few for simple devices but also some more advanced ones with more substantial code. - The biggest thing is the huge series from Morimoto-san which converted everything over to components. This is a huge change by code volume but was fairly mechanical - Many fixes for some of the Realtek based Baytrail systems covering both the CODECs and the CPUs, contributed by Hans de Goode. - Lots of cleanups for Samsung based Odroid systems from Sylwester Nawrocki. - The Freescale SSI driver also got a lot of cleanups from Nicolin Chen. - The Blackfin drivers have been removed as part of the removal of the architecture. - New drivers for AKM AK4458 and AK5558, several AMD based machines, several Intel based machines, Maxim MAX9759, Motorola CPCAP, Socionext Uniphier SoCs, and TI PCM1789 and TDA7419
2018-04-01Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller
Minor conflicts in drivers/net/ethernet/mellanox/mlx5/core/en_rep.c, we had some overlapping changes: 1) In 'net' MLX5E_PARAMS_LOG_{SQ,RQ}_SIZE --> MLX5E_REP_PARAMS_LOG_{SQ,RQ}_SIZE 2) In 'net-next' params->log_rq_size is renamed to be params->log_rq_mtu_frames. 3) In 'net-next' params->hard_mtu is added. Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-31Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-nextDavid S. Miller
Daniel Borkmann says: ==================== pull-request: bpf-next 2018-03-31 The following pull-request contains BPF updates for your *net-next* tree. The main changes are: 1) Add raw BPF tracepoint API in order to have a BPF program type that can access kernel internal arguments of the tracepoints in their raw form similar to kprobes based BPF programs. This infrastructure also adds a new BPF_RAW_TRACEPOINT_OPEN command to BPF syscall which returns an anon-inode backed fd for the tracepoint object that allows for automatic detach of the BPF program resp. unregistering of the tracepoint probe on fd release, from Alexei. 2) Add new BPF cgroup hooks at bind() and connect() entry in order to allow BPF programs to reject, inspect or modify user space passed struct sockaddr, and as well a hook at post bind time once the port has been allocated. They are used in FB's container management engine for implementing policy, replacing fragile LD_PRELOAD wrapper intercepting bind() and connect() calls that only works in limited scenarios like glibc based apps but not for other runtimes in containerized applications, from Andrey. 3) BPF_F_INGRESS flag support has been added to sockmap programs for their redirect helper call bringing it in line with cls_bpf based programs. Support is added for both variants of sockmap programs, meaning for tx ULP hooks as well as recv skb hooks, from John. 4) Various improvements on BPF side for the nfp driver, besides others this work adds BPF map update and delete helper call support from the datapath, JITing of 32 and 64 bit XADD instructions as well as offload support of bpf_get_prandom_u32() call. Initial implementation of nfp packet cache has been tackled that optimizes memory access (see merge commit for further details), from Jakub and Jiong. 5) Removal of struct bpf_verifier_env argument from the print_bpf_insn() API has been done in order to prepare to use print_bpf_insn() soon out of perf tool directly. This makes the print_bpf_insn() API more generic and pushes the env into private data. bpftool is adjusted as well with the print_bpf_insn() argument removal, from Jiri. 6) Couple of cleanups and prep work for the upcoming BTF (BPF Type Format). The latter will reuse the current BPF verifier log as well, thus bpf_verifier_log() is further generalized, from Martin. 7) For bpf_getsockopt() and bpf_setsockopt() helpers, IPv4 IP_TOS read and write support has been added in similar fashion to existing IPv6 IPV6_TCLASS socket option we already have, from Nikita. 8) Fixes in recent sockmap scatterlist API usage, which did not use sg_init_table() for initialization thus triggering a BUG_ON() in scatterlist API when CONFIG_DEBUG_SG was enabled. This adds and uses a small helper sg_init_marker() to properly handle the affected cases, from Prashant. 9) Let the BPF core follow IDR code convention and therefore use the idr_preload() and idr_preload_end() helpers, which would also help idr_alloc_cyclic() under GFP_ATOMIC to better succeed under memory pressure, from Shaohua. 10) Last but not least, a spelling fix in an error message for the BPF cookie UID helper under BPF sample code, from Colin. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-31tipc: avoid possible string overflowJon Maloy
gcc points out that the combined length of the fixed-length inputs to l->name is larger than the destination buffer size: net/tipc/link.c: In function 'tipc_link_create': net/tipc/link.c:465:26: error: '%s' directive writing up to 32 bytes into a region of size between 26 and 58 [-Werror=format-overflow=] sprintf(l->name, "%s:%s-%s:unknown", self_str, if_name, peer_str); net/tipc/link.c:465:2: note: 'sprintf' output 11 or more bytes (assuming 75) into a destination of size 60 sprintf(l->name, "%s:%s-%s:unknown", self_str, if_name, peer_str); A detailed analysis reveals that the theoretical maximum length of a link name is: max self_str + 1 + max if_name + 1 + max peer_str + 1 + max if_name = 16 + 1 + 15 + 1 + 16 + 1 + 15 = 65 Since we also need space for a trailing zero we now set MAX_LINK_NAME to 68. Just to be on the safe side we also replace the sprintf() call with snprintf(). Fixes: 25b0b9c4e835 ("tipc: handle collisions of 32-bit node address hash values") Reported-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-31tipc: tipc: rename address types in user apiJon Maloy
The three address type structs in the user API have names that in reality reflect the specific, non-Linux environment where they were originally created. We now give them more intuitive names, in accordance with how TIPC is described in the current documentation. struct tipc_portid -> struct tipc_socket_addr struct tipc_name -> struct tipc_service_addr struct tipc_name_seq -> struct tipc_service_range To avoid confusion, we also update some commmets and macro names to match the new terminology. For compatibility, we add macros that map all old names to the new ones. Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-31bpf: Post-hooks for sys_bindAndrey Ignatov
"Post-hooks" are hooks that are called right before returning from sys_bind. At this time IP and port are already allocated and no further changes to `struct sock` can happen before returning from sys_bind but BPF program has a chance to inspect the socket and change sys_bind result. Specifically it can e.g. inspect what port was allocated and if it doesn't satisfy some policy, BPF program can force sys_bind to fail and return EPERM to user. Another example of usage is recording the IP:port pair to some map to use it in later calls to sys_connect. E.g. if some TCP server inside cgroup was bound to some IP:port_n, it can be recorded to a map. And later when some TCP client inside same cgroup is trying to connect to 127.0.0.1:port_n, BPF hook for sys_connect can override the destination and connect application to IP:port_n instead of 127.0.0.1:port_n. That helps forcing all applications inside a cgroup to use desired IP and not break those applications if they e.g. use localhost to communicate between each other. == Implementation details == Post-hooks are implemented as two new attach types `BPF_CGROUP_INET4_POST_BIND` and `BPF_CGROUP_INET6_POST_BIND` for existing prog type `BPF_PROG_TYPE_CGROUP_SOCK`. Separate attach types for IPv4 and IPv6 are introduced to avoid access to IPv6 field in `struct sock` from `inet_bind()` and to IPv4 field from `inet6_bind()` since those fields might not make sense in such cases. Signed-off-by: Andrey Ignatov <rdna@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-03-31bpf: Hooks for sys_connectAndrey Ignatov
== The problem == See description of the problem in the initial patch of this patch set. == The solution == The patch provides much more reliable in-kernel solution for the 2nd part of the problem: making outgoing connecttion from desired IP. It adds new attach types `BPF_CGROUP_INET4_CONNECT` and `BPF_CGROUP_INET6_CONNECT` for program type `BPF_PROG_TYPE_CGROUP_SOCK_ADDR` that can be used to override both source and destination of a connection at connect(2) time. Local end of connection can be bound to desired IP using newly introduced BPF-helper `bpf_bind()`. It allows to bind to only IP though, and doesn't support binding to port, i.e. leverages `IP_BIND_ADDRESS_NO_PORT` socket option. There are two reasons for this: * looking for a free port is expensive and can affect performance significantly; * there is no use-case for port. As for remote end (`struct sockaddr *` passed by user), both parts of it can be overridden, remote IP and remote port. It's useful if an application inside cgroup wants to connect to another application inside same cgroup or to itself, but knows nothing about IP assigned to the cgroup. Support is added for IPv4 and IPv6, for TCP and UDP. IPv4 and IPv6 have separate attach types for same reason as sys_bind hooks, i.e. to prevent reading from / writing to e.g. user_ip6 fields when user passes sockaddr_in since it'd be out-of-bound. == Implementation notes == The patch introduces new field in `struct proto`: `pre_connect` that is a pointer to a function with same signature as `connect` but is called before it. The reason is in some cases BPF hooks should be called way before control is passed to `sk->sk_prot->connect`. Specifically `inet_dgram_connect` autobinds socket before calling `sk->sk_prot->connect` and there is no way to call `bpf_bind()` from hooks from e.g. `ip4_datagram_connect` or `ip6_datagram_connect` since it'd cause double-bind. On the other hand `proto.pre_connect` provides a flexible way to add BPF hooks for connect only for necessary `proto` and call them at desired time before `connect`. Since `bpf_bind()` is allowed to bind only to IP and autobind in `inet_dgram_connect` binds only port there is no chance of double-bind. bpf_bind() sets `force_bind_address_no_port` to bind to only IP despite of value of `bind_address_no_port` socket field. bpf_bind() sets `with_lock` to `false` when calling to __inet_bind() and __inet6_bind() since all call-sites, where bpf_bind() is called, already hold socket lock. Signed-off-by: Andrey Ignatov <rdna@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-03-31bpf: Hooks for sys_bindAndrey Ignatov
== The problem == There is a use-case when all processes inside a cgroup should use one single IP address on a host that has multiple IP configured. Those processes should use the IP for both ingress and egress, for TCP and UDP traffic. So TCP/UDP servers should be bound to that IP to accept incoming connections on it, and TCP/UDP clients should make outgoing connections from that IP. It should not require changing application code since it's often not possible. Currently it's solved by intercepting glibc wrappers around syscalls such as `bind(2)` and `connect(2)`. It's done by a shared library that is preloaded for every process in a cgroup so that whenever TCP/UDP server calls `bind(2)`, the library replaces IP in sockaddr before passing arguments to syscall. When application calls `connect(2)` the library transparently binds the local end of connection to that IP (`bind(2)` with `IP_BIND_ADDRESS_NO_PORT` to avoid performance penalty). Shared library approach is fragile though, e.g.: * some applications clear env vars (incl. `LD_PRELOAD`); * `/etc/ld.so.preload` doesn't help since some applications are linked with option `-z nodefaultlib`; * other applications don't use glibc and there is nothing to intercept. == The solution == The patch provides much more reliable in-kernel solution for the 1st part of the problem: binding TCP/UDP servers on desired IP. It does not depend on application environment and implementation details (whether glibc is used or not). It adds new eBPF program type `BPF_PROG_TYPE_CGROUP_SOCK_ADDR` and attach types `BPF_CGROUP_INET4_BIND` and `BPF_CGROUP_INET6_BIND` (similar to already existing `BPF_CGROUP_INET_SOCK_CREATE`). The new program type is intended to be used with sockets (`struct sock`) in a cgroup and provided by user `struct sockaddr`. Pointers to both of them are parts of the context passed to programs of newly added types. The new attach types provides hooks in `bind(2)` system call for both IPv4 and IPv6 so that one can write a program to override IP addresses and ports user program tries to bind to and apply such a program for whole cgroup. == Implementation notes == [1] Separate attach types for `AF_INET` and `AF_INET6` are added intentionally to prevent reading/writing to offsets that don't make sense for corresponding socket family. E.g. if user passes `sockaddr_in` it doesn't make sense to read from / write to `user_ip6[]` context fields. [2] The write access to `struct bpf_sock_addr_kern` is implemented using special field as an additional "register". There are just two registers in `sock_addr_convert_ctx_access`: `src` with value to write and `dst` with pointer to context that can't be changed not to break later instructions. But the fields, allowed to write to, are not available directly and to access them address of corresponding pointer has to be loaded first. To get additional register the 1st not used by `src` and `dst` one is taken, its content is saved to `bpf_sock_addr_kern.tmp_reg`, then the register is used to load address of pointer field, and finally the register's content is restored from the temporary field after writing `src` value. Signed-off-by: Andrey Ignatov <rdna@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-03-31bpf: Check attach type at prog load timeAndrey Ignatov
== The problem == There are use-cases when a program of some type can be attached to multiple attach points and those attach points must have different permissions to access context or to call helpers. E.g. context structure may have fields for both IPv4 and IPv6 but it doesn't make sense to read from / write to IPv6 field when attach point is somewhere in IPv4 stack. Same applies to BPF-helpers: it may make sense to call some helper from some attach point, but not from other for same prog type. == The solution == Introduce `expected_attach_type` field in in `struct bpf_attr` for `BPF_PROG_LOAD` command. If scenario described in "The problem" section is the case for some prog type, the field will be checked twice: 1) At load time prog type is checked to see if attach type for it must be known to validate program permissions correctly. Prog will be rejected with EINVAL if it's the case and `expected_attach_type` is not specified or has invalid value. 2) At attach time `attach_type` is compared with `expected_attach_type`, if prog type requires to have one, and, if they differ, attach will be rejected with EINVAL. The `expected_attach_type` is now available as part of `struct bpf_prog` in both `bpf_verifier_ops->is_valid_access()` and `bpf_verifier_ops->get_func_proto()` () and can be used to check context accesses and calls to helpers correspondingly. Initially the idea was discussed by Alexei Starovoitov <ast@fb.com> and Daniel Borkmann <daniel@iogearbox.net> here: https://marc.info/?l=linux-netdev&m=152107378717201&w=2 Signed-off-by: Andrey Ignatov <rdna@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-03-30blktrace: fix comment in blktrace_api.hSouvik Banerjee
The `__u64 time` field of the blk_io_trace struct refers to the time in nanoseconds, not in microseconds. It is set in __blk_add_trace, which does the following: t->time = ktime_to_ns(ktime_get()); ktime_to_ns returns ktime_t in nanoseconds, not microseconds. Signed-off-by: Souvik Banerjee <souvik1997@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-03-30Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-nextDavid S. Miller
Pablo Neira Ayuso says: ==================== Netfilter/IPVS updates for net-next The following patchset contains Netfilter/IPVS updates for your net-next tree. This batch comes with more input sanitization for xtables to address bug reports from fuzzers, preparation works to the flowtable infrastructure and assorted updates. In no particular order, they are: 1) Make sure userspace provides a valid standard target verdict, from Florian Westphal. 2) Sanitize error target size, also from Florian. 3) Validate that last rule in basechain matches underflow/policy since userspace assumes this when decoding the ruleset blob that comes from the kernel, from Florian. 4) Consolidate hook entry checks through xt_check_table_hooks(), patch from Florian. 5) Cap ruleset allocations at 512 mbytes, 134217728 rules and reject very large compat offset arrays, so we have a reasonable upper limit and fuzzers don't exercise the oom-killer. Patches from Florian. 6) Several WARN_ON checks on xtables mutex helper, from Florian. 7) xt_rateest now has a hashtable per net, from Cong Wang. 8) Consolidate counter allocation in xt_counters_alloc(), from Florian. 9) Earlier xt_table_unlock() call in {ip,ip6,arp,eb}tables, patch from Xin Long. 10) Set FLOW_OFFLOAD_DIR_* to IP_CT_DIR_* definitions, patch from Felix Fietkau. 11) Consolidate code through flow_offload_fill_dir(), also from Felix. 12) Inline ip6_dst_mtu_forward() just like ip_dst_mtu_maybe_forward() to remove a dependency with flowtable and ipv6.ko, from Felix. 13) Cache mtu size in flow_offload_tuple object, this is safe for forwarding as f87c10a8aa1e describes, from Felix. 14) Rename nf_flow_table.c to nf_flow_table_core.o, to simplify too modular infrastructure, from Felix. 15) Add rt0, rt2 and rt4 IPv6 routing extension support, patch from Ahmed Abdelsalam. 16) Remove unused parameter in nf_conncount_count(), from Yi-Hung Wei. 17) Support for counting only to nf_conncount infrastructure, patch from Yi-Hung Wei. 18) Add strict NFT_CT_{SRC_IP,DST_IP,SRC_IP6,DST_IP6} key datatypes to nft_ct. 19) Use boolean as return value from ipt_ah and from IPVS too, patch from Gustavo A. R. Silva. 20) Remove useless parameters in nfnl_acct_overquota() and nf_conntrack_broadcast_help(), from Taehee Yoo. 21) Use ipv6_addr_is_multicast() from xt_cluster, also from Taehee Yoo. 22) Statify nf_tables_obj_lookup_byhandle, patch from Fengguang Wu. 23) Fix typo in xt_limit, from Geert Uytterhoeven. 24) Do no use VLAs in Netfilter code, again from Gustavo. 25) Use ADD_COUNTER from ebtables, from Taehee Yoo. 26) Bitshift support for CONNMARK and MARK targets, from Jack Ma. 27) Use pr_*() and add pr_fmt(), from Arushi Singhal. 28) Add synproxy support to ctnetlink. 29) ICMP type and IGMP matching support for ebtables, patches from Matthias Schiffer. 30) Support for the revision infrastructure to ebtables, from Bernie Harris. 31) String match support for ebtables, also from Bernie. 32) Documentation for the new flowtable infrastructure. 33) Use generic comparison functions in ebt_stp, from Joe Perches. 34) Demodularize filter chains in nftables. 35) Register conntrack hooks in case nftables NAT chain is added. 36) Merge assignments with return in a couple of spots in the Netfilter codebase, also from Arushi. 37) Document that xtables percpu counters are stored in the same memory area, from Ben Hutchings. 38) Revert mark_source_chains() sanity checks that break existing rulesets, from Florian Westphal. 39) Use is_zero_ether_addr() in the ipset codebase, from Joe Perches. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-30netfilter: ebtables: Add support for specifying match revisionBernie Harris
Currently ebtables assumes that the revision number of all match modules is 0, which is an issue when trying to use existing xtables matches with ebtables. The solution is to modify ebtables to allow extensions to specify a revision number, similar to iptables. This gets passed down to the kernel, which is then able to find the match module correctly. To main binary backwards compatibility, the size of the ebt_entry structures is not changed, only the size of the name field is decreased by 1 byte to make room for the revision field. Signed-off-by: Bernie Harris <bernie.harris@alliedtelesis.co.nz> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-03-29Merge tag 'mac80211-next-for-davem-2018-03-29' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211-next Johannes Berg says: ==================== We have a fair number of patches, but many of them are from the first bullet here: * EAPoL-over-nl80211 from Denis - this will let us fix some long-standing issues with bridging, races with encryption and more * DFS offload support from the qtnfmac folks * regulatory database changes for the new ETSI adaptivity requirements * various other fixes and small enhancements ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-29RDMA/nldev: Provide netdevice name and indexLeon Romanovsky
Export the net device name and index to easily find connection between IB devices and relevant net devices. We also updated the comment regarding the devices without FW. Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Reviewed-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-03-29Merge branch 'perf/urgent' into perf/coreIngo Molnar
Conflicts: kernel/events/hw_breakpoint.c Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-03-29Merge tag 'stm-intel_th-for-greg-20180329' of ↵Greg Kroah-Hartman
git://git.kernel.org/pub/scm/linux/kernel/git/ash/stm into char-misc-next Alexander writes: stm class/intel_th: Updates for 4.17 These are: * Mass conversion to GPL-2 SPDX header * Moved "hwtracing" to now its own submenu, to uncrowd the parent menu a bit * Added MAINTAINERS entry for drivers/hwtracing * Somewhat small Trace Hub fixes * Added ACPI glue layer for the Trace Hub * Added more module parameters to dummy_stm for better test coverage
2018-03-29nl80211: Add CONTROL_PORT_OVER_NL80211 attributeDenis Kenzior
Signed-off-by: Denis Kenzior <denkenz@gmail.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2018-03-29nl80211: Implement TX of control port framesDenis Kenzior
This commit implements the TX side of NL80211_CMD_CONTROL_PORT_FRAME. Userspace provides the raw EAPoL frame using NL80211_ATTR_FRAME. Userspace should also provide the destination address and the protocol type to use when sending the frame. This is used to implement TX of Pre-authentication frames. If CONTROL_PORT_ETHERTYPE_NO_ENCRYPT is specified, then the driver will be asked not to encrypt the outgoing frame. A new EXT_FEATURE flag is introduced so that nl80211 code can check whether a given wiphy has capability to pass EAPoL frames over nl80211. Signed-off-by: Denis Kenzior <denkenz@gmail.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2018-03-29nl80211: Add CMD_CONTROL_PORT_FRAME APIDenis Kenzior
This commit also adds cfg80211_rx_control_port function. This is used to generate a CMD_CONTROL_PORT_FRAME event out to userspace. The conn_owner_nlportid is used as the unicast destination. This means that userspace must specify NL80211_ATTR_SOCKET_OWNER flag if control port over nl80211 routing is requested in NL80211_CMD_CONNECT, NL80211_CMD_ASSOCIATE, NL80211_CMD_START_AP or IBSS/mesh join. Signed-off-by: Denis Kenzior <denkenz@gmail.com> [johannes: fix return value of cfg80211_rx_control_port()] Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2018-03-29nl80211: Add SOCKET_OWNER support to START_APDenis Kenzior
Signed-off-by: Denis Kenzior <denkenz@gmail.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2018-03-29nl80211: Add SOCKET_OWNER support to JOIN_MESHDenis Kenzior
Signed-off-by: Denis Kenzior <denkenz@gmail.com> [johannes: fix race with wdev lock/unlock by just acquiring once] Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2018-03-29nl80211: Add SOCKET_OWNER support to JOIN_IBSSDenis Kenzior
Signed-off-by: Denis Kenzior <denkenz@gmail.com> [johannes: fix race with wdev lock/unlock by just acquiring once] Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2018-03-28RDMA/CMA: Add rdma_port_space to UAPISteve Wise
Since the rdma_port_space enum is being passed between user and kernel for user cm_id setup, we need it in a UAPI header. So add it to rdma_user_cm.h. This also fixes the cm_id restrack changes which pass up the port space value via the RDMA_NLDEV_ATTR_RES_PS attribute. Fixes: 00313983cda6 ("RDMA/nldev: provide detailed CM_ID information") Signed-off-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-03-28bpf: introduce BPF_RAW_TRACEPOINTAlexei Starovoitov
Introduce BPF_PROG_TYPE_RAW_TRACEPOINT bpf program type to access kernel internal arguments of the tracepoints in their raw form. >From bpf program point of view the access to the arguments look like: struct bpf_raw_tracepoint_args { __u64 args[0]; }; int bpf_prog(struct bpf_raw_tracepoint_args *ctx) { // program can read args[N] where N depends on tracepoint // and statically verified at program load+attach time } kprobe+bpf infrastructure allows programs access function arguments. This feature allows programs access raw tracepoint arguments. Similar to proposed 'dynamic ftrace events' there are no abi guarantees to what the tracepoints arguments are and what their meaning is. The program needs to type cast args properly and use bpf_probe_read() helper to access struct fields when argument is a pointer. For every tracepoint __bpf_trace_##call function is prepared. In assembler it looks like: (gdb) disassemble __bpf_trace_xdp_exception Dump of assembler code for function __bpf_trace_xdp_exception: 0xffffffff81132080 <+0>: mov %ecx,%ecx 0xffffffff81132082 <+2>: jmpq 0xffffffff811231f0 <bpf_trace_run3> where TRACE_EVENT(xdp_exception, TP_PROTO(const struct net_device *dev, const struct bpf_prog *xdp, u32 act), The above assembler snippet is casting 32-bit 'act' field into 'u64' to pass into bpf_trace_run3(), while 'dev' and 'xdp' args are passed as-is. All of ~500 of __bpf_trace_*() functions are only 5-10 byte long and in total this approach adds 7k bytes to .text. This approach gives the lowest possible overhead while calling trace_xdp_exception() from kernel C code and transitioning into bpf land. Since tracepoint+bpf are used at speeds of 1M+ events per second this is valuable optimization. The new BPF_RAW_TRACEPOINT_OPEN sys_bpf command is introduced that returns anon_inode FD of 'bpf-raw-tracepoint' object. The user space looks like: // load bpf prog with BPF_PROG_TYPE_RAW_TRACEPOINT type prog_fd = bpf_prog_load(...); // receive anon_inode fd for given bpf_raw_tracepoint with prog attached raw_tp_fd = bpf_raw_tracepoint_open("xdp_exception", prog_fd); Ctrl-C of tracing daemon or cmdline tool that uses this feature will automatically detach bpf program, unload it and unregister tracepoint probe. On the kernel side the __bpf_raw_tp_map section of pointers to tracepoint definition and to __bpf_trace_*() probe function is used to find a tracepoint with "xdp_exception" name and corresponding __bpf_trace_xdp_exception() probe function which are passed to tracepoint_probe_register() to connect probe with tracepoint. Addition of bpf_raw_tracepoint doesn't interfere with ftrace and perf tracepoint mechanisms. perf_event_open() can be used in parallel on the same tracepoint. Multiple bpf_raw_tracepoint_open("xdp_exception", prog_fd) are permitted. Each with its own bpf program. The kernel will execute all tracepoint probes and all attached bpf programs. In the future bpf_raw_tracepoints can be extended with query/introspection logic. __bpf_raw_tp_map section logic was contributed by Steven Rostedt Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Acked-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-03-28stm class: Make dummy's master/channel ranges configurableAlexander Shishkin
To allow for more flexible testing of the stm class, make it possible to specify the ranges of masters and channels that the dummy_stm devices cover. This is done via module parameters. Signed-off-by: Alexander Shishkin <alexander.shishkin@linux.intel.com>
2018-03-28stm class: Add SPDX GPL-2.0 header to replace GPLv2 boilerplateAlexander Shishkin
This adds SPDX GPL-2.0 header to to stm core files and removes the GPLv2 boilerplate text. Signed-off-by: Alexander Shishkin <alexander.shishkin@linux.intel.com>
2018-03-28Merge 4.16-rc7 into staging-nextGreg Kroah-Hartman
We want the IIO and staging driver fixes in here as well. Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-03-28Merge tag 'drm-amdkfd-next-2018-03-27' of ↵Dave Airlie
git://people.freedesktop.org/~gabbayo/linux into drm-next - GPUVM support for dGPUs - KFD events support for dGPUs - Fix live-lock situation when restoring multiple evicted processes - Fix VM page table allocation on large-bar systems - Fix for build failure on frv architecture * tag 'drm-amdkfd-next-2018-03-27' of git://people.freedesktop.org/~gabbayo/linux: drm/amdkfd: Use ordered workqueue to restore processes drm/amdgpu: Fix acquiring VM on large-BAR systems drm/amdkfd: Add module option for testing large-BAR functionality drm/amdkfd: Kmap event page for dGPUs drm/amdkfd: Add ioctls for GPUVM memory management drm/amdkfd: Add TC flush on VMID deallocation for Hawaii drm/amdkfd: Allocate CWSR trap handler memory for dGPUs drm/amdkfd: Add per-process IDR for buffer handles drm/amdkfd: Aperture setup for dGPUs drm/amdkfd: Remove limit on number of GPUs drm/amdkfd: Populate DRM render device minor drm/amdkfd: Create KFD VMs on demand drm/amdgpu: Add kfd2kgd interface to acquire an existing VM drm/amdgpu: Add helper to turn an existing VM into a compute VM drm/amdgpu: Fix initial validation of PD BO for KFD VMs drm/amdgpu: Move KFD-specific fields into struct amdgpu_vm drm/amdkfd: fix uninitialized variable use drm/amdkfd: add missing include of mm.h
2018-03-28Backmerge tag 'v4.16-rc7' into drm-nextDave Airlie
Linux 4.16-rc7 This was requested by Daniel, and things were getting a bit hard to reconcile, most of the conflicts were trivial though.
2018-03-28Merge remote-tracking branches 'asoc/topic/fsl_esai', 'asoc/topic/fsl_ssi', ↵Mark Brown
'asoc/topic/fsl_utils', 'asoc/topic/generic-dmaengine' and 'asoc/topic/gtm601' into asoc-next
2018-03-27IB/uverbs: UAPI pointers should use __aligned_u64 typeMatan Barak
The ioctl() UAPIs are meant to be used by both user-space and kernel ioctl() handlers. Mostly, these UAPI structs tend to consist of simple types, but sometimes user-space pointers may be passed between user-space and kernel. We would like to avoid dereferencing a user-space pointer in the kernel, thus - we always define RDMA_UAPI_PTR as a __aligned_u64 type. Fixes: 1f7ff9d5d36a ('IB/uverbs: Move to new headers and make naming consistent') Signed-off-by: Matan Barak <matanb@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-03-27RDMA: Change all uapi headers to use __aligned_u64 instead of __u64Jason Gunthorpe
The new auditing standard for the subsystem will be to only use __aligned_64 in uapi headers to try and prevent 32/64 compat bugs from existing in the future. Changing all existing usage will help ensure new developers copy the right idea. The before/after of this patch was tested using pahole on 32 and 64 bit compiles to confirm it has no change in the structure layout, so this patch is a NOP. Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-03-27RDMA/rxe: Fix uABI structure layouts for 32/64 compatJason Gunthorpe
With 32 bit compilation several of the fields become misaligned here. Fixing this is an ABI break for 32 bit rxe and it is in well used portions of the rxe ABI. To handle this we bump the ABI version, as expected. However the user space driver doesn't handle it properly today, so all existing user space continues to work. Updated userspace will start to require the necessary kernel version. We don't expect there to be any 32 bit users of rxe. Most likely cases, such as ARM 32 already generally don't work because rxe does not handle the CPU cache properly on its shared with userspace pages. Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-03-27RDMA/mlx4: Fix uABI structure layouts for 32/64 compatJason Gunthorpe
rss_caps in struct mlx4_uverbs_ex_query_device_resp is misaligned on 32 bit compared to 64 bit, add explicit padding. The rss caps were introduced recently and are very rarely used in user space, mainly for DPDK. We don't expect there to be a real 32 bit user, so this change is done without compat considerations. Fixes: 09d208b258a2 ("IB/mlx4: Add report for RSS capabilities by vendor channel") Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-03-27RDMA/qedr: Fix uABI structure layouts for 32/64 compatJason Gunthorpe
struct qedr_alloc_ucontext_resp is a different length in 32 and 64 bit compiles due to implicit compiler padding. The structs alloc_pd_uresp, create_cq_uresp and create_qp_uresp are not padded by the compiler, but in user space the compiler pads them due to the way the core and driver structs are concatenated. Make this padding explicit and consistent for future sanity. The kernel driver can already handle the user buffer being smaller than required and copies correctly, so no compat or ABI break happens from introducing the explicit padding. Acked-by: Michal Kalderon <michal.kalderon@cavium.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-03-27RDMA/ucma: Fix uABI structure layouts for 32/64 compatJason Gunthorpe
The rdma_ucm_event_resp is a different length on 32 and 64 bit compiles. The kernel requires it to be the expected length or longer so 32 bit builds running on a 64 bit kernel will not work. Retain full compat by having all kernels accept a struct with or without the trailing reserved field. Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-03-27RDMA: Remove minor pahole differences between 32/64Jason Gunthorpe
To help automatic detection we want pahole to report the same struct layouts for 32 and 64 bit compiles. These cases are all implicit padding added at the end of embedded structs as part of a union. The added reserved fields have no impact on the ABI. Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-03-26ethtool: Add support for configuring PFC stall prevention in ethtoolInbar Karmy
In the event where the device unexpectedly becomes unresponsive for a long period of time, flow control mechanism may propagate pause frames which will cause congestion spreading to the entire network. To prevent this scenario, when the device is stalled for a period longer than a pre-configured timeout, flow control mechanisms are automatically disabled. This patch adds support for the ETHTOOL_PFC_STALL_PREVENTION as a tunable. This API provides support for configuring flow control storm prevention timeout (msec). Signed-off-by: Inbar Karmy <inbark@mellanox.com> Cc: Michal Kubecek <mkubecek@suse.cz> Cc: Andrew Lunn <andrew@lunn.ch> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>