summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2017-07-03ALSA: hda/realtek - Fix typo of pincfg for Dell quirkShih-Yuan Lee (FourDollars)
The PIN number for Dell headset mode of ALC3271 is wrong. Fixes: fcc6c877a01f ("ALSA: hda/realtek - Support Dell headset mode for ALC3271") Signed-off-by: Shih-Yuan Lee (FourDollars) <sylee@canonical.com> Signed-off-by: Takashi Iwai <tiwai@suse.de>
2017-07-03serial: exar: Add support for IOT2040 deviceJan Kiszka
This implements the setup of RS232 and the switch-over to RS485 or RS422 for the Siemens IOT2040. That uses an EXAR XR17V352 with external logic to switch between the different modes. The external logic is controlled via MPIO pins of the EXAR controller. Only pin 10 can be exported as GPIO on the IOT2040. It is connected to an LED. As the XR17V352 used on the IOT2040 is not equipped with an external EEPROM, it cannot present itself as IOT2040-variant via subvendor/ subdevice IDs. Thus, we have to check via DMI for the target platform. Co-developed with Sascha Weisenberger. Signed-off-by: Sascha Weisenberger <sascha.weisenberger@siemens.com> Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> Reviewed-by: Andy Shevchenko <andy.shevchenko@gmail.com> Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-07-03gpio-exar/8250-exar: Make set of exported GPIOs configurableJan Kiszka
On the SIMATIC, IOT2040 only a single pin is exportable as GPIO, the rest is required to operate the UART. To allow modeling this case, expand the platform device data structure to specify a (consecutive) pin subset for exporting by the gpio-exar driver. Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> Reviewed-by: Andy Shevchenko <andy.shevchenko@gmail.com>
2017-07-03platform: Accept const propertiesJan Kiszka
Aligns us with device_add_properties, the function we call. Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> Reviewed-by: Andy Shevchenko <andy.shevchenko@gmail.com>
2017-07-03serial: exar: Factor out platform hooksJan Kiszka
This prepares the addition of IOT2040 platform support by preparing the needed setup and rs485_config hooks. Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> Reviewed-by: Andy Shevchenko <andy.shevchenko@gmail.com> Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-07-03gpio-exar/8250-exar: Rearrange gpiochip parenthoodJan Kiszka
Set the parent of the exar gpiochip to its platform device, like other gpiochips are doing it. In order to keep the relationship discoverable for ACPI systems, set the platform device companion to the PCI device. Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> Reviewed-by: Andy Shevchenko <andy.shevchenko@gmail.com> Acked-by: Linus Walleij <linus.walleij@linaro.org>
2017-07-03gpio: exar: Fix iomap requestJan Kiszka
The UART driver already maps the resource for us. Trying to do this here only fails and leaves us with a non-working device. Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> Reviewed-by: Andy Shevchenko <andy.shevchenko@gmail.com> Acked-by: Linus Walleij <linus.walleij@linaro.org>
2017-07-03gpio-exar/8250-exar: Do not even instantiate a GPIO device for Commtech cardsJan Kiszka
Commtech adapters need the MPIOs for internal purposes, and the gpio-exar driver already refused to pick them up. But there is actually no point in even creating the underlying platform device. Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> Reviewed-by: Andy Shevchenko <andy.shevchenko@gmail.com> Acked-by: Linus Walleij <linus.walleij@linaro.org> Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-07-03serial: uapi: Add support for bus terminationJan Kiszka
The Siemens IOT2040 comes with a RS485 interface that allows to enable or disable bus termination via software. Add a bit to the flags field of serial_rs485 that applications can set in order to request this feature from the hardware. This seems generic enough to add it for everyone. Existing driver will simply ignore it when set. Signed-off-by: Sascha Weisenberger <sascha.weisenberger@siemens.com> Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-07-03dmaengine: qcom_hidma: correct API violation for submitSinan Kaya
Current code is violating the DMA Engine API by putting the submitted requests directly into the HW queue. This causes queued transactions to be started by another thread as soon as the first one finishes. The DMA Engine document clearly states this. "dmaengine_submit() will not start the DMA operation". Move HW queuing of the requests into the issue_pending() routine to comply with API requirements also create a new queued state for temporarily holding the requests. A descriptor goes through these transitions now. free->prepared->queued->active->completed->free as opposed to free->prepared->active->completed->free Signed-off-by: Sinan Kaya <okaya@codeaurora.org> Signed-off-by: Vinod Koul <vinod.koul@intel.com>
2017-07-03dmaengine: zynqmp_dma: Remove max len check in zynqmp_dma_prep_memcpyStefan Roese
Remove check for "len > ZYNQMP_DMA_MAX_TRANS_LEN" as its not needed. If the length is larger, the transfer is split up into multiple parts with the max descriptor length already. Signed-off-by: Stefan Roese <sr@denx.de> Cc: Kedareswara rao Appana <appanad@xilinx.com> Cc: Vinod Koul <vinod.koul@intel.com> Signed-off-by: Vinod Koul <vinod.koul@intel.com>
2017-07-02Linux 4.12v4.12Linus Torvalds
2017-07-03kbuild: improve comments on KBUILD_SRCCao jin
Original comments is confusing on "OBJ directory", make it clear. Bonus: move comments close to what it wants to comment. Signed-off-by: Cao jin <caoj.fnst@cn.fujitsu.com> Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
2017-07-03kbuild: create deterministic initramfs directory listingsBjørn Forsman
kbuild runs "find" on each entry in CONFIG_INITRAMFS_SOURCE that is a directory. The order of the file listing output by "find" matter for build reproducability, hence this patch applies "sort" to get deterministic results. Without this patch, two different machines with identical initramfs directory input may produce differing initramfs cpio archives (different hash) due to the different order of the files within the archive. Signed-off-by: Bjørn Forsman <bjorn.forsman@gmail.com> Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
2017-07-02moduleparam: fix doc: hwparam_irq configures an IRQSylvain 'ythier' Hitier
Signed-off-by: Sylvain 'ythier' Hitier <sylvain.hitier@gmail.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-07-02bpf: fix to bpf_setsockopsLawrence Brakmo
Fixed build error due to misplaced "#ifdef CONFIG_INET" (moved 1 statement up). Signed-off-by: Lawrence Brakmo <brakmo@fb.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-02SMB3: Enable encryption for SMB3.1.1Steve French
We were missing a capability flag for SMB3.1.1 Signed-off-by: Steve French <steve.french@primarydata.com> Reviewed-by: Pavel Shilovsky <pshilov@microsoft.com>
2017-07-02parisc: Report SIGSEGV instead of SIGBUS when running out of stackHelge Deller
When a process runs out of stack the parisc kernel wrongly faults with SIGBUS instead of the expected SIGSEGV signal. This example shows how the kernel faults: do_page_fault() command='a.out' type=15 address=0xfaac2000 in libc-2.24.so[f8308000+16c000] trap #15: Data TLB miss fault, vm_start = 0xfa2c2000, vm_end = 0xfaac2000 The vma->vm_end value is the first address which does not belong to the vma, so adjust the check to include vma->vm_end to the range for which to send the SIGSEGV signal. This patch unbreaks building the debian libsigsegv package. Cc: stable@vger.kernel.org Signed-off-by: Helge Deller <deller@gmx.de>
2017-07-02parisc: use compat_sys_keyctl()Eric Biggers
Architectures with a compat syscall table must put compat_sys_keyctl() in it, not sys_keyctl(). The parisc architecture was not doing this; fix it. Cc: stable@vger.kernel.org Signed-off-by: Eric Biggers <ebiggers@google.com> Acked-by: Helge Deller <deller@gmx.de> Signed-off-by: Helge Deller <deller@gmx.de>
2017-07-02Merge branch 'upstream' of git://git.linux-mips.org/pub/scm/ralf/upstream-linusLinus Torvalds
Pull MIPS fixes from Ralf Baechle: "Here's a final round of fixes for 4.12: - Fix misordered instructions in assembly code making kenel startup via UHB unreliable. - Fix special case of MADDF and MADDF emulation. - Fix alignment issue in address calculation in pm-cps on 64 bit. - Fix IRQ tracing & lockdep when rescheduling - Systems with MAARs require post-DMA cache flushes. The reordering fix and the MADDF/MSUBF fix have sat in linux-next for a number of days. The others haven't propagated from my pull tree to linux-next yet but all have survived manual testing and Imagination's automated test system and there are no pending bug reports" * 'upstream' of git://git.linux-mips.org/pub/scm/ralf/upstream-linus: MIPS: Avoid accidental raw backtrace MIPS: Perform post-DMA cache flushes on systems with MAARs MIPS: Fix IRQ tracing & lockdep when rescheduling MIPS: pm-cps: Drop manual cache-line alignment of ready_count MIPS: math-emu: Handle zero accumulator case in MADDF and MSUBF separately MIPS: head: Reorder instructions missing a delay slot
2017-07-02Merge branch 'fixes' of git://git.armlinux.org.uk/~rmk/linux-armLinus Torvalds
Pull ARM fix from Russell King: "One final fix for 4.12 - Doug found a boot failure case triggered by requesting a non-even MB vmalloc size" * 'fixes' of git://git.armlinux.org.uk/~rmk/linux-arm: ARM: 8685/1: ensure memblock-limit is pmd-aligned
2017-07-02acpi/nfit: Issue Start ARS to retrieve existing recordsToshi Kani
ACPI 6.2 defines in section 9.20.7.2 that the OSPM may call a Start ARS with Flags Bit [1] set upon receiving the 0x81 notification. Upon receiving the notification, the OSPM may decide to issue a Start ARS with Flags Bit [1] set to prepare for the retrieval of existing records and issue the Query ARS Status function to retrieve the records. Add support to call a Start ARS from acpi_nfit_uc_error_notify() with ND_ARS_RETURN_PREV_DATA set when HW_ERROR_SCRUB_ON is not set. Link: http://www.uefi.org/sites/default/files/resources/ACPI_6_2.pdf Signed-off-by: Toshi Kani <toshi.kani@hpe.com> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Rafael J. Wysocki <rjw@rjwysocki.net> Cc: Vishal Verma <vishal.l.verma@intel.com> Cc: Linda Knippers <linda.knippers@hpe.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2017-07-02powerpc/perf/hv-24x7: Aggregate result elements on POWER9 SMT8Thiago Jung Bauermann
On POWER9 SMT8 the 24x7 API returns two result elements for physical core and virtual CPU events and we need to add their counts to get the final result. Reviewed-by: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com> Signed-off-by: Thiago Jung Bauermann <bauerman@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-07-02powerpc/perf/hv-24x7: Support v2 of the hypervisor APIThiago Jung Bauermann
POWER9 introduces a new version of the hypervisor API to access the 24x7 perf counters. The new version changed some of the structures used for requests and results. Signed-off-by: Thiago Jung Bauermann <bauerman@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-07-02powerpc/perf/hv-24x7: Minor improvementsThiago Jung Bauermann
There's an H24x7_DATA_BUFFER_SIZE constant, so use it in init_24x7_request. There's also an HV_PERF_DOMAIN_MAX constant, so use it in h_24x7_event_init. This makes the comment above the check redundant, so remove it. In add_event_to_24x7_request, a statement is terminated with a comma instead of a semicolon. Fix it. In hv-24x7.h, improve comments in struct hv_24x7_result. Signed-off-by: Thiago Jung Bauermann <bauerman@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-07-02powerpc/perf/hv-24x7: Fix return value of hcallsThiago Jung Bauermann
The H_GET_24X7_CATALOG_PAGE hcall can return a signed error code, so fix this in the code. The H_GET_24X7_DATA hcall can return a signed error code, so fix this in the code. Also, don't truncate it to 32 bit to use as return value for make_24x7_request. In case of error h_24x7_event_commit_txn passes that return value to generic code, so it should be a proper errno. The other caller of make_24x7_request is single_24x7_request, whose callers don't actually care which error code is returned so they are not affected by this change. Finally, h_24x7_get_value doesn't use the error code from single_24x7_request, so there's no need to store it. Reviewed-by: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com> Signed-off-by: Thiago Jung Bauermann <bauerman@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-07-02powerpc-perf/hx-24x7: Don't log failed hcall twiceThiago Jung Bauermann
make_24x7_request already calls log_24x7_hcall if it fails, so callers don't have to do it again. In fact, since the latter is now only called from the former, there's no need for a separate log_24x7_hcall anymore so remove it. Reviewed-by: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com> Signed-off-by: Thiago Jung Bauermann <bauerman@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-07-02powerpc/perf/hv-24x7: Properly iterate through resultsThiago Jung Bauermann
hv-24x7.h has a comment mentioning that result_buffer->results can't be indexed as a normal array because it may contain results of variable sizes, so fix the loop in h_24x7_event_commit_txn to take the variation into account when iterating through results. Another problem in that loop is that it sets h24x7hw->events[i] to NULL. This assumes that only the i'th result maps to the i'th request, but that is not guaranteed to be true. We need to leave the event in the array so that we don't dereference a NULL pointer in case more than one result maps to one request. We still assume that each result has only one result element, so warn if that assumption is violated. Reviewed-by: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com> Signed-off-by: Thiago Jung Bauermann <bauerman@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-07-02powerpc/perf/hv-24x7: Fix off-by-one error in request_buffer checkThiago Jung Bauermann
request_buffer can hold 254 requests, so if it already has that number of entries we can't add a new one. Also, define constant to show where the number comes from. Fixes: e3ee15dc5d19 ("powerpc/perf/hv-24x7: Define add_event_to_24x7_request()") Reviewed-by: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com> Signed-off-by: Thiago Jung Bauermann <bauerman@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-07-02powerpc/perf/hv-24x7: Fix passing of catalog version numberThiago Jung Bauermann
H_GET_24X7_CATALOG_PAGE needs to be passed the version number obtained from the first catalog page obtained previously. This is a 64 bit number, but create_events_from_catalog truncates it to 32-bit. This worked on POWER8, but POWER9 actually uses the upper bits so the call fails with H_P3 because the hypervisor doesn't recognize the version. This patch also adds the hcall return code to the error message, which is helpful when debugging the problem. Fixes: 5c5cd7b50259 ("powerpc/perf/hv-24x7: parse catalog and populate sysfs with events") Reviewed-by: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com> Signed-off-by: Thiago Jung Bauermann <bauerman@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-07-02powerpc/mm: Enable ZONE_DEVICE on powerpcOliver O'Halloran
Flip the switch. Running around and screaming "IT'S ALIVE" is optional, but recommended. Signed-off-by: Oliver O'Halloran <oohall@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-07-02powerpc/mm: Wire up hpte_removebolted for powernvAnton Blanchard
Adds support for removing bolted (i.e kernel linear mapping) mappings on powernv. This is needed to support memory hot unplug operations which are required for the teardown of DAX/PMEM devices. Reviewed-by: Balbir Singh <bsingharora@gmail.com> Reviewed-by: Rashmica Gupta <rashmica.g@gmail.com> Signed-off-by: Anton Blanchard <anton@samba.org> Signed-off-by: Oliver O'Halloran <oohall@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-07-02powerpc/mm: Add devmap support for ppc64Oliver O'Halloran
Add support for the devmap bit on PTEs and PMDs for PPC64 Book3S. This is used to differentiate device backed memory from transparent huge pages since they are handled in more or less the same manner by the core mm code. Cc: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Oliver O'Halloran <oohall@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-07-02powerpc/vmemmap: Add altmap supportOliver O'Halloran
Adds support to powerpc for the altmap feature of ZONE_DEVICE memory. An altmap is a driver provided region that is used to provide the backing storage for the struct pages of ZONE_DEVICE memory. In situations where large amount of ZONE_DEVICE memory is being added to the system the altmap reduces pressure on main system memory by allowing the mm/ metadata to be stored on the device itself rather in main memory. Reviewed-by: Balbir Singh <bsingharora@gmail.com> Signed-off-by: Oliver O'Halloran <oohall@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-07-02powerpc/vmemmap: Reshuffle vmemmap_free()Oliver O'Halloran
Removes an indentation level and shuffles some code around to make the following patch cleaner. No functional changes. Reviewed-by: Balbir Singh <bsingharora@gmail.com> Signed-off-by: Oliver O'Halloran <oohall@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-07-02mm, x86: Add ARCH_HAS_ZONE_DEVICE to KconfigOliver O'Halloran
Currently ZONE_DEVICE depends on X86_64 and this will get unwieldly as new architectures (and platforms) get ZONE_DEVICE support. Move to an arch selected Kconfig option to save us the trouble. Cc: linux-mm@kvack.org Acked-by: Ingo Molnar <mingo@kernel.org> Acked-by: Balbir Singh <bsingharora@gmail.com> Signed-off-by: Oliver O'Halloran <oohall@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-07-02powerpc/hugetlbfs: Export HPAGE_SHIFTOliver O'Halloran
Export it so it can be referenced inside a module. Signed-off-by: Oliver O'Halloran <oohall@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-07-02MAINTAINERS: cxl: update maintainershipAndrew Donnellan
As Ian's stepping down from his maintainer role now that he's leaving IBM, Frederic has asked me to add myself to the cxl maintainer list. Updating accordingly. Cc: Frederic Barrat <fbarrat@linux.vnet.ibm.com> Cc: Ian Munsie <imunsie@au1.ibm.com> Signed-off-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com> Acked-by: Frederic Barrat <fbarrat@linux.vnet.ibm.com> Acked-by: Ian Munsie <imunsie@au1.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-07-02MAINTAINERS: Remove myself as cxl maintainerIan Munsie
I am no longer employed by IBM and will no longer have access to cxl hardware, so remove myself as a cxl maintainer. If anyone needs to contact me in the future, please use my personal email address darkstarsword@gmail.com Signed-off-by: Ian Munsie <imunsie@au1.ibm.com> Cc: Frederic Barrat <fbarrat@linux.vnet.ibm.com> Cc: Andrew Donnellan <andrew.donnellan@au1.ibm.com> Cc: linuxppc-dev@lists.ozlabs.org Acked-by: Frederic Barrat <fbarrat@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-07-02powerpc: use spin loop primitives in some functionsNicholas Piggin
Use the different spin loop primitives in some simple powerpc spin loops, including those which will spin as a common case. This will help to test the spin loop primitives before more conversions are done. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> [mpe: Add some includes of <linux/processor.h>] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-07-02powerpc/64: implement spin loop primitivesNicholas Piggin
Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-07-02locking/refcount: Remove the half-implemented refcount_sub() APIKees Cook
CONFIG_REFCOUNT_FULL=y (correctly) does not provide a refcount_sub(), which should not be part of proper refcount design patterns. Remove the erroneous extern and the later !CONFIG_REFCOUNT_FULL accidental implementation. Signed-off-by: Kees Cook <keescook@chromium.org> Cc: Elena Reshetova <elena.reshetova@intel.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Fixes: 29dee3c03abc ("locking/refcounts: Out-of-line everything") Link: http://lkml.kernel.org/r/20170701180129.GA17405@beast Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-07-02ALSA: pcm: add a documentation for tracepointsTakashi Sakamoto
In PCM interface/protocol for userspace, parameters of runtime for PCM substream is decided by an interaction between applications and ALSA PCM core. In former commits, some tracepoints were added to probe a part of the interaction. This commit adds a documentation about the interaction and the tracepoints. Signed-off-by: Takashi Sakamoto <o-takashi@sakamocchi.jp> Signed-off-by: Takashi Iwai <tiwai@suse.de>
2017-07-01Merge branch 'bpf-Add-support-for-sock_ops'David S. Miller
Lawrence Brakmo says: ==================== bpf: Add support for sock_ops Created a new BPF program type, BPF_PROG_TYPE_SOCK_OPS, and a corresponding struct that allows BPF programs of this type to access some of the socket's fields (such as IP addresses, ports, etc.) and setting connection parameters such as buffer sizes, initial window, SYN/SYN-ACK RTOs, etc. Unlike current BPF program types that expect to be called at a particular place in the network stack code, SOCK_OPS program can be called at different places and use an "op" field to indicate the context. There are currently two types of operations, those whose effect is through their return value and those whose effect is through the new bpf_setsocketop BPF helper function. Example operands of the first type are: BPF_SOCK_OPS_TIMEOUT_INIT BPF_SOCK_OPS_RWND_INIT BPF_SOCK_OPS_NEEDS_ECN Example operands of the secont type are: BPF_SOCK_OPS_TCP_CONNECT_CB BPF_SOCK_OPS_ACTIVE_ESTABLISHED_CB BPF_SOCK_OPS_PASSIVE_ESTABLISHED_CB Current operands are only called during connection establishment so there should not be any BPF overheads after connection establishment. The main idea is to use connection information form both hosts, such as IP addresses and ports to allow setting of per connection parameters to optimize the connection's peformance. Alghough there are already 3 mechanisms to set parameters (sysctls, route metrics and setsockopts), this new mechanism provides some disticnt advantages. Unlike sysctls, it can set parameters per connection. In contrast to route metrics, it can also use port numbers and information provided by a user level program. In addition, it could set parameters probabilistically for evaluation purposes (i.e. do something different on 10% of the flows and compare results with the other 90% of the flows). Also, in cases where IPv6 addresses contain geographic information, the rules to make changes based on the distance (or RTT) between the hosts are much easier than route metric rules and can be global. Finally, unlike setsockopt, it does not require application changes and it can be updated easily at any time. It uses the existing bpf cgroups infrastructure so the programs can be attached per cgroup with full inheritance support. Although the bpf cgroup framework already contains a sock related program type (BPF_PROG_TYPE_CGROUP_SOCK), I created the new type (BPF_PROG_TYPE_SOCK_OPS) beccause the existing type expects to be called only once during the connections's lifetime. In contrast, the new program type will be called multiple times from different places in the network stack code. For example, before sending SYN and SYN-ACKs to set an appropriate timeout, when the connection is established to set congestion control, etc. As a result it has "op" field to specify the type of operation requested. This patch set also includes sample BPF programs to demostrate the differnet features. v2: Formatting changes, rebased to latest net-next v3: Fixed build issues, changed socket_ops to sock_ops throught, fixed formatting issues, removed the syscall to load sock_ops program and added functionality to use existing bpf attach and bpf detach system calls, removed reader/writer locks in sock_bpfops.c (used when saving sock_ops global program) and fixed missing module refcount increment. v4: Removed global sock_ops program and instead used existing cgroup bpf infrastructure to support a new BPF_CGROUP_ATTCH type. v5: fixed kbuild warning happening in bpf-cgroup.h removed automatic converstion to host byte order from some sock_ops fields (ipv4 and ipv6 addresses, remote port) Added conversion to host byte order in some of the sample programs Added to sample BPF program comments about using load_sock_ops to load Removed is_req_sock field from bpf_sock_ops_kern and related places, using sk_fullsock() instead. v6: fixes to BPF helper function setsockopt (possible NULL deferencing, etc.) ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-01bpf: update tools/include/uapi/linux/bpf.hLawrence Brakmo
Update tools/include/uapi/linux/bpf.h to include changes related to new bpf sock_ops program type. Signed-off-by: Lawrence Brakmo <brakmo@fb.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-01bpf: Sample bpf program to set sndcwnd clampLawrence Brakmo
Sample BPF program, tcp_clamp_kern.c, to demostrate the use of setting the sndcwnd clamp. This program assumes that if the first 5.5 bytes of the host's IPv6 addresses are the same, then the hosts are in the same datacenter and sets sndcwnd clamp to 100 packets, SYN and SYN-ACK RTOs to 10ms and send/receive buffer sizes to 150KB. Signed-off-by: Lawrence Brakmo <brakmo@fb.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-01bpf: Adds support for setting sndcwnd clampLawrence Brakmo
Adds a new bpf_setsockopt for TCP sockets, TCP_BPF_SNDCWND_CLAMP, which sets the initial congestion window. It is useful to limit the sndcwnd when the host are close to each other (small RTT). Signed-off-by: Lawrence Brakmo <brakmo@fb.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-01bpf: Sample BPF program to set initial cwndLawrence Brakmo
Sample BPF program that assumes hosts are far away (i.e. large RTTs) and sets initial cwnd and initial receive window to 40 packets, send and receive buffers to 1.5MB. In practice there would be a test to insure the hosts are actually far enough away. Signed-off-by: Lawrence Brakmo <brakmo@fb.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-01bpf: Adds support for setting initial cwndLawrence Brakmo
Adds a new bpf_setsockopt for TCP sockets, TCP_BPF_IW, which sets the initial congestion window. This can be used when the hosts are far apart (large RTTs) and it is safe to start with a large inital cwnd. Signed-off-by: Lawrence Brakmo <brakmo@fb.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-01bpf: Sample BPF program to set congestion controlLawrence Brakmo
Sample BPF program that sets congestion control to dctcp when both hosts are within the same datacenter. In this example that is assumed to be when they have the first 5.5 bytes of their IPv6 address are the same. Signed-off-by: Lawrence Brakmo <brakmo@fb.com> Signed-off-by: David S. Miller <davem@davemloft.net>