summaryrefslogtreecommitdiff
path: root/include/linux
AgeCommit message (Collapse)Author
2016-12-06acpi, nfit, libnvdimm: fix / harden ars_status output length handlingDan Williams
Given ambiguities in the ACPI 6.1 definition of the "Output (Size)" field of the ARS (Address Range Scrub) Status command, a firmware implementation may in practice return 0, 4, or 8 to indicate that there is no output payload to process. The specification states "Size of Output Buffer in bytes, including this field.". However, 'Output Buffer' is also the name of the entire payload, and earlier in the specification it states "Max Query ARS Status Output Buffer Size: Maximum size of buffer (including the Status and Extended Status fields)". Without this fix if the BIOS happens to return 0 it causes memory corruption as evidenced by this result from the acpi_nfit_ctl() unit test. ars_status00000000: 00020000 00000000 ........ BUG: stack guard page was hit at ffffc90001750000 (stack is ffffc9000174c000..ffffc9000174ffff) kernel stack overflow (page fault): 0000 [#1] SMP DEBUG_PAGEALLOC task: ffff8803332d2ec0 task.stack: ffffc9000174c000 RIP: 0010:[<ffffffff814cfe72>] [<ffffffff814cfe72>] __memcpy+0x12/0x20 RSP: 0018:ffffc9000174f9a8 EFLAGS: 00010246 RAX: ffffc9000174fab8 RBX: 0000000000000000 RCX: 000000001fffff56 RDX: 0000000000000000 RSI: ffff8803231f5a08 RDI: ffffc90001750000 RBP: ffffc9000174fa88 R08: ffffc9000174fab0 R09: ffff8803231f54b8 R10: 0000000000000008 R11: 0000000000000001 R12: 0000000000000000 R13: 0000000000000000 R14: 0000000000000003 R15: ffff8803231f54a0 FS: 00007f3a611af640(0000) GS:ffff88033ed00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: ffffc90001750000 CR3: 0000000325b20000 CR4: 00000000000406e0 Stack: ffffffffa00bc60d 0000000000000008 ffffc90000000001 ffffc9000174faac 0000000000000292 ffffffffa00c24e4 ffffffffa00c2914 0000000000000000 0000000000000000 ffffffff00000003 ffff880331ae8ad0 0000000800000246 Call Trace: [<ffffffffa00bc60d>] ? acpi_nfit_ctl+0x49d/0x750 [nfit] [<ffffffffa01f4fe0>] nfit_test_probe+0x670/0xb1b [nfit_test] Cc: <stable@vger.kernel.org> Fixes: 747ffe11b440 ("libnvdimm, tools/testing/nvdimm: fix 'ars_status' output buffer sizing") Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2016-12-06netfilter: ingress: translate 0 nf_hook_slow retval to -1Florian Westphal
The caller assumes that < 0 means that skb was stolen (or free'd). All other return values continue skb processing. nf_hook_slow returns 3 different return value types: A) a (negative) errno value: the skb was dropped (NF_DROP, e.g. by iptables '-j DROP' rule). B) 0. The skb was stolen by the hook or queued to userspace. C) 1. all hooks returned NF_ACCEPT so the caller should invoke the okfn so packet processing can continue. nft ingress facility currently doesn't have the 'okfn' that the NF_HOOK() macros use; there is no nfqueue support either. So 1 means that nf_hook_ingress() caller should go on processing the skb. In order to allow use of NF_STOLEN from ingress we need to translate this to an errno number, else we'd crash because we continue with already-free'd (or about to be free-d) skb. The errno value isn't checked, its just important that its less than 0, so return -1. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2016-12-06netfilter: x_tables: pack percpu counter allocationsFlorian Westphal
instead of allocating each xt_counter individually, allocate 4k chunks and then use these for counter allocation requests. This should speed up rule evaluation by increasing data locality, also speeds up ruleset loading because we reduce calls to the percpu allocator. As Eric points out we can't use PAGE_SIZE, page_allocator would fail on arches with 64k page size. Suggested-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Florian Westphal <fw@strlen.de> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2016-12-06netfilter: x_tables: pass xt_counters struct to counter allocatorFlorian Westphal
Keeps some noise away from a followup patch. Signed-off-by: Florian Westphal <fw@strlen.de> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2016-12-06netfilter: x_tables: pass xt_counters struct instead of packet counterFlorian Westphal
On SMP we overload the packet counter (unsigned long) to contain percpu offset. Hide this from callers and pass xt_counters address instead. Preparation patch to allocate the percpu counters in page-sized batch chunks. Signed-off-by: Florian Westphal <fw@strlen.de> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2016-12-06netfilter: decouple nf_hook_entry and nf_hook_opsAaron Conole
During nfhook traversal we only need a very small subset of nf_hook_ops members. We need: - next element - hook function to call - hook function priv argument Bridge netfilter also needs 'thresh'; can be obtained via ->orig_ops. nf_hook_entry struct is now 32 bytes on x86_64. A followup patch will turn the run-time list into an array that only stores hook functions plus their priv arguments, eliminating the ->next element. Suggested-by: Florian Westphal <fw@strlen.de> Signed-off-by: Aaron Conole <aconole@bytheb.org> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2016-12-06netfilter: introduce accessor functions for hook entriesAaron Conole
This allows easier future refactoring. Signed-off-by: Aaron Conole <aconole@bytheb.org> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2016-12-06vmbus: add support for dynamic device id'sStephen Hemminger
This patch adds sysfs interface to dynamically bind new UUID values to existing VMBus device. This is useful for generic UIO driver to act similar to uio_pci_generic. Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com> Signed-off-by: K. Y. Srinivasan <kys@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-12-06hyperv: Fix spelling of HV_UNKOWNHaiyang Zhang
Changed it to HV_UNKNOWN Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com> Signed-off-by: K. Y. Srinivasan <kys@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-12-06mei: bus: enable non-blocking RXAlexander Usyskin
Enable non-blocking receive for drivers on mei bus, this allows checking for data availability by mei client drivers. This is most effective for fixed address clients, that lacks flow control. This function adds new API function mei_cldev_recv_nonblock(), it retuns -EGAIN if function will block. Signed-off-by: Alexander Usyskin <alexander.usyskin@intel.com> Signed-off-by: Tomas Winkler <tomas.winkler@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-12-06VME: Remove shutdown entry from vme_driverMartyn Welch
The vme_driver structure currently has a "shutdown" entry. This entry is never used, it lacks the correct parameter (it should be providing a pointer to the relevant vme_dev struct to even *look* usable), the VME subsystem currently doesn't provide support for shutdown functions and no in-tree drivers use it (hardly surprising, given it'd never be called). Remove the entry from vme_driver to avoid confusion. Signed-off-by: Martyn Welch <martyn.welch@collabora.co.uk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-12-06locking/ww_mutex: Use relaxed atomicsPeter Zijlstra
The stamp is a sequence number, we don't care about memory ordering. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Boqun Feng <boqun.feng@gmail.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Will Deacon <will.deacon@arm.com> Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-12-06x86/uaccess, sched/preempt: Verify access_ok() contextPeter Zijlstra
I recently encountered wreckage because access_ok() was used where it should not be, add an explicit WARN when access_ok() is used wrongly. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-12-06nvme-fabrics: Add FC transport LLDD api definitionsJames Smart
Host: - LLDD registration with the host transport - registering host ports (local ports) and target ports seen on fabric (remote ports) - Data structures and call points for FC-4 LS's and FCP IO requests Target: - LLDD registration with the target transport - registering nvme subsystem ports (target ports) - Data structures and call points for reception of FC-4 LS's and FCP IO requests, and callbacks to perform data and rsp transfers for the io. Add to MAINTAINERS file Signed-off-by: James Smart <james.smart@broadcom.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jay Freyensee <james_p_freyensee@linux.intel.com> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Signed-off-by: Christoph Hellwig <hch@lst.de>
2016-12-06nvme-fabrics: Add FC transport FC-NVME definitionsJames Smart
- Formats for Cmd, Data, Rsp IUs - Formats FC-4 LS definitions - Add to MAINTAINERS file Signed-off-by: James Smart <james.smart@broadcom.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jay Freyensee <james_p_freyensee@linux.intel.com> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Signed-off-by: Christoph Hellwig <hch@lst.de>
2016-12-06nvme-fabrics: Add FC transport error codes to nvme.hJames Smart
Signed-off-by: James Smart <james.smart@broadcom.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jay Freyensee <james_p_freyensee@linux.intel.com> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Signed-off-by: Christoph Hellwig <hch@lst.de>
2016-12-06parser: add u64 number parserJames Smart
Will be used by the nvme-fabrics FC transport in parsing options Signed-off-by: James Smart <james.smart@broadcom.com> Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
2016-12-06PM / OPP: Allow platform specific custom set_opp() callbacksViresh Kumar
The generic set_opp() handler isn't sufficient for platforms with complex DVFS. For example, some TI platforms have multiple regulators for a CPU device. The order in which various supplies need to be programmed is only known to the platform code and its best to leave it to it. This patch implements APIs to register platform specific set_opp() callback. Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Tested-by: Dave Gerlach <d-gerlach@ti.com> Reviewed-by: Stephen Boyd <sboyd@codeaurora.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2016-12-06PM / OPP: Separate out _generic_set_opp()Viresh Kumar
Later patches would add support for custom set_opp() callbacks. This patch separates out the code for _generic_set_opp() handler in order to prepare for that. Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Tested-by: Dave Gerlach <d-gerlach@ti.com> Reviewed-by: Stephen Boyd <sboyd@codeaurora.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2016-12-06PM / OPP: Add infrastructure to manage multiple regulatorsViresh Kumar
This patch adds infrastructure to manage multiple regulators and updates the only user (cpufreq-dt) of dev_pm_opp_set{put}_regulator(). This is preparatory work for adding full support for devices with multiple regulators. Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Tested-by: Dave Gerlach <d-gerlach@ti.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2016-12-06PM / OPP: Manage supply's voltage/current in a separate structureViresh Kumar
This is a preparatory step for multiple regulator per device support. Move the voltage/current variables to a new structure. Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Tested-by: Dave Gerlach <d-gerlach@ti.com> Reviewed-by: Stephen Boyd <sboyd@codeaurora.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2016-12-05bpf: add prog_digest and expose it via fdinfo/netlinkDaniel Borkmann
When loading a BPF program via bpf(2), calculate the digest over the program's instruction stream and store it in struct bpf_prog's digest member. This is done at a point in time before any instructions are rewritten by the verifier. Any unstable map file descriptor number part of the imm field will be zeroed for the hash. fdinfo example output for progs: # cat /proc/1590/fdinfo/5 pos: 0 flags: 02000002 mnt_id: 11 prog_type: 1 prog_jited: 1 prog_digest: b27e8b06da22707513aa97363dfb11c7c3675d28 memlock: 4096 When programs are pinned and retrieved by an ELF loader, the loader can check the program's digest through fdinfo and compare it against one that was generated over the ELF file's program section to see if the program needs to be reloaded. Furthermore, this can also be exposed through other means such as netlink in case of a tc cls/act dump (or xdp in future), but also through tracepoints or other facilities to identify the program. Other than that, the digest can also serve as a base name for the work in progress kallsyms support of programs. The digest doesn't depend/select the crypto layer, since we need to keep dependencies to a minimum. iproute2 will get support for this facility. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-05ahci-remap.h: add ahci remapping definitionsDan Williams
Signed-off-by: Dan Williams <dan.j.williams@intel.com> [hch: split into a separate header and commit] Signed-off-by: Christoph Hellwig <hch@lst.de> [tj: dropped duplicate definition of AHCI_VSCAP spotted by Sergei] Cc: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com> Signed-off-by: Tejun Heo <tj@kernel.org>
2016-12-05nvme: move NVMe class code to pci_ids.hChristoph Hellwig
We'll need to check for it in the AHCI drivers (yes, really) soon. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Tejun Heo <tj@kernel.org>
2016-12-05tcp: tsq: move tsq_flags close to sk_wmem_allocEric Dumazet
tsq_flags being in the same cache line than sk_wmem_alloc makes a lot of sense. Both fields are changed from tcp_wfree() and more generally by various TSQ related functions. Prior patch made room in struct sock and added sk_tsq_flags, this patch deletes tsq_flags from struct tcp_sock. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-05tcp: tsq: add tsq_flags / tsq_enumEric Dumazet
This is a cleanup, to ease code review of following patches. Old 'enum tsq_flags' is renamed, and a new enumeration is added with the flags used in cmpxchg() operations as opposed to single bit operations. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-05usb: hcd.h: construct hub class request constants from simpler constantsTal Shorer
Currently, each hub class request constant is defined by a line like: #define ClearHubFeature (0x2000 | USB_REQ_CLEAR_FEATURE) The "magic" number for the high byte is one of 0x20, 0xa0, 0x23, 0xa3. The 0x80 bit that changes inditace USB_DIR_IN, and the 0x03 that pops up is the difference between USB_RECIP_DEVICE (0x00) and USB_RECIP_OTHER (0x03). The constant 0x20 bit is USB_TYPE_CLASS. This patch eliminates those magic numbers by defining a macro to help construct these hub class request from simpler constants. Note that USB_RT_HUB is defined as (USB_TYPE_CLASS | USB_RECIP_DEVICE) and that USB_RT_PORT is defined as (USB_TYPE_CLASS | USB_RECIP_OTHER). Signed-off-by: Tal Shorer <tal.shorer@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-12-05fsl/usb: Workarourd for USB erratum-A005697Changming Huang
The EHCI specification states the following in the SUSP bit description: In the Suspend state, the port is sensitive to resume detection. Note that the bit status does not change until the port is suspended and that there may be a delay in suspending a port if there is a transaction currently in progress on the USB. However, in NXP USBDR controller, the PORTSCx[SUSP] bit changes immediately when the application sets it and not when the port is actually suspended. So the application must wait for at least 10 milliseconds after a port indicates that it is suspended, to make sure this port has entered suspended state before initiating this port resume using the Force Port Resume bit. This bit is for NXP controller, not EHCI compatible. Signed-off-by: Changming Huang <jerry.huang@nxp.com> Signed-off-by: Ramneek Mehresh <ramneek.mehresh@nxp.com> Acked-by: Alan Stern <stern@rowland.harvard.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-12-05driver core: Silence device links sphinx warningLukas Wunner
Silence this warning emitted by sphinx: include/linux/device.h:938: warning: No description found for parameter 'links' While at it, fix typos in comments of device links code. Cc: Rafael J. Wysocki <rafael@kernel.org> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Silvio Fricke <silvio.fricke@gmail.com> Signed-off-by: Lukas Wunner <lukas@wunner.de> Reviewed-by: Mauro Carvalho Chehab <mchehab@s-opensource.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-12-05mmc: mmc: Introduce mmc_abort_tuning()Adrian Hunter
If a tuning command times out, the card could still be processing it, which will cause problems for recovery. The eMMC specification says that CMD12 can be used to stop CMD21, so add a function that does that. Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
2016-12-05regulator: Fix regulator_get_error_flags() signature mismatchDavid Lechner
The function signature of does not match regulator_get_error_flags() when CONFIG_REGULATOR is not defined vs. when it is not defined. This makes both declarations match to prevent compiler errors. Signed-off-by: David Lechner <david@lechnology.com> Signed-off-by: Mark Brown <broonie@kernel.org>
2016-12-05mmc: mmc: Add Command Queue definitionsAdrian Hunter
Add definitions relating to Command Queuing. Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Reviewed-by: Linus Walleij <linus.walleij@linaro.org> Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
2016-12-05mmc: queue: Fix queue thread wake-upAdrian Hunter
The only time the driver sleeps expecting to be woken upon the arrival of a new request, is when the dispatch queue is empty. The only time that it is known whether the dispatch queue is empty is after NULL is returned from blk_fetch_request() while under the queue lock. Recognizing those facts, simplify the synchronization between the queue thread and the request function. A couple of flags tell the request function what to do, and the queue lock and barriers associated with wake-ups ensure synchronization. The result is simpler and allows the removal of the context_info lock. Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Reviewed-by: Linus Walleij <linus.walleij@linaro.org> Reviewed-by: Harjani Ritesh <riteshh@codeaurora.org> Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
2016-12-05Backmerge tag 'v4.9-rc8' into drm-nextDave Airlie
Linux 4.9-rc8 Daniel requested this so we could apply some follow on fixes cleanly to -next.
2016-12-04netfilter: conntrack: built-in support for DCCPDavide Caratti
CONFIG_NF_CT_PROTO_DCCP is no more a tristate. When set to y, connection tracking support for DCCP protocol is built-in into nf_conntrack.ko. footprint test: $ ls -l net/netfilter/nf_conntrack{_proto_dccp,}.ko \ net/ipv4/netfilter/nf_conntrack_ipv4.ko \ net/ipv6/netfilter/nf_conntrack_ipv6.ko (builtin)|| dccp | ipv4 | ipv6 | nf_conntrack ---------++--------+--------+--------+-------------- none || 469140 | 828755 | 828676 | 6141434 DCCP || - | 830566 | 829935 | 6533526 Signed-off-by: Davide Caratti <dcaratti@redhat.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2016-12-03ipv6 addrconf: Implemented enhanced DAD (RFC7527)Erik Nordmark
Implemented RFC7527 Enhanced DAD. IPv6 duplicate address detection can fail if there is some temporary loopback of Ethernet frames. RFC7527 solves this by including a random nonce in the NS messages used for DAD, and if an NS is received with the same nonce it is assumed to be a looped back DAD probe and is ignored. RFC7527 is enabled by default. Can be disabled by setting both of conf/{all,interface}/enhanced_dad to zero. Signed-off-by: Erik Nordmark <nordmark@arista.com> Signed-off-by: Bob Gilligan <gilligan@arista.com> Reviewed-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-03Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller
Couple conflicts resolved here: 1) In the MACB driver, a bug fix to properly initialize the RX tail pointer properly overlapped with some changes to support variable sized rings. 2) In XGBE we had a "CONFIG_PM" --> "CONFIG_PM_SLEEP" fix overlapping with a reorganization of the driver to support ACPI, OF, as well as PCI variants of the chip. 3) In 'net' we had several probe error path bug fixes to the stmmac driver, meanwhile a lot of this code was cleaned up and reorganized in 'net-next'. 4) The cls_flower classifier obtained a helper function in 'net-next' called __fl_delete() and this overlapped with Daniel Borkamann's bug fix to use RCU for object destruction in 'net'. It also overlapped with Jiri's change to guard the rhashtable_remove_fast() call with a check against tc_skip_sw(). 5) In mlx4, a revert bug fix in 'net' overlapped with some unrelated changes in 'net-next'. 6) In geneve, a stale header pointer after pskb_expand_head() bug fix in 'net' overlapped with a large reorganization of the same code in 'net-next'. Since the 'net-next' code no longer had the bug in question, there was nothing to do other than to simply take the 'net-next' hunks. Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-02Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netLinus Torvalds
Pull networking fixes from David Miller: 1) Lots more phydev and probe error path leaks in various drivers by Johan Hovold. 2) Fix race in packet_set_ring(), from Philip Pettersson. 3) Use after free in dccp_invalid_packet(), from Eric Dumazet. 4) Signnedness overflow in SO_{SND,RCV}BUFFORCE, also from Eric Dumazet. 5) When tunneling between ipv4 and ipv6 we can be left with the wrong skb->protocol value as we enter the IPSEC engine and this causes all kinds of problems. Set it before the output path does any dst_output() calls, from Eli Cooper. 6) bcmgenet uses wrong device struct pointer in DMA API calls, fix from Florian Fainelli. 7) Various netfilter nat bug fixes from FLorian Westphal. 8) Fix memory leak in ipvlan_link_new(), from Gao Feng. 9) Locking fixes, particularly wrt. socket lookups, in l2tp from Guillaume Nault. 10) Avoid invoking rhash teardowns in atomic context by moving netlink cb->done() dump completion from a worker thread. Fix from Herbert Xu. 11) Buffer refcount problems in tun and macvtap on errors, from Jason Wang. 12) We don't set Kconfig symbol DEFAULT_TCP_CONG properly when the user selects BBR. Fix from Julian Wollrath. 13) Fix deadlock in transmit path on altera TSE driver, from Lino Sanfilippo. 14) Fix unbalanced reference counting in dsa_switch_tree, from Nikita Yushchenko. 15) tc_tunnel_key needs to be properly exported to userspace via uapi, fix from Roi Dayan. 16) rds_tcp_init_net() doesn't unregister notifier in error path, fix from Sowmini Varadhan. 17) Stale packet header pointer access after pskb_expand_head() in genenve driver, fix from Sabrina Dubroca. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (103 commits) net: avoid signed overflows for SO_{SND|RCV}BUFFORCE geneve: avoid use-after-free of skb->data tipc: check minimum bearer MTU net: renesas: ravb: unintialized return value sh_eth: remove unchecked interrupts for RZ/A1 net: bcmgenet: Utilize correct struct device for all DMA operations NET: usb: qmi_wwan: add support for Telit LE922A PID 0x1040 cdc_ether: Fix handling connection notification ip6_offload: check segs for NULL in ipv6_gso_segment. RDS: TCP: unregister_netdevice_notifier() in error path of rds_tcp_init_net Revert: "ip6_tunnel: Update skb->protocol to ETH_P_IPV6 in ip6_tnl_xmit()" ipv6: Set skb->protocol properly for local output ipv4: Set skb->protocol properly for local output packet: fix race condition in packet_set_ring net: ethernet: altera: TSE: do not use tx queue lock in tx completion handler net: ethernet: altera: TSE: Remove unneeded dma sync for tx buffers net: ethernet: stmmac: fix of-node and fixed-link-phydev leaks net: ethernet: stmmac: platform: fix outdated function header net: ethernet: stmmac: dwmac-meson8b: fix probe error path net: ethernet: stmmac: dwmac-generic: fix probe error path ...
2016-12-02bpf: Add new cgroup attach type to enable sock modificationsDavid Ahern
Add new cgroup based program type, BPF_PROG_TYPE_CGROUP_SOCK. Similar to BPF_PROG_TYPE_CGROUP_SKB programs can be attached to a cgroup and run any time a process in the cgroup opens an AF_INET or AF_INET6 socket. Currently only sk_bound_dev_if is exported to userspace for modification by a bpf program. This allows a cgroup to be configured such that AF_INET{6} sockets opened by processes are automatically bound to a specific device. In turn, this enables the running of programs that do not support SO_BINDTODEVICE in a specific VRF context / L3 domain. Signed-off-by: David Ahern <dsa@cumulusnetworks.com> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-02bpf: Refactor cgroups code in prep for new typeDavid Ahern
Code move and rename only; no functional change intended. Signed-off-by: David Ahern <dsa@cumulusnetworks.com> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-02net/sched: cls_flower: Add offload support using egress Hardware deviceHadar Hen Zion
In order to support hardware offloading when the device given by the tc rule is different from the Hardware underline device, extract the mirred (egress) device from the tc action when a filter is added, using the new tc_action_ops, get_dev(). Flower caches the information about the mirred device and use it for calling ndo_setup_tc in filter change, update stats and delete. Calling ndo_setup_tc of the mirred (egress) device instead of the ingress device will allow a resolution between the software ingress device and the underline hardware device. The resolution will take place inside the offloading driver using 'egress_device' flag added to tc_to_netdev struct which is provided to the offloading driver. Signed-off-by: Hadar Hen Zion <hadarh@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-02tcp: randomize tcp timestamp offsets for each connectionFlorian Westphal
jiffies based timestamps allow for easy inference of number of devices behind NAT translators and also makes tracking of hosts simpler. commit ceaa1fef65a7c2e ("tcp: adding a per-socket timestamp offset") added the main infrastructure that is needed for per-connection ts randomization, in particular writing/reading the on-wire tcp header format takes the offset into account so rest of stack can use normal tcp_time_stamp (jiffies). So only two items are left: - add a tsoffset for request sockets - extend the tcp isn generator to also return another 32bit number in addition to the ISN. Re-use of ISN generator also means timestamps are still monotonically increasing for same connection quadruple, i.e. PAWS will still work. Includes fixes from Eric Dumazet. Signed-off-by: Florian Westphal <fw@strlen.de> Acked-by: Eric Dumazet <edumazet@google.com> Acked-by: Yuchung Cheng <ycheng@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-02qed: Add support for hardware offloaded iSCSI.Yuval Mintz
This adds the backbone required for the various HW initalizations which are necessary for the iSCSI driver (qedi) for QLogic FastLinQ 4xxxx line of adapters - FW notification, resource initializations, etc. Signed-off-by: Arun Easi <arun.easi@cavium.com> Signed-off-by: Yuval Mintz <yuval.mintz@cavium.com> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Reviewed-by: Hannes Reinecke <hare@suse.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-02bpf, xdp: drop rcu_read_lock from bpf_prog_run_xdp and move to callerDaniel Borkmann
After 326fe02d1ed6 ("net/mlx4_en: protect ring->xdp_prog with rcu_read_lock"), the rcu_read_lock() in bpf_prog_run_xdp() is superfluous, since callers need to hold rcu_read_lock() already to make sure BPF program doesn't get released in the background. Thus, drop it from bpf_prog_run_xdp(), as it can otherwise be misleading. Still keeping the bpf_prog_run_xdp() is useful as it allows for grepping in XDP supported drivers and to keep the typecheck on the context intact. For mlx4, this means we don't have a double rcu_read_lock() anymore. nfp can just make use of bpf_prog_run_xdp(), too. For qede, just move rcu_read_lock() out of the helper. When the driver gets atomic replace support, this will move to call-sites eventually. mlx5 needs actual fixing as it has the same issue as described already in 326fe02d1ed6 ("net/mlx4_en: protect ring->xdp_prog with rcu_read_lock"), that is, we're under RCU bh at this time, BPF programs are released via call_rcu(), and call_rcu() != call_rcu_bh(), so we need to properly mark read side as programs can get xchg()'ed in mlx5e_xdp_set() without queue reset. Fixes: 86994156c736 ("net/mlx5e: XDP fast RX drop bpf programs support") Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-02bpf: BPF for lightweight tunnel infrastructureThomas Graf
Registers new BPF program types which correspond to the LWT hooks: - BPF_PROG_TYPE_LWT_IN => dst_input() - BPF_PROG_TYPE_LWT_OUT => dst_output() - BPF_PROG_TYPE_LWT_XMIT => lwtunnel_xmit() The separate program types are required to differentiate between the capabilities each LWT hook allows: * Programs attached to dst_input() or dst_output() are restricted and may only read the data of an skb. This prevent modification and possible invalidation of already validated packet headers on receive and the construction of illegal headers while the IP headers are still being assembled. * Programs attached to lwtunnel_xmit() are allowed to modify packet content as well as prepending an L2 header via a newly introduced helper bpf_skb_change_head(). This is safe as lwtunnel_xmit() is invoked after the IP header has been assembled completely. All BPF programs receive an skb with L3 headers attached and may return one of the following error codes: BPF_OK - Continue routing as per nexthop BPF_DROP - Drop skb and return EPERM BPF_REDIRECT - Redirect skb to device as per redirect() helper. (Only valid in lwtunnel_xmit() context) The return codes are binary compatible with their TC_ACT_ relatives to ease compatibility. Signed-off-by: Thomas Graf <tgraf@suug.ch> Acked-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-02net/mlx5e: Implement Fragmented Work Queue (WQ)Tariq Toukan
Add new type of struct mlx5_frag_buf which is used to allocate fragmented buffers rather than contiguous, and make the Completion Queues (CQs) use it as they are big (default of 2MB per CQ in Striding RQ). This fixes the failures of type: "mlx5e_open_locked: mlx5e_open_channels failed, -12" due to dma_zalloc_coherent insufficient contiguous coherent memory to satisfy the driver's request when the user tries to setup more or larger rings. Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Reported-by: Sebastian Ott <sebott@linux.vnet.ibm.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-02Merge branch 'locking/urgent' into locking/core, to pick up dependent fixesIngo Molnar
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-12-01Merge tag 'pci-v4.9-fixes-4' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci Pull PCI fixes from Bjorn Helgaas: "PCI fixes: - Fix Read Completion Boundary setting, which fixes a boot failure on IBM x3850 with Mellanox MT27500 ConnectX-3 - Update some MAINTAINERS entries and email addresses" * tag 'pci-v4.9-fixes-4' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci: PCI: Set Read Completion Boundary to 128 iff Root Port supports it (_HPX) PCI: Export pcie_find_root_port PCI: designware-plat: Update author email PCI: designware: Change maintainer to Joao Pinto MAINTAINERS: Add devicetree binding to PCI i.MX6 entry MAINTAINERS: Update Richard Zhu's email address
2016-12-02zram: Convert to hotplug state machineAnna-Maria Gleixner
Install the callbacks via the state machine with multi instance support and let the core invoke the callbacks on the already online CPUs. [bigeasy: wire up the multi instance stuff] Signed-off-by: Anna-Maria Gleixner <anna-maria@linutronix.de> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com> Cc: Minchan Kim <minchan@kernel.org> Cc: rt@linutronix.de Cc: Nitin Gupta <ngupta@vflare.org> Link: http://lkml.kernel.org/r/20161126231350.10321-19-bigeasy@linutronix.de Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2016-12-02KVM/PPC/Book3S HV: Convert to hotplug state machineAnna-Maria Gleixner
Install the callbacks via the state machine. Signed-off-by: Anna-Maria Gleixner <anna-maria@linutronix.de> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Cc: kvm@vger.kernel.org Cc: Radim Krčmář <rkrcmar@redhat.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: linuxppc-dev@lists.ozlabs.org Cc: kvm-ppc@vger.kernel.org Cc: Paul Mackerras <paulus@samba.org> Cc: rt@linutronix.de Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Alexander Graf <agraf@suse.com> Link: http://lkml.kernel.org/r/20161126231350.10321-18-bigeasy@linutronix.de Signed-off-by: Thomas Gleixner <tglx@linutronix.de>