summaryrefslogtreecommitdiff
path: root/include/linux
AgeCommit message (Collapse)Author
2016-12-09udp: add batching to udp_rmem_release()Eric Dumazet
If udp_recvmsg() constantly releases sk_rmem_alloc for every read packet, it gives opportunity for producers to immediately grab spinlocks and desperatly try adding another packet, causing false sharing. We can add a simple heuristic to give the signal by batches of ~25 % of the queue capacity. This patch considerably increases performance under flood by about 50 %, since the thread draining the queue is no longer slowed by false sharing. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-09udp: copy skb->truesize in the first cache lineEric Dumazet
In UDP RX handler, we currently clear skb->dev before skb is added to receive queue, because device pointer is no longer available once we exit from RCU section. Since this first cache line is always hot, lets reuse this space to store skb->truesize and thus avoid a cache line miss at udp_recvmsg()/udp_skb_destructor time while receive queue spinlock is held. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-09vfs: refactor clone/dedupe_file_range common functionsDarrick J. Wong
Hoist both the XFS reflink inode state and preparation code and the XFS file blocks compare functions into the VFS so that ocfs2 can take advantage of it for reflink and dedupe. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2016-12-09configfs: Minimize #include directivesBart Van Assche
Only include the header files that are needed by configfs.h itself. Add #include <linux/stat.h>. Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com> Cc: Christoph Hellwig <hch@lst.de>
2016-12-09elevator: make the rqhash helpers exportedJens Axboe
Signed-off-by: Jens Axboe <axboe@fb.com> Reviewed-by: Hannes Reinecke <hare@suse.com>
2016-12-09blk-mq: add blk_mq_start_stopped_hw_queue()Jens Axboe
We have a variant for all hardware queues, but not one for a single hardware queue. Signed-off-by: Jens Axboe <axboe@fb.com> Reviewed-by: Hannes Reinecke <hare@suse.com>
2016-12-09vfs: make generic_readlink() staticMiklos Szeredi
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2016-12-09vfs: default to generic_readlink()Miklos Szeredi
If i_op->readlink is NULL, but i_op->get_link is set then vfs_readlink() defaults to calling generic_readlink(). The IOP_DEFAULT_READLINK flag indicates that the above conditions are met and the default action can be taken. Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2016-12-09vfs: replace calling i_op->readlink with vfs_readlink()Miklos Szeredi
Also check d_is_symlink() in callers instead of inode->i_op->readlink because following patches will allow NULL ->readlink for symlinks. Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2016-12-09block: improve handling of the magic discard payloadChristoph Hellwig
Instead of allocating a single unused biovec for discard requests, send them down without any payload. Instead we allow the driver to add a "special" payload using a biovec embedded into struct request (unioned over other fields never used while in the driver), and overloading the number of segments for this case. This has a couple of advantages: - we don't have to allocate the bio_vec - the amount of special casing for discard requests in the block layer is significantly reduced - using this same scheme for other request types is trivial, which will be important for implementing the new WRITE_ZEROES op on devices where it actually requires a payload (e.g. SCSI) - we can get rid of playing games with the request length, as we'll never touch it and completions will work just fine - it will allow us to support ranged discard operations in the future by merging non-contiguous discard bios into a single request - last but not least it removes a lot of code This patch is the common base for my WIP series for ranges discards and to remove discard_zeroes_data in favor of always using REQ_OP_WRITE_ZEROES, so it would be good to get it in quickly. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@fb.com>
2016-12-09tracing: Have the reg function allow to failSteven Rostedt (Red Hat)
Some tracepoints have a registration function that gets enabled when the tracepoint is enabled. There may be cases that the registraction function must fail (for example, can't allocate enough memory). In this case, the tracepoint should also fail to register, otherwise the user would not know why the tracepoint is not working. Cc: David Howells <dhowells@redhat.com> Cc: Seiji Aguchi <seiji.aguchi@hds.com> Cc: Anton Blanchard <anton@samba.org> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2016-12-08clk: add devm_get_clk_from_child() APIKuninori Morimoto
Some driver is using this type of DT bindings for clock (more detail, see ${LINUX}/Documentation/devicetree/bindings/sound/simple-card.txt). sound_soc { ... cpu { clocks = <&xxx>; ... }; codec { clocks = <&xxx>; ... }; }; Current driver in this case uses of_clk_get() for each node, but there is no devm_of_clk_get() today. OTOH, the problem of having devm_of_clk_get() is that it encourages the use of of_clk_get() when clk_get() is more desirable. Thus, this patch adds new devm_get_clk_from_chile() which explicitly reads as get a clock from a child node of this device. By this function, we can also use this type of DT bindings sound_soc { clocks = <&xxx>, <&xxx>; clock-names = "cpu", "codec"; clock-ranges; ... cpu { ... }; codec { ... }; }; Signed-off-by: Kuninori Morimoto <kuninori.morimoto.gx@renesas.com> [sboyd@codeurora.org: Rename subject to clk + add API] Signed-off-by: Stephen Boyd <sboyd@codeaurora.org>
2016-12-08bpf: xdp: Allow head adjustment in XDP progMartin KaFai Lau
This patch allows XDP prog to extend/remove the packet data at the head (like adding or removing header). It is done by adding a new XDP helper bpf_xdp_adjust_head(). It also renames bpf_helper_changes_skb_data() to bpf_helper_changes_pkt_data() to better reflect that XDP prog does not work on skb. This patch adds one "xdp_adjust_head" bit to bpf_prog for the XDP-capable driver to check if the XDP prog requires bpf_xdp_adjust_head() support. The driver can then decide to error out during XDP_SETUP_PROG. Signed-off-by: Martin KaFai Lau <kafai@fb.com> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org> Acked-by: John Fastabend <john.r.fastabend@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-08phy: add phy fixup unregister functionsWoojung.Huh@microchip.com
>From : Woojung Huh <woojung.huh@microchip.com> Add functions to unregister phy fixup for modules. int phy_unregister_fixup(const char *bus_id, u32 phy_uid, u32 phy_uid_mask) Unregister phy fixup from phy_fixup_list per bus_id, phy_uid & phy_uid_mask int phy_unregister_fixup_for_uid(u32 phy_uid, u32 phy_uid_mask) Unregister phy fixup from phy_fixup_list. Use it for fixup registered by phy_register_fixup_for_uid() int phy_unregister_fixup_for_id(const char *bus_id) Unregister phy fixup from phy_fixup_list. Use it for fixup registered by phy_register_fixup_for_id() Signed-off-by: Woojung Huh <woojung.huh@microchip.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-08bpf: fix state equivalenceAlexei Starovoitov
Commmits 57a09bf0a416 ("bpf: Detect identical PTR_TO_MAP_VALUE_OR_NULL registers") and 484611357c19 ("bpf: allow access into map value arrays") by themselves are correct, but in combination they make state equivalence ignore 'id' field of the register state which can lead to accepting invalid program. Fixes: 57a09bf0a416 ("bpf: Detect identical PTR_TO_MAP_VALUE_OR_NULL registers") Fixes: 484611357c19 ("bpf: allow access into map value arrays") Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-08udp: under rx pressure, try to condense skbsEric Dumazet
Under UDP flood, many softirq producers try to add packets to UDP receive queue, and one user thread is burning one cpu trying to dequeue packets as fast as possible. Two parts of the per packet cost are : - copying payload from kernel space to user space, - freeing memory pieces associated with skb. If socket is under pressure, softirq handler(s) can try to pull in skb->head the payload of the packet if it fits. Meaning the softirq handler(s) can free/reuse the page fragment immediately, instead of letting udp_recvmsg() do this hundreds of usec later, possibly from another node. Additional gains : - We reduce skb->truesize and thus can store more packets per SO_RCVBUF - We avoid cache line misses at copyout() time and consume_skb() time, and avoid one put_page() with potential alien freeing on NUMA hosts. This comes at the cost of a copy, bounded to available tail room, which is usually small. (We might have to fix GRO_MAX_HEAD which looks bigger than necessary) This patch gave me about 5 % increase in throughput in my tests. skb_condense() helper could probably used in other contexts. Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Paolo Abeni <pabeni@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-08net: rfs: add a jump labelEric Dumazet
RFS is not commonly used, so add a jump label to avoid some conditionals in fast path. Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Paolo Abeni <pabeni@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-08net: smmac: allow configuring lower pbl valuesNiklas Cassel
The driver currently always sets the PBLx8/PBLx4 bit, which means that the pbl values configured via the pbl/txpbl/rxpbl DT properties are always multiplied by 8/4 in the hardware. In order to allow the DT to configure lower pbl values, while at the same time not changing behavior of any existing device trees using the pbl/txpbl/rxpbl settings, add a property to disable the multiplication of the pbl by 8/4 in the hardware. Suggested-by: Rabin Vincent <rabinv@axis.com> Signed-off-by: Niklas Cassel <niklas.cassel@axis.com> Acked-by: Alexandre Torgue <alexandre.torgue@st.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-08net: stmmac: add support for independent DMA pbl for tx/rxNiklas Cassel
GMAC and newer supports independent programmable burst lengths for DMA tx/rx. Add new optional devicetree properties representing this. To be backwards compatible, snps,pbl will still be valid, but snps,txpbl/snps,rxpbl will override the value in snps,pbl if set. If the IP is synthesized to use the AXI interface, there is a register and a matching DT property inside the optional stmmac-axi-config DT node for controlling burst lengths, named snps,blen. However, using this register, it is not possible to control tx and rx independently. Also, this register is not available if the IP was synthesized with, e.g., the AHB interface. Signed-off-by: Niklas Cassel <niklas.cassel@axis.com> Acked-by: Alexandre Torgue <alexandre.torgue@st.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-08NET: usb: cdc_mbim: add quirk for supporting Telit LE922ADaniele Palmas
Telit LE922A MBIM based composition does not work properly with altsetting toggle done in cdc_ncm_bind_common. This patch adds CDC_MBIM_FLAG_AVOID_ALTSETTING_TOGGLE quirk to avoid this procedure that, instead, is mandatory for other modems. Signed-off-by: Daniele Palmas <dnlplm@gmail.com> Reviewed-by: Bjørn Mork <bjorn@mork.no> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-08kthread: Make struct kthread kmalloc'edOleg Nesterov
commit 23196f2e5f5d "kthread: Pin the stack via try_get_task_stack() / put_task_stack() in to_live_kthread() function" is a workaround for the fragile design of struct kthread being allocated on the task stack. struct kthread in its current form should be removed, but this needs cleanups outside of kthread.c. As a first step move struct kthread away from the task stack by making it kmalloc'ed. This allows to access kthread.exited without the magic of trying to pin task stack and the try logic in to_live_kthread(). Signed-off-by: Oleg Nesterov <oleg@redhat.com> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Cc: Chunming Zhou <David1.Zhou@amd.com> Cc: Roman Pen <roman.penyaev@profitbricks.com> Cc: Petr Mladek <pmladek@suse.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Tejun Heo <tj@kernel.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: Alex Deucher <alexander.deucher@amd.com> Cc: Andrew Morton <akpm@linux-foundation.org> Link: http://lkml.kernel.org/r/20161129175057.GA5330@redhat.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2016-12-08hotplug: Make register and unregister notifier API symmetricMichal Hocko
Yu Zhao has noticed that __unregister_cpu_notifier only unregisters its notifiers when HOTPLUG_CPU=y while the registration might succeed even when HOTPLUG_CPU=n if MODULE is enabled. This means that e.g. zswap might keep a stale notifier on the list on the manual clean up during the pool tear down and thus corrupt the list. Resulting in the following [ 144.964346] BUG: unable to handle kernel paging request at ffff880658a2be78 [ 144.971337] IP: [<ffffffffa290b00b>] raw_notifier_chain_register+0x1b/0x40 <snipped> [ 145.122628] Call Trace: [ 145.125086] [<ffffffffa28e5cf8>] __register_cpu_notifier+0x18/0x20 [ 145.131350] [<ffffffffa2a5dd73>] zswap_pool_create+0x273/0x400 [ 145.137268] [<ffffffffa2a5e0fc>] __zswap_param_set+0x1fc/0x300 [ 145.143188] [<ffffffffa2944c1d>] ? trace_hardirqs_on+0xd/0x10 [ 145.149018] [<ffffffffa2908798>] ? kernel_param_lock+0x28/0x30 [ 145.154940] [<ffffffffa2a3e8cf>] ? __might_fault+0x4f/0xa0 [ 145.160511] [<ffffffffa2a5e237>] zswap_compressor_param_set+0x17/0x20 [ 145.167035] [<ffffffffa2908d3c>] param_attr_store+0x5c/0xb0 [ 145.172694] [<ffffffffa290848d>] module_attr_store+0x1d/0x30 [ 145.178443] [<ffffffffa2b2b41f>] sysfs_kf_write+0x4f/0x70 [ 145.183925] [<ffffffffa2b2a5b9>] kernfs_fop_write+0x149/0x180 [ 145.189761] [<ffffffffa2a99248>] __vfs_write+0x18/0x40 [ 145.194982] [<ffffffffa2a9a412>] vfs_write+0xb2/0x1a0 [ 145.200122] [<ffffffffa2a9a732>] SyS_write+0x52/0xa0 [ 145.205177] [<ffffffffa2ff4d97>] entry_SYSCALL_64_fastpath+0x12/0x17 This can be even triggered manually by changing /sys/module/zswap/parameters/compressor multiple times. Fix this issue by making unregister APIs symmetric to the register so there are no surprises. Fixes: 47e627bc8c9a ("[PATCH] hotplug: Allow modules to use the cpu hotplug notifiers even if !CONFIG_HOTPLUG_CPU") Reported-and-tested-by: Yu Zhao <yuzhao@google.com> Signed-off-by: Michal Hocko <mhocko@suse.com> Cc: linux-mm@kvack.org Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Dan Streetman <ddstreet@ieee.org> Link: http://lkml.kernel.org/r/20161207135438.4310-1-mhocko@kernel.org Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2016-12-07Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-nextDavid S. Miller
Pablo Neira Ayuso says: ==================== Netfilter/IPVS updates for net-next The following patchset contains a large Netfilter update for net-next, to summarise: 1) Add support for stateful objects. This series provides a nf_tables native alternative to the extended accounting infrastructure for nf_tables. Two initial stateful objects are supported: counters and quotas. Objects are identified by a user-defined name, you can fetch and reset them anytime. You can also use a maps to allow fast lookups using any arbitrary key combination. More info at: http://marc.info/?l=netfilter-devel&m=148029128323837&w=2 2) On-demand registration of nf_conntrack and defrag hooks per netns. Register nf_conntrack hooks if we have a stateful ruleset, ie. state-based filtering or NAT. The new nf_conntrack_default_on sysctl enables this from newly created netnamespaces. Default behaviour is not modified. Patches from Florian Westphal. 3) Allocate 4k chunks and then use these for x_tables counter allocation requests, this improves ruleset load time and also datapath ruleset evaluation, patches from Florian Westphal. 4) Add support for ebpf to the existing x_tables bpf extension. From Willem de Bruijn. 5) Update layer 4 checksum if any of the pseudoheader fields is updated. This provides a limited form of 1:1 stateless NAT that make sense in specific scenario, eg. load balancing. 6) Add support to flush sets in nf_tables. This series comes with a new set->ops->deactivate_one() indirection given that we have to walk over the list of set elements, then deactivate them one by one. The existing set->ops->deactivate() performs an element lookup that we don't need. 7) Two patches to avoid cloning packets, thus speed up packet forwarding via nft_fwd from ingress. From Florian Westphal. 8) Two IPVS patches via Simon Horman: Decrement ttl in all modes to prevent infinite loops, patch from Dwip Banerjee. And one minor refactoring from Gao feng. 9) Revisit recent log support for nf_tables netdev families: One patch to ensure that we correctly handle non-ethernet packets. Another patch to add missing logger definition for netdev. Patches from Liping Zhang. 10) Three patches for nft_fib, one to address insufficient register initialization and another to solve incorrect (although harmless) byteswap operation. Moreover update xt_rpfilter and nft_fib to match lbcast packets with zeronet as source, eg. DHCP Discover packets (0.0.0.0 -> 255.255.255.255). Also from Liping Zhang. 11) Built-in DCCP, SCTP and UDPlite conntrack and NAT support, from Davide Caratti. While DCCP is rather hopeless lately, and UDPlite has been broken in many-cast mode for some little time, let's give them a chance by placing them at the same level as other existing protocols. Thus, users don't explicitly have to modprobe support for this and NAT rules work for them. Some people point to the lack of support in SOHO Linux-based routers that make deployment of new protocols harder. I guess other middleboxes outthere on the Internet are also to blame. Anyway, let's see if this has any impact in the midrun. 12) Skip software SCTP software checksum calculation if the NIC comes with SCTP checksum offload support. From Davide Caratti. 13) Initial core factoring to prepare conversion to hook array. Three patches from Aaron Conole. 14) Gao Feng made a wrong conversion to switch in the xt_multiport extension in a patch coming in the previous batch. Fix it in this batch. 15) Get vmalloc call in sync with kmalloc flags to avoid a warning and likely OOM killer intervention from x_tables. From Marcelo Ricardo Leitner. 16) Update Arturo Borrero's email address in all source code headers. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-07PCI: Export host bridge registration interfaceThierry Reding
Allow PCI host bridge drivers to use the new host bridge interfaces to register their host bridge. Signed-off-by: Thierry Reding <treding@nvidia.com> Signed-off-by: Bjorn Helgaas <helgaas@kernel.org>
2016-12-07PCI: Allow driver-specific data in host bridgeThierry Reding
Provide a way to allocate driver-specific data along with a PCI host bridge structure. The bridge's ->private field points to this data. Signed-off-by: Thierry Reding <treding@nvidia.com> Signed-off-by: Bjorn Helgaas <helgaas@kernel.org>
2016-12-07PCI: Add pci_register_host_bridge() interfaceArnd Bergmann
Make the existing pci_host_bridge structure a proper device that is usable by PCI host drivers in a more standard way. In addition to the existing pci_scan_bus(), pci_scan_root_bus(), pci_scan_root_bus_msi(), and pci_create_root_bus() interfaces, this unfortunately means having to add yet another interface doing basically the same thing, and add some extra code in the initial step. However, this time it's more likely to be extensible enough that we won't have to do another one again in the future, and we should be able to reduce code much more as a result. The main idea is to pull the allocation of 'struct pci_host_bridge' out of the registration, and let individual host drivers and architecture code fill the members before calling the registration function. There are a number of things we can do based on this: * Use a single memory allocation for the driver-specific structure and the generic PCI host bridge * consolidate the contents of driver-specific structures by moving them into pci_host_bridge * Add a consistent interface for removing a PCI host bridge again when unloading a host driver module * Replace the architecture specific __weak pcibios_*() functions with callbacks in a pci_host_bridge device * Move common boilerplate code from host drivers into the generic function, based on contents of the structure * Extend pci_host_bridge with additional members when needed without having to add arguments to pci_scan_*(). * Move members of struct pci_bus into pci_host_bridge to avoid having lots of identical copies. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Thierry Reding <treding@nvidia.com> Signed-off-by: Bjorn Helgaas <helgaas@kernel.org>
2016-12-07Merge branch 'thread-irq-simpler' into develLinus Walleij
2016-12-07Merge branch 'pl061' into develLinus Walleij
2016-12-07gpio: pl061: move platform data into driverLinus Walleij
No boardfile defines any PL061 platform data anymore: the Integrator IM/PD-1 includes the file but is not making use of the struct. Let's delete the include and all references, then move the platform data into the driver for later consolidation into the driver state container. The only resource defined by the IM/PD-1 is the IRQ which is passed through the AMBA PrimeCell bus abstraction struct amba_device. Cc: arm@vger.kernel.org Cc: linux-arm-kernel@lists.infradead.org Cc: Russell King <linux@armlinux.org.uk> Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
2016-12-06acpi, nfit, libnvdimm: fix / harden ars_status output length handlingDan Williams
Given ambiguities in the ACPI 6.1 definition of the "Output (Size)" field of the ARS (Address Range Scrub) Status command, a firmware implementation may in practice return 0, 4, or 8 to indicate that there is no output payload to process. The specification states "Size of Output Buffer in bytes, including this field.". However, 'Output Buffer' is also the name of the entire payload, and earlier in the specification it states "Max Query ARS Status Output Buffer Size: Maximum size of buffer (including the Status and Extended Status fields)". Without this fix if the BIOS happens to return 0 it causes memory corruption as evidenced by this result from the acpi_nfit_ctl() unit test. ars_status00000000: 00020000 00000000 ........ BUG: stack guard page was hit at ffffc90001750000 (stack is ffffc9000174c000..ffffc9000174ffff) kernel stack overflow (page fault): 0000 [#1] SMP DEBUG_PAGEALLOC task: ffff8803332d2ec0 task.stack: ffffc9000174c000 RIP: 0010:[<ffffffff814cfe72>] [<ffffffff814cfe72>] __memcpy+0x12/0x20 RSP: 0018:ffffc9000174f9a8 EFLAGS: 00010246 RAX: ffffc9000174fab8 RBX: 0000000000000000 RCX: 000000001fffff56 RDX: 0000000000000000 RSI: ffff8803231f5a08 RDI: ffffc90001750000 RBP: ffffc9000174fa88 R08: ffffc9000174fab0 R09: ffff8803231f54b8 R10: 0000000000000008 R11: 0000000000000001 R12: 0000000000000000 R13: 0000000000000000 R14: 0000000000000003 R15: ffff8803231f54a0 FS: 00007f3a611af640(0000) GS:ffff88033ed00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: ffffc90001750000 CR3: 0000000325b20000 CR4: 00000000000406e0 Stack: ffffffffa00bc60d 0000000000000008 ffffc90000000001 ffffc9000174faac 0000000000000292 ffffffffa00c24e4 ffffffffa00c2914 0000000000000000 0000000000000000 ffffffff00000003 ffff880331ae8ad0 0000000800000246 Call Trace: [<ffffffffa00bc60d>] ? acpi_nfit_ctl+0x49d/0x750 [nfit] [<ffffffffa01f4fe0>] nfit_test_probe+0x670/0xb1b [nfit_test] Cc: <stable@vger.kernel.org> Fixes: 747ffe11b440 ("libnvdimm, tools/testing/nvdimm: fix 'ars_status' output buffer sizing") Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2016-12-06netfilter: ingress: translate 0 nf_hook_slow retval to -1Florian Westphal
The caller assumes that < 0 means that skb was stolen (or free'd). All other return values continue skb processing. nf_hook_slow returns 3 different return value types: A) a (negative) errno value: the skb was dropped (NF_DROP, e.g. by iptables '-j DROP' rule). B) 0. The skb was stolen by the hook or queued to userspace. C) 1. all hooks returned NF_ACCEPT so the caller should invoke the okfn so packet processing can continue. nft ingress facility currently doesn't have the 'okfn' that the NF_HOOK() macros use; there is no nfqueue support either. So 1 means that nf_hook_ingress() caller should go on processing the skb. In order to allow use of NF_STOLEN from ingress we need to translate this to an errno number, else we'd crash because we continue with already-free'd (or about to be free-d) skb. The errno value isn't checked, its just important that its less than 0, so return -1. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2016-12-06netfilter: x_tables: pack percpu counter allocationsFlorian Westphal
instead of allocating each xt_counter individually, allocate 4k chunks and then use these for counter allocation requests. This should speed up rule evaluation by increasing data locality, also speeds up ruleset loading because we reduce calls to the percpu allocator. As Eric points out we can't use PAGE_SIZE, page_allocator would fail on arches with 64k page size. Suggested-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Florian Westphal <fw@strlen.de> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2016-12-06netfilter: x_tables: pass xt_counters struct to counter allocatorFlorian Westphal
Keeps some noise away from a followup patch. Signed-off-by: Florian Westphal <fw@strlen.de> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2016-12-06netfilter: x_tables: pass xt_counters struct instead of packet counterFlorian Westphal
On SMP we overload the packet counter (unsigned long) to contain percpu offset. Hide this from callers and pass xt_counters address instead. Preparation patch to allocate the percpu counters in page-sized batch chunks. Signed-off-by: Florian Westphal <fw@strlen.de> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2016-12-06netfilter: decouple nf_hook_entry and nf_hook_opsAaron Conole
During nfhook traversal we only need a very small subset of nf_hook_ops members. We need: - next element - hook function to call - hook function priv argument Bridge netfilter also needs 'thresh'; can be obtained via ->orig_ops. nf_hook_entry struct is now 32 bytes on x86_64. A followup patch will turn the run-time list into an array that only stores hook functions plus their priv arguments, eliminating the ->next element. Suggested-by: Florian Westphal <fw@strlen.de> Signed-off-by: Aaron Conole <aconole@bytheb.org> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2016-12-06netfilter: introduce accessor functions for hook entriesAaron Conole
This allows easier future refactoring. Signed-off-by: Aaron Conole <aconole@bytheb.org> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2016-12-06PCI: Add MCFG quirks for X-Gene host controllerDuc Dang
PCIe controllers in X-Gene SoCs are not ECAM compliant: software needs to configure additional controller's register to address device at bus:dev:function. Add a quirk to discover controller MMIO register space and configure controller registers to select and address the target secondary device. The quirk will only be applied for X-Gene PCIe MCFG table with OEM revison 1, 2, 3 or 4 (PCIe controller v1 and v2 on X-Gene SoCs). Tested-by: Jon Masters <jcm@redhat.com> Signed-off-by: Duc Dang <dhdang@apm.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2016-12-06PCI: Add MCFG quirks for Cavium ThunderX pass1.x host controllerTomasz Nowicki
ThunderX pass1.x requires to emulate the EA headers for on-chip devices hence it has to use custom pci_thunder_ecam_ops for accessing PCI config space (pci-thunder-ecam.c). Add new entries to MCFG quirk array where it can be applied while probing ACPI based PCI host controller. ThunderX pass1.x is using the same way for accessing off-chip devices (so-called PEM) as silicon pass-2.x so we need to add PEM quirk entries too. Quirk is considered for ThunderX silicon pass1.x only which is identified via MCFG revision 2. ThunderX pass 1.x requires the following accessors: NUMA node 0 PCI segments 0- 3: pci_thunder_ecam_ops (MCFG quirk) NUMA node 0 PCI segments 4- 9: thunder_pem_ecam_ops (MCFG quirk) NUMA node 1 PCI segments 10-13: pci_thunder_ecam_ops (MCFG quirk) NUMA node 1 PCI segments 14-19: thunder_pem_ecam_ops (MCFG quirk) [bhelgaas: change Makefile/ifdefs so quirk doesn't depend on CONFIG_PCI_HOST_THUNDER_ECAM] Signed-off-by: Tomasz Nowicki <tn@semihalf.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2016-12-06PCI: Add MCFG quirks for Cavium ThunderX pass2.x host controllerTomasz Nowicki
ThunderX PCIe controller to off-chip devices (so-called PEM) is not fully compliant with ECAM standard. It uses non-standard configuration space accessors (see thunder_pem_ecam_ops) and custom configuration space granulation (see bus_shift = 24). In order to access configuration space and probe PEM as ACPI-based PCI host controller we need to add MCFG quirk infrastructure. This involves: 1. A new thunder_pem_acpi_init() init function to locate PEM-specific register ranges using ACPI. 2. Export PEM thunder_pem_ecam_ops structure so it is visible to MCFG quirk code. 3. New quirk entries for each PEM segment. Each contains platform IDs, mentioned thunder_pem_ecam_ops and CFG resources. Quirk is considered for ThunderX silicon pass2.x only which is identified via MCFG revision 1. ThunderX pass 2.x requires the following accessors: NUMA Node 0 PCI segments 0- 3: pci_generic_ecam_ops (ECAM-compliant) NUMA Node 0 PCI segments 4- 9: thunder_pem_ecam_ops (MCFG quirk) NUMA Node 1 PCI segments 10-13: pci_generic_ecam_ops (ECAM-compliant) NUMA Node 1 PCI segments 14-19: thunder_pem_ecam_ops (MCFG quirk) [bhelgaas: adapt to use acpi_get_rc_resources(), update Makefile/ifdefs so quirk doesn't depend on CONFIG_PCI_HOST_THUNDER_PEM] Signed-off-by: Tomasz Nowicki <tn@semihalf.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2016-12-06PCI: Add MCFG quirks for HiSilicon Hip05/06/07 host controllersDongdong Liu
The PCIe controller in Hip05/Hip06/Hip07 SoCs is not completely ECAM-compliant. It is non-ECAM only for the RC bus config space; for any other bus underneath the root bus it does support ECAM access. Add specific quirks for PCI config space accessors. This involves: 1. New initialization call hisi_pcie_init() to obtain RC base addresses from PNP0C02 at the root of the ACPI namespace (under \_SB). 2. New entry in common quirk array. [bhelgaas: move to pcie-hisi.c and change Makefile/ifdefs so quirk doesn't depend on CONFIG_PCI_HISI] Signed-off-by: Dongdong Liu <liudongdong3@huawei.com> Signed-off-by: Gabriele Paoloni <gabriele.paoloni@huawei.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2016-12-06PCI: Add MCFG quirks for Qualcomm QDF2432 host controllerChristopher Covington
The Qualcomm Technologies QDF2432 SoC does not support accesses smaller than 32 bits to the PCI configuration space. Register the appropriate quirk. [bhelgaas: add QCOM_ECAM32 macro, ifdef for ACPI and PCI_QUIRKS] Signed-off-by: Christopher Covington <cov@codeaurora.org> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2016-12-06PCI/ACPI: Extend pci_mcfg_lookup() to return ECAM config accessorsTomasz Nowicki
pci_mcfg_lookup() is the external interface to the generic MCFG code. Previously it merely looked up the ECAM base address for a given domain and bus range. We want a way to add MCFG quirks, some of which may require special config accessors and adjustments to the ECAM address range. Extend pci_mcfg_lookup() so it can return a pointer to a pci_ecam_ops structure and a struct resource for the ECAM address space. For now, it always returns &pci_generic_ecam_ops (the standard accessor) and the resource described by the MCFG. No functional changes intended. [bhelgaas: changelog] Signed-off-by: Tomasz Nowicki <tn@semihalf.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2016-12-06Merge branches 'arm/mediatek', 'arm/smmu', 'x86/amd', 's390', 'core' and ↵Joerg Roedel
'arm/exynos' into next
2016-12-06ACPI/IORT: Make dma masks set-up IORT specificLorenzo Pieralisi
The introduction of acpi_dma_configure() allows to configure DMA and related IOMMU for any device that is DMA capable. To achieve that goal it ensures DMA masks are set-up to sane default values before proceeding with IOMMU and DMA ops configuration. On x86/ia64 systems, through acpi_bind_one(), acpi_dma_configure() is called for every device that has an ACPI companion, in that every device is considered DMA capable on x86/ia64 systems (ie acpi_get_dma_attr() API), which has the side effect of initializing dma masks also for pseudo-devices (eg CPUs and memory nodes) and potentially for devices whose dma masks were not set-up before the acpi_dma_configure() API was introduced, which may have noxious side effects. Therefore, in preparation for IORT firmware specific DMA masks set-up, wrap the default DMA masks set-up in acpi_dma_configure() inside an IORT specific wrapper that reverts to a NOP on x86/ia64 systems, restoring the default expected behaviour on x86/ia64 systems and keeping DMA default masks set-up on IORT based (ie ARM) arch configurations. Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com> Acked-by: Will Deacon <will.deacon@arm.com> Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Reviewed-by: Hanjun Guo <hanjun.guo@linaro.org> Tested-by: Hanjun Guo <hanjun.guo@linaro.org> Cc: Will Deacon <will.deacon@arm.com> Cc: Hanjun Guo <hanjun.guo@linaro.org> Cc: Bjorn Helgaas <bhelgaas@google.com> Cc: Robin Murphy <robin.murphy@arm.com> Cc: Tomasz Nowicki <tn@semihalf.com> Cc: Joerg Roedel <joro@8bytes.org> Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net> Cc: Sricharan R <sricharan@codeaurora.org> Signed-off-by: Joerg Roedel <jroedel@suse.de>
2016-12-06vmbus: add support for dynamic device id'sStephen Hemminger
This patch adds sysfs interface to dynamically bind new UUID values to existing VMBus device. This is useful for generic UIO driver to act similar to uio_pci_generic. Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com> Signed-off-by: K. Y. Srinivasan <kys@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-12-06hyperv: Fix spelling of HV_UNKOWNHaiyang Zhang
Changed it to HV_UNKNOWN Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com> Signed-off-by: K. Y. Srinivasan <kys@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-12-06mei: bus: enable non-blocking RXAlexander Usyskin
Enable non-blocking receive for drivers on mei bus, this allows checking for data availability by mei client drivers. This is most effective for fixed address clients, that lacks flow control. This function adds new API function mei_cldev_recv_nonblock(), it retuns -EGAIN if function will block. Signed-off-by: Alexander Usyskin <alexander.usyskin@intel.com> Signed-off-by: Tomas Winkler <tomas.winkler@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-12-06VME: Remove shutdown entry from vme_driverMartyn Welch
The vme_driver structure currently has a "shutdown" entry. This entry is never used, it lacks the correct parameter (it should be providing a pointer to the relevant vme_dev struct to even *look* usable), the VME subsystem currently doesn't provide support for shutdown functions and no in-tree drivers use it (hardly surprising, given it'd never be called). Remove the entry from vme_driver to avoid confusion. Signed-off-by: Martyn Welch <martyn.welch@collabora.co.uk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-12-06locking/ww_mutex: Use relaxed atomicsPeter Zijlstra
The stamp is a sequence number, we don't care about memory ordering. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Boqun Feng <boqun.feng@gmail.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Will Deacon <will.deacon@arm.com> Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-12-06x86/uaccess, sched/preempt: Verify access_ok() contextPeter Zijlstra
I recently encountered wreckage because access_ok() was used where it should not be, add an explicit WARN when access_ok() is used wrongly. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>