summaryrefslogtreecommitdiff
path: root/include
AgeCommit message (Collapse)Author
2019-03-24Merge branch 'irq-urgent-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull irq fixes from Thomas Gleixner: "A set of fixes for the interrupt subsystem: - Remove secondary GIC support on systems w/o device-tree support - A set of small fixlets in various irqchip drivers - static and fall-through annotations - Kernel doc and typo fixes" * 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: genirq: Mark expected switch case fall-through genirq/devres: Remove excess parameter from kernel doc irqchip/irq-mvebu-sei: Make mvebu_sei_ap806_caps static irqchip/mbigen: Don't clear eventid when freeing an MSI irqchip/stm32: Don't set rising configuration registers at init irqchip/stm32: Don't clear rising/falling config registers at init dt-bindings: irqchip: renesas-irqc: Document r8a774c0 support irqchip/mmp: Make mmp_irq_domain_ops static irqchip/brcmstb-l2: Make two init functions static genirq: Fix typo in comment of IRQD_MOVE_PCNTXT irqchip/gic-v3-its: Fix comparison logic in lpi_range_cmp irqchip/gic: Drop support for secondary GIC in non-DT systems irqchip/imx-irqsteer: Fix of_property_read_u32() error handling
2019-03-24Merge tag 'auxdisplay-for-linus-v5.1-rc2' of git://github.com/ojeda/linuxLinus Torvalds
Pull auxdisplay updates from Miguel Ojeda: "A few fixes and improvements for auxdisplay: - Series to fix a memory leak in hd44780 while introducing charlcd_free(). From Andy Shevchenko - Series to clean up the Kconfig menus and a couple of improvements for charlcd. From Mans Rullgard" * tag 'auxdisplay-for-linus-v5.1-rc2' of git://github.com/ojeda/linux: auxdisplay: charlcd: make backlight initial state configurable auxdisplay: charlcd: simplify init message display auxdisplay: deconfuse configuration auxdisplay: hd44780: Convert to use charlcd_free() auxdisplay: panel: Convert to use charlcd_free() auxdisplay: charlcd: Introduce charlcd_free() helper auxdisplay: charlcd: Move to_priv() to charlcd namespace auxdisplay: hd44780: Fix memory leak on ->remove()
2019-03-24habanalabs: add device status option to INFO IOCTLDalit Ben Zoor
This patch adds a new opcode to INFO IOCTL that returns the device status. This will allow users to query the device status in order to avoid sending command submissions while device is in reset. Signed-off-by: Dalit Ben Zoor <dbenzoor@habana.ai> Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
2019-03-24gpiolib: export devprop_gpiochip_set_names()Jan Kundrát
This function is needed in mcp23s08. That driver is a special snowflake because it supports several hardware chips as a single "GPIO chip" under Linux. Signed-off-by: Jan Kundrát <jan.kundrat@cesnet.cz> Cc: Linus Walleij <linus.walleij@linaro.org> Cc: Phil Reid <preid@electromag.com.au> Acked-by: Mika Westerberg <mika.westerberg@linux.intel.com> Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
2019-03-23Merge tag 'mlx5-updates-2019-03-20' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux Saeed Mahameed says: ==================== mlx5-updates-2019-03-20 This series includes updates to mlx5 driver, 1) Compiler warnings cleanup from Saeed Mahameed 2) Parav Pandit simplifies sriov enable/disables 3) Gustavo A. R. Silva, Removes a redundant assignment 4) Moshe Shemesh, Adds Geneve tunnel stateless offload support 5) Eli Britstein, Adds the Support for VLAN modify action and Replaces TC VLAN pop and push actions with VLAN modify Note: This series includes two simple non-mlx5 patches, 1) Declare IANA_VXLAN_UDP_PORT definition in include/net/vxlan.h, and use it in some drivers. 2) Declare GENEVE_UDP_PORT definition in include/net/geneve.h, and use it in mlx5 and nfp drivers. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2019-03-23tcp: add one skb cache for rxEric Dumazet
Often times, recvmsg() system calls and BH handling for a particular TCP socket are done on different cpus. This means the incoming skb had to be allocated on a cpu, but freed on another. This incurs a high spinlock contention in slab layer for small rpc, but also a high number of cache line ping pongs for larger packets. A full size GRO packet might use 45 page fragments, meaning that up to 45 put_page() can be involved. More over performing the __kfree_skb() in the recvmsg() context adds a latency for user applications, and increase probability of trapping them in backlog processing, since the BH handler might found the socket owned by the user. This patch, combined with the prior one increases the rpc performance by about 10 % on servers with large number of cores. (tcp_rr workload with 10,000 flows and 112 threads reach 9 Mpps instead of 8 Mpps) This also increases single bulk flow performance on 40Gbit+ links, since in this case there are often two cpus working in tandem : - CPU handling the NIC rx interrupts, feeding the receive queue, and (after this patch) freeing the skbs that were consumed. - CPU in recvmsg() system call, essentially 100 % busy copying out data to user space. Having at most one skb in a per-socket cache has very little risk of memory exhaustion, and since it is protected by socket lock, its management is essentially free. Note that if rps/rfs is used, we do not enable this feature, because there is high chance that the same cpu is handling both the recvmsg() system call and the TCP rx path, but that another cpu did the skb allocations in the device driver right before the RPS/RFS logic. To properly handle this case, it seems we would need to record on which cpu skb was allocated, and use a different channel to give skbs back to this cpu. Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Soheil Hassas Yeganeh <soheil@google.com> Acked-by: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-03-23tcp: add one skb cache for txEric Dumazet
On hosts with a lot of cores, RPC workloads suffer from heavy contention on slab spinlocks. 20.69% [kernel] [k] queued_spin_lock_slowpath 5.64% [kernel] [k] _raw_spin_lock 3.83% [kernel] [k] syscall_return_via_sysret 3.48% [kernel] [k] __entry_text_start 1.76% [kernel] [k] __netif_receive_skb_core 1.64% [kernel] [k] __fget For each sendmsg(), we allocate one skb, and free it at the time ACK packet comes. In many cases, ACK packets are handled by another cpus, and this unfortunately incurs heavy costs for slab layer. This patch uses an extra pointer in socket structure, so that we try to reuse the same skb and avoid these expensive costs. We cache at most one skb per socket so this should be safe as far as memory pressure is concerned. Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Soheil Hassas Yeganeh <soheil@google.com> Acked-by: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-03-23net: convert rps_needed and rfs_needed to new static branch apiEric Dumazet
We prefer static_branch_unlikely() over static_key_false() these days. Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Soheil Hassas Yeganeh <soheil@google.com> Acked-by: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-03-23net: sched: add empty status flag for NOLOCK qdiscPaolo Abeni
The queue is marked not empty after acquiring the seqlock, and it's up to the NOLOCK qdisc clearing such flag on dequeue. Since the empty status lays on the same cache-line of the seqlock, it's always hot on cache during the updates. This makes the empty flag update a little bit loosy. Given the lack of synchronization between enqueue and dequeue, this is unavoidable. v2 -> v3: - qdisc_is_empty() has a const argument (Eric) v1 -> v2: - use really an 'empty' flag instead of 'not_empty', as suggested by Eric Signed-off-by: Paolo Abeni <pabeni@redhat.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Ivan Vecera <ivecera@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-03-23tcp: add documentation for tcp_ca_stateSoheil Hassas Yeganeh
Add documentation to the tcp_ca_state enum, since this enum is exposed in uapi. Signed-off-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: Yuchung Cheng <ycheng@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Soheil Hassas Yeganeh <soheil@google.com> Cc: Sowmini Varadhan <sowmini05@gmail.com> Acked-by: Sowmini Varadhan <sowmini05@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-03-23tick: Remove outgoing CPU from broadcast masksThomas Gleixner
Valentin reported that unplugging a CPU occasionally results in a warning in the tick broadcast code which is triggered when an offline CPU is in the broadcast mask. This happens because the outgoing CPU is not removing itself from the broadcast masks, especially not from the broadcast_force_mask. The removal happens on the control CPU after the outgoing CPU is dead. It's a long standing issue, but the warning is harmless. Rework the hotplug mechanism so that the outgoing CPU removes itself from the broadcast masks after disabling interrupts and removing itself from the online mask. Reported-by: Valentin Schneider <valentin.schneider@arm.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Valentin Schneider <valentin.schneider@arm.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Link: https://lkml.kernel.org/r/alpine.DEB.2.21.1903211540180.1784@nanos.tec.linutronix.de
2019-03-23Merge tag 'io_uring-20190323' of git://git.kernel.dk/linux-blockLinus Torvalds
Pull io_uring fixes and improvements from Jens Axboe: "The first five in this series are heavily inspired by the work Al did on the aio side to fix the races there. The last two re-introduce a feature that was in io_uring before it got merged, but which I pulled since we didn't have a good way to have BVEC iters that already have a stable reference. These aren't necessarily related to block, it's just how io_uring pins fixed buffers" * tag 'io_uring-20190323' of git://git.kernel.dk/linux-block: block: add BIO_NO_PAGE_REF flag iov_iter: add ITER_BVEC_FLAG_NO_REF flag io_uring: mark me as the maintainer io_uring: retry bulk slab allocs as single allocs io_uring: fix poll races io_uring: fix fget/fput handling io_uring: add prepped flag io_uring: make io_read/write return an integer io_uring: use regular request ref counts
2019-03-23Merge tag 'for-linus-20190323' of git://git.kernel.dk/linux-blockLinus Torvalds
Pull block fixes from Jens Axboe: "A set of fixes/changes that should go into this series. This contains: - Kernel doc / comment updates (Bart, Shenghui) - Un-export of core-only used function (Bart) - Fix race on loop file access (Dongli) - pf/pcd queue cleanup fixes (me) - Use appropriate helper for RESTART bit set (Yufen) - Use named identifier for classic poll (Yufen)" * tag 'for-linus-20190323' of git://git.kernel.dk/linux-block: sbitmap: trivial - update comment for sbitmap_deferred_clear_bit blkcg: Fix kernel-doc warnings blk-iolatency: #include "blk.h" block: Unexport blk_mq_add_to_requeue_list() block: add BLK_MQ_POLL_CLASSIC for hybrid poll and return EINVAL for unexpected value blk-mq: remove unused 'nr_expired' from blk_mq_hw_ctx loop: access lo_backing_file only when the loop device is Lo_bound blk-mq: use blk_mq_sched_mark_restart_hctx to set RESTART paride/pcd: cleanup queues when detection fails paride/pf: cleanup queues when detection fails
2019-03-23Merge tag 'ceph-for-5.1-rc2' of git://github.com/ceph/ceph-clientLinus Torvalds
Pull ceph fixes from Ilya Dryomov: "A follow up for the new alloc_size logic and a blacklisting fix, marked for stable" * tag 'ceph-for-5.1-rc2' of git://github.com/ceph/ceph-client: rbd: drop wait_for_latest_osdmap() libceph: wait for latest osdmap in ceph_monc_blacklist_add() rbd: set io_min, io_opt and discard_granularity to alloc_size
2019-03-23x86/gart: Exclude GART aperture from kcoreKairui Song
On machines where the GART aperture is mapped over physical RAM, /proc/kcore contains the GART aperture range. Accessing the GART range via /proc/kcore results in a kernel crash. vmcore used to have the same issue, until it was fixed with commit 2a3e83c6f96c ("x86/gart: Exclude GART aperture from vmcore")', leveraging existing hook infrastructure in vmcore to let /proc/vmcore return zeroes when attempting to read the aperture region, and so it won't read from the actual memory. Apply the same workaround for kcore. First implement the same hook infrastructure for kcore, then reuse the hook functions introduced in the previous vmcore fix. Just with some minor adjustment, rename some functions for more general usage, and simplify the hook infrastructure a bit as there is no module usage yet. Suggested-by: Baoquan He <bhe@redhat.com> Signed-off-by: Kairui Song <kasong@redhat.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Jiri Bohac <jbohac@suse.cz> Acked-by: Baoquan He <bhe@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Alexey Dobriyan <adobriyan@gmail.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Omar Sandoval <osandov@fb.com> Cc: Dave Young <dyoung@redhat.com> Link: https://lkml.kernel.org/r/20190308030508.13548-1-kasong@redhat.com
2019-03-22bpf: add bpf_skb_adjust_room encap flagsWillem de Bruijn
When pushing tunnel headers, annotate skbs in the same way as tunnel devices. For GSO packets, the network stack requires certain fields set to segment packets with tunnel headers. gro_gse_segment depends on transport and inner mac header, for instance. Add an option to pass this information. Remove the restriction on len_diff to network header length, which is too short, e.g., for GRE protocols. Changes v1->v2: - document new flags - BPF_F_ADJ_ROOM_MASK moved v2->v3: - BPF_F_ADJ_ROOM_ENCAP_L3_MASK moved Signed-off-by: Willem de Bruijn <willemb@google.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2019-03-22bpf: add bpf_skb_adjust_room flag BPF_F_ADJ_ROOM_FIXED_GSOWillem de Bruijn
bpf_skb_adjust_room adjusts gso_size of gso packets to account for the pushed or popped header room. This is not allowed with UDP, where gso_size delineates datagrams. Add an option to avoid these updates and allow this call for datagrams. It can also be used with TCP, when MSS is known to allow headroom, e.g., through MSS clamping or route MTU. Changes v1->v2: - document flag BPF_F_ADJ_ROOM_FIXED_GSO - do not expose BPF_F_ADJ_ROOM_MASK through uapi, as it may change. Link: https://patchwork.ozlabs.org/patch/1052497/ Signed-off-by: Willem de Bruijn <willemb@google.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2019-03-22bpf: add bpf_skb_adjust_room mode BPF_ADJ_ROOM_MACWillem de Bruijn
bpf_skb_adjust_room net allows inserting room in an skb. Existing mode BPF_ADJ_ROOM_NET inserts room after the network header by pulling the skb, moving the network header forward and zeroing the new space. Add new mode BPF_ADJUST_ROOM_MAC that inserts room after the mac header. This allows inserting tunnel headers in front of the network header without having to recreate the network header in the original space, avoiding two copies. Signed-off-by: Willem de Bruijn <willemb@google.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2019-03-22drm/i915/ehl: Add EHL platform info and PCI IDsJames Ausmus
Add known EHL PCI IDs. v2 (Rodrigo): Removed x86 early quirk. To be sent in a separated patch cc'ing the appropriated list and maintainers for proper ack. v3: (Rodrigo): - Removed .num_pipes = 3 that is coming since GEN&_FEATURES. - Added ppgtt type and size after rework from Bob and Chris v4: (Rodrigo): - remove ppgtt type added on v3. Jose pointed it is not needed. Cc: Bob Paauwe <bob.j.paauwe@intel.com> Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: José Roberto de Souza <jose.souza@intel.com> Signed-off-by: James Ausmus <james.ausmus@intel.com> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Reviewed-by: Bob Paauwe <bob.j.paauwe@intel.com> Reviewed-by: José Roberto de Souza <jose.souza@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190322175847.25707-1-rodrigo.vivi@intel.com
2019-03-22net/mlx5e: Add VLAN ID rewrite fieldsEli Britstein
Add VLAN ID rewrite fields as a pre-step to support this rewrite. Signed-off-by: Eli Britstein <elibr@mellanox.com> Reviewed-by: Roi Dayan <roid@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-03-22net: Add IANA_VXLAN_UDP_PORT definition to vxlan header fileMoshe Shemesh
Added IANA_VXLAN_UDP_PORT (4789) definition to vxlan header file so it can be used by drivers instead of local definition. Updated drivers which locally defined it as 4789 to use it. Signed-off-by: Moshe Shemesh <moshe@mellanox.com> Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com> Cc: John Hurley <john.hurley@netronome.com> Cc: Jakub Kicinski <jakub.kicinski@netronome.com> Cc: Yunsheng Lin <linyunsheng@huawei.com> Cc: Peng Li <lipeng321@huawei.com> Reviewed-by: Tariq Toukan <tariqt@mellanox.com> Acked-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-03-22net: Move the definition of the default Geneve udp port to public header fileMoshe Shemesh
Move the definition of the default Geneve udp port from the geneve source to the header file, so we can re-use it from drivers. Modify existing drivers to use it. Signed-off-by: Moshe Shemesh <moshe@mellanox.com> Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com> Cc: John Hurley <john.hurley@netronome.com> Cc: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Tariq Toukan <tariqt@mellanox.com> Acked-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-03-22sbitmap: trivial - update comment for sbitmap_deferred_clear_bitShenghui Wang
"sbitmap_batch_clear" should be "sbitmap_deferred_clear" Acked-by: Omar Sandoval <osandov@fb.com> Signed-off-by: Shenghui Wang <shhuiw@foxmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-03-22PCI: Cleanup register definition width and whitespaceBjorn Helgaas
Follow the file conventions of: - register offsets not indented - fields within a register indented one space - field masks use same width as register - register field values indented an additional space No functional change intended. Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2019-03-22gpio: amd-fch: Fix bogus SPDX identifierThomas Gleixner
spdxcheck.py complains: include/linux/platform_data/gpio/gpio-amd-fch.h: 1:28 Invalid License ID: GPL+ which is correct because GPL+ is not a valid identifier. Of course this could have been caught by checkpatch.pl _before_ submitting or merging the patch. WARNING: 'SPDX-License-Identifier: GPL+ */' is not supported in LICENSES/... #271: FILE: include/linux/platform_data/gpio/gpio-amd-fch.h:1: +/* SPDX-License-Identifier: GPL+ */ Fix it under the assumption that the author meant GPL-2.0+, which makes sense as the corresponding C file is using that identifier. Fixes: e09d168f13f0 ("gpio: AMD G-Series PCH gpio driver") Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Bartosz Golaszewski <bgolaszewski@baylibre.com>
2019-03-22genetlink: make policy common to familyJohannes Berg
Since maxattr is common, the policy can't really differ sanely, so make it common as well. The only user that did in fact manage to make a non-common policy is taskstats, which has to be really careful about it (since it's still using a common maxattr!). This is no longer supported, but we can fake it using pre_doit. This reduces the size of e.g. nl80211.o (which has lots of commands): text data bss dec hex filename 398745 14323 2240 415308 6564c net/wireless/nl80211.o (before) 397913 14331 2240 414484 65314 net/wireless/nl80211.o (after) -------------------------------- -832 +8 0 -824 Which is obviously just 8 bytes for each command, and an added 8 bytes for the new policy pointer. I'm not sure why the ops list is counted as .text though. Most of the code transformations were done using the following spatch: @ops@ identifier OPS; expression POLICY; @@ struct genl_ops OPS[] = { ..., { - .policy = POLICY, }, ... }; @@ identifier ops.OPS; expression ops.POLICY; identifier fam; expression M; @@ struct genl_family fam = { .ops = OPS, .maxattr = M, + .policy = POLICY, ... }; This also gets rid of devlink_nl_cmd_region_read_dumpit() accessing the cb->data as ops, which we want to change in a later genl patch. Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-03-22softirq: Remove tasklet_hrtimerThomas Gleixner
There are no more users of this interface. Remove it. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Anna-Maria Gleixner <anna-maria@linutronix.de> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: David S. Miller <davem@davemloft.net> Cc: netdev@vger.kernel.org Link: https://lkml.kernel.org/r/20190301224821.29843-4-bigeasy@linutronix.de
2019-03-22xfrm: Replace hrtimer tasklet with softirq hrtimerThomas Gleixner
Switch the timer to HRTIMER_MODE_SOFT, which executed the timer callback in softirq context and remove the hrtimer_tasklet. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Anna-Maria Gleixner <anna-maria@linutronix.de> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: David S. Miller <davem@davemloft.net> Cc: netdev@vger.kernel.org Cc: Steffen Klassert <steffen.klassert@secunet.com> Cc: Herbert Xu <herbert@gondor.apana.org.au> Link: https://lkml.kernel.org/r/20190301224821.29843-3-bigeasy@linutronix.de
2019-03-22drm/i915: Allow contexts to share a single timeline across all enginesChris Wilson
Previously, our view has been always to run the engines independently within a context. (Multiple engines happened before we had contexts and timelines, so they always operated independently and that behaviour persisted into contexts.) However, at the user level the context often represents a single timeline (e.g. GL contexts) and userspace must ensure that the individual engines are serialised to present that ordering to the client (or forgot about this detail entirely and hope no one notices - a fair ploy if the client can only directly control one engine themselves ;) In the next patch, we will want to construct a set of engines that operate as one, that have a single timeline interwoven between them, to present a single virtual engine to the user. (They submit to the virtual engine, then we decide which engine to execute on based.) To that end, we want to be able to create contexts which have a single timeline (fence context) shared between all engines, rather than multiple timelines. v2: Move the specialised timeline ordering to its own function. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190322092325.5883-4-chris@chris-wilson.co.uk
2019-03-22drm/i915: Extend CONTEXT_CREATE to set parameters upon constructionChris Wilson
It can be useful to have a single ioctl to create a context with all the initial parameters instead of a series of create + setparam + setparam ioctls. This extension to create context allows any of the parameters to be passed in as a linked list to be applied to the newly constructed context. v2: Make a local copy of user setparam (Tvrtko) v3: Use flags to detect availability of extension interface Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190322092325.5883-3-chris@chris-wilson.co.uk
2019-03-22drm/i915: Create/destroy VM (ppGTT) for use with contextsChris Wilson
In preparation to making the ppGTT binding for a context explicit (to facilitate reusing the same ppGTT between different contexts), allow the user to create and destroy named ppGTT. v2: Replace global barrier for swapping over the ppgtt and tlbs with a local context barrier (Tvrtko) v3: serialise with struct_mutex; it's lazy but required dammit v4: Rewrite igt_ctx_shared_exec to be more different (aimed to be more similarly, turned out different!) v5: Fix up test unwind for aliasing-ppgtt (snb) v6: Tighten language for uapi struct drm_i915_gem_vm_control. v7: Patch the context image for runtime ppgtt switching! Testcase: igt/gem_vm_create Testcase: igt/gem_ctx_param/vm Testcase: igt/gem_ctx_clone/vm Testcase: igt/gem_ctx_shared Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190322092325.5883-2-chris@chris-wilson.co.uk
2019-03-22drm/i915: Introduce the i915_user_extension_methodChris Wilson
An idea for extending uABI inspired by Vulkan's extension chains. Instead of expanding the data struct for each ioctl every time we need to add a new feature, define an extension chain instead. As we add optional interfaces to control the ioctl, we define a new extension struct that can be linked into the ioctl data only when required by the user. The key advantage being able to ignore large control structs for optional interfaces/extensions, while being able to process them in a consistent manner. In comparison to other extensible ioctls, the key difference is the use of a linked chain of extension structs vs an array of tagged pointers. For example, struct drm_amdgpu_cs_chunk { __u32 chunk_id; __u32 length_dw; __u64 chunk_data; }; struct drm_amdgpu_cs_in { __u32 ctx_id; __u32 bo_list_handle; __u32 num_chunks; __u32 _pad; __u64 chunks; }; allows userspace to pass in array of pointers to extension structs, but must therefore keep constructing that array along side the command stream. In dynamic situations like that, a linked list is preferred and does not similar from extra cache line misses as the extension structs themselves must still be loaded separate to the chunks array. v2: Apply the tail call optimisation directly to nip the worry of stack overflow in the bud. v3: Defend against recursion. v4: Fixup local types to match new uabi Opens: - do we include the result as an out-field in each chain? struct i915_user_extension { __u64 next_extension; __u64 name; __s32 result; __u32 mbz; /* reserved for future use */ }; * Undecided, so provision some room for future expansion. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190322092325.5883-1-chris@chris-wilson.co.uk
2019-03-22crypto: simd,testmgr - introduce crypto_simd_usable()Eric Biggers
So that the no-SIMD fallback code can be tested by the crypto self-tests, add a macro crypto_simd_usable() which wraps may_use_simd(), but also returns false if the crypto self-tests have set a per-CPU bool to disable SIMD in crypto code on the current CPU. Signed-off-by: Eric Biggers <ebiggers@google.com> Reviewed-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2019-03-22crypto: x86/morus1280 - convert to use AEAD SIMD helpersEric Biggers
Convert the x86 implementations of MORUS-1280 to use the AEAD SIMD helpers, rather than hand-rolling the same functionality. This simplifies the code and also fixes the bug where the user-provided aead_request is modified. Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2019-03-22crypto: x86/morus640 - convert to use AEAD SIMD helpersEric Biggers
Convert the x86 implementation of MORUS-640 to use the AEAD SIMD helpers, rather than hand-rolling the same functionality. This simplifies the code and also fixes the bug where the user-provided aead_request is modified. Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2019-03-22crypto: simd - support wrapping AEAD algorithmsEric Biggers
Update the crypto_simd module to support wrapping AEAD algorithms. Previously it only supported skciphers. The code for each is similar. I'll be converting the x86 implementations of AES-GCM, AEGIS, and MORUS to use this. Currently they each independently implement the same functionality. This will not only simplify the code, but it will also fix the bug detected by the improved self-tests: the user-provided aead_request is modified. This is because these algorithms currently reuse the original request, whereas the crypto_simd helpers build a new request in the original request's context. Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2019-03-22clk: samsung: dt-bindings: Add ADC clock ID to Exynos5410Krzysztof Kozlowski
Add ID for TSADC clock to Exynos5410. Choose the same value of ID as in Exynos5420 to make it simpler/compatible in future (although clock driver code is not shared). Signed-off-by: Krzysztof Kozlowski <krzk@kernel.org> Acked-by: Chanwoo Choi <cw00.choi@samsung.com> Signed-off-by: Sylwester Nawrocki <s.nawrocki@samsung.com>
2019-03-22clk: samsung: dt-bindings: Put CLK_UART3 in orderKrzysztof Kozlowski
Order the CLK_UART3 by ID. No change in functionality. Signed-off-by: Krzysztof Kozlowski <krzk@kernel.org> Acked-by: Chanwoo Choi <cw00.choi@samsung.com> Signed-off-by: Sylwester Nawrocki <s.nawrocki@samsung.com>
2019-03-21bpf: add helper to check for a valid SYN cookieLorenz Bauer
Using bpf_skc_lookup_tcp it's possible to ascertain whether a packet belongs to a known connection. However, there is one corner case: no sockets are created if SYN cookies are active. This means that the final ACK in the 3WHS is misclassified. Using the helper, we can look up the listening socket via bpf_skc_lookup_tcp and then check whether a packet is a valid SYN cookie ACK. Signed-off-by: Lorenz Bauer <lmb@cloudflare.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2019-03-21bpf: add skc_lookup_tcp helperLorenz Bauer
Allow looking up a sock_common. This gives eBPF programs access to timewait and request sockets. Signed-off-by: Lorenz Bauer <lmb@cloudflare.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2019-03-21bpf: allow helpers to return PTR_TO_SOCK_COMMONLorenz Bauer
It's currently not possible to access timewait or request sockets from eBPF, since there is no way to return a PTR_TO_SOCK_COMMON from a helper. Introduce RET_PTR_TO_SOCK_COMMON to enable this behaviour. Signed-off-by: Lorenz Bauer <lmb@cloudflare.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2019-03-21rhashtable: rename rht_for_each*continue as *from.NeilBrown
The pattern set by list.h is that for_each..continue() iterators start at the next entry after the given one, while for_each..from() iterators start at the given entry. The rht_for_each*continue() iterators are documented as though the start at the 'next' entry, but actually start at the given entry, and they are used expecting that behaviour. So fix the documentation and change the names to *from for consistency with list.h Acked-by: Herbert Xu <herbert@gondor.apana.org.au> Acked-by: Miguel Ojeda <miguel.ojeda.sandonis@gmail.com> Signed-off-by: NeilBrown <neilb@suse.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-03-21rhashtable: don't hold lock on first table throughout insertion.NeilBrown
rhashtable_try_insert() currently holds a lock on the bucket in the first table, while also locking buckets in subsequent tables. This is unnecessary and looks like a hold-over from some earlier version of the implementation. As insert and remove always lock a bucket in each table in turn, and as insert only inserts in the final table, there cannot be any races that are not covered by simply locking a bucket in each table in turn. When an insert call reaches that last table it can be sure that there is no matchinf entry in any other table as it has searched them all, and insertion never happens anywhere but in the last table. The fact that code tests for the existence of future_tbl while holding a lock on the relevant bucket ensures that two threads inserting the same key will make compatible decisions about which is the "last" table. This simplifies the code and allows the ->rehash field to be discarded. We still need a way to ensure that a dead bucket_table is never re-linked by rhashtable_walk_stop(). This can be achieved by calling call_rcu() inside the locked region, and checking with rcu_head_after_call_rcu() in rhashtable_walk_stop() to see if the bucket table is empty and dead. Acked-by: Herbert Xu <herbert@gondor.apana.org.au> Reviewed-by: Paul E. McKenney <paulmck@linux.ibm.com> Signed-off-by: NeilBrown <neilb@suse.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-03-21net: dst: remove gc leftoversJulian Wiedmann
Get rid of some obsolete gc-related documentation and macros that were missed in commit 5b7c9a8ff828 ("net: remove dst gc related code"). CC: Wei Wang <weiwan@google.com> Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com> Acked-by: Wei Wang <weiwan@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-03-21ipv4: Allow amount of dirty memory from fib resizing to be controllableDavid Ahern
fib_trie implementation calls synchronize_rcu when a certain amount of pages are dirty from freed entries. The number of pages was determined experimentally in 2009 (commit c3059477fce2d). At the current setting, synchronize_rcu is called often -- 51 times in a second in one test with an average of an 8 msec delay adding a fib entry. The total impact is a lot of slow down modifying the fib. This is seen in the output of 'time' - the difference between real time and sys+user. For example, using 720,022 single path routes and 'ip -batch'[1]: $ time ./ip -batch ipv4/routes-1-hops real 0m14.214s user 0m2.513s sys 0m6.783s So roughly 35% of the actual time to install the routes is from the ip command getting scheduled out, most notably due to synchronize_rcu (this is observed using 'perf sched timehist'). This patch makes the amount of dirty memory configurable between 64k where the synchronize_rcu is called often (small, low end systems that are memory sensitive) to 64M where synchronize_rcu is called rarely during a large FIB change (for high end systems with lots of memory). The default is 512kB which corresponds to the current setting of 128 pages with a 4kB page size. As an example, at 16MB the worst interval shows 4 calls to synchronize_rcu in a second blocking for up to 30 msec in a single instance, and a total of almost 100 msec across the 4 calls in the second. The trade off is allowing FIB entries to consume more memory in a given time window but but with much better fib insertion rates (~30% increase in prefixes/sec). With this patch and net.ipv4.fib_sync_mem set to 16MB, the same batch file runs in: $ time ./ip -batch ipv4/routes-1-hops real 0m9.692s user 0m2.491s sys 0m6.769s So the dead time is reduced to about 1/2 second or <5% of the real time. [1] 'ip' modified to not request ACK messages which improves route insertion times by about 20% Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-03-21net/sched: let actions use RCU to access 'goto_chain'Davide Caratti
use RCU when accessing the action chain, to avoid use after free in the traffic path when 'goto chain' is replaced on existing TC actions (see script below). Since the control action is read in the traffic path without holding the action spinlock, we need to explicitly ensure that a->goto_chain is not NULL before dereferencing (i.e it's not sufficient to rely on the value of TC_ACT_GOTO_CHAIN bits). Not doing so caused NULL dereferences in tcf_action_goto_chain_exec() when the following script: # tc chain add dev dd0 chain 42 ingress protocol ip flower \ > ip_proto udp action pass index 4 # tc filter add dev dd0 ingress protocol ip flower \ > ip_proto udp action csum udp goto chain 42 index 66 # tc chain del dev dd0 chain 42 ingress (start UDP traffic towards dd0) # tc action replace action csum udp pass index 66 was run repeatedly for several hours. Suggested-by: Cong Wang <xiyou.wangcong@gmail.com> Suggested-by: Vlad Buslov <vladbu@mellanox.com> Signed-off-by: Davide Caratti <dcaratti@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-03-21net/sched: don't dereference a->goto_chain to read the chain indexDavide Caratti
callers of tcf_gact_goto_chain_index() can potentially read an old value of the chain index, or even dereference a NULL 'goto_chain' pointer, because 'goto_chain' and 'tcfa_action' are read in the traffic path without caring of concurrent write in the control path. The most recent value of chain index can be read also from a->tcfa_action (it's encoded there together with TC_ACT_GOTO_CHAIN bits), so we don't really need to dereference 'goto_chain': just read the chain id from the control action. Fixes: e457d86ada27 ("net: sched: add couple of goto_chain helpers") Signed-off-by: Davide Caratti <dcaratti@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-03-21net/sched: prepare TC actions to properly validate the control actionDavide Caratti
- pass a pointer to struct tcf_proto in each actions's init() handler, to allow validating the control action, checking whether the chain exists and (eventually) refcounting it. - remove code that validates the control action after a successful call to the action's init() handler, and replace it with a test that forbids addition of actions having 'goto_chain' and NULL goto_chain pointer at the same time. - add tcf_action_check_ctrlact(), that will validate the control action and eventually allocate the action 'goto_chain' within the init() handler. - add tcf_action_set_ctrlact(), that will assign the control action and swap the current 'goto_chain' pointer with the new given one. This disallows 'goto_chain' on actions that don't initialize it properly in their init() handler, i.e. calling tcf_action_check_ctrlact() after successful IDR reservation and then calling tcf_action_set_ctrlact() to assign 'goto_chain' and 'tcf_action' consistently. By doing this, the kernel does not leak anymore refcounts when a valid 'goto chain' handle is replaced in TC actions, causing kmemleak splats like the following one: # tc chain add dev dd0 chain 42 ingress protocol ip flower \ > ip_proto tcp action drop # tc chain add dev dd0 chain 43 ingress protocol ip flower \ > ip_proto udp action drop # tc filter add dev dd0 ingress matchall \ > action gact goto chain 42 index 66 # tc filter replace dev dd0 ingress matchall \ > action gact goto chain 43 index 66 # echo scan >/sys/kernel/debug/kmemleak <...> unreferenced object 0xffff93c0ee09f000 (size 1024): comm "tc", pid 2565, jiffies 4295339808 (age 65.426s) hex dump (first 32 bytes): 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ 00 00 00 00 08 00 06 00 00 00 00 00 00 00 00 00 ................ backtrace: [<000000009b63f92d>] tc_ctl_chain+0x3d2/0x4c0 [<00000000683a8d72>] rtnetlink_rcv_msg+0x263/0x2d0 [<00000000ddd88f8e>] netlink_rcv_skb+0x4a/0x110 [<000000006126a348>] netlink_unicast+0x1a0/0x250 [<00000000b3340877>] netlink_sendmsg+0x2c1/0x3c0 [<00000000a25a2171>] sock_sendmsg+0x36/0x40 [<00000000f19ee1ec>] ___sys_sendmsg+0x280/0x2f0 [<00000000d0422042>] __sys_sendmsg+0x5e/0xa0 [<000000007a6c61f9>] do_syscall_64+0x5b/0x180 [<00000000ccd07542>] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [<0000000013eaa334>] 0xffffffffffffffff Fixes: db50514f9a9c ("net: sched: add termination action to allow goto chain") Fixes: 97763dc0f401 ("net_sched: reject unknown tcfa_action values") Signed-off-by: Davide Caratti <dcaratti@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-03-21tun: Add ioctl() TUNGETDEVNETNS cmd to allow obtaining real net ns of tun deviceKirill Tkhai
In commit f2780d6d7475 "tun: Add ioctl() SIOCGSKNS cmd to allow obtaining net ns of tun device" it was missed that tun may change its net ns, while net ns of socket remains the same as it was created initially. SIOCGSKNS returns net ns of socket, so it is not suitable for obtaining net ns of device. We may have two tun devices with the same names in two net ns, and in this case it's not possible to determ, which of them fd refers to (TUNGETIFF will return the same name). This patch adds new ioctl() cmd for obtaining net ns of a device. Reported-by: Harald Albrecht <harald.albrecht@gmx.net> Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-03-21PCI/MSI: Remove unused mask_msi_irq() and unmask_msi_irq()Bjorn Helgaas
Change pcie-xilinx-nwl.c to use pci_msi_mask_irq() and pci_msi_unmask_irq() like all other PCI host controller drivers. Remove the now-unused mask_msi_irq() and unmask_msi_irq(). Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> CC: Michal Simek <michal.simek@xilinx.com> CC: linux-arm-kernel@lists.infradead.org