summaryrefslogtreecommitdiff
path: root/include/linux
AgeCommit message (Collapse)Author
2018-10-20blkcg: reassociate bios when make_request() is called recursivelyDennis Zhou
When submitting a bio, multiple recursive calls to make_request() may occur. This causes the initial associate done in blkcg_bio_issue_check() to be incorrect and reference the prior request_queue. This introduces a helper to do reassociation when make_request() is recursively called. Fixes: a7b39b4e961c ("blkcg: always associate a bio with a blkg") Reported-by: Valdis Kletnieks <valdis.kletnieks@vt.edu> Signed-off-by: Dennis Zhou <dennis@kernel.org> Tested-by: Valdis Kletnieks <valdis.kletnieks@vt.edu> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-10-20blkcg: fix edge case for blk_get_rl() under memory pressureDennis Zhou
It is possible for blkg creation to fail when in blk_get_rl(). In this situation, the fallback logic returns the nearest created blkg. There is however special handling for the request_list for the root blkcg. This fixes the missing edge case from the earlier series changing blk_get_rl(). Fixes: e2b0989954ae ("blkcg: cleanup and make blk_get_rl use blkg_lookup_create") Signed-off-by: Dennis Zhou <dennis@kernel.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-10-20bpf: sk_msg program helper bpf_msg_push_dataJohn Fastabend
This allows user to push data into a msg using sk_msg program types. The format is as follows, bpf_msg_push_data(msg, offset, len, flags) this will insert 'len' bytes at offset 'offset'. For example to prepend 10 bytes at the front of the message the user can, bpf_msg_push_data(msg, 0, 10, 0); This will invalidate data bounds so BPF user will have to then recheck data bounds after calling this. After this the msg size will have been updated and the user is free to write into the added bytes. We allow any offset/len as long as it is within the (data, data_end) range. However, a copy will be required if the ring is full and its possible for the helper to fail with ENOMEM or EINVAL errors which need to be handled by the BPF program. This can be used similar to XDP metadata to pass data between sk_msg layer and lower layers. Signed-off-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-10-20Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-nextDavid S. Miller
Pablo Neira Ayuso says: ==================== Netfilter updates for net-next The following patchset contains Netfilter updates for your net-next tree: 1) Use lockdep_is_held() in ipset_dereference_protected(), from Lance Roy. 2) Remove unused variable in cttimeout, from YueHaibing. 3) Add ttl option for nft_osf, from Fernando Fernandez Mancera. 4) Use xfrm family to deal with IPv6-in-IPv4 packets from nft_xfrm, from Florian Westphal. 5) Simplify xt_osf_match_packet(). 6) Missing ct helper alias definition in snmp_trap helper, from Taehee Yoo. 7) Remove unnecessary parameter in nf_flow_table_cleanup(), from Taehee Yoo. 8) Remove unused variable definitions in nft_{dup,fwd}, from Weongyo Jeong. 9) Remove empty net/netfilter/nfnetlink_log.h file, from Taehee Yoo. 10) Revert xt_quota updates remain option due to problems in the listing path for 32-bit arches, from Maze. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-20Merge branch 'pci/peer-to-peer'Bjorn Helgaas
- Add PCI support for peer-to-peer DMA (Logan Gunthorpe) - Add sysfs group for PCI peer-to-peer memory statistics (Logan Gunthorpe) - Add PCI peer-to-peer DMA scatterlist mapping interface (Logan Gunthorpe) - Add PCI configfs/sysfs helpers for use by peer-to-peer users (Logan Gunthorpe) - Add PCI peer-to-peer DMA driver writer's documentation (Logan Gunthorpe) - Add block layer flag to indicate driver support for PCI peer-to-peer DMA (Logan Gunthorpe) - Map Infiniband scatterlists for peer-to-peer DMA if they contain P2P memory (Logan Gunthorpe) - Register nvme-pci CMB buffer as PCI peer-to-peer memory (Logan Gunthorpe) - Add nvme-pci support for PCI peer-to-peer memory in requests (Logan Gunthorpe) - Use PCI peer-to-peer memory in nvme (Stephen Bates, Steve Wise, Christoph Hellwig, Logan Gunthorpe) * pci/peer-to-peer: nvmet: Optionally use PCI P2P memory nvmet: Introduce helper functions to allocate and free request SGLs nvme-pci: Add support for P2P memory in requests nvme-pci: Use PCI p2pmem subsystem to manage the CMB IB/core: Ensure we map P2P memory correctly in rdma_rw_ctx_[init|destroy]() block: Add PCI P2P flag for request queue PCI/P2PDMA: Add P2P DMA driver writer's documentation docs-rst: Add a new directory for PCI documentation PCI/P2PDMA: Introduce configfs/sysfs enable attribute helpers PCI/P2PDMA: Add PCI p2pmem DMA mappings to adjust the bus offset PCI/P2PDMA: Add sysfs group to display p2pmem stats PCI/P2PDMA: Support peer-to-peer memory
2018-10-20Merge branch 'pci/misc'Bjorn Helgaas
- Remove unused Netronome NFP32xx Device IDs (Jakub Kicinski) - Use bitmap_zalloc() for dma_alias_mask (Andy Shevchenko) - Add switch fall-through annotations (Gustavo A. R. Silva) - Remove unused Switchtec quirk variable (Joshua Abraham) - Fix pci.c kernel-doc warning (Randy Dunlap) - Remove trivial PCI wrappers for DMA APIs (Christoph Hellwig) - Add Intel GPU device IDs to spurious interrupt quirk (Bin Meng) - Run Switchtec DMA aliasing quirk only on NTB endpoints to avoid useless dmesg errors (Logan Gunthorpe) - Update Switchtec NTB documentation (Wesley Yung) - Remove redundant "default n" from Kconfig (Bartlomiej Zolnierkiewicz) * pci/misc: PCI: pcie: Remove redundant 'default n' from Kconfig NTB: switchtec_ntb: Update switchtec documentation with prerequisites for NTB PCI: Fix Switchtec DMA aliasing quirk dmesg noise PCI: Add macro for Switchtec quirk declarations PCI: Add Device IDs for Intel GPU "spurious interrupt" quirk PCI: Remove pci_set_dma_max_seg_size() PCI: Remove pci_set_dma_seg_boundary() PCI: Remove pci_unmap_addr() wrappers for DMA API PCI / ACPI: Mark expected switch fall-through PCI: Remove set but unused variable PCI: Fix pci.c kernel-doc parameter warning PCI: Allocate dma_alias_mask with bitmap_zalloc() PCI: Remove unused NFP32xx IDs
2018-10-20Merge branch 'pci/hotplug'Bjorn Helgaas
- Differentiate between pciehp surprise and safe removal (Lukas Wunner) - Remove unnecessary pciehp includes (Lukas Wunner) - Drop pciehp hotplug_slot_ops wrappers (Lukas Wunner) - Tolerate PCIe Slot Presence Detect being hardwired to zero to workaround broken hardware, e.g., the Wilocity switch/wireless device (Lukas Wunner) - Unify pciehp controller & slot structs (Lukas Wunner) - Constify hotplug_slot_ops (Lukas Wunner) - Drop hotplug_slot_info (Lukas Wunner) - Embed hotplug_slot struct into users instead of allocating it separately (Lukas Wunner) - Initialize PCIe port service drivers directly instead of relying on initcall ordering (Keith Busch) - Restore PCI config state after a slot reset (Keith Busch) - Save/restore DPC config state along with other PCI config state (Keith Busch) - Reference count devices during AER handling to avoid race issue with concurrent hot removal (Keith Busch) - If an Upstream Port reports ERR_FATAL, don't try to read the Port's config space because it is probably unreachable (Keith Busch) - During error handling, use slot-specific reset instead of secondary bus reset to avoid link up/down issues on hotplug ports (Keith Busch) - Restore previous AER/DPC handling that does not remove and re-enumerate devices on ERR_FATAL (Keith Busch) - Notify all drivers that may be affected by error recovery resets (Keith Busch) - Always generate error recovery uevents, even if a driver doesn't have error callbacks (Keith Busch) - Make PCIe link active reporting detection generic (Keith Busch) - Support D3cold in PCIe hierarchies during system sleep and runtime, including hotplug and Thunderbolt ports (Mika Westerberg) - Handle hpmemsize/hpiosize kernel parameters uniformly, whether slots are empty or occupied (Jon Derrick) - Remove duplicated include from pci/pcie/err.c and unused variable from cpqphp (YueHaibing) - Remove driver pci_cleanup_aer_uncorrect_error_status() calls (Oza Pawandeep) - Uninline PCI bus accessors for better ftracing (Keith Busch) - Remove unused AER Root Port .error_resume method (Keith Busch) - Use kfifo in AER instead of a local version (Keith Busch) - Use threaded IRQ in AER bottom half (Keith Busch) - Use managed resources in AER core (Keith Busch) - Reuse pcie_port_find_device() for AER injection (Keith Busch) - Abstract AER interrupt handling to disconnect error injection (Keith Busch) - Refactor AER injection callbacks to simplify future improvments (Keith Busch) * pci/hotplug: PCI/AER: Refactor error injection fallbacks PCI/AER: Abstract AER interrupt handling PCI/AER: Reuse existing pcie_port_find_device() interface PCI/AER: Use managed resource allocations PCI/AER: Use threaded IRQ for bottom half PCI/AER: Use kfifo_in_spinlocked() to insert locked elements PCI/AER: Use kfifo for tracking events instead of reimplementing it PCI/AER: Remove error source from AER struct aer_rpc PCI/AER: Remove unused aer_error_resume() PCI: Uninline PCI bus accessors for better ftracing PCI/AER: Remove pci_cleanup_aer_uncorrect_error_status() calls PCI: pnv_php: Use kmemdup() PCI: cpqphp: Remove set but not used variable 'physical_slot' PCI/ERR: Remove duplicated include from err.c PCI: Equalize hotplug memory and io for occupied and empty slots PCI / ACPI: Whitelist D3 for more PCIe hotplug ports ACPI / property: Allow multiple property compatible _DSD entries PCI/PME: Implement runtime PM callbacks PCI: pciehp: Implement runtime PM callbacks PCI/portdrv: Add runtime PM hooks for port service drivers PCI/portdrv: Resume upon exit from system suspend if left runtime suspended PCI: pciehp: Do not handle events if interrupts are masked PCI: pciehp: Disable hotplug interrupt during suspend PCI / ACPI: Enable wake automatically for power managed bridges PCI: Do not skip power-managed bridges in pci_enable_wake() PCI: Make link active reporting detection generic PCI: Unify device inaccessible PCI/ERR: Always report current recovery status for udev PCI/ERR: Simplify broadcast callouts PCI/ERR: Run error recovery callbacks for all affected devices PCI/ERR: Handle fatal error recovery PCI/ERR: Use slot reset if available PCI/AER: Don't read upstream ports below fatal errors PCI/AER: Take reference on error devices PCI/DPC: Save and restore config state PCI: portdrv: Restore PCI config state on slot reset PCI: portdrv: Initialize service drivers directly PCI: hotplug: Document TODOs PCI: hotplug: Embed hotplug_slot PCI: hotplug: Drop hotplug_slot_info PCI: hotplug: Constify hotplug_slot_ops PCI: pciehp: Reshuffle controller struct for clarity PCI: pciehp: Rename controller struct members for clarity PCI: pciehp: Unify controller and slot structs PCI: pciehp: Tolerate Presence Detect hardwired to zero PCI: pciehp: Drop hotplug_slot_ops wrappers PCI: pciehp: Drop unnecessary includes PCI: pciehp: Differentiate between surprise and safe removal PCI: Simplify disconnected marking
2018-10-19net: phy: micrel: add Microchip KSZ9131 initial driverYuiko Oshino
Add support for Microchip Technology KSZ9131 10/100/1000 Ethernet PHY Signed-off-by: Yuiko Oshino <yuiko.oshino@microchip.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-19netpoll: allow cleanup to be synchronousDebabrata Banerjee
This fixes a problem introduced by: commit 2cde6acd49da ("netpoll: Fix __netpoll_rcu_free so that it can hold the rtnl lock") When using netconsole on a bond, __netpoll_cleanup can asynchronously recurse multiple times, each __netpoll_free_async call can result in more __netpoll_free_async's. This means there is now a race between cleanup_work queues on multiple netpoll_info's on multiple devices and the configuration of a new netpoll. For example if a netconsole is set to enable 0, reconfigured, and enable 1 immediately, this netconsole will likely not work. Given the reason for __netpoll_free_async is it can be called when rtnl is not locked, if it is locked, we should be able to execute synchronously. It appears to be locked everywhere it's called from. Generalize the design pattern from the teaming driver for current callers of __netpoll_free_async. CC: Neil Horman <nhorman@tuxdriver.com> CC: "David S. Miller" <davem@davemloft.net> Signed-off-by: Debabrata Banerjee <dbanerje@akamai.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-20bpf: skmsg, fix psock create on existing kcm/tls portJohn Fastabend
Before using the psock returned by sk_psock_get() when adding it to a sockmap we need to ensure it is actually a sockmap based psock. Previously we were only checking this after incrementing the reference counter which was an error. This resulted in a slab-out-of-bounds error when the psock was not actually a sockmap type. This moves the check up so the reference counter is only used if it is a sockmap psock. Eric reported the following KASAN BUG, BUG: KASAN: slab-out-of-bounds in atomic_read include/asm-generic/atomic-instrumented.h:21 [inline] BUG: KASAN: slab-out-of-bounds in refcount_inc_not_zero_checked+0x97/0x2f0 lib/refcount.c:120 Read of size 4 at addr ffff88019548be58 by task syz-executor4/22387 CPU: 1 PID: 22387 Comm: syz-executor4 Not tainted 4.19.0-rc7+ #264 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Call Trace: __dump_stack lib/dump_stack.c:77 [inline] dump_stack+0x1c4/0x2b4 lib/dump_stack.c:113 print_address_description.cold.8+0x9/0x1ff mm/kasan/report.c:256 kasan_report_error mm/kasan/report.c:354 [inline] kasan_report.cold.9+0x242/0x309 mm/kasan/report.c:412 check_memory_region_inline mm/kasan/kasan.c:260 [inline] check_memory_region+0x13e/0x1b0 mm/kasan/kasan.c:267 kasan_check_read+0x11/0x20 mm/kasan/kasan.c:272 atomic_read include/asm-generic/atomic-instrumented.h:21 [inline] refcount_inc_not_zero_checked+0x97/0x2f0 lib/refcount.c:120 sk_psock_get include/linux/skmsg.h:379 [inline] sock_map_link.isra.6+0x41f/0xe30 net/core/sock_map.c:178 sock_hash_update_common+0x19b/0x11e0 net/core/sock_map.c:669 sock_hash_update_elem+0x306/0x470 net/core/sock_map.c:738 map_update_elem+0x819/0xdf0 kernel/bpf/syscall.c:818 Signed-off-by: John Fastabend <john.fastabend@gmail.com> Reported-by: Eric Dumazet <eric.dumazet@gmail.com> Fixes: 604326b41a6f ("bpf, sockmap: convert to generic sk_msg interface") Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-10-19bpf: add cg_skb_is_valid_access for BPF_PROG_TYPE_CGROUP_SKBSong Liu
BPF programs of BPF_PROG_TYPE_CGROUP_SKB need to access headers in the skb. This patch enables direct access of skb for these programs. Two helper functions bpf_compute_and_save_data_end() and bpf_restore_data_end() are introduced. There are used in __cgroup_bpf_run_filter_skb(), to compute proper data_end for the BPF program, and restore original data afterwards. Signed-off-by: Song Liu <songliubraving@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-10-19bpf: add queue and stack mapsMauricio Vasquez B
Queue/stack maps implement a FIFO/LIFO data storage for ebpf programs. These maps support peek, pop and push operations that are exposed to eBPF programs through the new bpf_map[peek/pop/push] helpers. Those operations are exposed to userspace applications through the already existing syscalls in the following way: BPF_MAP_LOOKUP_ELEM -> peek BPF_MAP_LOOKUP_AND_DELETE_ELEM -> pop BPF_MAP_UPDATE_ELEM -> push Queue/stack maps are implemented using a buffer, tail and head indexes, hence BPF_F_NO_PREALLOC is not supported. As opposite to other maps, queue and stack do not use RCU for protecting maps values, the bpf_map[peek/pop] have a ARG_PTR_TO_UNINIT_MAP_VALUE argument that is a pointer to a memory zone where to save the value of a map. Basically the same as ARG_PTR_TO_UNINIT_MEM, but the size has not be passed as an extra argument. Our main motivation for implementing queue/stack maps was to keep track of a pool of elements, like network ports in a SNAT, however we forsee other use cases, like for exampling saving last N kernel events in a map and then analysing from userspace. Signed-off-by: Mauricio Vasquez B <mauricio.vasquez@polito.it> Acked-by: Song Liu <songliubraving@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-10-19bpf/verifier: add ARG_PTR_TO_UNINIT_MAP_VALUEMauricio Vasquez B
ARG_PTR_TO_UNINIT_MAP_VALUE argument is a pointer to a memory zone used to save the value of a map. Basically the same as ARG_PTR_TO_UNINIT_MEM, but the size has not be passed as an extra argument. This will be used in the following patch that implements some new helpers that receive a pointer to be filled with a map value. Signed-off-by: Mauricio Vasquez B <mauricio.vasquez@polito.it> Acked-by: Song Liu <songliubraving@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-10-19bpf: rename stack trace map operationsMauricio Vasquez B
In the following patches queue and stack maps (FIFO and LIFO datastructures) will be implemented. In order to avoid confusion and a possible name clash rename stack_map_ops to stack_trace_map_ops Signed-off-by: Mauricio Vasquez B <mauricio.vasquez@polito.it> Acked-by: Song Liu <songliubraving@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-10-19remoteproc: Add mechanism for custom dump function assignmentSibi Sankar
This patch adds a mechanism for assigning each rproc dump segment with a custom dump function and private data. The dump function is to be called for each rproc segment during coredump if assigned. Signed-off-by: Sibi Sankar <sibis@codeaurora.org> [bjorn: reordred arguments to rproc_coredump_add_custom_segment()] Signed-off-by: Bjorn Andersson <bjorn.andersson@linaro.org>
2018-10-19remoteproc: Introduce custom dump function for each remoteproc segmentSibi Sankar
Introduce custom dump function and private data per remoteproc dump segment. The dump function is responsible for filling the device memory segment associated with coredump Signed-off-by: Sibi Sankar <sibis@codeaurora.org> Signed-off-by: Bjorn Andersson <bjorn.andersson@linaro.org>
2018-10-19Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller
net/sched/cls_api.c has overlapping changes to a call to nlmsg_parse(), one (from 'net') added rtm_tca_policy instead of NULL to the 5th argument, and another (from 'net-next') added cb->extack instead of NULL to the 6th argument. net/ipv4/ipmr_base.c is a case of a bug fix in 'net' being done to code which moved (to mr_table_dump)) in 'net-next'. Thanks to David Ahern for the heads up. Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-19fs: group frequently accessed fields of struct super_block togetherAmir Goldstein
Kernel test robot reported [1] a 6% performance regression in a concurrent unlink(2) workload on commit 60f7ed8c7c4d ("fsnotify: send path type events to group with super block marks"). The performance test was run with no fsnotify marks at all on the data set, so the only extra instructions added by the offending commit are tests of the super_block fields s_fsnotify_{marks,mask} and these tests happen on almost every single inode access. When adding those fields to the super_block struct, we did not give much thought of placing them on a hot cache lines (we just placed them at the end of the struct). Re-organize struct super_block to try and keep some frequently accessed fields on the same cache line. Move the frequently accessed fields s_fsnotify_{marks,mask} near the frequently accessed fields s_fs_info,s_time_gran, while filling a 64bit alignment hole after s_time_gran. Move the seldom accessed fields s_id,s_uuid,s_max_links,s_mode near the seldom accessed fields s_vfs_rename_mutex,s_subtype. Rong Chen confirmed that this patch solved the reported problem. [1] https://lkml.org/lkml/2018/9/30/206 Reported-by: kernel test robot <rong.a.chen@intel.com> Tested-by: kernel test robot <rong.a.chen@intel.com> Fixes: 1e6cb72399 ("fsnotify: add super block object type") Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz>
2018-10-19regulator: bd718x7: Remove struct bd718xx_pmicAxel Lin
All the fields in struct bd718xx_pmic are not really necessary. Remove struct bd718xx_pmic to simplify the code. Signed-off-by: Axel Lin <axel.lin@ingics.com> Reviewed-by: Matti Vaittinen <matti.vaittinen@fi.rohmeurope.com> Signed-off-by: Mark Brown <broonie@kernel.org>
2018-10-19regmap: Add regmap_noinc_write APIBen Whitten
The regmap API had a noinc_read function added for instances where devices supported returning data from an internal FIFO in a single read. This commit adds the noinc_write variant to allow writing to a non incrementing register, this is used in devices such as the sx1301 for loading firmware. Signed-off-by: Ben Whitten <ben.whitten@lairdtech.com> Signed-off-by: Mark Brown <broonie@kernel.org>
2018-10-19Merge tag 'nand/for-4.20' of git://git.infradead.org/linux-mtd into mtd/nextBoris Brezillon
NAND core changes: - Two batchs of cleanups of the NAND API, including: * Deprecating a lot of interfaces (now replaced by ->exec_op()). * Moving code in separate drivers (JEDEC, ONFI), in private files (internals), in platform drivers, etc. * Functions/structures reordering. * Exclusive use of the nand_chip structure instead of the MTD one all across the subsystem. - Addition of the nand_wait_readrdy/rdy_op() helpers. Raw NAND controllers drivers changes: - Various coccinelle patches. - Marvell: * Use regmap_update_bits() for syscon access. * More documentation. * BCH failure path rework. * More layouts to be supported. * IRQ handler complete() condition fixed. - Fsl_ifc: * SRAM initialization fixed for newer controller versions. - Denali: * Fix licenses mismatch and use a SPDX tag. * Set SPARE_AREA_SKIP_BYTES register to 8 if unset. - Qualcomm: * Do not include dma-direct.h. - Docg4: * Removed. - Ams-delta: * Use of a GPIO lookup table * Internal machinery changes. Raw NAND chip drivers changes: - Toshiba: * Add support for Toshiba memory BENAND * Pass a single nand_chip object to the status helper. - ESMT: * New driver to retrieve the ECC requirements from the 5th ID byte.
2018-10-19Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netGreg Kroah-Hartman
David writes: "Networking 1) Fix gro_cells leak in xfrm layer, from Li RongQing. 2) BPF selftests change RLIMIT_MEMLOCK blindly, don't do that. From Eric Dumazet. 3) AF_XDP calls synchronize_net() under RCU lock, fix from Björn Töpel. 4) Out of bounds packet access in _decode_session6(), from Alexei Starovoitov. 5) Several ethtool bugs, where we copy a struct into the kernel twice and our validations of the values in the first copy can be invalidated by the second copy due to asynchronous updates to the memory by the user. From Wenwen Wang. 6) Missing netlink attribute validation in cls_api, from Davide Caratti. 7) LLC SAP sockets neet to be SOCK_RCU FREE, from Cong Wang. 8) rxrpc operates on wrong kvec, from Yue Haibing. 9) A regression was introduced by the disassosciation of route neighbour references in rt6_probe(), causing probe for neighbourless routes to not be properly rate limited. Fix from Sabrina Dubroca. 10) Unsafe RCU locking in tipc, from Tung Nguyen. 11) Use after free in inet6_mc_check(), from Eric Dumazet. 12) PMTU from icmp packets should update the SCTP transport pathmtu, from Xin Long. 13) Missing peer put on error in rxrpc, from David Howells. 14) Fix pedit in nfp driver, from Pieter Jansen van Vuuren. 15) Fix overflowing shift statement in qla3xxx driver, from Nathan Chancellor. 16) Fix Spectre v1 in ptp code, from Gustavo A. R. Silva. 17) udp6_unicast_rcv_skb() interprets udpv6_queue_rcv_skb() return value in an inverted manner, fix from Paolo Abeni. 18) Fix missed unresolved entries in ipmr dumps, from Nikolay Aleksandrov. 19) Fix NAPI handling under high load, we can completely miss events when NAPI has to loop more than one time in a cycle. From Heiner Kallweit." * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (49 commits) ip6_tunnel: Fix encapsulation layout tipc: fix info leak from kernel tipc_event net: socket: fix a missing-check bug net: sched: Fix for duplicate class dump r8169: fix NAPI handling under high load net: ipmr: fix unresolved entry dumps net: mscc: ocelot: Fix comment in ocelot_vlant_wait_for_completion() sctp: fix the data size calculation in sctp_data_size virtio_net: avoid using netif_tx_disable() for serializing tx routine udp6: fix encap return code for resubmitting mlxsw: core: Fix use-after-free when flashing firmware during init sctp: not free the new asoc when sctp_wait_for_connect returns err sctp: fix race on sctp_id2asoc r8169: re-enable MSI-X on RTL8168g net: bpfilter: use get_pid_task instead of pid_task ptp: fix Spectre v1 vulnerability net: qla3xxx: Remove overflowing shift statement geneve, vxlan: Don't set exceptions if skb->len < mtu geneve, vxlan: Don't check skb_dst() twice sctp: get pr_assoc and pr_stream all status with SCTP_PR_SCTP_ALL instead ...
2018-10-19swiotlb: don't dip into swiotlb pool for coherent allocationsChristoph Hellwig
All architectures that support swiotlb also have a zone that backs up these less than full addressing allocations (usually ZONE_DMA32). Because of that it is rather pointless to fall back to the global swiotlb buffer if the normal dma direct allocation failed - the only thing this will do is to eat up bounce buffers that would be more useful to serve streaming mappings. Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: Catalin Marinas <catalin.marinas@arm.com> Acked-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2018-10-19compiler-gcc: remove comment about gcc 4.5 from unreachable()ndesaulniers@google.com
Remove the comment about being unable to detect __builtin_unreachable. __builtin_unreachable was implemented in the GCC 4.5 timeframe. The kernel's minimum supported version of GCC is 4.6 since commit cafa0010cd51 ("Raise the minimum required gcc version to 4.6"). Commit cb984d101b30 ("compiler-gcc: integrate the various compiler-gcc[345].h files") shows that unreachable() had different guards based on GCC version. Suggested-by: Miguel Ojeda <miguel.ojeda.sandonis@gmail.com> Signed-off-by: Nick Desaulniers <ndesaulniers@google.com> Signed-off-by: Miguel Ojeda <miguel.ojeda.sandonis@gmail.com>
2018-10-19compiler.h: update definition of unreachable()ndesaulniers@google.com
Fixes the objtool warning seen with Clang: arch/x86/mm/fault.o: warning: objtool: no_context()+0x220: unreachable instruction Fixes commit 815f0ddb346c ("include/linux/compiler*.h: make compiler-*.h mutually exclusive") Josh noted that the fallback definition was meant to work around a pre-gcc-4.6 bug. GCC still needs to work around https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82365, so compiler-gcc.h defines its own version of unreachable(). Clang and ICC can use this shared definition. Link: https://github.com/ClangBuiltLinux/linux/issues/204 Suggested-by: Andy Lutomirski <luto@amacapital.net> Suggested-by: Josh Poimboeuf <jpoimboe@redhat.com> Tested-by: Nathan Chancellor <natechancellor@gmail.com> Signed-off-by: Nick Desaulniers <ndesaulniers@google.com> Signed-off-by: Miguel Ojeda <miguel.ojeda.sandonis@gmail.com>
2018-10-19swiotlb: remove the overflow bufferChristoph Hellwig
Like all other dma mapping drivers just return an error code instead of an actual memory buffer. The reason for the overflow buffer was that at the time swiotlb was invented there was no way to check for dma mapping errors, but this has long been fixed. Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: Catalin Marinas <catalin.marinas@arm.com> Reviewed-by: Robin Murphy <robin.murphy@arm.com> Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2018-10-19swiotlb: mark is_swiotlb_buffer staticChristoph Hellwig
Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Robin Murphy <robin.murphy@arm.com> Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2018-10-19locking/lockdep: Make global debug_locks* variables read-mostlyWaiman Long
Make the frequently used lockdep global variable debug_locks read-mostly. As debug_locks_silent is sometime used together with debug_locks, it is also made read-mostly so that they can be close together. With false cacheline sharing, cacheline contention problem can happen depending on what get put into the same cacheline as debug_locks. Signed-off-by: Waiman Long <longman@redhat.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Will Deacon <will.deacon@arm.com> Link: http://lkml.kernel.org/r/1539913518-15598-2-git-send-email-longman@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-10-18Merge branches 'clk-tegra' and 'clk-bulk-get-all' into clk-nextStephen Boyd
- Nvidia Tegra clk driver MBIST workaround fix - clk_bulk_get_all() API and friends to get all the clks for a device * clk-tegra: clk: tegra210: Include size.h for compilation ease clk: tegra: Fixes for MBIST work around clk: tegra: probe deferral error reporting * clk-bulk-get-all: clk: add managed version of clk_bulk_get_all clk: add new APIs to operate on all available clocks clk: bulk: add of_clk_bulk_get()
2018-10-18Merge branch 'clk-ti' into clk-nextStephen Boyd
* clk-ti: clk: ti: Prepare for remove of OF node name clk: Clean up suspend/resume coding style clk: ti: Add functions to save/restore clk context clk: clk: Add clk_gate_restore_context function clk: Add functions to save/restore clock context en-masse clk: ti: dra7: add new clkctrl data clk: ti: dra7xx: rename existing clkctrl data as compat data clk: ti: am43xx: add new clkctrl data for am43xx clk: ti: am43xx: rename existing clkctrl data as compat data clk: ti: am33xx: add new clkctrl data for am33xx clk: ti: am33xx: rename existing clkctrl data as compat data clk: ti: clkctrl: replace dashes from clkdm name with underscore clk: ti: clkctrl: support multiple clkctrl nodes under a cm node dt-bindings: clock: dra7xx: add clkctrl indices for new data layout dt-bindings: clock: am43xx: add clkctrl indices for new data layout dt-bindings: clock: am33xx: add clkctrl indices for new data layout
2018-10-18net/mlx5: Added "per_lane_error_counters" cap bit to PCAMShay Agroskin
Added "Per lane raw errors" capability bit in Ports Capabilities Mask (PCAM) enhanced features layout. This bit determines if the fields "phy_raw_errors_laneX" in "Physical Layer statistical" counters group are supported. Signed-off-by: Shay Agroskin <shayag@mellanox.com> Reviewed-by: Eran Ben Elisha <eranbe@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-10-18net/mlx5: Add FEC fields to Port Phy Link Mode (PPLM) regShay Agroskin
Added FEC related fields to PPLM layout. These fields are needed to set and query FEC policy for different link speeds. Signed-off-by: Shay Agroskin <shayag@mellanox.com> Reviewed-by: Eran Ben Elisha <eranbe@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-10-18net/mlx5: Refactor fragmented buffer struct fields and init flowTariq Toukan
Take struct mlx5_frag_buf out of mlx5_frag_buf_ctrl, as it is not needed to manage and control the datapath of the fragmented buffers API. struct mlx5_frag_buf contains control info to manage the allocation and de-allocation of the fragmented buffer. Its fields are not relevant for datapath, so here I take them out of the struct mlx5_frag_buf_ctrl, except for the fragments array itself. In addition, modified mlx5_fill_fbc to initialise the frags pointers as well. This implies that the buffer must be allocated before the function is called. A set of type-specific *_get_byte_size() functions are replaced by a generic one. Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-10-18Merge tag 'mlx5-updates-2018-10-17' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux mlx5-updates-2018-10-17 ======================================================================== From Or Gerlitz <ogerlitz@mellanox.com>: This series from Paul adds support to mlx5 e-switch tc offloading of multiple priorities and chains. This is made of four building blocks (along with few minor driver refactors): [1] Split FDB fast path prio to multiple namespaces Currently the FDB name-space contains two priorities, fast path (p0) and slow path (p1). The slow path contains the per representor SQ send-to-vport TX rule and the match-all RX miss rule. As a pre-step to support multi-chains and priorities, we split the FDB fast path to multiple namespaces (sub namespaces), each with multiple priorities. [2] E-Switch chains and priorities A chain is a group of priorities. We use the fdb parallel sub-namespaces to implement chains, and a flow table for each priority in them. Because these namespaces are parallel and in series to the slow path fdb, the chains aren't connected to each other (but to the slow path), and one must use a explicit goto action to reach a different chain. Flow tables for the priorities are created on demand and destroyed once not used. [3] Add a no-append flow insertion mode, use it for TC offloads Enhance the driver fs core, such that if a no-append flag is set by the caller, we add a new FTE, instead of appending the actions of the inserted rule when the same match already exists. For encap rules, we defer the HW offloading till we have a valid neighbor. This can result in the packet hitting a lower priority rule in the HW DP. Use the no-append API to push these packets to the slow path FDB table, so they go to the TC kernel DP as done before priorities where supported. [4] Offloading tc priorities and chains for eswitch flows Using [1], [2] and [3] above we add the support for offloading both chains and priorities. To get to a new chain, use the tc goto action. We support a fixed prio range 1-16, and chains 0-3. ============================================================================= Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-18dm: add dm_table_device_name()Michał Mirosław
Add a shortcut for dm_device_name(dm_table_get_md(t)). Signed-off-by: Michał Mirosław <mirq-linux@rere.qmqm.pl> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2018-10-18Merge branches 'acpi-init', 'acpi-osl', 'acpi-bus', 'acpi-tables' and ↵Rafael J. Wysocki
'acpi-misc' * acpi-init: ACPI: probe ECDT before loading AML tables regardless of module-level code flag * acpi-osl: ACPI / OSL: Use 'jiffies' as the time bassis for acpi_os_get_timer() * acpi-bus: ACPI / glue: Split dev_is_platform() out of module for wide use * acpi-tables: ACPI/PPTT: Handle architecturally unknown cache types drivers: base: cacheinfo: Do not populate sysfs for unknown cache types * acpi-misc: ACPI: remove redundant 'default n' from Kconfig ACPI: custom_method: remove meaningless null check before debugfs_remove()
2018-10-18Merge branches 'pm-devfreq' and 'pm-tools'Rafael J. Wysocki
* pm-devfreq: PM / devfreq: remove redundant null pointer check before kfree PM / devfreq: stopping the governor before device_unregister() PM / devfreq: Convert to using %pOFn instead of device_node.name PM / devfreq: Make update_devfreq() public PM / devfreq: Don't adjust to user limits in governors PM / devfreq: Fix handling of min/max_freq == 0 PM / devfreq: Drop custom MIN/MAX macros PM / devfreq: Fix devfreq_add_device() when drivers are built as modules. * pm-tools: PM / tools: sleepgraph and bootgraph: upgrade to v5.2 PM / tools: sleepgraph: first batch of v5.2 changes cpupower: Fix coredump on VMWare cpupower: Fix AMD Family 0x17 msr_pstate size cpupower: remove stringop-truncation waring
2018-10-18Merge branches 'pm-opp' and 'powercap'Rafael J. Wysocki
* pm-opp: PM / OPP: _of_add_opp_table_v2(): increment count only if OPP is added cpufreq: dt: Try freeing static OPPs only if we have added them OPP: Return error on error from dev_pm_opp_get_opp_count() OPP: Improve error handling in dev_pm_opp_of_cpumask_add_table() OPP: Pass OPP table to _of_add_opp_table_v{1|2}() OPP: Prevent creating multiple OPP tables for devices sharing OPP nodes OPP: Use a single mechanism to free the OPP table OPP: Don't remove dynamic OPPs from _dev_pm_opp_remove_table() cpufreq: mvebu: Remove OPPs using dev_pm_opp_remove() OPP: Create separate kref for static OPPs list OPP: Don't take OPP table's kref for static OPPs OPP: Parse OPP table's DT properties from _of_init_opp_table() OPP: Pass index to _of_init_opp_table() OPP: Protect dev_list with opp_table lock OPP: Don't try to remove all OPP tables on failure OPP: Free OPP table properly on performance state irregularities * powercap: powercap: RAPL: Get rid of custom RAPL_CPU() macro
2018-10-18Merge branch 'pm-cpuidle'Rafael J. Wysocki
* pm-cpuidle: cpuidle: menu: Avoid computations when result will be discarded cpuidle: menu: Drop redundant comparison cpuidle: menu: Simplify checks related to the polling state cpuidle: poll_state: Revise loop termination condition cpuidle: menu: Move the latency_req == 0 special case check cpuidle: menu: Avoid computations for very close timers cpuidle: menu: Do not update last_state_idx in menu_select() cpuidle: menu: Get rid of first_idx from menu_select() cpuidle: menu: Compute first_idx when latency_req is known cpuidle: menu: Fix wakeup statistics updates for polling state cpuidle: menu: Replace data->predicted_us with local variable cpuidle: enter_state: Don't needlessly calculate diff time cpuidle: Remove unnecessary wrapper cpuidle_get_last_residency() intel_idle: Get rid of custom ICPU() macro
2018-10-18PM / Domains: Document flags for genpdUlf Hansson
The current documented description of the GENPD_FLAG_* flags, are too simplified, so let's extend them. Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2018-10-18mremap: properly flush TLB before releasing the pageLinus Torvalds
Jann Horn points out that our TLB flushing was subtly wrong for the mremap() case. What makes mremap() special is that we don't follow the usual "add page to list of pages to be freed, then flush tlb, and then free pages". No, mremap() obviously just _moves_ the page from one page table location to another. That matters, because mremap() thus doesn't directly control the lifetime of the moved page with a freelist: instead, the lifetime of the page is controlled by the page table locking, that serializes access to the entry. As a result, we need to flush the TLB not just before releasing the lock for the source location (to avoid any concurrent accesses to the entry), but also before we release the destination page table lock (to avoid the TLB being flushed after somebody else has already done something to that page). This also makes the whole "need_flush" logic unnecessary, since we now always end up flushing the TLB for every valid entry. Reported-and-tested-by: Jann Horn <jannh@google.com> Acked-by: Will Deacon <will.deacon@arm.com> Tested-by: Ingo Molnar <mingo@kernel.org> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-10-17net: skbuff.h: Mark expected switch fall-throughsGustavo A. R. Silva
In preparation to enabling -Wimplicit-fallthrough, mark switch cases where we are expecting to fall through. Acked-by: Kees Cook <keescook@chromium.org> Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-18Merge tag 'trace-v4.19-rc8' of ↵Greg Kroah-Hartman
git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace Steven writes: "tracing: Two fixes for 4.19 This fixes two bugs: - Fix size mismatch of tracepoint array - Have preemptirq test module use same clock source of the selftest" * tag 'trace-v4.19-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: tracing: Use trace_clock_local() for looping in preemptirq_delay_test.c tracepoint: Fix tracepoint array element size mismatch
2018-10-17net/mlx5: Add a no-append flow insertion modePaul Blakey
If no-append flag is set, we will add a new FTE, instead of appending the actions of the inserted rule when the same match already exists. While here, move the has_flow_tag boolean indicator to be a flag too. This patch doesn't change any functionality. Signed-off-by: Paul Blakey <paulb@mellanox.com> Reviewed-by: Or Gerlitz <ogerlitz@mellanmox.com> Reviewed-by: Mark Bloch <markb@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-10-17net/mlx5: Split FDB fast path prio to multiple namespacesPaul Blakey
Towards supporting multi-chains and priorities, split the FDB fast path to multiple namespaces (sub namespaces), each with multiple priorities. This patch adds a new flow steering type, FS_TYPE_PRIO_CHAINS, which is like current FS_TYPE_PRIO, but may contain only namespaces, and those will be in parallel to one another in terms of managing of the flow tables connections inside them. Meaning, while searching for the next or previous flow table to connect for a new table inside such namespace we skip the parallel namespaces in the same level under the FS_TYPE_PRIO_CHAINS prio we originated from. We use this new type for splitting the fast path prio into multiple parallel namespaces, each containing normal prios. The prios inside them (and their tables) will be connected to one another, but not from one parallel namespace to another, instead the last prio in each namespace will be connected to the next prio in the containing FDB namespace, which is the slow path prio. Signed-off-by: Paul Blakey <paulb@mellanox.com> Acked-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-10-17net/mlx5: Add cap bits for multi fdb encapPaul Blakey
If set, the firmware supports creating of flow tables with encap enabled while VFs are configured, if we already created one (restriction still applies on the first creation). Signed-off-by: Paul Blakey <paulb@mellanox.com> Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-10-17net/mlx5: Use flow counter IDs and not the wrapping cache objectMark Bloch
Currently, when a flow rule is created using the FS core layer, the caller has to pass the entire flow counter object and not just the counter HW handle (ID). This requires both the FS core and the caller to have knowledge about the inner implementation of the FS layer flow counters cache and limits the possible users. Move to use the counter ID across the place when dealing with flows. Doing this decoupling, now can we privatize the inner implementation of the flow counters. Signed-off-by: Mark Bloch <markb@mellanox.com> Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-10-17net/mlx5: E-Switch, Get counters for offloaded flows from callersMark Bloch
There's no real reason for the e-switch logic to manage the creation of counters for offloaded flows. The API already has the directive for the caller to denote they want to attach a counter to the created flow. As such, we go and move the management of flow counters to the mlx5e tc offload logic. This also lets us remove an inelegant interface where the FS layer had to provide a way to retrieve a counter from a flow rule. Signed-off-by: Mark Bloch <markb@mellanox.com> Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-10-17Merge branch 'mlx5-next' of ↵Saeed Mahameed
git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux into net-next mlx5 updates for both net-next and rdma-next * 'mlx5-next' of git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux: (21 commits) net/mlx5: Expose DC scatter to CQE capability bit net/mlx5: Update mlx5_ifc with DEVX UID bits net/mlx5: Set uid as part of DCT commands net/mlx5: Set uid as part of SRQ commands net/mlx5: Set uid as part of SQ commands net/mlx5: Set uid as part of RQ commands net/mlx5: Set uid as part of QP commands net/mlx5: Set uid as part of CQ commands net/mlx5: Rename incorrect naming in IFC file net/mlx5: Export packet reformat alloc/dealloc functions net/mlx5: Pass a namespace for packet reformat ID allocation net/mlx5: Expose new packet reformat capabilities {net, RDMA}/mlx5: Rename encap to reformat packet net/mlx5: Move header encap type to IFC header file net/mlx5: Break encap/decap into two separated flow table creation flags net/mlx5: Add support for more namespaces when allocating modify header net/mlx5: Export modify header alloc/dealloc functions net/mlx5: Add proper NIC TX steering flow tables support net/mlx5: Cleanup flow namespace getter switch logic net/mlx5: Add memic command opcode to command checker ... Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-10-17UAPI: ndctl: Remove use of PAGE_SIZEDavid Howells
The macro PAGE_SIZE isn't valid outside of the kernel, so it should not appear in UAPI headers. Furthermore, the actual machine page size could theoretically change from an application's point of view if it's running in a container that gets migrated to another machine (say 4K/ppc64 to 64K/ppc64). Fixes: f2ba5a5baecf ("libnvdimm, namespace: make min namespace size 4K") Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>