summaryrefslogtreecommitdiff
path: root/include/linux
AgeCommit message (Collapse)Author
2018-03-30fs, dax: prepare for dax-specific address_space_operationsDan Williams
In preparation for the dax implementation to start associating dax pages to inodes via page->mapping, we need to provide a 'struct address_space_operations' instance for dax. Define some generic VFS aops helpers for dax. These noop implementations are there in the dax case to prevent the VFS from falling back to operations with page-cache assumptions, dax_writeback_mapping_range() may not be referenced in the FS_DAX=n case. Cc: Jeff Moyer <jmoyer@redhat.com> Cc: Ross Zwisler <ross.zwisler@linux.intel.com> Suggested-by: Matthew Wilcox <mawilcox@microsoft.com> Suggested-by: Jan Kara <jack@suse.cz> Suggested-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jan Kara <jack@suse.cz> Suggested-by: Dave Chinner <david@fromorbit.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2018-03-31crypto: Deduplicate le32_to_cpu_array() and cpu_to_le32_array()Andy Shevchenko
Deduplicate le32_to_cpu_array() and cpu_to_le32_array() by moving them to the generic header. No functional change implied. Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2018-03-30net/dim: Fix int overflowTal Gilboa
When calculating difference between samples, the values are multiplied by 100. Large values may cause int overflow when multiplied (usually on first iteration). Fixed by forcing 100 to be of type unsigned long. Fixes: 4c4dbb4a7363 ("net/mlx5e: Move dynamic interrupt coalescing code to include/linux") Signed-off-by: Tal Gilboa <talgi@mellanox.com> Reviewed-by: Andy Gospodarek <gospo@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-30vlan: Fix vlan insertion for packets without ethernet headerToshiaki Makita
In some situation vlan packets do not have ethernet headers. One example is packets from tun devices. Users can specify vlan protocol in tun_pi field instead of IP protocol. When we have a vlan device with reorder_hdr disabled on top of the tun device, such packets from tun devices are untagged in skb_vlan_untag() and vlan headers will be inserted back in vlan_insert_inner_tag(). vlan_insert_inner_tag() however did not expect packets without ethernet headers, so in such a case size argument for memmove() underflowed. We don't need to copy headers for packets which do not have preceding headers of vlan headers, so skip memmove() in that case. Also don't write vlan protocol in skb->data when it does not have enough room for it. Fixes: cbe7128c4b92 ("vlan: Fix out of order vlan headers with reorder header off") Signed-off-by: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-30Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-nextDavid S. Miller
Pablo Neira Ayuso says: ==================== Netfilter/IPVS updates for net-next The following patchset contains Netfilter/IPVS updates for your net-next tree. This batch comes with more input sanitization for xtables to address bug reports from fuzzers, preparation works to the flowtable infrastructure and assorted updates. In no particular order, they are: 1) Make sure userspace provides a valid standard target verdict, from Florian Westphal. 2) Sanitize error target size, also from Florian. 3) Validate that last rule in basechain matches underflow/policy since userspace assumes this when decoding the ruleset blob that comes from the kernel, from Florian. 4) Consolidate hook entry checks through xt_check_table_hooks(), patch from Florian. 5) Cap ruleset allocations at 512 mbytes, 134217728 rules and reject very large compat offset arrays, so we have a reasonable upper limit and fuzzers don't exercise the oom-killer. Patches from Florian. 6) Several WARN_ON checks on xtables mutex helper, from Florian. 7) xt_rateest now has a hashtable per net, from Cong Wang. 8) Consolidate counter allocation in xt_counters_alloc(), from Florian. 9) Earlier xt_table_unlock() call in {ip,ip6,arp,eb}tables, patch from Xin Long. 10) Set FLOW_OFFLOAD_DIR_* to IP_CT_DIR_* definitions, patch from Felix Fietkau. 11) Consolidate code through flow_offload_fill_dir(), also from Felix. 12) Inline ip6_dst_mtu_forward() just like ip_dst_mtu_maybe_forward() to remove a dependency with flowtable and ipv6.ko, from Felix. 13) Cache mtu size in flow_offload_tuple object, this is safe for forwarding as f87c10a8aa1e describes, from Felix. 14) Rename nf_flow_table.c to nf_flow_table_core.o, to simplify too modular infrastructure, from Felix. 15) Add rt0, rt2 and rt4 IPv6 routing extension support, patch from Ahmed Abdelsalam. 16) Remove unused parameter in nf_conncount_count(), from Yi-Hung Wei. 17) Support for counting only to nf_conncount infrastructure, patch from Yi-Hung Wei. 18) Add strict NFT_CT_{SRC_IP,DST_IP,SRC_IP6,DST_IP6} key datatypes to nft_ct. 19) Use boolean as return value from ipt_ah and from IPVS too, patch from Gustavo A. R. Silva. 20) Remove useless parameters in nfnl_acct_overquota() and nf_conntrack_broadcast_help(), from Taehee Yoo. 21) Use ipv6_addr_is_multicast() from xt_cluster, also from Taehee Yoo. 22) Statify nf_tables_obj_lookup_byhandle, patch from Fengguang Wu. 23) Fix typo in xt_limit, from Geert Uytterhoeven. 24) Do no use VLAs in Netfilter code, again from Gustavo. 25) Use ADD_COUNTER from ebtables, from Taehee Yoo. 26) Bitshift support for CONNMARK and MARK targets, from Jack Ma. 27) Use pr_*() and add pr_fmt(), from Arushi Singhal. 28) Add synproxy support to ctnetlink. 29) ICMP type and IGMP matching support for ebtables, patches from Matthias Schiffer. 30) Support for the revision infrastructure to ebtables, from Bernie Harris. 31) String match support for ebtables, also from Bernie. 32) Documentation for the new flowtable infrastructure. 33) Use generic comparison functions in ebt_stp, from Joe Perches. 34) Demodularize filter chains in nftables. 35) Register conntrack hooks in case nftables NAT chain is added. 36) Merge assignments with return in a couple of spots in the Netfilter codebase, also from Arushi. 37) Document that xtables percpu counters are stored in the same memory area, from Ben Hutchings. 38) Revert mark_source_chains() sanity checks that break existing rulesets, from Florian Westphal. 39) Use is_zero_ether_addr() in the ipset codebase, from Joe Perches. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-30of_net: Implement of_get_nvmem_mac_address helperMike Looijmans
It's common practice to store MAC addresses for network interfaces into nvmem devices. However the code to actually do this in the kernel lacks, so this patch adds of_get_nvmem_mac_address() for drivers to obtain the address from an nvmem cell provider. This is particulary useful on devices where the ethernet interface cannot be configured by the bootloader, for example because it's in an FPGA. Signed-off-by: Mike Looijmans <mike.looijmans@topic.nl> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-30sfp/phylink: move module EEPROM ethtool access into netdev core ethtoolRussell King
Provide a pointer to the SFP bus in struct net_device, so that the ethtool module EEPROM methods can access the SFP directly, rather than needing every user to provide a hook for it. Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk> Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk> Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-30net: phy: phylink: Provide PHY interface to mac_link_{up, down}Florian Fainelli
In preparation for having DSA transition entirely to PHYLINK, we need to pass a PHY interface type to the mac_link_{up,down} callbacks because we may have to make decisions on that (e.g: turn on/off RGMII interfaces etc.). We do not pass an entire phylink_link_state because not all parameters (pause, duplex etc.) are defined when the link is down, only link and interface are. Update mvneta accordingly since it currently implements phylink_mac_ops. Acked-by: Russell King <rmk+kernel@armlinux.org.uk> Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Acked-by: Russell King <rmk+kernel@armlinux.org.uk> Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-30net: Call add/kill vid ndo on vlan filter feature togglingGal Pressman
NETIF_F_HW_VLAN_[CS]TAG_FILTER features require more than just a bit flip in dev->features in order to keep the driver in a consistent state. These features notify the driver of each added/removed vlan, but toggling of vlan-filter does not notify the driver accordingly for each of the existing vlans. This patch implements a similar solution to NETIF_F_RX_UDP_TUNNEL_PORT behavior (which notifies the driver about UDP ports in the same manner that vids are reported). Each toggling of the features propagates to the 8021q module, which iterates over the vlans and call add/kill ndo accordingly. Signed-off-by: Gal Pressman <galp@mellanox.com> Reviewed-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-30mm: make memblock_alloc_base_nid() non-staticNicholas Piggin
This will be used by powerpc to allocate per-cpu stacks and other data structures node-local where possible. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> [mpe: Drop stray change to memblock_alloc_range() as noticed by akpm] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-03-29lightnvm: pblk: implement get log report chunkJavier González
In preparation of pblk supporting 2.0, implement the get log report chunk in pblk. Also, define the chunk states as given in the 2.0 spec. Signed-off-by: Javier González <javier@cnexlabs.com> Signed-off-by: Matias Bjørling <mb@lightnvm.io> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-03-29lightnvm: implement get log report chunk helpersJavier González
The 2.0 spec provides a report chunk log page that can be retrieved using the stangard nvme get log page. This replaces the dedicated get/put bad block table in 1.2. This patch implements the helper functions to allow targets retrieve the chunk metadata using get log page. It makes nvme_get_log_ext available outside of nvme core so that we can use it form lightnvm. Signed-off-by: Javier González <javier@cnexlabs.com> Signed-off-by: Matias Bjørling <mb@lightnvm.io> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-03-29lightnvm: make address conversions depend on generic deviceJavier González
On address conversions, use the generic device, instead of the target device. This allows to use conversions outside of the target's realm. Signed-off-by: Javier González <javier@cnexlabs.com> Signed-off-by: Matias Bjørling <mb@lightnvm.io> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-03-29lightnvm: add support for 2.0 address formatJavier González
Add support for 2.0 address format. Also, align address bits for 1.2 and 2.0 to be able to operate on channel and luns without requiring a format conversion. Use a generic address format for this purpose. Also, convert the generic operations to the generic format in pblk. Signed-off-by: Javier González <javier@cnexlabs.com> Signed-off-by: Matias Bjørling <mb@lightnvm.io> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-03-29lightnvm: normalize geometry nomenclatureJavier González
Normalize nomenclature for naming channels, luns, chunks, planes and sectors as well as derivations in order to improve readability. Signed-off-by: Javier González <javier@cnexlabs.com> Signed-off-by: Matias Bjørling <mb@lightnvm.io> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-03-29lightnvm: complete geo structure with maxoc*Javier González
Complete the generic geometry structure with the maxoc and maxocpu felds, present in the 2.0 spec. Also, expose them through sysfs. Signed-off-by: Javier González <javier@cnexlabs.com> Signed-off-by: Matias Bjørling <mb@lightnvm.io> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-03-29lightnvm: add shorten OCSSD version in geoJavier González
Create a shorten version to use in the generic geometry. Signed-off-by: Javier González <javier@cnexlabs.com> Signed-off-by: Matias Bjørling <mb@lightnvm.io> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-03-29lightnvm: add minor version to generic geometryJavier González
Separate the version between major and minor on the generic geometry and represent it through sysfs in the 2.0 path. The 1.2 path only shows the major version to preserve the existing user space interface. Signed-off-by: Javier González <javier@cnexlabs.com> Signed-off-by: Matias Bjørling <mb@lightnvm.io> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-03-29lightnvm: simplify geometry structureJavier González
Currently, the device geometry is stored redundantly in the nvm_id and nvm_geo structures at a device level. Moreover, when instantiating targets on a specific number of LUNs, these structures are replicated and manually modified to fit the instance channel and LUN partitioning. Instead, create a generic geometry around nvm_geo, which can be used by (i) the underlying device to describe the geometry of the whole device, and (ii) instances to describe their geometry independently. Signed-off-by: Javier González <javier@cnexlabs.com> Signed-off-by: Matias Bjørling <mb@lightnvm.io> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-03-29lightnvm: remove nvm_dev_ops->max_phys_sectMatias Bjørling
The value of max_phys_sect is always static. Instead of defining it in the nvm_dev_ops structure, declare it as a global value. Signed-off-by: Matias Bjørling <mb@lightnvm.io> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-03-29lightnvm: remove max_rq_sizeMatias Bjørling
The field is no longer used. Signed-off-by: Matias Bjørling <mb@lightnvm.io> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-03-29lightnvm: add 2.0 geometry identificationMatias Bjørling
Implement the geometry data structures for 2.0 and enable a drive to be identified as one, including exposing the appropriate 2.0 sysfs entries. Signed-off-by: Matias Bjørling <mb@lightnvm.io> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-03-29lightnvm: flatten nvm_id_group into nvm_idMatias Bjørling
There are no groups in the 2.0 specification, make sure that the nvm_id structure is flattened before 2.0 data structures are added. Signed-off-by: Matias Bjørling <mb@lightnvm.io> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-03-30bpf: sockmap, BPF_F_INGRESS flag for BPF_SK_SKB_STREAM_VERDICT:John Fastabend
Add support for the BPF_F_INGRESS flag in skb redirect helper. To do this convert skb into a scatterlist and push into ingress queue. This is the same logic that is used in the sk_msg redirect helper so it should feel familiar. Signed-off-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-03-30bpf: sockmap redirect ingress supportJohn Fastabend
Add support for the BPF_F_INGRESS flag in sk_msg redirect helper. To do this add a scatterlist ring for receiving socks to check before calling into regular recvmsg call path. Additionally, because the poll wakeup logic only checked the skb recv queue we need to add a hook in TCP stack (similar to write side) so that we have a way to wake up polling socks when a scatterlist is redirected to that sock. After this all that is needed is for the redirect helper to push the scatterlist into the psock receive queue. Signed-off-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-03-29take out orphan externs (empty_string/slash_string)Al Viro
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2018-03-29qed: Adapter flash update support.Sudarsana Reddy Kalluru
This patch adds the required driver support for updating the flash or non volatile memory of the adapter. At highlevel, flash upgrade comprises of reading the flash images from the input file, validating the images and writing them to the respective paritions. Signed-off-by: Sudarsana Reddy Kalluru <Sudarsana.Kalluru@cavium.com> Signed-off-by: Ariel Elior <ariel.elior@cavium.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-29qed*: Utilize FW 8.33.11.0Michal Kalderon
This FW contains several fixes and features RDMA Features - SRQ support - XRC support - Memory window support - RDMA low latency queue support - RDMA bonding support RDMA bug fixes - RDMA remote invalidate during retransmit fix - iWARP MPA connect interop issue with RTR fix - iWARP Legacy DPM support - Fix MPA reject flow - iWARP error handling - RQ WQE validation checks MISC - Fix some HSI types endianity - New Restriction: vlan insertion in core_tx_bd_data can't be set for LB packets ETH - HW QoS offload support - Fix vlan, dcb and sriov flow of VF sending a packet with inband VLAN tag instead of default VLAN - Allow GRE version 1 offloads in RX flow - Allow VXLAN steering iSCSI / FcoE - Fix bd availability checking flow - Support 256th sge proerly in iscsi/fcoe retransmit - Performance improvement - Fix handle iSCSI command arrival with AHS and with immediate - Fix ipv6 traffic class configuration DEBUG - Update debug utilities Signed-off-by: Michal Kalderon <Michal.Kalderon@cavium.com> Signed-off-by: Tomer Tayar <Tomer.Tayar@cavium.com> Signed-off-by: Manish Rangankar <Manish.Rangankar@cavium.com> Signed-off-by: Ariel Elior <Ariel.Elior@cavium.com> Acked-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-29Merge tag 'mlx5-updates-2018-03-27' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux Saeed Mahameed says: ==================== mlx5-updates-2018-03-27 (Misc updates & SQ recovery) This series contains Misc updates and cleanups for mlx5e rx path and SQ recovery feature for tx path. From Tariq: (RX updates) - Disable Striding RQ when PCI devices, striding RQ limits the use of CQE compression feature, which is very critical for slow PCI devices performance, in this change we will prefer CQE compression over Striding RQ only on specific "slow" PCIe links. - RX path cleanups - Private flag to enable/disable striding RQ From Eran: (TX fast recovery) - TX timeout logic improvements, fast SQ recovery and TX error reporting if a HW error occurs while transmitting on a specific SQ, the driver will ignore such error and will wait for TX timeout to occur and reset all the rings. Instead, the current series improves the resiliency for such HW errors by detecting TX completions with errors, which will report them and perform a fast recover for the specific faulty SQ even before a TX timeout is detected. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-29net: Introduce net_rwsem to protect net_namespace_listKirill Tkhai
rtnl_lock() is used everywhere, and contention is very high. When someone wants to iterate over alive net namespaces, he/she has no a possibility to do that without exclusive lock. But the exclusive rtnl_lock() in such places is overkill, and it just increases the contention. Yes, there is already for_each_net_rcu() in kernel, but it requires rcu_read_lock(), and this can't be sleepable. Also, sometimes it may be need really prevent net_namespace_list growth, so for_each_net_rcu() is not fit there. This patch introduces new rw_semaphore, which will be used instead of rtnl_mutex to protect net_namespace_list. It is sleepable and allows not-exclusive iterations over net namespaces list. It allows to stop using rtnl_lock() in several places (what is made in next patches) and makes less the time, we keep rtnl_mutex. Here we just add new lock, while the explanation of we can remove rtnl_lock() there are in next patches. Fine grained locks generally are better, then one big lock, so let's do that with net_namespace_list, while the situation allows that. Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-29perf/x86/pt, coresight: Clean up address filter structureAlexander Shishkin
This is a cosmetic patch that deals with the address filter structure's ambiguous fields 'filter' and 'range'. The former stands to mean that the filter's *action* should be to filter the traces to its address range if it's set or stop tracing if it's unset. This is confusing and hard on the eyes, so this patch replaces it with 'action' enum. The 'range' field is completely redundant (meaning that the filter is an address range as opposed to a single address trigger), as we can use zero size to mean the same thing. Signed-off-by: Alexander Shishkin <alexander.shishkin@linux.intel.com> Acked-by: Mathieu Poirier <mathieu.poirier@linaro.org> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Cc: Will Deacon <will.deacon@arm.com> Link: http://lkml.kernel.org/r/20180329120648.11902-1-alexander.shishkin@linux.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-03-29Merge branch 'perf/urgent' into perf/coreIngo Molnar
Conflicts: kernel/events/hw_breakpoint.c Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-03-29Merge branches 'x86/amd', 'x86/vt-d', 'arm/rockchip', 'arm/omap', ↵Joerg Roedel
'arm/mediatek', 'arm/exynos', 'arm/renesas', 'arm/smmu' and 'core' into next
2018-03-29Merge tag 'stm-intel_th-for-greg-20180329' of ↵Greg Kroah-Hartman
git://git.kernel.org/pub/scm/linux/kernel/git/ash/stm into char-misc-next Alexander writes: stm class/intel_th: Updates for 4.17 These are: * Mass conversion to GPL-2 SPDX header * Moved "hwtracing" to now its own submenu, to uncrowd the parent menu a bit * Added MAINTAINERS entry for drivers/hwtracing * Somewhat small Trace Hub fixes * Added ACPI glue layer for the Trace Hub * Added more module parameters to dummy_stm for better test coverage
2018-03-29Merge tag 'irqchip-4.17' of ↵Thomas Gleixner
git://git.kernel.org/pub/scm/linux/kernel/git/maz/arm-platforms into irq/core Pull irqchip updates for 4.17 from Marc Zyngier: - New Qualcomm PDC irqchip - New Microsemi Ocelot irqchip - Suspend/resume support for some oddball GICv3 irqchip - Better GIC/GICv3 support for kexec - Various cleanups and fixes
2018-03-29PM: cpuidle/suspend: Add s2idle usage and time state attributesRafael J. Wysocki
Add a new attribute group called "s2idle" under the sysfs directory of each cpuidle state that supports the ->enter_s2idle callback and put two new attributes, "usage" and "time", into that group to represent the number of times the given state was requested for suspend-to-idle and the total time spent in suspend-to-idle after requesting that state, respectively. That will allow diagnostic information related to suspend-to-idle to be collected without enabling advanced debug features and analyzing dmesg output. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2018-03-28bpf: add parenthesis around argument of BPF_LDST_BYTES()Jakub Kicinski
BPF_LDST_BYTES() does not put it's argument in parenthesis when referencing it. This makes it impossible to pass pointers obtained by address-of operator (e.g. BPF_LDST_BYTES(&insn)). Add the parenthesis. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-03-28bpf: introduce BPF_RAW_TRACEPOINTAlexei Starovoitov
Introduce BPF_PROG_TYPE_RAW_TRACEPOINT bpf program type to access kernel internal arguments of the tracepoints in their raw form. >From bpf program point of view the access to the arguments look like: struct bpf_raw_tracepoint_args { __u64 args[0]; }; int bpf_prog(struct bpf_raw_tracepoint_args *ctx) { // program can read args[N] where N depends on tracepoint // and statically verified at program load+attach time } kprobe+bpf infrastructure allows programs access function arguments. This feature allows programs access raw tracepoint arguments. Similar to proposed 'dynamic ftrace events' there are no abi guarantees to what the tracepoints arguments are and what their meaning is. The program needs to type cast args properly and use bpf_probe_read() helper to access struct fields when argument is a pointer. For every tracepoint __bpf_trace_##call function is prepared. In assembler it looks like: (gdb) disassemble __bpf_trace_xdp_exception Dump of assembler code for function __bpf_trace_xdp_exception: 0xffffffff81132080 <+0>: mov %ecx,%ecx 0xffffffff81132082 <+2>: jmpq 0xffffffff811231f0 <bpf_trace_run3> where TRACE_EVENT(xdp_exception, TP_PROTO(const struct net_device *dev, const struct bpf_prog *xdp, u32 act), The above assembler snippet is casting 32-bit 'act' field into 'u64' to pass into bpf_trace_run3(), while 'dev' and 'xdp' args are passed as-is. All of ~500 of __bpf_trace_*() functions are only 5-10 byte long and in total this approach adds 7k bytes to .text. This approach gives the lowest possible overhead while calling trace_xdp_exception() from kernel C code and transitioning into bpf land. Since tracepoint+bpf are used at speeds of 1M+ events per second this is valuable optimization. The new BPF_RAW_TRACEPOINT_OPEN sys_bpf command is introduced that returns anon_inode FD of 'bpf-raw-tracepoint' object. The user space looks like: // load bpf prog with BPF_PROG_TYPE_RAW_TRACEPOINT type prog_fd = bpf_prog_load(...); // receive anon_inode fd for given bpf_raw_tracepoint with prog attached raw_tp_fd = bpf_raw_tracepoint_open("xdp_exception", prog_fd); Ctrl-C of tracing daemon or cmdline tool that uses this feature will automatically detach bpf program, unload it and unregister tracepoint probe. On the kernel side the __bpf_raw_tp_map section of pointers to tracepoint definition and to __bpf_trace_*() probe function is used to find a tracepoint with "xdp_exception" name and corresponding __bpf_trace_xdp_exception() probe function which are passed to tracepoint_probe_register() to connect probe with tracepoint. Addition of bpf_raw_tracepoint doesn't interfere with ftrace and perf tracepoint mechanisms. perf_event_open() can be used in parallel on the same tracepoint. Multiple bpf_raw_tracepoint_open("xdp_exception", prog_fd) are permitted. Each with its own bpf program. The kernel will execute all tracepoint probes and all attached bpf programs. In the future bpf_raw_tracepoints can be extended with query/introspection logic. __bpf_raw_tp_map section logic was contributed by Steven Rostedt Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Acked-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-03-28macro: introduce COUNT_ARGS() macroAlexei Starovoitov
move COUNT_ARGS() macro from apparmor to generic header and extend it to count till twelve. COUNT() was an alternative name for this logic, but it's used for different purpose in many other places. Similarly for CONCATENATE() macro. Suggested-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-03-28x86/hyper-v: move hyperv.h out of uapiVitaly Kuznetsov
hyperv.h is not part of uapi, there are no (known) users outside of kernel. We are making changes to this file to match current Hyper-V Hypervisor Top-Level Functional Specification (TLFS, see: https://docs.microsoft.com/en-us/virtualization/hyper-v-on-windows/reference/tlfs) and we don't want to maintain backwards compatibility. Move the file renaming to hyperv-tlfs.h to avoid confusing it with mshyperv.h. In future, all definitions from TLFS should go to it and all kernel objects should go to mshyperv.h or include/linux/hyperv.h. Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Acked-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2018-03-28ipc/shm: fix up for struct file no longer being available in shm.hStephen Rothwell
Stephen Rothewell <sfr@canb.auug.org.au> wrote: > After merging the userns tree, today's linux-next build (powerpc > ppc64_defconfig) produced this warning: > > In file included from include/linux/sched.h:16:0, > from arch/powerpc/lib/xor_vmx_glue.c:14: > include/linux/shm.h:17:35: error: 'struct file' declared inside parameter list will not be visible outside of this definition or declaration [-Werror] > bool is_file_shm_hugepages(struct file *file); > ^~~~ > > and many, many more (most warnings, but some errors - arch/powerpc is > mostly built with -Werror) I dug through this and I discovered that the error was caused by the removal of struct shmid_kernel from shm.h when building on powerpc. Except for observing the existence of "struct file *shm_file" in struct shmid_kernel I have no clue why the structure move would cause such a failure. I suspect shm.h always needed the forward declaration and someting had been confusing gcc into not issuing the warning. --EWB Fixes: a2e102cd3cdd ("shm: Move struct shmid_kernel into ipc/shm.c") Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
2018-03-28stm class: Add SPDX GPL-2.0 header to replace GPLv2 boilerplateAlexander Shishkin
This adds SPDX GPL-2.0 header to to stm core files and removes the GPLv2 boilerplate text. Signed-off-by: Alexander Shishkin <alexander.shishkin@linux.intel.com>
2018-03-28dma-mapping: Don't clear GFP_ZERO in dma_alloc_attrsChristoph Hellwig
Revert the clearing of __GFP_ZERO in dma_alloc_attrs and move it to dma_direct_alloc for now. While most common architectures always zero dma cohereny allocations (and x86 did so since day one) this is not documented and at least arc and s390 do not zero without the explicit __GFP_ZERO argument. Fixes: 57bf5a8963f8 ("dma-mapping: clear harmful GFP_* flags in common code") Reported-by: Evgeniy Didin <Evgeniy.Didin@synopsys.com> Reported-by: Sebastian Ott <sebott@linux.vnet.ibm.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Evgeniy Didin <Evgeniy.Didin@synopsys.com> Cc: iommu@lists.linux-foundation.org Link: https://lkml.kernel.org/r/20180328133535.17302-2-hch@lst.de
2018-03-28Merge tag 'reset-for-4.17-2' of git://git.pengutronix.de/git/pza/linux into ↵Arnd Bergmann
next/drivers Pull "Reset controller changes for v4.17, part 2" from Philipp Zabel: This tag contains reset lookup support, similar to pwm lookups, for legacy non-DT platforms, a few new reset controls and a Kconfig fix for uniphier SoCs, as well as a new driver for the STM32MP1 peripheral reset controller. The reset lookups are merged from a separate, immutable branch, that may also be merged into the davinci tree. * tag 'reset-for-4.17-2' of git://git.pengutronix.de/git/pza/linux: reset: uniphier: add ethernet reset control support for PXs3 reset: stm32mp1: Enable stm32mp1 reset driver dt-bindings: reset: add STM32MP1 resets reset: uniphier: add Pro4/Pro5/PXs2 audio systems reset control reset: imx7: add 'depends on HAS_IOMEM' to fix unmet dependency reset: modify the way reset lookup works for board files reset: add support for non-DT systems
2018-03-28Merge tag 'kvm-arm-for-v4.17' of ↵Radim Krčmář
git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm KVM/ARM updates for v4.17 - VHE optimizations - EL2 address space randomization - Variant 3a mitigation for Cortex-A57 and A72 - The usual vgic fixes - Various minor tidying-up
2018-03-28Merge 4.16-rc7 into staging-nextGreg Kroah-Hartman
We want the IIO and staging driver fixes in here as well. Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-03-28Merge 4.16-rc7 into char-misc-nextGreg Kroah-Hartman
We want the hyperv fix in here for merging and testing. Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-03-28fs: move I_DIRTY_INODE to fs.hChristoph Hellwig
And use it in a few more places rather than opencoding the values. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2018-03-28Backmerge tag 'v4.16-rc7' into drm-nextDave Airlie
Linux 4.16-rc7 This was requested by Daniel, and things were getting a bit hard to reconcile, most of the conflicts were trivial though.
2018-03-27blk-mq: Allow PCI vector offset for mapping queuesKeith Busch
The PCI interrupt vectors intended to be associated with a queue may not start at 0; a driver may allocate pre_vectors for special use. This patch adds an offset parameter so blk-mq may find the intended affinity mask and updates all drivers using this API accordingly. Cc: Don Brace <don.brace@microsemi.com> Cc: <qla2xxx-upstream@qlogic.com> Cc: <linux-scsi@vger.kernel.org> Signed-off-by: Keith Busch <keith.busch@intel.com> Reviewed-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>