summaryrefslogtreecommitdiff
path: root/include
AgeCommit message (Collapse)Author
2016-12-04netfilter: nf_conntrack_tuple_common.h: fix #includeDavide Caratti
To allow usage of enum ip_conntrack_dir in include/net/netns/conntrack.h, this patch encloses #include <linux/netfilter.h> in a #ifndef __KERNEL__ directive, so that compiler errors caused by unwanted inclusion of include/linux/netfilter.h are avoided. In addition, #include <linux/netfilter/nf_conntrack_common.h> line has been added to resolve correctly CTINFO2DIR macro. Signed-off-by: Davide Caratti <dcaratti@redhat.com> Acked-by: Mikko Rapeli <mikko.rapeli@iki.fi> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2016-12-04netfilter: nf_log: do not assume ethernet header in netdev familyLiping Zhang
In netdev family, we will handle non ethernet packets, so using eth_hdr(skb)->h_proto is incorrect. Meanwhile, we can use socket(AF_PACKET...) to sending packets, so skb->protocol is not always set in bridge family. Add an extra parameter into nf_log_l2packet to solve this issue. Fixes: 1fddf4bad0ac ("netfilter: nf_log: add packet logging for netdev family") Signed-off-by: Liping Zhang <zlpnobody@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2016-12-04netfilter: built-in NAT support for UDPliteDavide Caratti
CONFIG_NF_NAT_PROTO_UDPLITE is no more a tristate. When set to y, NAT support for UDPlite protocol is built-in into nf_nat.ko. footprint test: (nf_nat_proto_) |udplite || nf_nat --------------------------+--------++-------- no builtin | 408048 || 2241312 UDPLITE builtin | - || 2577256 Signed-off-by: Davide Caratti <dcaratti@redhat.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2016-12-04netfilter: built-in NAT support for SCTPDavide Caratti
CONFIG_NF_NAT_PROTO_SCTP is no more a tristate. When set to y, NAT support for SCTP protocol is built-in into nf_nat.ko. footprint test: (nf_nat_proto_) | sctp || nf_nat --------------------------+--------++-------- no builtin | 428344 || 2241312 SCTP builtin | - || 2597032 Signed-off-by: Davide Caratti <dcaratti@redhat.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2016-12-04netfilter: built-in NAT support for DCCPDavide Caratti
CONFIG_NF_NAT_PROTO_DCCP is no more a tristate. When set to y, NAT support for DCCP protocol is built-in into nf_nat.ko. footprint test: (nf_nat_proto_) | dccp || nf_nat --------------------------+--------++-------- no builtin | 409800 || 2241312 DCCP builtin | - || 2578968 Signed-off-by: Davide Caratti <dcaratti@redhat.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2016-12-03ipv6 addrconf: Implemented enhanced DAD (RFC7527)Erik Nordmark
Implemented RFC7527 Enhanced DAD. IPv6 duplicate address detection can fail if there is some temporary loopback of Ethernet frames. RFC7527 solves this by including a random nonce in the NS messages used for DAD, and if an NS is received with the same nonce it is assumed to be a looped back DAD probe and is ignored. RFC7527 is enabled by default. Can be disabled by setting both of conf/{all,interface}/enhanced_dad to zero. Signed-off-by: Erik Nordmark <nordmark@arista.com> Signed-off-by: Bob Gilligan <gilligan@arista.com> Reviewed-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-03ipv4: fib: Replay events when registering FIB notifierIdo Schimmel
Commit b90eb7549499 ("fib: introduce FIB notification infrastructure") introduced a new notification chain to notify listeners (f.e., switchdev drivers) about addition and deletion of routes. However, upon registration to the chain the FIB tables can already be populated, which means potential listeners will have an incomplete view of the tables. Solve that by dumping the FIB tables and replaying the events to the passed notification block. The dump itself is done using RCU in order not to starve consumers that need RTNL to make progress. The integrity of the dump is ensured by reading the FIB change sequence counter before and after the dump under RTNL. This allows us to avoid the problematic situation in which the dumping process sends a ENTRY_ADD notification following ENTRY_DEL generated by another process holding RTNL. Callers of the registration function may pass a callback that is executed in case the dump was inconsistent with current FIB tables. The number of retries until a consistent dump is achieved is set to a fixed number to prevent callers from looping for long periods of time. In case current limit proves to be problematic in the future, it can be easily converted to be configurable using a sysctl. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-03ipv4: fib: Allow for consistent FIB dumpingIdo Schimmel
The next patch will enable listeners of the FIB notification chain to request a dump of the FIB tables. However, since RTNL isn't taken during the dump, it's possible for the FIB tables to change mid-dump, which will result in inconsistency between the listener's table and the kernel's. Allow listeners to know about changes that occurred mid-dump, by adding a change sequence counter to each net namespace. The counter is incremented just before a notification is sent in the FIB chain. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-03ipv4: fib: Add fib_info_hold() helperIdo Schimmel
As explained in the previous commit, modules are going to need to take a reference on fib info and then drop it using fib_info_put(). Add the fib_info_hold() helper to make the code more readable and also symmetric with fib_info_put(). Signed-off-by: Ido Schimmel <idosch@mellanox.com> Suggested-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-03netns: fix net_generic() "id - 1" bloatAlexey Dobriyan
net_generic() function is both a) inline and b) used ~600 times. It has the following code inside ... ptr = ng->ptr[id - 1]; ... "id" is never compile time constant so compiler is forced to subtract 1. And those decrements or LEA [r32 - 1] instructions add up. We also start id'ing from 1 to catch bugs where pernet sybsystem id is not initialized and 0. This is quite pointless idea (nothing will work or immediate interference with first registered subsystem) in general but it hints what needs to be done for code size reduction. Namely, overlaying allocation of pointer array and fixed part of structure in the beginning and using usual base-0 addressing. Ids are just cookies, their exact values do not matter, so lets start with 3 on x86_64. Code size savings (oh boy): -4.2 KB As usual, ignore the initial compiler stupidity part of the table. add/remove: 0/0 grow/shrink: 12/670 up/down: 89/-4297 (-4208) function old new delta tipc_nametbl_insert_publ 1250 1270 +20 nlmclnt_lookup_host 686 703 +17 nfsd4_encode_fattr 5930 5941 +11 nfs_get_client 1050 1061 +11 register_pernet_operations 333 342 +9 tcf_mirred_init 843 849 +6 tcf_bpf_init 1143 1149 +6 gss_setup_upcall 990 994 +4 idmap_name_to_id 432 434 +2 ops_init 274 275 +1 nfsd_inject_forget_client 259 260 +1 nfs4_alloc_client 612 613 +1 tunnel_key_walker 164 163 -1 ... tipc_bcbase_select_primary 392 360 -32 mac80211_hwsim_new_radio 2808 2767 -41 ipip6_tunnel_ioctl 2228 2186 -42 tipc_bcast_rcv 715 672 -43 tipc_link_build_proto_msg 1140 1089 -51 nfsd4_lock 3851 3796 -55 tipc_mon_rcv 1012 956 -56 Total: Before=156643951, After=156639743, chg -0.00% Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-03netns: add dummy struct inside "struct net_generic"Alexey Dobriyan
This is precursor to fixing "[id - 1]" bloat inside net_generic(). Name "s" is chosen to complement name "u" often used for dummy unions. Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-03netlink: 2-clause nla_ok()Alexey Dobriyan
nla_ok() consists of 3 clauses: 1) int rem >= (int)sizeof(struct nlattr) 2) u16 nla_len >= sizeof(struct nlattr) 3) u16 nla_len <= int rem The statement is that clause (1) is redundant. What it does is ensuring that "rem" is a positive number, so that in clause (3) positive number will be compared to positive number with no problems. However, "u16" fully fits into "int" and integers do not change value when upcasting even to signed type. Negative integers will be rejected by clause (3) just fine. Small positive integers will be rejected by transitivity of comparison operator. NOTE: all of the above DOES NOT apply to nlmsg_ok() where ->nlmsg_len is u32(!), so 3 clauses AND A CAST TO INT are necessary. Obligatory space savings report: -1.6 KB $ ./scripts/bloat-o-meter ../vmlinux-000* ../vmlinux-001* add/remove: 0/0 grow/shrink: 3/63 up/down: 35/-1692 (-1657) function old new delta validate_scan_freqs 142 155 +13 tcf_em_tree_validate 867 879 +12 dcbnl_ieee_del 328 338 +10 netlbl_cipsov4_add_common.isra 218 215 -3 ... ovs_nla_put_actions 888 806 -82 netlbl_cipsov4_add_std 1648 1566 -82 nl80211_parse_sched_scan 2889 2780 -109 ip_tun_from_nlattr 3086 2945 -141 Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-03Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller
Couple conflicts resolved here: 1) In the MACB driver, a bug fix to properly initialize the RX tail pointer properly overlapped with some changes to support variable sized rings. 2) In XGBE we had a "CONFIG_PM" --> "CONFIG_PM_SLEEP" fix overlapping with a reorganization of the driver to support ACPI, OF, as well as PCI variants of the chip. 3) In 'net' we had several probe error path bug fixes to the stmmac driver, meanwhile a lot of this code was cleaned up and reorganized in 'net-next'. 4) The cls_flower classifier obtained a helper function in 'net-next' called __fl_delete() and this overlapped with Daniel Borkamann's bug fix to use RCU for object destruction in 'net'. It also overlapped with Jiri's change to guard the rhashtable_remove_fast() call with a check against tc_skip_sw(). 5) In mlx4, a revert bug fix in 'net' overlapped with some unrelated changes in 'net-next'. 6) In geneve, a stale header pointer after pskb_expand_head() bug fix in 'net' overlapped with a large reorganization of the same code in 'net-next'. Since the 'net-next' code no longer had the bug in question, there was nothing to do other than to simply take the 'net-next' hunks. Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-02arm/arm64: xen: Move shared architecture headers to include/xen/armMarc Zyngier
ARM and arm64 Xen ports share a number of headers, leading to packaging issues when these headers needs to be exported, as it breaks the reasonable requirement that an architecture port has self-contained headers. Fix the issue by moving the 5 header files to include/xen/arm, and keep local placeholders to include the relevant files. Signed-off-by: Marc Zyngier <marc.zyngier@arm.com> Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
2016-12-02Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netLinus Torvalds
Pull networking fixes from David Miller: 1) Lots more phydev and probe error path leaks in various drivers by Johan Hovold. 2) Fix race in packet_set_ring(), from Philip Pettersson. 3) Use after free in dccp_invalid_packet(), from Eric Dumazet. 4) Signnedness overflow in SO_{SND,RCV}BUFFORCE, also from Eric Dumazet. 5) When tunneling between ipv4 and ipv6 we can be left with the wrong skb->protocol value as we enter the IPSEC engine and this causes all kinds of problems. Set it before the output path does any dst_output() calls, from Eli Cooper. 6) bcmgenet uses wrong device struct pointer in DMA API calls, fix from Florian Fainelli. 7) Various netfilter nat bug fixes from FLorian Westphal. 8) Fix memory leak in ipvlan_link_new(), from Gao Feng. 9) Locking fixes, particularly wrt. socket lookups, in l2tp from Guillaume Nault. 10) Avoid invoking rhash teardowns in atomic context by moving netlink cb->done() dump completion from a worker thread. Fix from Herbert Xu. 11) Buffer refcount problems in tun and macvtap on errors, from Jason Wang. 12) We don't set Kconfig symbol DEFAULT_TCP_CONG properly when the user selects BBR. Fix from Julian Wollrath. 13) Fix deadlock in transmit path on altera TSE driver, from Lino Sanfilippo. 14) Fix unbalanced reference counting in dsa_switch_tree, from Nikita Yushchenko. 15) tc_tunnel_key needs to be properly exported to userspace via uapi, fix from Roi Dayan. 16) rds_tcp_init_net() doesn't unregister notifier in error path, fix from Sowmini Varadhan. 17) Stale packet header pointer access after pskb_expand_head() in genenve driver, fix from Sabrina Dubroca. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (103 commits) net: avoid signed overflows for SO_{SND|RCV}BUFFORCE geneve: avoid use-after-free of skb->data tipc: check minimum bearer MTU net: renesas: ravb: unintialized return value sh_eth: remove unchecked interrupts for RZ/A1 net: bcmgenet: Utilize correct struct device for all DMA operations NET: usb: qmi_wwan: add support for Telit LE922A PID 0x1040 cdc_ether: Fix handling connection notification ip6_offload: check segs for NULL in ipv6_gso_segment. RDS: TCP: unregister_netdevice_notifier() in error path of rds_tcp_init_net Revert: "ip6_tunnel: Update skb->protocol to ETH_P_IPV6 in ip6_tnl_xmit()" ipv6: Set skb->protocol properly for local output ipv4: Set skb->protocol properly for local output packet: fix race condition in packet_set_ring net: ethernet: altera: TSE: do not use tx queue lock in tx completion handler net: ethernet: altera: TSE: Remove unneeded dma sync for tx buffers net: ethernet: stmmac: fix of-node and fixed-link-phydev leaks net: ethernet: stmmac: platform: fix outdated function header net: ethernet: stmmac: dwmac-meson8b: fix probe error path net: ethernet: stmmac: dwmac-generic: fix probe error path ...
2016-12-02bpf: Add support for reading socket family, type, protocolDavid Ahern
Add socket family, type and protocol to bpf_sock allowing bpf programs read-only access. Add __sk_flags_offset[0] to struct sock before the bitfield to programmtically determine the offset of the unsigned int containing protocol and type. Signed-off-by: David Ahern <dsa@cumulusnetworks.com> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-02bpf: Add new cgroup attach type to enable sock modificationsDavid Ahern
Add new cgroup based program type, BPF_PROG_TYPE_CGROUP_SOCK. Similar to BPF_PROG_TYPE_CGROUP_SKB programs can be attached to a cgroup and run any time a process in the cgroup opens an AF_INET or AF_INET6 socket. Currently only sk_bound_dev_if is exported to userspace for modification by a bpf program. This allows a cgroup to be configured such that AF_INET{6} sockets opened by processes are automatically bound to a specific device. In turn, this enables the running of programs that do not support SO_BINDTODEVICE in a specific VRF context / L3 domain. Signed-off-by: David Ahern <dsa@cumulusnetworks.com> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-02bpf: Refactor cgroups code in prep for new typeDavid Ahern
Code move and rename only; no functional change intended. Signed-off-by: David Ahern <dsa@cumulusnetworks.com> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-02net/sched: cls_flower: Add offload support using egress Hardware deviceHadar Hen Zion
In order to support hardware offloading when the device given by the tc rule is different from the Hardware underline device, extract the mirred (egress) device from the tc action when a filter is added, using the new tc_action_ops, get_dev(). Flower caches the information about the mirred device and use it for calling ndo_setup_tc in filter change, update stats and delete. Calling ndo_setup_tc of the mirred (egress) device instead of the ingress device will allow a resolution between the software ingress device and the underline hardware device. The resolution will take place inside the offloading driver using 'egress_device' flag added to tc_to_netdev struct which is provided to the offloading driver. Signed-off-by: Hadar Hen Zion <hadarh@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-02net/sched: act_mirred: Add new tc_action_ops get_dev()Hadar Hen Zion
Adding support to a new tc_action_ops. get_dev is a general option which allows to get the underline device when trying to offload a tc rule. In case of mirred action the returned device is the mirred (egress) device. Signed-off-by: Hadar Hen Zion <hadarh@mellanox.com> Reviewed-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-02net/sched: Add separate check for skip_hw flagHadar Hen Zion
Creating a difference between two possible cases: 1. Not offloading tc rule since the user sets 'skip_hw' flag. 2. Not offloading tc rule since the device doesn't support offloading. This patch doesn't add any new functionality. Signed-off-by: Hadar Hen Zion <hadarh@mellanox.com> Reviewed-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-02tcp: randomize tcp timestamp offsets for each connectionFlorian Westphal
jiffies based timestamps allow for easy inference of number of devices behind NAT translators and also makes tracking of hosts simpler. commit ceaa1fef65a7c2e ("tcp: adding a per-socket timestamp offset") added the main infrastructure that is needed for per-connection ts randomization, in particular writing/reading the on-wire tcp header format takes the offset into account so rest of stack can use normal tcp_time_stamp (jiffies). So only two items are left: - add a tsoffset for request sockets - extend the tcp isn generator to also return another 32bit number in addition to the ISN. Re-use of ISN generator also means timestamps are still monotonically increasing for same connection quadruple, i.e. PAWS will still work. Includes fixes from Eric Dumazet. Signed-off-by: Florian Westphal <fw@strlen.de> Acked-by: Eric Dumazet <edumazet@google.com> Acked-by: Yuchung Cheng <ycheng@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-02qed: Add support for hardware offloaded iSCSI.Yuval Mintz
This adds the backbone required for the various HW initalizations which are necessary for the iSCSI driver (qedi) for QLogic FastLinQ 4xxxx line of adapters - FW notification, resource initializations, etc. Signed-off-by: Arun Easi <arun.easi@cavium.com> Signed-off-by: Yuval Mintz <yuval.mintz@cavium.com> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Reviewed-by: Hannes Reinecke <hare@suse.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-02default exported asm symbols to zeroArnd Bergmann
With binutils-2.26 and before, a weak missing symbol was kept during the final link, and a missing CRC for an export would lead to that CRC being treated as zero implicitly. With binutils-2.27, the crc symbol gets dropped, and any module trying to use it will fail to load. This sets the weak CRC symbol to zero explicitly, making it defined in vmlinux, which in turn lets us load the modules referring to that CRC. The comment above the __CRC_SYMBOL macro suggests that this was always the intention, although it also seems that all symbols defined in C have a correct CRC these days, and only the exports that are now done in assembly need this. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Tested-by: Adam Borowski <kilobyte@angband.pl> Cc: stable@kernel.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-12-02bpf, xdp: drop rcu_read_lock from bpf_prog_run_xdp and move to callerDaniel Borkmann
After 326fe02d1ed6 ("net/mlx4_en: protect ring->xdp_prog with rcu_read_lock"), the rcu_read_lock() in bpf_prog_run_xdp() is superfluous, since callers need to hold rcu_read_lock() already to make sure BPF program doesn't get released in the background. Thus, drop it from bpf_prog_run_xdp(), as it can otherwise be misleading. Still keeping the bpf_prog_run_xdp() is useful as it allows for grepping in XDP supported drivers and to keep the typecheck on the context intact. For mlx4, this means we don't have a double rcu_read_lock() anymore. nfp can just make use of bpf_prog_run_xdp(), too. For qede, just move rcu_read_lock() out of the helper. When the driver gets atomic replace support, this will move to call-sites eventually. mlx5 needs actual fixing as it has the same issue as described already in 326fe02d1ed6 ("net/mlx4_en: protect ring->xdp_prog with rcu_read_lock"), that is, we're under RCU bh at this time, BPF programs are released via call_rcu(), and call_rcu() != call_rcu_bh(), so we need to properly mark read side as programs can get xchg()'ed in mlx5e_xdp_set() without queue reset. Fixes: 86994156c736 ("net/mlx5e: XDP fast RX drop bpf programs support") Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-02bpf: BPF for lightweight tunnel infrastructureThomas Graf
Registers new BPF program types which correspond to the LWT hooks: - BPF_PROG_TYPE_LWT_IN => dst_input() - BPF_PROG_TYPE_LWT_OUT => dst_output() - BPF_PROG_TYPE_LWT_XMIT => lwtunnel_xmit() The separate program types are required to differentiate between the capabilities each LWT hook allows: * Programs attached to dst_input() or dst_output() are restricted and may only read the data of an skb. This prevent modification and possible invalidation of already validated packet headers on receive and the construction of illegal headers while the IP headers are still being assembled. * Programs attached to lwtunnel_xmit() are allowed to modify packet content as well as prepending an L2 header via a newly introduced helper bpf_skb_change_head(). This is safe as lwtunnel_xmit() is invoked after the IP header has been assembled completely. All BPF programs receive an skb with L3 headers attached and may return one of the following error codes: BPF_OK - Continue routing as per nexthop BPF_DROP - Drop skb and return EPERM BPF_REDIRECT - Redirect skb to device as per redirect() helper. (Only valid in lwtunnel_xmit() context) The return codes are binary compatible with their TC_ACT_ relatives to ease compatibility. Signed-off-by: Thomas Graf <tgraf@suug.ch> Acked-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-02net/mlx5e: Implement Fragmented Work Queue (WQ)Tariq Toukan
Add new type of struct mlx5_frag_buf which is used to allocate fragmented buffers rather than contiguous, and make the Completion Queues (CQs) use it as they are big (default of 2MB per CQ in Striding RQ). This fixes the failures of type: "mlx5e_open_locked: mlx5e_open_channels failed, -12" due to dma_zalloc_coherent insufficient contiguous coherent memory to satisfy the driver's request when the user tries to setup more or larger rings. Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Reported-by: Sebastian Ott <sebott@linux.vnet.ibm.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-02Merge branch 'locking/urgent' into locking/core, to pick up dependent fixesIngo Molnar
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-12-01Merge tag 'pci-v4.9-fixes-4' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci Pull PCI fixes from Bjorn Helgaas: "PCI fixes: - Fix Read Completion Boundary setting, which fixes a boot failure on IBM x3850 with Mellanox MT27500 ConnectX-3 - Update some MAINTAINERS entries and email addresses" * tag 'pci-v4.9-fixes-4' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci: PCI: Set Read Completion Boundary to 128 iff Root Port supports it (_HPX) PCI: Export pcie_find_root_port PCI: designware-plat: Update author email PCI: designware: Change maintainer to Joao Pinto MAINTAINERS: Add devicetree binding to PCI i.MX6 entry MAINTAINERS: Update Richard Zhu's email address
2016-12-02zram: Convert to hotplug state machineAnna-Maria Gleixner
Install the callbacks via the state machine with multi instance support and let the core invoke the callbacks on the already online CPUs. [bigeasy: wire up the multi instance stuff] Signed-off-by: Anna-Maria Gleixner <anna-maria@linutronix.de> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com> Cc: Minchan Kim <minchan@kernel.org> Cc: rt@linutronix.de Cc: Nitin Gupta <ngupta@vflare.org> Link: http://lkml.kernel.org/r/20161126231350.10321-19-bigeasy@linutronix.de Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2016-12-02KVM/PPC/Book3S HV: Convert to hotplug state machineAnna-Maria Gleixner
Install the callbacks via the state machine. Signed-off-by: Anna-Maria Gleixner <anna-maria@linutronix.de> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Cc: kvm@vger.kernel.org Cc: Radim Krčmář <rkrcmar@redhat.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: linuxppc-dev@lists.ozlabs.org Cc: kvm-ppc@vger.kernel.org Cc: Paul Mackerras <paulus@samba.org> Cc: rt@linutronix.de Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Alexander Graf <agraf@suse.com> Link: http://lkml.kernel.org/r/20161126231350.10321-18-bigeasy@linutronix.de Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2016-12-02iommu/vt-d: Convert to hotplug state machineAnna-Maria Gleixner
Install the callbacks via the state machine. Signed-off-by: Anna-Maria Gleixner <anna-maria@linutronix.de> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Cc: Joerg Roedel <joro@8bytes.org> Cc: iommu@lists.linux-foundation.org Cc: rt@linutronix.de Cc: David Woodhouse <dwmw2@infradead.org> Link: http://lkml.kernel.org/r/20161126231350.10321-14-bigeasy@linutronix.de Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2016-12-02mm/zswap: Convert pool to hotplug state machineSebastian Andrzej Siewior
Install the callbacks via the state machine. Multi state is used to address the per-pool notifier. Uppon adding of the intance the callback is invoked for all online CPUs so the manual init can go. Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Cc: linux-mm@kvack.org Cc: Seth Jennings <sjenning@redhat.com> Cc: rt@linutronix.de Link: http://lkml.kernel.org/r/20161126231350.10321-13-bigeasy@linutronix.de Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2016-12-02mm/zswap: Convert dst-mem to hotplug state machineSebastian Andrzej Siewior
Install the callbacks via the state machine and let the core invoke the callbacks on the already online CPUs. Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Cc: linux-mm@kvack.org Cc: Seth Jennings <sjenning@redhat.com> Cc: rt@linutronix.de Link: http://lkml.kernel.org/r/20161126231350.10321-12-bigeasy@linutronix.de Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2016-12-02mm/zsmalloc: Convert to hotplug state machineSebastian Andrzej Siewior
Install the callbacks via the state machine and let the core invoke the callbacks on the already online CPUs. Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Cc: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com> Cc: linux-mm@kvack.org Cc: Minchan Kim <minchan@kernel.org> Cc: rt@linutronix.de Cc: Nitin Gupta <ngupta@vflare.org> Link: http://lkml.kernel.org/r/20161126231350.10321-11-bigeasy@linutronix.de Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2016-12-02mm/vmstat: Convert to hotplug state machineSebastian Andrzej Siewior
Install the callbacks via the state machine, but do not invoke them as we can initialize the node state without calling the callbacks on all online CPUs. start_shepherd_timer() is now called outside the get_online_cpus() block which is safe as it only operates on cpu possible mask. Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: linux-mm@kvack.org Cc: rt@linutronix.de Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Mel Gorman <mgorman@techsingularity.net> Cc: Vlastimil Babka <vbabka@suse.cz> Link: http://lkml.kernel.org/r/20161129145221.ffc3kg3hd7lxiwj6@linutronix.de Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2016-12-02tracing/rb: Convert to hotplug state machineSebastian Andrzej Siewior
Install the callbacks via the state machine. The notifier in struct ring_buffer is replaced by the multi instance interface. Upon __ring_buffer_alloc() invocation, cpuhp_state_add_instance() will invoke the trace_rb_cpu_prepare() on each CPU. This callback may now fail. This means __ring_buffer_alloc() will fail and cleanup (like previously) and during a CPU up event this failure will not allow the CPU to come up. Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: rt@linutronix.de Link: http://lkml.kernel.org/r/20161126231350.10321-7-bigeasy@linutronix.de Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2016-12-01Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input Pull input fixes from Dmitry Torokhov: "We are disabling automatic probing of BYD touchpads as it results in too many false positives, and the hardware is not terribly popular and having the protocol support does not result in significantly improved user experience. We also change keycode for KEY_DATA to avoid clashing with KEY_FASTREVERSE. Luckily this newish code is used by CEC framework that is still in staging, so it is extremely unlikely that someone has already started using this keycode" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input: Input: change KEY_DATA from 0x275 to 0x277 Input: psmouse - disable automatic probing of BYD touchpads
2016-12-01vfio: support notifier chain in vfio_groupJike Song
Beyond vfio_iommu events, users might also be interested in vfio_group events. For example, if a vfio_group is used along with Qemu/KVM, whenever kvm pointer is set to/cleared from the vfio_group, users could be notified. Currently only VFIO_GROUP_NOTIFY_SET_KVM supported. Cc: Kirti Wankhede <kwankhede@nvidia.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Xiao Guangrong <guangrong.xiao@linux.intel.com> Signed-off-by: Jike Song <jike.song@intel.com> [aw: remove use of new typedef] Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2016-12-01vfio: vfio_register_notifier: classify iommu notifierJike Song
Currently vfio_register_notifier assumes that there is only one notifier chain, which is in vfio_iommu. However, the user might also be interested in events other than vfio_iommu, for example, vfio_group. Refactor vfio_{un}register_notifier implementation to make it feasible. Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Xiao Guangrong <guangrong.xiao@linux.intel.com> Reviewed-by: Kirti Wankhede <kwankhede@nvidia.com> Signed-off-by: Jike Song <jike.song@intel.com> [aw: merge with commit 816ca69ea9c7 ("vfio: Fix handling of error returned by 'vfio_group_get_from_dev()'"), remove typedef] Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2016-12-01net: phy: add mdix_ctrl to hold the user configuration.Raju Lakkaraju
Add new parameter mdix_ctrl to hold the user configuration. Existing mdix maintain the current status of MDI(X) crossover performed or not. mdix_ctrl can configure either ETH_TP_MDI or ETH_TP_MDI_X orETH_TP_MDI_AUTO. Signed-off-by: Raju Lakkaraju <Raju.Lakkaraju@microsemi.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-01Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nfDavid S. Miller
Pablo Neira Ayuso says: ==================== Netfilter fixes for net This is a large batch of Netfilter fixes for net, they are: 1) Three patches to fix NAT conversion to rhashtable: Switch to rhlist structure that allows to have several objects with the same key. Moreover, fix wrong comparison logic in nf_nat_bysource_cmp() as this is expecting a return value similar to memcmp(). Change location of the nat_bysource field in the nf_conn structure to avoid zeroing this as it breaks interaction with SLAB_DESTROY_BY_RCU and lead us to crashes. From Florian Westphal. 2) Don't allow malformed fragments go through in IPv6, drop them, otherwise we hit GPF, patch from Florian Westphal. 3) Fix crash if attributes are missing in nft_range, from Liping Zhang. 4) Fix arptables 32-bits userspace 64-bits kernel compat, from Hongxu Jia. 5) Two patches from David Ahern to fix netfilter interaction with vrf. From David Ahern. 6) Fix element timeout calculation in nf_tables, we take milliseconds from userspace, but we use jiffies from kernelspace. Patch from Anders K. Pedersen. 7) Missing validation length netlink attribute for nft_hash, from Laura Garcia. 8) Fix nf_conntrack_helper documentation, we don't default to off anymore for a bit of time so let's get this in sync with the code. I know is late but I think these are important, specifically the NAT bits, as they are mostly addressing fallout from recent changes. I also read there are chances to have -rc8, if that is the case, that would also give us a bit more time to test this. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-01drm: Make the connector .detect() callback optionalLaurent Pinchart
Many drivers (21 to be exact) create connectors that are always connected (for instance to an LVDS or DSI panel). Instead of forcing them to implement a dummy .detect() handler, make the callback optional and consider the connector as always connected in that case. Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Acked-by: Maxime Ripard <maxime.ripard@free-electrons.com> Acked-by: Jyri Sarha <jsarha@ti.com> Acked-by: Jani Nikula <jani.nikula@intel.com> Acked-by: Philipp Zabel <p.zabel@pengutronix.de> Acked-by: Vincent Abriou <vincent.abriou@st.com> Acked-by: Alexey Brodkin <abrodkin@synopsys.com> Signed-off-by: Laurent Pinchart <laurent.pinchart+renesas@ideasonboard.com> [seanpaul fixed small conflict in rcar-du/rcar_du_lvdscon.c] Signed-off-by: Sean Paul <seanpaul@chromium.org>
2016-12-01nvme.h: add Write Zeroes definitionsChaitanya Kulkarni
Add the command structure, optional command set support (ONCS) bit and a new error code for the Write Zeroes command. Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@hgst.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@fb.com>
2016-12-01block: add support for REQ_OP_WRITE_ZEROESChaitanya Kulkarni
This adds a new block layer operation to zero out a range of LBAs. This allows to implement zeroing for devices that don't use either discard with a predictable zero pattern or WRITE SAME of zeroes. The prominent example of that is NVMe with the Write Zeroes command, but in the future, this should also help with improving the way zeroing discards work. For this operation, suitable entry is exported in sysfs which indicate the number of maximum bytes allowed in one write zeroes operation by the device. Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@hgst.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@fb.com>
2016-12-01block: add async variant of blkdev_issue_zerooutChaitanya Kulkarni
Similar to __blkdev_issue_discard this variant allows submitting the final bio asynchronously and chaining multiple ranges into a single completion. Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@hgst.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@fb.com>
2016-12-01alarmtimer: Add tracepoints for alarm timersBaolin Wang
Alarm timers are one of the mechanisms to wake up a system from suspend, but there exist no tracepoints to analyse which process/thread armed an alarmtimer. Add tracepoints for start/cancel/expire of individual alarm timers and one for tracing the suspend time decision when to resume the system. The following trace excerpt illustrates the new mechanism: Binder:3292_2-3304 [000] d..2 149.981123: alarmtimer_cancel: alarmtimer:ffffffc1319a7800 type:REALTIME expires:1325463120000000000 now:1325376810370370245 Binder:3292_2-3304 [000] d..2 149.981136: alarmtimer_start: alarmtimer:ffffffc1319a7800 type:REALTIME expires:1325376840000000000 now:1325376810370384591 Binder:3292_9-3953 [000] d..2 150.212991: alarmtimer_cancel: alarmtimer:ffffffc1319a5a00 type:BOOTTIME expires:179552000000 now:150154008122 Binder:3292_9-3953 [000] d..2 150.213006: alarmtimer_start: alarmtimer:ffffffc1319a5a00 type:BOOTTIME expires:179551000000 now:150154025622 system_server-3000 [002] ...1 162.701940: alarmtimer_suspend: alarmtimer type:REALTIME expires:1325376840000000000 The wakeup time which is selected at suspend time allows to map it back to the task arming the timer: Binder:3292_2. [ tglx: Store alarm timer expiry time instead of some useless RTC relative information, add proper type information for wakeups which are handled via the clock_nanosleep/freezer and massage the changelog. ] Signed-off-by: Baolin Wang <baolin.wang@linaro.org> Signed-off-by: John Stultz <john.stultz@linaro.org> Acked-by: Steven Rostedt <rostedt@goodmis.org> Cc: Prarit Bhargava <prarit@redhat.com> Cc: Richard Cochran <richardcochran@gmail.com> Link: http://lkml.kernel.org/r/1480372524-15181-5-git-send-email-john.stultz@linaro.org Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2016-12-01Merge back earlier cpuidle material for v4.10.Rafael J. Wysocki
2016-12-01Merge back earlier ACPICA material for v4.10.Rafael J. Wysocki
2016-11-30mm: fix false-positive WARN_ON() in truncate/invalidate for hugetlbKirill A. Shutemov
Hugetlb pages have ->index in size of the huge pages (PMD_SIZE or PUD_SIZE), not in PAGE_SIZE as other types of pages. This means we cannot user page_to_pgoff() to check whether we've got the right page for the radix-tree index. Let's introduce page_to_index() which would return radix-tree index for given page. We will be able to get rid of this once hugetlb will be switched to multi-order entries. Fixes: fc127da085c2 ("truncate: handle file thp") Link: http://lkml.kernel.org/r/20161123093053.mjbnvn5zwxw5e6lk@black.fi.intel.com Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Reported-by: Doug Nelson <doug.nelson@intel.com> Tested-by: Doug Nelson <doug.nelson@intel.com> Reviewed-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Cc: <stable@vger.kernel.org> [4.8+] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>