summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2016-12-04NFS: Fix incorrect mapping revalidation when holding a delegationTrond Myklebust
We should only care about checking the attributes if the page cache is marked as dubious (using NFS_INO_REVAL_PAGECACHE) and the NFS_INO_REVAL_FORCED flag is set. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2016-12-04Linux 4.9-rc8v4.9-rc8Linus Torvalds
2016-12-04netfilter: conntrack: add nf_conntrack_default_on sysctlFlorian Westphal
This switch (default on) can be used to disable automatic registration of connection tracking functionality in newly created network namespaces. This means that when net namespace goes down (or the tracker protocol module is unloaded) we *might* have to unregister the hooks. We can either add another per-netns variable that tells if the hooks got registered by default, or, alternatively, just call the protocol _put() function and have the callee deal with a possible 'extra' put() operation that doesn't pair with a get() one. This uses the latter approach, i.e. a put() without a get has no effect. Conntrack is still enabled automatically regardless of the new sysctl setting if the new net namespace requires connection tracking, e.g. when NAT rules are created. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2016-12-04netfilter: conntrack: register hooks in netns when needed by rulesetFlorian Westphal
This makes use of nf_ct_netns_get/put added in previous patch. We add get/put functions to nf_conntrack_l3proto structure, ipv4 and ipv6 then implement use-count to track how many users (nft or xtables modules) have a dependency on ipv4 and/or ipv6 connection tracking functionality. When count reaches zero, the hooks are unregistered. This delays activation of connection tracking inside a namespace until stateful firewall rule or nat rule gets added. This patch breaks backwards compatibility in the sense that connection tracking won't be active anymore when the protocol tracker module is loaded. This breaks e.g. setups that ctnetlink for flow accounting and the like, without any '-m conntrack' packet filter rules. Followup patch restores old behavour and makes new delayed scheme optional via sysctl. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2016-12-04netfilter: nf_tables: add conntrack dependencies for nat/masq/redir expressionsFlorian Westphal
so that conntrack core will add the needed hooks in this namespace. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2016-12-04netfilter: nat: add dependencies on conntrack moduleFlorian Westphal
MASQUERADE, S/DNAT and REDIRECT already call functions that depend on the conntrack module. However, since the conntrack hooks are now registered in a lazy fashion (i.e., only when needed) a symbol reference is not enough. Thus, when something is added to a nat table, make sure that it will see packets by calling nf_ct_netns_get() which will register the conntrack hooks in the current netns. An alternative would be to add these dependencies to the NAT table. However, that has problems when using non-modular builds -- we might register e.g. ipv6 conntrack before its initcall has run, leading to NULL deref crashes since its per-netns storage has not yet been allocated. Adding the dependency in the modules instead has the advantage that nat table also does not register its hooks until rules are added. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2016-12-04netfilter: add and use nf_ct_netns_get/putFlorian Westphal
currently aliased to try_module_get/_put. Will be changed in next patch when we add functions to make use of ->net argument to store usercount per l3proto tracker. This is needed to avoid registering the conntrack hooks in all netns and later only enable connection tracking in those that need conntrack. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2016-12-04netfilter: conntrack: remove unused init_net hookFlorian Westphal
since adf0516845bcd0 ("netfilter: remove ip_conntrack* sysctl compat code") the only user (ipv4 tracker) sets this to an empty stub function. After this change nf_ct_l3proto_pernet_register() is also empty, but this will change in a followup patch to add conditional register of the hooks. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2016-12-04netfilter: conntrack: built-in support for UDPliteDavide Caratti
CONFIG_NF_CT_PROTO_UDPLITE is no more a tristate. When set to y, connection tracking support for UDPlite protocol is built-in into nf_conntrack.ko. footprint test: $ ls -l net/netfilter/nf_conntrack{_proto_udplite,}.ko \ net/ipv4/netfilter/nf_conntrack_ipv4.ko \ net/ipv6/netfilter/nf_conntrack_ipv6.ko (builtin)|| udplite| ipv4 | ipv6 |nf_conntrack ---------++--------+--------+--------+-------------- none || 432538 | 828755 | 828676 | 6141434 UDPlite || - | 829649 | 829362 | 6498204 Signed-off-by: Davide Caratti <dcaratti@redhat.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2016-12-04netfilter: conntrack: built-in support for SCTPDavide Caratti
CONFIG_NF_CT_PROTO_SCTP is no more a tristate. When set to y, connection tracking support for SCTP protocol is built-in into nf_conntrack.ko. footprint test: $ ls -l net/netfilter/nf_conntrack{_proto_sctp,}.ko \ net/ipv4/netfilter/nf_conntrack_ipv4.ko \ net/ipv6/netfilter/nf_conntrack_ipv6.ko (builtin)|| sctp | ipv4 | ipv6 | nf_conntrack ---------++--------+--------+--------+-------------- none || 498243 | 828755 | 828676 | 6141434 SCTP || - | 829254 | 829175 | 6547872 Signed-off-by: Davide Caratti <dcaratti@redhat.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2016-12-04netfilter: conntrack: built-in support for DCCPDavide Caratti
CONFIG_NF_CT_PROTO_DCCP is no more a tristate. When set to y, connection tracking support for DCCP protocol is built-in into nf_conntrack.ko. footprint test: $ ls -l net/netfilter/nf_conntrack{_proto_dccp,}.ko \ net/ipv4/netfilter/nf_conntrack_ipv4.ko \ net/ipv6/netfilter/nf_conntrack_ipv6.ko (builtin)|| dccp | ipv4 | ipv6 | nf_conntrack ---------++--------+--------+--------+-------------- none || 469140 | 828755 | 828676 | 6141434 DCCP || - | 830566 | 829935 | 6533526 Signed-off-by: Davide Caratti <dcaratti@redhat.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2016-12-04netfilter: nf_conntrack_tuple_common.h: fix #includeDavide Caratti
To allow usage of enum ip_conntrack_dir in include/net/netns/conntrack.h, this patch encloses #include <linux/netfilter.h> in a #ifndef __KERNEL__ directive, so that compiler errors caused by unwanted inclusion of include/linux/netfilter.h are avoided. In addition, #include <linux/netfilter/nf_conntrack_common.h> line has been added to resolve correctly CTINFO2DIR macro. Signed-off-by: Davide Caratti <dcaratti@redhat.com> Acked-by: Mikko Rapeli <mikko.rapeli@iki.fi> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2016-12-04Merge tag 'ipvs-for-v4.10' of ↵Pablo Neira Ayuso
https://git.kernel.org/pub/scm/linux/kernel/git/horms/ipvs-next Simon Horman says: ==================== IPVS Updates for v4.10 please consider these enhancements to the IPVS for v4.10. * Decrement the IP ttl in all the modes in order to prevent infinite route loops. Thanks to Dwip Banerjee. * Use IS_ERR_OR_NULL macro. Clean-up from Gao Feng. ==================== Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2016-12-04netfilter: nfnetlink_log: add "nf-logger-5-1" module alias nameLiping Zhang
So we can autoload nfnetlink_log.ko when the user adding nft log group X rule in netdev family. Signed-off-by: Liping Zhang <zlpnobody@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2016-12-04netfilter: nf_log: do not assume ethernet header in netdev familyLiping Zhang
In netdev family, we will handle non ethernet packets, so using eth_hdr(skb)->h_proto is incorrect. Meanwhile, we can use socket(AF_PACKET...) to sending packets, so skb->protocol is not always set in bridge family. Add an extra parameter into nf_log_l2packet to solve this issue. Fixes: 1fddf4bad0ac ("netfilter: nf_log: add packet logging for netdev family") Signed-off-by: Liping Zhang <zlpnobody@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2016-12-04netfilter: built-in NAT support for UDPliteDavide Caratti
CONFIG_NF_NAT_PROTO_UDPLITE is no more a tristate. When set to y, NAT support for UDPlite protocol is built-in into nf_nat.ko. footprint test: (nf_nat_proto_) |udplite || nf_nat --------------------------+--------++-------- no builtin | 408048 || 2241312 UDPLITE builtin | - || 2577256 Signed-off-by: Davide Caratti <dcaratti@redhat.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2016-12-04netfilter: built-in NAT support for SCTPDavide Caratti
CONFIG_NF_NAT_PROTO_SCTP is no more a tristate. When set to y, NAT support for SCTP protocol is built-in into nf_nat.ko. footprint test: (nf_nat_proto_) | sctp || nf_nat --------------------------+--------++-------- no builtin | 428344 || 2241312 SCTP builtin | - || 2597032 Signed-off-by: Davide Caratti <dcaratti@redhat.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2016-12-04netfilter: built-in NAT support for DCCPDavide Caratti
CONFIG_NF_NAT_PROTO_DCCP is no more a tristate. When set to y, NAT support for DCCP protocol is built-in into nf_nat.ko. footprint test: (nf_nat_proto_) | dccp || nf_nat --------------------------+--------++-------- no builtin | 409800 || 2241312 DCCP builtin | - || 2578968 Signed-off-by: Davide Caratti <dcaratti@redhat.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2016-12-04netfilter: update Arturo Borrero Gonzalez email addressArturo Borrero Gonzalez
The email address has changed, let's update the copyright statements. Signed-off-by: Arturo Borrero Gonzalez <arturo@debian.org> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2016-12-04libnvdimm, namespace: use octal for permissionsFabian Frederick
According to commit f90774e1fd27 ("checkpatch: look for symbolic permissions and suggest octal instead") Signed-off-by: Fabian Frederick <fabf@skynet.be> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2016-12-04libnvdimm, namespace: avoid multiple sector calculationsFabian Frederick
Use sector_t for cleared Suggested-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Fabian Frederick <fabf@skynet.be> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2016-12-04libnvdimm: remove else after return in nsio_rw_bytes()Fabian Frederick
else after return is not needed. Signed-off-by: Fabian Frederick <fabf@skynet.be> [djbw: removed some now unnecessary newlines] Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2016-12-04EDAC, amd64: Fix improper return valuePan Bian
When the call to zalloc_cpumask_var() fails, returning "false" seems improper. The real value of macro "false" is 0, and 0 means no error. Return -ENOMEM instead. Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=189071 Signed-off-by: Pan Bian <bianpan2016@163.com> Cc: linux-edac <linux-edac@vger.kernel.org> Link: http://lkml.kernel.org/r/1480831638-5361-1-git-send-email-bianpan201604@163.com Signed-off-by: Borislav Petkov <bp@suse.de>
2016-12-03net: dcb: set error code on failuresPan Bian
In function dcbnl_cee_fill(), returns the value of variable err on errors. However, on some error paths (e.g. nla put fails), its value may be 0. It may be better to explicitly set a negative errno to variable err before returning. Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=188881 Signed-off-by: Pan Bian <bianpan2016@163.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-03ipv6 addrconf: Implemented enhanced DAD (RFC7527)Erik Nordmark
Implemented RFC7527 Enhanced DAD. IPv6 duplicate address detection can fail if there is some temporary loopback of Ethernet frames. RFC7527 solves this by including a random nonce in the NS messages used for DAD, and if an NS is received with the same nonce it is assumed to be a looped back DAD probe and is ignored. RFC7527 is enabled by default. Can be disabled by setting both of conf/{all,interface}/enhanced_dad to zero. Signed-off-by: Erik Nordmark <nordmark@arista.com> Signed-off-by: Bob Gilligan <gilligan@arista.com> Reviewed-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-03Merge branch 'mv88e6390-batch-three'David S. Miller
Andrew Lunn says: ==================== mv88e6390 batch 3 More patches to support the MV88e6390. This is mostly refactoring existing code and adding implementations for the mv88e6390. This patchset set which reserved frames are sent to the cpu, the size of jumbo frames that will be accepted, turn off egress rate limiting, and configuration of pause frames. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-03net: dsa: mv88e6xxx: Implement mv88e6390 pause controlAndrew Lunn
The mv88e6390 has a number flow control registers accessed via the Flow Control register. Use these to set the pause control. Signed-off-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-03net: dsa: mv88e6xxx: Refactor pause configurationAndrew Lunn
The mv88e6390 has a different mechanism for configuring pause. Refactor the code into an ops function, and for the moment, don't add any mv88e6390 code yet. Signed-off-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-03net: dsa: mv88e6xxx: Refactor egress rate limitingAndrew Lunn
There are two different rate limiting configurations, depending on the switch generation. Refactor this into ops. Signed-off-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-03net: dsa: mv88e6xxx: Refactor setting of jumbo framesAndrew Lunn
Some switches support jumbo frames. Refactor this code into operations in the ops structure. Signed-off-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-03net: dsa: mv88e6xxx: Reserved Management frames to CPUAndrew Lunn
Older devices have a couple of registers in global2. The mv88e6390 family has a single register in global1 behind which hides similar configuration. Implement and op for this. Signed-off-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-03Merge branch 'mv88e6390-batch-two'David S. Miller
Andrew Lunn says: ==================== MV88E6390 batch two This is the second batch of patches adding support for the MV88e6390. They are not sufficient to make it work properly. The mv88e6390 has a much expanded set of priority maps. Refactor the existing code, and implement basic support for the new device. Similarly, the monitor control register has been reworked. The mv88e6390 has something odd in its EDSA tagging implementation, which means it is not possible to use it. So we need to use DSA tagging. This is the first device with EDSA support where we need to use DSA, and the code does not support this. So two patches refactor the existing code. The two different register definitions are separated out, and using DSA on an EDSA capable device is added. v2: Add port prefix Add helper function for 6390 Add _IEEE_ into #defines Split monitor_ctrl into a number of separate ops. Remove 6390 code which is management, used in a later patch s/EGREES/EGRESS/. Broke up setup_port_dsa() and set_port_dsa() into a number of ops v3: Verify mandatory ops for port setup Don't set ether type for DSA port. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-03net: dsa: mv88e6xxx: Refactor CPU and DSA port setupAndrew Lunn
Older chips only support DSA tagging. Newer chips have both DSA and EDSA tagging. Refactor the code by adding port functions for setting the frame mode, egress mode, and if to forward unknown frames. This results in the helper mv88e6xxx_6065_family() becoming unused, so remove it. Signed-off-by: Andrew Lunn <andrew@lunn.ch> v3: Verify mandatory ops for port setup Don't set ether type for DSA port. Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-03net: dsa: mv88e6xxx: Move the tagging protocol into infoAndrew Lunn
Older chips support a single tagging protocol, DSA. New chips support both DSA and EDSA, an enhanced version. Having both as an option changes the register layouts. Up until now, it has been assumed that if EDSA is supported, it will be used. Hence the register layout has been determined by which protocol should be used. However, mv88e6390 has a different implementation of EDSA, which requires we need to use the DSA tagging. Hence separate the selection of the protocol from the register layout. Signed-off-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-03net: dsa: mv88e6xxx: Monitor and Management tablesAndrew Lunn
The mv88e6390 changes the monitor control register into the Monitor and Management control, which is an indirection register to various registers. Add ops to set the CPU port and the ingress/egress port for both register layouts, to global1 Signed-off-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-03net: dsa: mv88e6xxx: Implement mv88e6390 tag remapAndrew Lunn
The mv88e6390 does not have the two registers to set the frame priority map. Instead it has an indirection registers for setting a number of different priority maps. Refactor the old code into an function, implement the mv88e6390 version, and use an op to call the right one. Signed-off-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-03autofs - dont hold spin lock over direct mount expireIan Kent
Commit 7cbdb4a286 altered the autofs indirect mount expire to not hold a spin lock during the expire check. The direct mount expire needs the same treatment because to make autofs expires namespace aware may_umount_tree() needs to to use a similar method to may_umount() when checking if a mount tree is in use. This means may_umount_tree() will end up taking the namespace_sem for the check so the autofs direct mount expire won't be allowed to hold a spin lock over the check. Signed-off-by: Ian Kent <raven@themaw.net> Cc: Al Viro <viro@ZenIV.linux.org.uk> Cc: Eric W. Biederman <ebiederm@xmission.com> Cc: Omar Sandoval <osandov@osandov.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-12-03autofs - constify misc struct path instancesIan Kent
Signed-off-by: Ian Kent <raven@themaw.net> Cc: Al Viro <viro@ZenIV.linux.org.uk> Cc: Eric W. Biederman <ebiederm@xmission.com> Cc: Omar Sandoval <osandov@osandov.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-12-03vfs: remove unused have_submounts() functionIan Kent
Now that path_has_submounts() has been added have_submounts() is no longer used so remove it. Link: http://lkml.kernel.org/r/20161011053428.27645.12310.stgit@pluto.themaw.net Signed-off-by: Ian Kent <raven@themaw.net> Cc: Al Viro <viro@ZenIV.linux.org.uk> Cc: Eric W. Biederman <ebiederm@xmission.com> Cc: Omar Sandoval <osandov@osandov.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-12-03autofs: use path_has_submounts() to fix unreliable have_submount() checksIan Kent
If an automount mount is clone(2)ed into a file system that is propagation private, when it later expires in the originating namespace, subsequent calls to autofs ->d_automount() for that dentry in the original namespace will return ELOOP until the mount is umounted in the cloned namespace. Now that a struct path is available where needed use path_has_submounts() instead of have_submounts() so we don't get false positives when checking if a dentry is a mount point or contains mounts in the current namespace. Link: http://lkml.kernel.org/r/20161011053423.27645.91233.stgit@pluto.themaw.net Signed-off-by: Ian Kent <raven@themaw.net> Cc: Al Viro <viro@ZenIV.linux.org.uk> Cc: Eric W. Biederman <ebiederm@xmission.com> Cc: Omar Sandoval <osandov@osandov.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-12-03autofs: use path_is_mountpoint() to fix unreliable d_mountpoint() checksIan Kent
If an automount mount is clone(2)ed into a file system that is propagation private, when it later expires in the originating namespace, subsequent calls to autofs ->d_automount() for that dentry in the original namespace will return ELOOP until the mount is umounted in the cloned namespace. Now that a struct path is available where needed use path_is_mountpoint() instead of d_mountpoint() so we don't get false positives when checking if a dentry is a mount point in the current namespace. Link: http://lkml.kernel.org/r/20161011053418.27645.15241.stgit@pluto.themaw.net Signed-off-by: Ian Kent <raven@themaw.net> Cc: Al Viro <viro@ZenIV.linux.org.uk> Cc: Eric W. Biederman <ebiederm@xmission.com> Cc: Omar Sandoval <osandov@osandov.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-12-03autofs: change autofs4_wait() to take struct pathIan Kent
In order to use the functions path_is_mountpoint() and path_has_submounts() autofs needs to pass a struct path in several places. Now change autofs4_wait() to take a struct path instead of a struct dentry. Link: http://lkml.kernel.org/r/20161011053413.27645.84666.stgit@pluto.themaw.net Signed-off-by: Ian Kent <raven@themaw.net> Cc: Al Viro <viro@ZenIV.linux.org.uk> Cc: Eric W. Biederman <ebiederm@xmission.com> Cc: Omar Sandoval <osandov@osandov.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-12-03autofs: change autofs4_expire_wait()/do_expire_wait() to take struct pathIan Kent
In order to use the functions path_is_mountpoint() and path_has_submounts() autofs needs to pass a struct path in several places. Start by changing autofs4_expire_wait() and do_expire_wait() to take a struct path instead of a struct dentry. Link: http://lkml.kernel.org/r/20161011053408.27645.40091.stgit@pluto.themaw.net Signed-off-by: Ian Kent <raven@themaw.net> Cc: Al Viro <viro@ZenIV.linux.org.uk> Cc: Eric W. Biederman <ebiederm@xmission.com> Cc: Omar Sandoval <osandov@osandov.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-12-03vfs: add path_has_submounts()Ian Kent
d_mountpoint() can only be used reliably to establish if a dentry is not mounted in any namespace. It isn't aware of the possibility there may be multiple mounts using the given dentry, possibly in a different namespace. Add function, path_has_submounts(), that checks is a struct path contains mounts (or is a mountpoint itself) to handle this case. Link: http://lkml.kernel.org/r/20161011053403.27645.55242.stgit@pluto.themaw.net Signed-off-by: Ian Kent <raven@themaw.net> Cc: Al Viro <viro@ZenIV.linux.org.uk> Cc: Eric W. Biederman <ebiederm@xmission.com> Cc: Omar Sandoval <osandov@osandov.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-12-03vfs: add path_is_mountpoint() helperIan Kent
d_mountpoint() can only be used reliably to establish if a dentry is not mounted in any namespace. It isn't aware of the possibility there may be multiple mounts using a given dentry that may be in a different namespace. Add helper functions, path_is_mountpoint(), that checks if a struct path is a mountpoint for this case. Link: http://lkml.kernel.org/r/20161011053358.27645.9729.stgit@pluto.themaw.net Signed-off-by: Ian Kent <raven@themaw.net> Cc: Al Viro <viro@ZenIV.linux.org.uk> Cc: Eric W. Biederman <ebiederm@xmission.com> Cc: Omar Sandoval <osandov@osandov.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-12-03Merge tag 'drm-fixes-for-v4.9-rc8' of ↵Linus Torvalds
git://people.freedesktop.org/~airlied/linux Pull drm fixes from Dave Airlie: "A pretty small pull request: a couple of AMD powerxpress regression fixes and a power management fix, a couple of i915 fixes and one hdlcd fix, along with one core don't oops because of incorrect API usage fix" * tag 'drm-fixes-for-v4.9-rc8' of git://people.freedesktop.org/~airlied/linux: drm/i915: drop the struct_mutex when wedged or trying to reset drm/i915: Don't touch NULL sg on i915_gem_object_get_pages_gtt() error drm: Don't call drm_for_each_crtc with a non-KMS driver drm/radeon: fix check for port PM availability drm/amdgpu: fix check for port PM availability drm/amd/powerplay: initialize the soft_regs offset in struct smu7_hwmgr drm: hdlcd: Fix cleanup order
2016-12-03Merge branch 'fib-notifier-event-replay'David S. Miller
Jiri Pirko says: ==================== ipv4: fib: Replay events when registering FIB notifier Ido says: In kernel 4.9 the switchdev-specific FIB offload mechanism was replaced by a new FIB notification chain to which modules could register in order to be notified about the addition and deletion of FIB entries. The motivation for this change was that switchdev drivers need to be able to reflect the entire FIB table and not only FIBs configured on top of the port netdevs themselves. This is useful in case of in-band management. The fundamental problem with this approach is that upon registration listeners lose all the information previously sent in the chain and thus have an incomplete view of the FIB tables, which can result in packet loss. This patchset fixes that by dumping the FIB tables and replaying notifications previously sent in the chain for the registered notification block. The entire dump process is done under RCU and thus the FIB notification chain is converted to be atomic. The listeners are modified accordingly. This is done in the first eight patches. The ninth patch adds a change sequence counter to ensure the integrity of the FIB dump. The last patch adds the dump itself to the FIB chain registration function and modifies existing listeners to pass a callback to be executed in case dump was inconsistent. --- v3->v4: - Register the notification block after the dump and protect it using the change sequence counter (Hannes Frederic Sowa). - Since we now integrate the dump into the registration function, drop the sysctl to set maximum number of retries and instead set it to a fixed number. Lets see if it's really a problem before adding something we can never remove. - For the same reason, dump FIB tables for all net namespaces. - Add a comment regarding guarantees provided by mutex semantics. v2->v3: - Add sysctl to set the number of FIB dump retries (Hannes Frederic Sowa). - Read the sequence counter under RTNL to ensure synchronization between the dump process and other processes changing the routing tables (Hannes Frederic Sowa). - Pass a callback to the dump function to be executed prior to a retry. - Limit the dump to a single net namespace. v1->v2: - Add a sequence counter to ensure the integrity of the FIB dump (David S. Miller, Hannes Frederic Sowa). - Protect notifications from re-ordering in listeners by using an ordered workqueue (Hannes Frederic Sowa). - Introduce fib_info_hold() (Jiri Pirko). - Relieve rocker from the need to invoke the FIB dump by registering to the FIB notification chain prior to ports creation. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-03ipv4: fib: Replay events when registering FIB notifierIdo Schimmel
Commit b90eb7549499 ("fib: introduce FIB notification infrastructure") introduced a new notification chain to notify listeners (f.e., switchdev drivers) about addition and deletion of routes. However, upon registration to the chain the FIB tables can already be populated, which means potential listeners will have an incomplete view of the tables. Solve that by dumping the FIB tables and replaying the events to the passed notification block. The dump itself is done using RCU in order not to starve consumers that need RTNL to make progress. The integrity of the dump is ensured by reading the FIB change sequence counter before and after the dump under RTNL. This allows us to avoid the problematic situation in which the dumping process sends a ENTRY_ADD notification following ENTRY_DEL generated by another process holding RTNL. Callers of the registration function may pass a callback that is executed in case the dump was inconsistent with current FIB tables. The number of retries until a consistent dump is achieved is set to a fixed number to prevent callers from looping for long periods of time. In case current limit proves to be problematic in the future, it can be easily converted to be configurable using a sysctl. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-03ipv4: fib: Allow for consistent FIB dumpingIdo Schimmel
The next patch will enable listeners of the FIB notification chain to request a dump of the FIB tables. However, since RTNL isn't taken during the dump, it's possible for the FIB tables to change mid-dump, which will result in inconsistency between the listener's table and the kernel's. Allow listeners to know about changes that occurred mid-dump, by adding a change sequence counter to each net namespace. The counter is incremented just before a notification is sent in the FIB chain. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-03ipv4: fib: Convert FIB notification chain to be atomicIdo Schimmel
In order not to hold RTNL for long periods of time we're going to dump the FIB tables using RCU. Convert the FIB notification chain to be atomic, as we can't block in RCU critical sections. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>