summaryrefslogtreecommitdiff
path: root/net/ipv6/addrconf.c
AgeCommit message (Collapse)Author
2020-04-27sysctl: pass kernel pointers to ->proc_handlerChristoph Hellwig
Instead of having all the sysctl handlers deal with user pointers, which is rather hairy in terms of the BPF interaction, copy the input to and from userspace in common code. This also means that the strings are always NUL-terminated by the common code, making the API a little bit safer. As most handler just pass through the data to one of the common handlers a lot of the changes are mechnical. Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: Andrey Ignatov <rdna@fb.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2020-04-02neigh: support smaller retrans_time setttingHangbin Liu
Currently, we limited the retrans_time to be greater than HZ/2. i.e. setting retrans_time less than 500ms will not work. This makes the user unable to achieve a more accurate control for bonding arp fast failover. Update the sanity check to HZ/100, which is 10ms, to let users have more ability on the retrans_time control. v3: sync the behavior with IPv6 and update all the timer handler v2: use HZ instead of hard code number Signed-off-by: Hangbin Liu <liuhangbin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-04-01ipv6: don't auto-add link-local address to lag portsJarod Wilson
Bonding slave and team port devices should not have link-local addresses automatically added to them, as it can interfere with openvswitch being able to properly add tc ingress. Basic reproducer, courtesy of Marcelo: $ ip link add name bond0 type bond $ ip link set dev ens2f0np0 master bond0 $ ip link set dev ens2f1np2 master bond0 $ ip link set dev bond0 up $ ip a s 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000 link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 inet 127.0.0.1/8 scope host lo valid_lft forever preferred_lft forever inet6 ::1/128 scope host valid_lft forever preferred_lft forever 2: ens2f0np0: <BROADCAST,MULTICAST,SLAVE,UP,LOWER_UP> mtu 1500 qdisc mq master bond0 state UP group default qlen 1000 link/ether 00:0f:53:2f:ea:40 brd ff:ff:ff:ff:ff:ff 5: ens2f1np2: <NO-CARRIER,BROADCAST,MULTICAST,SLAVE,UP> mtu 1500 qdisc mq master bond0 state DOWN group default qlen 1000 link/ether 00:0f:53:2f:ea:40 brd ff:ff:ff:ff:ff:ff 11: bond0: <BROADCAST,MULTICAST,MASTER,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000 link/ether 00:0f:53:2f:ea:40 brd ff:ff:ff:ff:ff:ff inet6 fe80::20f:53ff:fe2f:ea40/64 scope link valid_lft forever preferred_lft forever (above trimmed to relevant entries, obviously) $ sysctl net.ipv6.conf.ens2f0np0.addr_gen_mode=0 net.ipv6.conf.ens2f0np0.addr_gen_mode = 0 $ sysctl net.ipv6.conf.ens2f1np2.addr_gen_mode=0 net.ipv6.conf.ens2f1np2.addr_gen_mode = 0 $ ip a l ens2f0np0 2: ens2f0np0: <BROADCAST,MULTICAST,SLAVE,UP,LOWER_UP> mtu 1500 qdisc mq master bond0 state UP group default qlen 1000 link/ether 00:0f:53:2f:ea:40 brd ff:ff:ff:ff:ff:ff inet6 fe80::20f:53ff:fe2f:ea40/64 scope link tentative valid_lft forever preferred_lft forever $ ip a l ens2f1np2 5: ens2f1np2: <NO-CARRIER,BROADCAST,MULTICAST,SLAVE,UP> mtu 1500 qdisc mq master bond0 state DOWN group default qlen 1000 link/ether 00:0f:53:2f:ea:40 brd ff:ff:ff:ff:ff:ff inet6 fe80::20f:53ff:fe2f:ea40/64 scope link tentative valid_lft forever preferred_lft forever Looks like addrconf_sysctl_addr_gen_mode() bypasses the original "is this a slave interface?" check added by commit c2edacf80e15, and results in an address getting added, while w/the proposed patch added, no address gets added. This simply adds the same gating check to another code path, and thus should prevent the same devices from erroneously obtaining an ipv6 link-local address. Fixes: d35a00b8e33d ("net/ipv6: allow sysctl to change link-local address generation mode") Reported-by: Moshe Levi <moshele@mellanox.com> CC: Stephen Hemminger <stephen@networkplumber.org> CC: Marcelo Ricardo Leitner <mleitner@redhat.com> CC: netdev@vger.kernel.org Signed-off-by: Jarod Wilson <jarod@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-29net: ipv6: add support for rpl sr exthdrAlexander Aring
This patch adds rpl source routing receive handling. Everything works only if sysconf "rpl_seg_enabled" and source routing is enabled. Mostly the same behaviour as IPv6 segmentation routing. To handle compression and uncompression a rpl.c file is created which contains the necessary functionality. The receive handling will also care about IPv6 encapsulated so far it's specified as possible nexthdr in RFC 6554. Signed-off-by: Alexander Aring <alex.aring@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-29addrconf: add functionality to check on rpl requirementsAlexander Aring
This patch adds a functionality to addrconf to check on a specific RPL address configuration. According to RFC 6554: To detect loops in the SRH, a router MUST determine if the SRH includes multiple addresses assigned to any interface on that router. If such addresses appear more than once and are separated by at least one address not assigned to that router. Signed-off-by: Alexander Aring <alex.aring@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-12Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netDavid S. Miller
Minor overlapping changes, nothing serious. Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-12inet: Use fallthrough;Joe Perches
Convert the various uses of fallthrough comments to fallthrough; Done via script Link: https://lore.kernel.org/lkml/b56602fcf79f849e733e7b521bb0e17895d390fa.1582230379.git.joe@perches.com/ And by hand: net/ipv6/ip6_fib.c has a fallthrough comment outside of an #ifdef block that causes gcc to emit a warning if converted in-place. So move the new fallthrough; inside the containing #ifdef/#endif too. Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-10ipv6/addrconf: call ipv6_mc_up() for non-Ethernet interfaceHangbin Liu
Rafał found an issue that for non-Ethernet interface, if we down and up frequently, the memory will be consumed slowly. The reason is we add allnodes/allrouters addressed in multicast list in ipv6_add_dev(). When link down, we call ipv6_mc_down(), store all multicast addresses via mld_add_delrec(). But when link up, we don't call ipv6_mc_up() for non-Ethernet interface to remove the addresses. This makes idev->mc_tomb getting bigger and bigger. The call stack looks like: addrconf_notify(NETDEV_REGISTER) ipv6_add_dev ipv6_dev_mc_inc(ff01::1) ipv6_dev_mc_inc(ff02::1) ipv6_dev_mc_inc(ff02::2) addrconf_notify(NETDEV_UP) addrconf_dev_config /* Alas, we support only Ethernet autoconfiguration. */ return; addrconf_notify(NETDEV_DOWN) addrconf_ifdown ipv6_mc_down igmp6_group_dropped(ff02::2) mld_add_delrec(ff02::2) igmp6_group_dropped(ff02::1) igmp6_group_dropped(ff01::1) After investigating, I can't found a rule to disable multicast on non-Ethernet interface. In RFC2460, the link could be Ethernet, PPP, ATM, tunnels, etc. In IPv4, it doesn't check the dev type when calls ip_mc_up() in inetdev_event(). Even for IPv6, we don't check the dev type and call ipv6_add_dev(), ipv6_dev_mc_inc() after register device. So I think it's OK to fix this memory consumer by calling ipv6_mc_up() for non-Ethernet interface. v2: Also check IFF_MULTICAST flag to make sure the interface supports multicast Reported-by: Rafał Miłecki <zajec5@gmail.com> Tested-by: Rafał Miłecki <zajec5@gmail.com> Fixes: 74235a25c673 ("[IPV6] addrconf: Fix IPv6 on tuntap tunnels") Fixes: 1666d49e1d41 ("mld: do not remove mld souce list info when set link down") Signed-off-by: Hangbin Liu <liuhangbin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-03net/ipv6: remove the old peer route if change it to a new oneHangbin Liu
When we modify the peer route and changed it to a new one, we should remove the old route first. Before the fix: + ip addr add dev dummy1 2001:db8::1 peer 2001:db8::2 + ip -6 route show dev dummy1 2001:db8::1 proto kernel metric 256 pref medium 2001:db8::2 proto kernel metric 256 pref medium + ip addr change dev dummy1 2001:db8::1 peer 2001:db8::3 + ip -6 route show dev dummy1 2001:db8::1 proto kernel metric 256 pref medium 2001:db8::2 proto kernel metric 256 pref medium After the fix: + ip addr change dev dummy1 2001:db8::1 peer 2001:db8::3 + ip -6 route show dev dummy1 2001:db8::1 proto kernel metric 256 pref medium 2001:db8::3 proto kernel metric 256 pref medium This patch depend on the previous patch "net/ipv6: need update peer route when modify metric" to update new peer route after delete old one. Signed-off-by: Hangbin Liu <liuhangbin@gmail.com> Reviewed-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-03net/ipv6: need update peer route when modify metricHangbin Liu
When we modify the route metric, the peer address's route need also be updated. Before the fix: + ip addr add dev dummy1 2001:db8::1 peer 2001:db8::2 metric 60 + ip -6 route show dev dummy1 2001:db8::1 proto kernel metric 60 pref medium 2001:db8::2 proto kernel metric 60 pref medium + ip addr change dev dummy1 2001:db8::1 peer 2001:db8::2 metric 61 + ip -6 route show dev dummy1 2001:db8::1 proto kernel metric 61 pref medium 2001:db8::2 proto kernel metric 60 pref medium After the fix: + ip addr change dev dummy1 2001:db8::1 peer 2001:db8::2 metric 61 + ip -6 route show dev dummy1 2001:db8::1 proto kernel metric 61 pref medium 2001:db8::2 proto kernel metric 61 pref medium Fixes: 8308f3ff1753 ("net/ipv6: Add support for specifying metric of connected routes") Signed-off-by: Hangbin Liu <liuhangbin@gmail.com> Reviewed-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-02-29net/ipv6: use configured metric when add peer routeHangbin Liu
When we add peer address with metric configured, IPv4 could set the dest metric correctly, but IPv6 do not. e.g. ]# ip addr add 192.0.2.1 peer 192.0.2.2/32 dev eth1 metric 20 ]# ip route show dev eth1 192.0.2.2 proto kernel scope link src 192.0.2.1 metric 20 ]# ip addr add 2001:db8::1 peer 2001:db8::2/128 dev eth1 metric 20 ]# ip -6 route show dev eth1 2001:db8::1 proto kernel metric 20 pref medium 2001:db8::2 proto kernel metric 256 pref medium Fix this by using configured metric instead of default one. Reported-by: Jianlin Shi <jishi@redhat.com> Fixes: 8308f3ff1753 ("net/ipv6: Add support for specifying metric of connected routes") Reviewed-by: David Ahern <dsahern@gmail.com> Signed-off-by: Hangbin Liu <liuhangbin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-02-07ipv6/addrconf: fix potential NULL deref in inet6_set_link_af()Eric Dumazet
__in6_dev_get(dev) called from inet6_set_link_af() can return NULL. The needed check has been recently removed, let's add it back. While do_setlink() does call validate_linkmsg() : ... err = validate_linkmsg(dev, tb); /* OK at this point */ ... It is possible that the following call happening before the ->set_link_af() removes IPv6 if MTU is less than 1280 : if (tb[IFLA_MTU]) { err = dev_set_mtu_ext(dev, nla_get_u32(tb[IFLA_MTU]), extack); if (err < 0) goto errout; status |= DO_SETLINK_MODIFIED; } ... if (tb[IFLA_AF_SPEC]) { ... err = af_ops->set_link_af(dev, af); ->inet6_set_link_af() // CRASH because idev is NULL Please note that IPv4 is immune to the bug since inet_set_link_af() does : struct in_device *in_dev = __in_dev_get_rcu(dev); if (!in_dev) return -EAFNOSUPPORT; This problem has been mentioned in commit cf7afbfeb8ce ("rtnl: make link af-specific updates atomic") changelog : This method is not fail proof, while it is currently sufficient to make set_link_af() inerrable and thus 100% atomic, the validation function method will not be able to detect all error scenarios in the future, there will likely always be errors depending on states which are f.e. not protected by rtnl_mutex and thus may change between validation and setting. IPv6: ADDRCONF(NETDEV_CHANGE): lo: link becomes ready general protection fault, probably for non-canonical address 0xdffffc0000000056: 0000 [#1] PREEMPT SMP KASAN KASAN: null-ptr-deref in range [0x00000000000002b0-0x00000000000002b7] CPU: 0 PID: 9698 Comm: syz-executor712 Not tainted 5.5.0-syzkaller #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 RIP: 0010:inet6_set_link_af+0x66e/0xae0 net/ipv6/addrconf.c:5733 Code: 38 d0 7f 08 84 c0 0f 85 20 03 00 00 48 8d bb b0 02 00 00 45 0f b6 64 24 04 48 b8 00 00 00 00 00 fc ff df 48 89 fa 48 c1 ea 03 <0f> b6 04 02 84 c0 74 08 3c 03 0f 8e 1a 03 00 00 44 89 a3 b0 02 00 RSP: 0018:ffffc90005b06d40 EFLAGS: 00010206 RAX: dffffc0000000000 RBX: 0000000000000000 RCX: ffffffff86df39a6 RDX: 0000000000000056 RSI: ffffffff86df3e74 RDI: 00000000000002b0 RBP: ffffc90005b06e70 R08: ffff8880a2ac0380 R09: ffffc90005b06db0 R10: fffff52000b60dbe R11: ffffc90005b06df7 R12: 0000000000000000 R13: 0000000000000000 R14: ffff8880a1fcc424 R15: dffffc0000000000 FS: 0000000000c46880(0000) GS:ffff8880ae800000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 000055f0494ca0d0 CR3: 000000009e4ac000 CR4: 00000000001406f0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: do_setlink+0x2a9f/0x3720 net/core/rtnetlink.c:2754 rtnl_group_changelink net/core/rtnetlink.c:3103 [inline] __rtnl_newlink+0xdd1/0x1790 net/core/rtnetlink.c:3257 rtnl_newlink+0x69/0xa0 net/core/rtnetlink.c:3377 rtnetlink_rcv_msg+0x45e/0xaf0 net/core/rtnetlink.c:5438 netlink_rcv_skb+0x177/0x450 net/netlink/af_netlink.c:2477 rtnetlink_rcv+0x1d/0x30 net/core/rtnetlink.c:5456 netlink_unicast_kernel net/netlink/af_netlink.c:1302 [inline] netlink_unicast+0x59e/0x7e0 net/netlink/af_netlink.c:1328 netlink_sendmsg+0x91c/0xea0 net/netlink/af_netlink.c:1917 sock_sendmsg_nosec net/socket.c:652 [inline] sock_sendmsg+0xd7/0x130 net/socket.c:672 ____sys_sendmsg+0x753/0x880 net/socket.c:2343 ___sys_sendmsg+0x100/0x170 net/socket.c:2397 __sys_sendmsg+0x105/0x1d0 net/socket.c:2430 __do_sys_sendmsg net/socket.c:2439 [inline] __se_sys_sendmsg net/socket.c:2437 [inline] __x64_sys_sendmsg+0x78/0xb0 net/socket.c:2437 do_syscall_64+0xfa/0x790 arch/x86/entry/common.c:294 entry_SYSCALL_64_after_hwframe+0x49/0xbe RIP: 0033:0x4402e9 Code: 18 89 d0 c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 fb 13 fc ff c3 66 2e 0f 1f 84 00 00 00 00 RSP: 002b:00007fffd62fbcf8 EFLAGS: 00000246 ORIG_RAX: 000000000000002e RAX: ffffffffffffffda RBX: 00000000004002c8 RCX: 00000000004402e9 RDX: 0000000000000000 RSI: 0000000020000080 RDI: 0000000000000003 RBP: 00000000006ca018 R08: 0000000000000008 R09: 00000000004002c8 R10: 0000000000000005 R11: 0000000000000246 R12: 0000000000401b70 R13: 0000000000401c00 R14: 0000000000000000 R15: 0000000000000000 Modules linked in: ---[ end trace cfa7664b8fdcdff3 ]--- RIP: 0010:inet6_set_link_af+0x66e/0xae0 net/ipv6/addrconf.c:5733 Code: 38 d0 7f 08 84 c0 0f 85 20 03 00 00 48 8d bb b0 02 00 00 45 0f b6 64 24 04 48 b8 00 00 00 00 00 fc ff df 48 89 fa 48 c1 ea 03 <0f> b6 04 02 84 c0 74 08 3c 03 0f 8e 1a 03 00 00 44 89 a3 b0 02 00 RSP: 0018:ffffc90005b06d40 EFLAGS: 00010206 RAX: dffffc0000000000 RBX: 0000000000000000 RCX: ffffffff86df39a6 RDX: 0000000000000056 RSI: ffffffff86df3e74 RDI: 00000000000002b0 RBP: ffffc90005b06e70 R08: ffff8880a2ac0380 R09: ffffc90005b06db0 R10: fffff52000b60dbe R11: ffffc90005b06df7 R12: 0000000000000000 R13: 0000000000000000 R14: ffff8880a1fcc424 R15: dffffc0000000000 FS: 0000000000c46880(0000) GS:ffff8880ae900000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000020000004 CR3: 000000009e4ac000 CR4: 00000000001406e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Fixes: 7dc2bccab0ee ("Validate required parameters in inet6_validate_link_af") Signed-off-by: Eric Dumazet <edumazet@google.com> Bisected-and-reported-by: syzbot <syzkaller@googlegroups.com> Cc: Maxim Mikityanskiy <maximmi@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-13ipv6/addrconf: only check invalid header values when NETLINK_F_STRICT_CHK is setHangbin Liu
In commit 4b1373de73a3 ("net: ipv6: addr: perform strict checks also for doit handlers") we add strict check for inet6_rtm_getaddr(). But we did the invalid header values check before checking if NETLINK_F_STRICT_CHK is set. This may break backwards compatibility if user already set the ifm->ifa_prefixlen, ifm->ifa_flags, ifm->ifa_scope in their netlink code. I didn't move the nlmsg_len check because I thought it's a valid check. Reported-by: Jianlin Shi <jishi@redhat.com> Fixes: 4b1373de73a3 ("net: ipv6: addr: perform strict checks also for doit handlers") Signed-off-by: Hangbin Liu <liuhangbin@gmail.com> Reviewed-by: David Ahern <dsahern@gmail.com> Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
2019-10-05Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netDavid S. Miller
2019-10-04ipv6: Handle missing host route in __ipv6_ifa_notifyDavid Ahern
Rajendra reported a kernel panic when a link was taken down: [ 6870.263084] BUG: unable to handle kernel NULL pointer dereference at 00000000000000a8 [ 6870.271856] IP: [<ffffffff8efc5764>] __ipv6_ifa_notify+0x154/0x290 <snip> [ 6870.570501] Call Trace: [ 6870.573238] [<ffffffff8efc58c6>] ? ipv6_ifa_notify+0x26/0x40 [ 6870.579665] [<ffffffff8efc98ec>] ? addrconf_dad_completed+0x4c/0x2c0 [ 6870.586869] [<ffffffff8efe70c6>] ? ipv6_dev_mc_inc+0x196/0x260 [ 6870.593491] [<ffffffff8efc9c6a>] ? addrconf_dad_work+0x10a/0x430 [ 6870.600305] [<ffffffff8f01ade4>] ? __switch_to_asm+0x34/0x70 [ 6870.606732] [<ffffffff8ea93a7a>] ? process_one_work+0x18a/0x430 [ 6870.613449] [<ffffffff8ea93d6d>] ? worker_thread+0x4d/0x490 [ 6870.619778] [<ffffffff8ea93d20>] ? process_one_work+0x430/0x430 [ 6870.626495] [<ffffffff8ea99dd9>] ? kthread+0xd9/0xf0 [ 6870.632145] [<ffffffff8f01ade4>] ? __switch_to_asm+0x34/0x70 [ 6870.638573] [<ffffffff8ea99d00>] ? kthread_park+0x60/0x60 [ 6870.644707] [<ffffffff8f01ae77>] ? ret_from_fork+0x57/0x70 [ 6870.650936] Code: 31 c0 31 d2 41 b9 20 00 08 02 b9 09 00 00 0 addrconf_dad_work is kicked to be scheduled when a device is brought up. There is a race between addrcond_dad_work getting scheduled and taking the rtnl lock and a process taking the link down (under rtnl). The latter removes the host route from the inet6_addr as part of addrconf_ifdown which is run for NETDEV_DOWN. The former attempts to use the host route in __ipv6_ifa_notify. If the down event removes the host route due to the race to the rtnl, then the BUG listed above occurs. Since the DAD sequence can not be aborted, add a check for the missing host route in __ipv6_ifa_notify. The only way this should happen is due to the previously mentioned race. The host route is created when the address is added to an interface; it is only removed on a down event where the address is kept. Add a warning if the host route is missing AND the device is up; this is a situation that should never happen. Fixes: f1705ec197e7 ("net: ipv6: Make address flushing on ifdown optional") Reported-by: Rajendra Dendukuri <rajendra.dendukuri@broadcom.com> Signed-off-by: David Ahern <dsahern@gmail.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-10-04Revert "ipv6: Handle race in addrconf_dad_work"David Ahern
This reverts commit a3ce2a21bb8969ae27917281244fa91bf5f286d7. Eric reported tests failings with commit. After digging into it, the bottom line is that the DAD sequence is not to be messed with. There are too many cases that are expected to proceed regardless of whether a device is up. Revert the patch and I will send a different solution for the problem Rajendra reported. Signed-off-by: David Ahern <dsahern@gmail.com> Cc: Eric Dumazet <edumazet@google.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-10-01ipv6: Handle race in addrconf_dad_workDavid Ahern
Rajendra reported a kernel panic when a link was taken down: [ 6870.263084] BUG: unable to handle kernel NULL pointer dereference at 00000000000000a8 [ 6870.271856] IP: [<ffffffff8efc5764>] __ipv6_ifa_notify+0x154/0x290 <snip> [ 6870.570501] Call Trace: [ 6870.573238] [<ffffffff8efc58c6>] ? ipv6_ifa_notify+0x26/0x40 [ 6870.579665] [<ffffffff8efc98ec>] ? addrconf_dad_completed+0x4c/0x2c0 [ 6870.586869] [<ffffffff8efe70c6>] ? ipv6_dev_mc_inc+0x196/0x260 [ 6870.593491] [<ffffffff8efc9c6a>] ? addrconf_dad_work+0x10a/0x430 [ 6870.600305] [<ffffffff8f01ade4>] ? __switch_to_asm+0x34/0x70 [ 6870.606732] [<ffffffff8ea93a7a>] ? process_one_work+0x18a/0x430 [ 6870.613449] [<ffffffff8ea93d6d>] ? worker_thread+0x4d/0x490 [ 6870.619778] [<ffffffff8ea93d20>] ? process_one_work+0x430/0x430 [ 6870.626495] [<ffffffff8ea99dd9>] ? kthread+0xd9/0xf0 [ 6870.632145] [<ffffffff8f01ade4>] ? __switch_to_asm+0x34/0x70 [ 6870.638573] [<ffffffff8ea99d00>] ? kthread_park+0x60/0x60 [ 6870.644707] [<ffffffff8f01ae77>] ? ret_from_fork+0x57/0x70 [ 6870.650936] Code: 31 c0 31 d2 41 b9 20 00 08 02 b9 09 00 00 0 addrconf_dad_work is kicked to be scheduled when a device is brought up. There is a race between addrcond_dad_work getting scheduled and taking the rtnl lock and a process taking the link down (under rtnl). The latter removes the host route from the inet6_addr as part of addrconf_ifdown which is run for NETDEV_DOWN. The former attempts to use the host route in ipv6_ifa_notify. If the down event removes the host route due to the race to the rtnl, then the BUG listed above occurs. This scenario does not occur when the ipv6 address is not kept (net.ipv6.conf.all.keep_addr_on_down = 0) as addrconf_ifdown sets the state of the ifp to DEAD. Handle when the addresses are kept by checking IF_READY which is reset by addrconf_ifdown. The 'dead' flag for an inet6_addr is set only under rtnl, in addrconf_ifdown and it means the device is getting removed (or IPv6 is disabled). The interesting cases for changing the idev flag are addrconf_notify (NETDEV_UP and NETDEV_CHANGE) and addrconf_ifdown (reset the flag). The former does not have the idev lock - only rtnl; the latter has both. Based on that the existing dead + IF_READY check can be moved to right after the rtnl_lock in addrconf_dad_work. Fixes: f1705ec197e7 ("net: ipv6: Make address flushing on ifdown optional") Reported-by: Rajendra Dendukuri <rajendra.dendukuri@broadcom.com> Signed-off-by: David Ahern <dsahern@gmail.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-10-01ipv6: minor code reorg in inet6_fill_ifla6_attrs()Nicolas Dichtel
Just put related code together to ease code reading: the memcpy() is related to the nla_reserve(). Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-23ipv6: propagate ipv6_add_dev's error returns out of ipv6_find_idevSabrina Dubroca
Currently, ipv6_find_idev returns NULL when ipv6_add_dev fails, ignoring the specific error value. This results in addrconf_add_dev returning ENOBUFS in all cases, which is unfortunate in cases such as: # ip link add dummyX type dummy # ip link set dummyX mtu 1200 up # ip addr add 2000::/64 dev dummyX RTNETLINK answers: No buffer space available Commit a317a2f19da7 ("ipv6: fail early when creating netdev named all or default") introduced error returns in ipv6_add_dev. Before that, that function would simply return NULL for all failures. Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-21ipv6/addrconf: allow adding multicast addr if IFA_F_MCAUTOJOIN is setHangbin Liu
In commit 93a714d6b53d ("multicast: Extend ip address command to enable multicast group join/leave on") we added a new flag IFA_F_MCAUTOJOIN to make user able to add multicast address on ethernet interface. This works for IPv4, but not for IPv6. See the inet6_addr_add code. static int inet6_addr_add() { ... if (cfg->ifa_flags & IFA_F_MCAUTOJOIN) { ipv6_mc_config(net->ipv6.mc_autojoin_sk, true...) } ifp = ipv6_add_addr(idev, cfg, true, extack); <- always fail with maddr if (!IS_ERR(ifp)) { ... } else if (cfg->ifa_flags & IFA_F_MCAUTOJOIN) { ipv6_mc_config(net->ipv6.mc_autojoin_sk, false...) } } But in ipv6_add_addr() it will check the address type and reject multicast address directly. So this feature is never worked for IPv6. We should not remove the multicast address check totally in ipv6_add_addr(), but could accept multicast address only when IFA_F_MCAUTOJOIN flag supplied. v2: update commit description Fixes: 93a714d6b53d ("multicast: Extend ip address command to enable multicast group join/leave on") Reported-by: Jianlin Shi <jishi@redhat.com> Signed-off-by: Hangbin Liu <liuhangbin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-18proc/sysctl: add shared variables for range checkMatteo Croce
In the sysctl code the proc_dointvec_minmax() function is often used to validate the user supplied value between an allowed range. This function uses the extra1 and extra2 members from struct ctl_table as minimum and maximum allowed value. On sysctl handler declaration, in every source file there are some readonly variables containing just an integer which address is assigned to the extra1 and extra2 members, so the sysctl range is enforced. The special values 0, 1 and INT_MAX are very often used as range boundary, leading duplication of variables like zero=0, one=1, int_max=INT_MAX in different source files: $ git grep -E '\.extra[12].*&(zero|one|int_max)' |wc -l 248 Add a const int array containing the most commonly used values, some macros to refer more easily to the correct array member, and use them instead of creating a local one for every object file. This is the bloat-o-meter output comparing the old and new binary compiled with the default Fedora config: # scripts/bloat-o-meter -d vmlinux.o.old vmlinux.o add/remove: 2/2 grow/shrink: 0/2 up/down: 24/-188 (-164) Data old new delta sysctl_vals - 12 +12 __kstrtab_sysctl_vals - 12 +12 max 14 10 -4 int_max 16 - -16 one 68 - -68 zero 128 28 -100 Total: Before=20583249, After=20583085, chg -0.00% [mcroce@redhat.com: tipc: remove two unused variables] Link: http://lkml.kernel.org/r/20190530091952.4108-1-mcroce@redhat.com [akpm@linux-foundation.org: fix net/ipv6/sysctl_net_ipv6.c] [arnd@arndb.de: proc/sysctl: make firmware loader table conditional] Link: http://lkml.kernel.org/r/20190617130014.1713870-1-arnd@arndb.de [akpm@linux-foundation.org: fix fs/eventpoll.c] Link: http://lkml.kernel.org/r/20190430180111.10688-1-mcroce@redhat.com Signed-off-by: Matteo Croce <mcroce@redhat.com> Signed-off-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Kees Cook <keescook@chromium.org> Reviewed-by: Aaron Tomlin <atomlin@redhat.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-06-07Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller
Some ISDN files that got removed in net-next had some changes done in mainline, take the removals. Signed-off-by: David S. Miller <davem@davemloft.net>
2019-06-04ipv6: Plumb support for nexthop object in a fib6_infoDavid Ahern
Add struct nexthop and nh_list list_head to fib6_info. nh_list is the fib6_info side of the nexthop <-> fib_info relationship. Since a fib6_info referencing a nexthop object can not have 'sibling' entries (the old way of doing multipath routes), the nh_list is a union with fib6_siblings. Add f6i_list list_head to 'struct nexthop' to track fib6_info entries using a nexthop instance. Update __remove_nexthop_fib to walk f6_list and delete fib entries using the nexthop. Add a few nexthop helpers for use when a nexthop is added to fib6_info: - nexthop_fib6_nh - return first fib6_nh in a nexthop object - fib6_info_nh_dev moved to nexthop.h and updated to use nexthop_fib6_nh if the fib6_info references a nexthop object - nexthop_path_fib6_result - similar to ipv4, select a path within a multipath nexthop object. If the nexthop is a blackhole, set fib6_result type to RTN_BLACKHOLE, and set the REJECT flag Update the fib6_info references to check for nh and take a different path as needed: - rt6_qualify_for_ecmp - if a fib entry uses a nexthop object it can NOT be coalesced with other fib entries into a multipath route - rt6_duplicate_nexthop - use nexthop_cmp if either fib6_info references a nexthop - addrconf (host routes), RA's and info entries (anything configured via ndisc) does not use nexthop objects - fib6_info_destroy_rcu - put reference to nexthop object - fib6_purge_rt - drop fib6_info from f6i_list - fib6_select_path - update to use the new nexthop_path_fib6_result when fib entry uses a nexthop object - rt6_device_match - update to catch use of nexthop object as a blackhole and set fib6_type and flags. - ip6_route_info_create - don't add space for fib6_nh if fib entry is going to reference a nexthop object, take a reference to nexthop object, disallow use of source routing - rt6_nlmsg_size - add space for RTA_NH_ID - add rt6_fill_node_nexthop to add nexthop data on a dump As with ipv4, most of the changes push existing code into the else branch of whether the fib entry uses a nexthop object. Update the nexthop code to walk f6i_list on a nexthop deleted to remove fib entries referencing it. Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-06-02net: use new in_dev_ifa iteratorsFlorian Westphal
Use in_dev_for_each_ifa_rcu/rtnl instead. This prevents sparse warnings once proper __rcu annotations are added. Signed-off-by: Florian Westphal <fw@strlen.de> t di# Last commands done (6 commands done): Signed-off-by: David S. Miller <davem@davemloft.net>
2019-05-31Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller
The phylink conflict was between a bug fix by Russell King to make sure we have a consistent PHY interface mode, and a change in net-next to pull some code in phylink_resolve() into the helper functions phylink_mac_link_{up,down}() On the dp83867 side it's mostly overlapping changes, with the 'net' side removing a condition that was supposed to trigger for RGMII but because of how it was coded never actually could trigger. Signed-off-by: David S. Miller <davem@davemloft.net>
2019-05-31Merge tag 'spdx-5.2-rc3-1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core Pull yet more SPDX updates from Greg KH: "Here is another set of reviewed patches that adds SPDX tags to different kernel files, based on a set of rules that are being used to parse the comments to try to determine that the license of the file is "GPL-2.0-or-later" or "GPL-2.0-only". Only the "obvious" versions of these matches are included here, a number of "non-obvious" variants of text have been found but those have been postponed for later review and analysis. There is also a patch in here to add the proper SPDX header to a bunch of Kbuild files that we have missed in the past due to new files being added and forgetting that Kbuild uses two different file names for Makefiles. This issue was reported by the Kbuild maintainer. These patches have been out for review on the linux-spdx@vger mailing list, and while they were created by automatic tools, they were hand-verified by a bunch of different people, all whom names are on the patches are reviewers" * tag 'spdx-5.2-rc3-1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (82 commits) treewide: Add SPDX license identifier - Kbuild treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 225 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 224 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 223 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 222 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 221 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 220 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 218 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 217 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 216 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 215 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 214 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 213 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 211 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 210 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 209 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 207 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 206 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 203 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 201 ...
2019-05-30treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 152Thomas Gleixner
Based on 1 normalized pattern(s): this program is free software you can redistribute it and or modify it under the terms of the gnu general public license as published by the free software foundation either version 2 of the license or at your option any later version extracted by the scancode license scanner the SPDX license identifier GPL-2.0-or-later has been chosen to replace the boilerplate/reference in 3029 file(s). Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Allison Randal <allison@lohutok.net> Cc: linux-spdx@vger.kernel.org Link: https://lkml.kernel.org/r/20190527070032.746973796@linutronix.de Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-05-24ipv6: Make fib6_nh optional at the end of fib6_infoDavid Ahern
Move fib6_nh to the end of fib6_info and make it an array of size 0. Pass a flag to fib6_info_alloc indicating if the allocation needs to add space for a fib6_nh. The current code path always has a fib6_nh allocated with a fib6_info; with nexthop objects they will be separate. Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-05-24ipv6: Move pcpu cached routes to fib6_nhDavid Ahern
rt6_info are specific instances of a fib entry and are tied to a device and gateway - ie., a nexthop. Before nexthop objects, IPv6 fib entries have separate fib6_info for each nexthop in a multipath route, so the location of the pcpu cache in the fib6_info struct worked. However, with nexthop objects a fib6_info can point to a set of nexthops (yet another alignment of ipv6 with ipv4). Accordingly, the pcpu cache needs to be moved to the fib6_nh struct so the cached entries are local to the nexthop specification used to create the rt6_info. Initialization and free of the pcpu entries moved to fib6_nh_init and fib6_nh_release. Change in location only, from fib6_info down to fib6_nh; no other functional change intended. Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-05-22Validate required parameters in inet6_validate_link_afMaxim Mikityanskiy
inet6_set_link_af requires that at least one of IFLA_INET6_TOKEN or IFLA_INET6_ADDR_GET_MODE is passed. If none of them is passed, it returns -EINVAL, which may cause do_setlink() to fail in the middle of processing other commands and give the following warning message: A link change request failed with some changes committed already. Interface eth0 may have been left with an inconsistent configuration, please check. Check the presence of at least one of them in inet6_validate_link_af to detect invalid parameters at an early stage, before do_setlink does anything. Also validate the address generation mode at an early stage. Signed-off-by: Maxim Mikityanskiy <maximmi@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-04-27netlink: make validation more configurable for future strictnessJohannes Berg
We currently have two levels of strict validation: 1) liberal (default) - undefined (type >= max) & NLA_UNSPEC attributes accepted - attribute length >= expected accepted - garbage at end of message accepted 2) strict (opt-in) - NLA_UNSPEC attributes accepted - attribute length >= expected accepted Split out parsing strictness into four different options: * TRAILING - check that there's no trailing data after parsing attributes (in message or nested) * MAXTYPE - reject attrs > max known type * UNSPEC - reject attributes with NLA_UNSPEC policy entries * STRICT_ATTRS - strictly validate attribute size The default for future things should be *everything*. The current *_strict() is a combination of TRAILING and MAXTYPE, and is renamed to _deprecated_strict(). The current regular parsing has none of this, and is renamed to *_parse_deprecated(). Additionally it allows us to selectively set one of the new flags even on old policies. Notably, the UNSPEC flag could be useful in this case, since it can be arranged (by filling in the policy) to not be an incompatible userspace ABI change, but would then going forward prevent forgetting attribute entries. Similar can apply to the POLICY flag. We end up with the following renames: * nla_parse -> nla_parse_deprecated * nla_parse_strict -> nla_parse_deprecated_strict * nlmsg_parse -> nlmsg_parse_deprecated * nlmsg_parse_strict -> nlmsg_parse_deprecated_strict * nla_parse_nested -> nla_parse_nested_deprecated * nla_validate_nested -> nla_validate_nested_deprecated Using spatch, of course: @@ expression TB, MAX, HEAD, LEN, POL, EXT; @@ -nla_parse(TB, MAX, HEAD, LEN, POL, EXT) +nla_parse_deprecated(TB, MAX, HEAD, LEN, POL, EXT) @@ expression NLH, HDRLEN, TB, MAX, POL, EXT; @@ -nlmsg_parse(NLH, HDRLEN, TB, MAX, POL, EXT) +nlmsg_parse_deprecated(NLH, HDRLEN, TB, MAX, POL, EXT) @@ expression NLH, HDRLEN, TB, MAX, POL, EXT; @@ -nlmsg_parse_strict(NLH, HDRLEN, TB, MAX, POL, EXT) +nlmsg_parse_deprecated_strict(NLH, HDRLEN, TB, MAX, POL, EXT) @@ expression TB, MAX, NLA, POL, EXT; @@ -nla_parse_nested(TB, MAX, NLA, POL, EXT) +nla_parse_nested_deprecated(TB, MAX, NLA, POL, EXT) @@ expression START, MAX, POL, EXT; @@ -nla_validate_nested(START, MAX, POL, EXT) +nla_validate_nested_deprecated(START, MAX, POL, EXT) @@ expression NLH, HDRLEN, MAX, POL, EXT; @@ -nlmsg_validate(NLH, HDRLEN, MAX, POL, EXT) +nlmsg_validate_deprecated(NLH, HDRLEN, MAX, POL, EXT) For this patch, don't actually add the strict, non-renamed versions yet so that it breaks compile if I get it wrong. Also, while at it, make nla_validate and nla_parse go down to a common __nla_validate_parse() function to avoid code duplication. Ultimately, this allows us to have very strict validation for every new caller of nla_parse()/nlmsg_parse() etc as re-introduced in the next patch, while existing things will continue to work as is. In effect then, this adds fully strict validation for any new command. Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-04-27netlink: make nla_nest_start() add NLA_F_NESTED flagMichal Kubecek
Even if the NLA_F_NESTED flag was introduced more than 11 years ago, most netlink based interfaces (including recently added ones) are still not setting it in kernel generated messages. Without the flag, message parsers not aware of attribute semantics (e.g. wireshark dissector or libmnl's mnl_nlmsg_fprintf()) cannot recognize nested attributes and won't display the structure of their contents. Unfortunately we cannot just add the flag everywhere as there may be userspace applications which check nlattr::nla_type directly rather than through a helper masking out the flags. Therefore the patch renames nla_nest_start() to nla_nest_start_noflag() and introduces nla_nest_start() as a wrapper adding NLA_F_NESTED. The calls which add NLA_F_NESTED manually are rewritten to use nla_nest_start(). Except for changes in include/net/netlink.h, the patch was generated using this semantic patch: @@ expression E1, E2; @@ -nla_nest_start(E1, E2) +nla_nest_start_noflag(E1, E2) @@ expression E1, E2; @@ -nla_nest_start_noflag(E1, E2 | NLA_F_NESTED) +nla_nest_start(E1, E2) Signed-off-by: Michal Kubecek <mkubecek@suse.cz> Acked-by: Jiri Pirko <jiri@mellanox.com> Acked-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-04-08net: Replace nhc_has_gw with nhc_gw_familyDavid Ahern
Allow the gateway in a fib_nh_common to be from a different address family than the outer fib{6}_nh. To that end, replace nhc_has_gw with nhc_gw_family and update users of nhc_has_gw to check nhc_gw_family. Now nhc_family is used to know if the nh_common is part of a fib_nh or fib6_nh (used for container_of to get to route family specific data), and nhc_gw_family represents the address family for the gateway. Signed-off-by: David Ahern <dsahern@gmail.com> Reviewed-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-03-29ipv6: Rename fib6_nh entriesDavid Ahern
Rename fib6_nh entries that will be moved to a fib_nh_common struct. Specifically, the device, gateway, flags, and lwtstate are common with all nexthop definitions. In some places new temporary variables are declared or local variables renamed to maintain line lengths. Rename only; no functional change intended. Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-03-29ipv6: Move gateway checks to a fib6_nh settingDavid Ahern
The gateway setting is not per fib6_info entry but per-fib6_nh. Add a new fib_nh_has_gw flag to fib6_nh and convert references to RTF_GATEWAY to the new flag. For IPv6 address the flag is cheaper than checking that nh_gw is non-0 like IPv4 does. While this increases fib6_nh by 8-bytes, the effective allocation size of a fib6_info is unchanged. The 8 bytes is recovered later with a fib_nh_common change. Signed-off-by: David Ahern <dsahern@gmail.com> Reviewed-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-03-04net: ignore sysctl_devconf_inherit_init_net without SYSCTLArnd Bergmann
When CONFIG_SYSCTL is turned off, we get a link failure for the newly introduced tuning knob. net/ipv6/addrconf.o: In function `addrconf_init_net': addrconf.c:(.text+0x31dc): undefined reference to `sysctl_devconf_inherit_init_net' Add an IS_ENABLED() check to fall back to the default behavior (sysctl_devconf_inherit_init_net=0) here. Fixes: 856c395cfa63 ("net: introduce a knob to control whether to inherit devconf config") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Christian Brauner <christian@brauner.io> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-15Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller
The netfilter conflicts were rather simple overlapping changes. However, the cls_tcindex.c stuff was a bit more complex. On the 'net' side, Cong is fixing several races and memory leaks. Whilst on the 'net-next' side we have Vlad adding the rtnl-ness support. What I've decided to do, in order to resolve this, is revert the conversion over to using a workqueue that Cong did, bringing us back to pure RCU. I did it this way because I believe that either Cong's races don't apply with have Vlad did things, or Cong will have to implement the race fix slightly differently. Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-11net: fix IPv6 prefix route residueZhiqiang Liu
Follow those steps: # ip addr add 2001:123::1/32 dev eth0 # ip addr add 2001:123:456::2/64 dev eth0 # ip addr del 2001:123::1/32 dev eth0 # ip addr del 2001:123:456::2/64 dev eth0 and then prefix route of 2001:123::1/32 will still exist. This is because ipv6_prefix_equal in check_cleanup_prefix_route func does not check whether two IPv6 addresses have the same prefix length. If the prefix of one address starts with another shorter address prefix, even though their prefix lengths are different, the return value of ipv6_prefix_equal is true. Here I add a check of whether two addresses have the same prefix to decide whether their prefixes are equal. Fixes: 5b84efecb7d9 ("ipv6 addrconf: don't cleanup prefix route for IFA_F_NOPREFIXROUTE") Signed-off-by: Zhiqiang Liu <liuzhiqiang26@huawei.com> Reported-by: Wenhao Zhang <zhangwenhao8@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-01-27Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller
2019-01-22net/ipv6: lower the level of "link is not ready" messagesLubomir Rintel
This message gets logged far too often for how interesting is it. Most distributions nowadays configure NetworkManager to use randomly generated MAC addresses for Wi-Fi network scans. The interfaces end up being periodically brought down for the address change. When they're subsequently brought back up, the message is logged, eventually flooding the log. Perhaps the message is not all that helpful: it seems to be more interesting to hear when the addrconf actually start, not when it does not. Let's lower its level. Signed-off-by: Lubomir Rintel <lkundrak@v3.sk> Acked-By: Thomas Haller <thaller@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-01-22net/ipv6: don't return positive numbers when nothing was dumpedJakub Kicinski
in6_dump_addrs() returns a positive 1 if there was nothing to dump. This return value can not be passed as return from inet6_dump_addr() as is, because it will confuse rtnetlink, resulting in NLMSG_DONE never getting set: $ ip addr list dev lo EOF on netlink Dump terminated v2: flip condition to avoid a new goto (DaveA) Fixes: 7c1e8a3817c5 ("netlink: fixup regression in RTM_GETADDR") Reported-by: Brendan Galloway <brendan.galloway@netronome.com> Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: David Ahern <dsahern@gmail.com> Tested-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-01-22net: introduce a knob to control whether to inherit devconf configCong Wang
There have been many people complaining about the inconsistent behaviors of IPv4 and IPv6 devconf when creating new network namespaces. Currently, for IPv4, we inherit all current settings from init_net, but for IPv6 we reset all setting to default. This patch introduces a new /proc file /proc/sys/net/core/devconf_inherit_init_net to control the behavior of whether to inhert sysctl current settings from init_net. This file itself is only available in init_net. As demonstrated below: Initial setup in init_net: # cat /proc/sys/net/ipv4/conf/all/rp_filter 2 # cat /proc/sys/net/ipv6/conf/all/accept_dad 1 Default value 0 (current behavior): # ip netns del test # ip netns add test # ip netns exec test cat /proc/sys/net/ipv4/conf/all/rp_filter 2 # ip netns exec test cat /proc/sys/net/ipv6/conf/all/accept_dad 0 Set to 1 (inherit from init_net): # echo 1 > /proc/sys/net/core/devconf_inherit_init_net # ip netns del test # ip netns add test # ip netns exec test cat /proc/sys/net/ipv4/conf/all/rp_filter 2 # ip netns exec test cat /proc/sys/net/ipv6/conf/all/accept_dad 1 Set to 2 (reset to default): # echo 2 > /proc/sys/net/core/devconf_inherit_init_net # ip netns del test # ip netns add test # ip netns exec test cat /proc/sys/net/ipv4/conf/all/rp_filter 0 # ip netns exec test cat /proc/sys/net/ipv6/conf/all/accept_dad 0 Set to a value out of range (invalid): # echo 3 > /proc/sys/net/core/devconf_inherit_init_net -bash: echo: write error: Invalid argument # echo -1 > /proc/sys/net/core/devconf_inherit_init_net -bash: echo: write error: Invalid argument Reported-by: Zhu Yanjun <Yanjun.Zhu@windriver.com> Reported-by: Tonghao Zhang <xiangxia.m.yue@gmail.com> Cc: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Acked-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Acked-by: Tonghao Zhang <xiangxia.m.yue@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-01-19net: ipv6: netconf: perform strict checks also for doit handlersJakub Kicinski
Make RTM_GETNETCONF's doit handler use strict checks when NETLINK_F_STRICT_CHK is set. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-01-19net: ipv6: addr: perform strict checks also for doit handlersJakub Kicinski
Make RTM_GETADDR's doit handler use strict checks when NETLINK_F_STRICT_CHK is set. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-01-04netlink: fixup regression in RTM_GETADDRArthur Gautier
This commit fixes a regression in AF_INET/RTM_GETADDR and AF_INET6/RTM_GETADDR. Before this commit, the kernel would stop dumping addresses once the first skb was full and end the stream with NLMSG_DONE(-EMSGSIZE). The error shouldn't be sent back to netlink_dump so the callback is kept alive. The userspace is expected to call back with a new empty skb. Changes from V1: - The error is not handled in netlink_dump anymore but rather in inet_dump_ifaddr and inet6_dump_addr directly as suggested by David Ahern. Fixes: d7e38611b81e ("net/ipv4: Put target net when address dump fails due to bad attributes") Fixes: 242afaa6968c ("net/ipv6: Put target net when address dump fails due to bad attributes") Cc: David Ahern <dsahern@gmail.com> Cc: "David S . Miller" <davem@davemloft.net> Cc: netdev@vger.kernel.org Signed-off-by: Arthur Gautier <baloo@gandi.net> Reviewed-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-30net/ipv6: Fix a test against 'ipv6_find_idev()' return valueChristophe JAILLET
'ipv6_find_idev()' returns NULL on error, not an error pointer. Update the test accordingly and return -ENOBUFS, as already done in 'addrconf_add_dev()', if NULL is returned. Fixes: ("ipv6: allow userspace to add IFA_F_OPTIMISTIC addresses") Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-06net: core: dev: Add extack argument to dev_open()Petr Machata
In order to pass extack together with NETDEV_PRE_UP notifications, it's necessary to route the extack to __dev_open() from diverse (possibly indirect) callers. One prominent API through which the notification is invoked is dev_open(). Therefore extend dev_open() with and extra extack argument and update all users. Most of the calls end up just encoding NULL, but bond and team drivers have the extack readily available. Signed-off-by: Petr Machata <petrm@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Reviewed-by: Ido Schimmel <idosch@mellanox.com> Reviewed-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-23net/ipv6: re-do dad when interface has IFF_NOARP flag changeHangbin Liu
When we add a new IPv6 address, we should also join corresponding solicited-node multicast address, unless the interface has IFF_NOARP flag, as function addrconf_join_solict() did. But if we remove IFF_NOARP flag later, we do not do dad and add the mcast address. So we will drop corresponding neighbour discovery message that came from other nodes. A typical example is after creating a ipvlan with mode l3, setting up an ipv6 address and changing the mode to l2. Then we will not be able to ping this address as the interface doesn't join related solicited-node mcast address. Fix it by re-doing dad when interface changed IFF_NOARP flag. Then we will add corresponding mcast group and check if there is a duplicate address on the network. Reported-by: Jianlin Shi <jishi@redhat.com> Reviewed-by: Stefano Brivio <sbrivio@redhat.com> Signed-off-by: Hangbin Liu <liuhangbin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-25net/{ipv4,ipv6}: Do not put target net if input nsid is invalidBjørn Mork
The cleanup path will put the target net when netnsid is set. So we must reset netnsid if the input is invalid. Fixes: d7e38611b81e ("net/ipv4: Put target net when address dump fails due to bad attributes") Fixes: 242afaa6968c ("net/ipv6: Put target net when address dump fails due to bad attributes") Cc: David Ahern <dsahern@gmail.com> Signed-off-by: Bjørn Mork <bjorn@mork.no> Reviewed-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-24net/ipv6: Put target net when address dump fails due to bad attributesDavid Ahern
If tgt_net is set based on IFA_TARGET_NETNSID attribute in the dump request, make sure all error paths call put_net. Fixes: 6371a71f3a3b ("net/ipv6: Add support for dumping addresses for a specific device") Fixes: ed6eff11790a ("net/ipv6: Update inet6_dump_addr for strict data checking") Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>