summaryrefslogtreecommitdiff
path: root/net
AgeCommit message (Collapse)Author
2017-04-25net sched actions: Complete the JUMPX opcodeJamal Hadi Salim
per discussion at netconf/netdev: When we have an action that is capable of branching (example a policer), we can achieve a continuation of the action graph by programming a "continue" where we find an exact replica of the same filter rule with a lower priority and the remainder of the action graph. When you have 100s of thousands of filters which require such a feature it gets very inefficient to do two lookups. This patch completes a leftover feature of action codes. Its time has come. Example below where a user labels packets with a different skbmark on ingress of a port depending on whether they have/not exceeded the configured rate. This mark is then used to make further decisions on some egress port. #rate control, very low so we can easily see the effect sudo $TC actions add action police rate 1kbit burst 90k \ conform-exceed pipe/jump 2 index 10 # skbedit index 11 will be used if the user conforms sudo $TC actions add action skbedit mark 11 ok index 11 # skbedit index 12 will be used if the user does not conform sudo $TC actions add action skbedit mark 12 ok index 12 #lets bind the user .. sudo $TC filter add dev $ETH parent ffff: protocol ip prio 8 u32 \ match ip dst 127.0.0.8/32 flowid 1:10 \ action police index 10 \ action skbedit index 11 \ action skbedit index 12 #run a ping -f and see what happens.. # jhs@foobar:~$ sudo $TC -s filter ls dev $ETH parent ffff: protocol ip filter pref 8 u32 filter pref 8 u32 fh 800: ht divisor 1 filter pref 8 u32 fh 800::800 order 2048 key ht 800 bkt 0 flowid 1:10 (rule hit 2800 success 1005) match 7f000008/ffffffff at 16 (success 1005 ) action order 1: police 0xa rate 1Kbit burst 23440b mtu 2Kb action pipe/jump 2 overhead 0b ref 2 bind 1 installed 207 sec used 122 sec Action statistics: Sent 84420 bytes 1005 pkt (dropped 0, overlimits 721 requeues 0) backlog 0b 0p requeues 0 action order 2: skbedit mark 11 pass index 11 ref 2 bind 1 installed 204 sec used 122 sec Action statistics: Sent 60564 bytes 721 pkt (dropped 0, overlimits 0 requeues 0) backlog 0b 0p requeues 0 action order 3: skbedit mark 12 pass index 12 ref 2 bind 1 installed 201 sec used 122 sec Action statistics: Sent 23856 bytes 284 pkt (dropped 0, overlimits 0 requeues 0) backlog 0b 0p requeues 0 Not bad, about 28% non-conforming packets.. Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-04-25netfilter: Wrong icmp6 checksum for ICMPV6_TIME_EXCEED in reverse SNATv6 pathDave Johnson
When recalculating the outer ICMPv6 checksum for a reverse path NATv6 such as ICMPV6_TIME_EXCEED nf_nat_icmpv6_reply_translation() was accessing data beyond the headlen of the skb for non-linear skb. This resulted in incorrect ICMPv6 checksum as garbage data was used. Patch replaces csum_partial() with skb_checksum() which supports non-linear skbs similar to nf_nat_icmp_reply_translation() from ipv4. Signed-off-by: Dave Johnson <dave-kernel@centerclick.org> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2017-04-25netfilter: nft_dynset: continue to next expr if _OP_ADD succeededLiping Zhang
Currently, after adding the following nft rules: # nft add set x target1 { type ipv4_addr \; flags timeout \;} # nft add rule x y set add ip daddr timeout 1d @target1 counter the counters will always be zero despite of the elements are added to the dynamic set "target1" or not, as we will break the nft expr traversal unconditionally: # nft list ruleset ... set target1 { ... elements = { 8.8.8.8 expires 23h59m53s} } chain output { ... set add ip daddr timeout 1d @target1 counter packets 0 bytes 0 ^ ^ ... } Since we add the elements to the set successfully, we should continue to the next expression. Additionally, if elements are added to "flow table" successfully, we will _always_ continue to the next expr, even if the operation is _OP_ADD. So it's better to keep them to be consistent. Fixes: 22fe54d5fefc ("netfilter: nf_tables: add support for dynamic set updates") Reported-by: Robert White <rwhite@pobox.com> Signed-off-by: Liping Zhang <zlpnobody@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2017-04-25bridge: ebtables: fix reception of frames DNAT-ed to bridge device/portLinus Lüssing
When trying to redirect bridged frames to the bridge device itself or a bridge port (brouting) via the dnat target then this currently fails: The ethernet destination of the frame is dnat'ed to the MAC address of the bridge device or port just fine. However, the IP code drops it in the beginning of ip_input.c/ip_rcv() as the dnat target left the skb->pkt_type as PACKET_OTHERHOST. Fixing this by resetting skb->pkt_type to an appropriate type after dnat'ing. Signed-off-by: Linus Lüssing <linus.luessing@c0d3.blue> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2017-04-25can: network namespace support for CAN gatewayOliver Hartkopp
The CAN gateway was not implemented as per-net in the initial network namespace support by Mario Kicherer (8e8cda6d737d). This patch enables the CAN gateway to be used in different namespaces. Signed-off-by: Oliver Hartkopp <socketcan@hartkopp.net> Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2017-04-25can: network namespace support for CAN_BCM protocolOliver Hartkopp
The CAN_BCM protocol and its procfs entries were not implemented as per-net in the initial network namespace support by Mario Kicherer (8e8cda6d737d). This patch adds the missing per-net functionality for the CAN BCM. Signed-off-by: Oliver Hartkopp <socketcan@hartkopp.net> Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2017-04-25can: complete initial namespace supportOliver Hartkopp
The statistics and its proc output was not implemented as per-net in the initial network namespace support by Mario Kicherer (8e8cda6d737d). This patch adds the missing per-net statistics for the CAN subsystem. Signed-off-by: Oliver Hartkopp <socketcan@hartkopp.net> Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2017-04-25can: remove obsolete definitionsOliver Hartkopp
can_rx_alldev_list is a per-net data structure now. Remove it's definition here and can_rx_dev_list too. Signed-off-by: Oliver Hartkopp <socketcan@hartkopp.net> Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2017-04-25can: remove obsolete pernet_operations definitionsOliver Hartkopp
The namespace support for the CAN subsystem does not need any additional memory. So when ".size = 0" there's no extra memory allocated by the system. And therefore ".id" is obsolete too. Signed-off-by: Oliver Hartkopp <socketcan@hartkopp.net> Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2017-04-25can: fix memory leak in initial namespace supportOliver Hartkopp
The can_rx_alldev_list is a per-net data structure now and allocated in can_pernet_init(). Make sure the memory is free'd in can_pernet_exit() too. Signed-off-by: Oliver Hartkopp <socketcan@hartkopp.net> Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2017-04-25Bluetooth: convert smp and selftest to crypto kpp APISalvatore Benedetto
* Convert both smp and selftest to crypto kpp API * Remove module ecc as no more required * Add ecdh_helper functions for wrapping kpp async calls This patch has been tested *only* with selftest, which is called on module loading. Signed-off-by: Salvatore Benedetto <salvatore.benedetto@intel.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2017-04-24tipc: check return value of nlmsg_newPan Bian
Function nlmsg_new() will return a NULL pointer if there is no enough memory, and its return value should be checked before it is used. However, in function tipc_nl_node_get_monitor(), the validation of the return value of function nlmsg_new() is missed. This patch fixes the bug. Signed-off-by: Pan Bian <bianpan2016@163.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-04-24lwtunnel: check return value of nla_nest_startPan Bian
Function nla_nest_start() may return a NULL pointer on error. However, in function lwtunnel_fill_encap(), the return value of nla_nest_start() is not validated before it is used. This patch checks the return value of nla_nest_start() against NULL. Signed-off-by: Pan Bian <bianpan2016@163.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-04-24cls_flower: add support for matching MPLS fields (v2)Benjamin LaHaise
Add support to the tc flower classifier to match based on fields in MPLS labels (TTL, Bottom of Stack, TC field, Label). Signed-off-by: Benjamin LaHaise <benjamin.lahaise@netronome.com> Signed-off-by: Benjamin LaHaise <bcrl@kvack.org> Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: Simon Horman <simon.horman@netronome.com> Cc: Jamal Hadi Salim <jhs@mojatatu.com> Cc: Cong Wang <xiyou.wangcong@gmail.com> Cc: Jiri Pirko <jiri@mellanox.com> Cc: Eric Dumazet <edumazet@google.com> Cc: Hadar Hen Zion <hadarh@mellanox.com> Cc: Gao Feng <fgao@ikuai8.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-04-24flow_dissector: add mpls support (v2)Benjamin LaHaise
Add support for parsing MPLS flows to the flow dissector in preparation for adding MPLS match support to cls_flower. Signed-off-by: Benjamin LaHaise <benjamin.lahaise@netronome.com> Signed-off-by: Benjamin LaHaise <bcrl@kvack.org> Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: Simon Horman <simon.horman@netronome.com> Cc: Jamal Hadi Salim <jhs@mojatatu.com> Cc: Cong Wang <xiyou.wangcong@gmail.com> Cc: Jiri Pirko <jiri@mellanox.com> Cc: Eric Dumazet <jhs@mojatatu.com> Cc: Hadar Hen Zion <hadarh@mellanox.com> Cc: Gao Feng <fgao@ikuai8.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-04-24net/tcp_fastopen: Remove mss check in tcp_write_timeout()Wei Wang
Christoph Paasch from Apple found another firewall issue for TFO: After successful 3WHS using TFO, server and client starts to exchange data. Afterwards, a 10s idle time occurs on this connection. After that, firewall starts to drop every packet on this connection. The fix for this issue is to extend existing firewall blackhole detection logic in tcp_write_timeout() by removing the mss check. Signed-off-by: Wei Wang <weiwan@google.com> Acked-by: Yuchung Cheng <ycheng@google.com> Acked-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-04-24net/tcp_fastopen: Add snmp counter for blackhole detectionWei Wang
This counter records the number of times the firewall blackhole issue is detected and active TFO is disabled. Signed-off-by: Wei Wang <weiwan@google.com> Acked-by: Yuchung Cheng <ycheng@google.com> Acked-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-04-24net/tcp_fastopen: Disable active side TFO in certain scenariosWei Wang
Middlebox firewall issues can potentially cause server's data being blackholed after a successful 3WHS using TFO. Following are the related reports from Apple: https://www.nanog.org/sites/default/files/Paasch_Network_Support.pdf Slide 31 identifies an issue where the client ACK to the server's data sent during a TFO'd handshake is dropped. C ---> syn-data ---> S C <--- syn/ack ----- S C (accept & write) C <---- data ------- S C ----- ACK -> X S [retry and timeout] https://www.ietf.org/proceedings/94/slides/slides-94-tcpm-13.pdf Slide 5 shows a similar situation that the server's data gets dropped after 3WHS. C ---- syn-data ---> S C <--- syn/ack ----- S C ---- ack --------> S S (accept & write) C? X <- data ------ S [retry and timeout] This is the worst failure b/c the client can not detect such behavior to mitigate the situation (such as disabling TFO). Failing to proceed, the application (e.g., SSL library) may simply timeout and retry with TFO again, and the process repeats indefinitely. The proposed solution is to disable active TFO globally under the following circumstances: 1. client side TFO socket detects out of order FIN 2. client side TFO socket receives out of order RST We disable active side TFO globally for 1hr at first. Then if it happens again, we disable it for 2h, then 4h, 8h, ... And we reset the timeout to 1hr if a client side TFO sockets not opened on loopback has successfully received data segs from server. And we examine this condition during close(). The rational behind it is that when such firewall issue happens, application running on the client should eventually close the socket as it is not able to get the data it is expecting. Or application running on the server should close the socket as it is not able to receive any response from client. In both cases, out of order FIN or RST will get received on the client given that the firewall will not block them as no data are in those frames. And we want to disable active TFO globally as it helps if the middle box is very close to the client and most of the connections are likely to fail. Also, add a debug sysctl: tcp_fastopen_blackhole_detect_timeout_sec: the initial timeout to use when firewall blackhole issue happens. This can be set and read. When setting it to 0, it means to disable the active disable logic. Signed-off-by: Wei Wang <weiwan@google.com> Acked-by: Yuchung Cheng <ycheng@google.com> Acked-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-04-24Merge tag 'mlx5-updates-2017-04-22' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux Saeed Mahameed says: ==================== mlx5-updates-2017-04-22 Sparse and compiler warnings fixes from Stephen Hemminger. From Roi Dayan and Or Gerlitz, Add devlink and mlx5 support for controlling E-Switch encapsulation mode, this knob will enable HW support for applying encapsulation/decapsulation to VF traffic as part of SRIOV e-switch offloading. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2017-04-24net: add rcu locking when changing early demuxDavid Ahern
systemd-sysctl is triggering a suspicious RCU usage message when net.ipv4.tcp_early_demux or net.ipv4.udp_early_demux is changed via a sysctl config file: [ 33.896184] =============================== [ 33.899558] [ ERR: suspicious RCU usage. ] [ 33.900624] 4.11.0-rc7+ #104 Not tainted [ 33.901698] ------------------------------- [ 33.903059] /home/dsa/kernel-2.git/net/ipv4/sysctl_net_ipv4.c:305 suspicious rcu_dereference_check() usage! [ 33.905724] other info that might help us debug this: [ 33.907656] rcu_scheduler_active = 2, debug_locks = 0 [ 33.909288] 1 lock held by systemd-sysctl/143: [ 33.910373] #0: (sb_writers#5){.+.+.+}, at: [<ffffffff8123a370>] file_start_write+0x45/0x48 [ 33.912407] stack backtrace: [ 33.914018] CPU: 0 PID: 143 Comm: systemd-sysctl Not tainted 4.11.0-rc7+ #104 [ 33.915631] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.7.5-20140531_083030-gandalf 04/01/2014 [ 33.917870] Call Trace: [ 33.918431] dump_stack+0x81/0xb6 [ 33.919241] lockdep_rcu_suspicious+0x10f/0x118 [ 33.920263] proc_configure_early_demux+0x65/0x10a [ 33.921391] proc_udp_early_demux+0x3a/0x41 add rcu locking to proc_configure_early_demux. Fixes: dddb64bcb3461 ("net: Add sysctl to toggle early demux for tcp and udp") Signed-off-by: David Ahern <dsa@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-04-24net: ipv6: send unsolicited NA if enabled for all interfacesDavid Ahern
When arp_notify is set to 1 for either a specific interface or for 'all' interfaces, gratuitous arp requests are sent. Since ndisc_notify is the ipv6 equivalent to arp_notify, it should follow the same semantics. Commit 4a6e3c5def13 ("net: ipv6: send unsolicited NA on admin up") sends the NA on admin up. The final piece is checking devconf_all->ndisc_notify in addition to the per device setting. Add it. Fixes: 5cb04436eef6 ("ipv6: add knob to send unsolicited ND on link-layer address change") Signed-off-by: David Ahern <dsa@cumulusnetworks.com> Reviewed-by: Simon Horman <simon.horman@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-04-24netfilter: xt_socket: Fix broken IPv6 handlingPeter Tirsek
Commit 834184b1f3a4 ("netfilter: defrag: only register defrag functionality if needed") used the outdated XT_SOCKET_HAVE_IPV6 macro which was removed earlier in commit 8db4c5be88f6 ("netfilter: move socket lookup infrastructure to nf_socket_ipv{4,6}.c"). With that macro never being defined, the xt_socket match emits an "Unknown family 10" warning when used with IPv6: WARNING: CPU: 0 PID: 1377 at net/netfilter/xt_socket.c:160 socket_mt_enable_defrag+0x47/0x50 [xt_socket] Unknown family 10 Modules linked in: xt_socket nf_socket_ipv4 nf_socket_ipv6 nf_defrag_ipv4 [...] CPU: 0 PID: 1377 Comm: ip6tables-resto Not tainted 4.10.10 #1 Hardware name: [...] Call Trace: ? __warn+0xe7/0x100 ? socket_mt_enable_defrag+0x47/0x50 [xt_socket] ? socket_mt_enable_defrag+0x47/0x50 [xt_socket] ? warn_slowpath_fmt+0x39/0x40 ? socket_mt_enable_defrag+0x47/0x50 [xt_socket] ? socket_mt_v2_check+0x12/0x40 [xt_socket] ? xt_check_match+0x6b/0x1a0 [x_tables] ? xt_find_match+0x93/0xd0 [x_tables] ? xt_request_find_match+0x20/0x80 [x_tables] ? translate_table+0x48e/0x870 [ip6_tables] ? translate_table+0x577/0x870 [ip6_tables] ? walk_component+0x3a/0x200 ? kmalloc_order+0x1d/0x50 ? do_ip6t_set_ctl+0x181/0x490 [ip6_tables] ? filename_lookup+0xa5/0x120 ? nf_setsockopt+0x3a/0x60 ? ipv6_setsockopt+0xb0/0xc0 ? sock_common_setsockopt+0x23/0x30 ? SyS_socketcall+0x41d/0x630 ? vfs_read+0xfa/0x120 ? do_fast_syscall_32+0x7a/0x110 ? entry_SYSENTER_32+0x47/0x71 This patch brings the conditional back in line with how the rest of the file handles IPv6. Fixes: 834184b1f3a4 ("netfilter: defrag: only register defrag functionality if needed") Signed-off-by: Peter Tirsek <peter@tirsek.com> Acked-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2017-04-24netfilter: ctnetlink: acquire ct->lock before operating nf_ct_seqadjLiping Zhang
We should acquire the ct->lock before accessing or modifying the nf_ct_seqadj, as another CPU may modify the nf_ct_seqadj at the same time during its packet proccessing. Signed-off-by: Liping Zhang <zlpnobody@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2017-04-24netfilter: ctnetlink: make it safer when updating ct->statusLiping Zhang
After converting to use rcu for conntrack hash, one CPU may update the ct->status via ctnetlink, while another CPU may process the packets and update the ct->status. So the non-atomic operation "ct->status |= status;" via ctnetlink becomes unsafe, and this may clear the IPS_DYING_BIT bit set by another CPU unexpectedly. For example: CPU0 CPU1 ctnetlink_change_status __nf_conntrack_find_get old = ct->status nf_ct_gc_expired - nf_ct_kill - test_and_set_bit(IPS_DYING_BIT new = old | status; - ct->status = new; <-- oops, _DYING_ is cleared! Now using a series of atomic bit operation to solve the above issue. Also note, user shouldn't set IPS_TEMPLATE, IPS_SEQ_ADJUST directly, so make these two bits be unchangable too. If we set the IPS_TEMPLATE_BIT, ct will be freed by nf_ct_tmpl_free, but actually it is alloced by nf_conntrack_alloc. If we set the IPS_SEQ_ADJUST_BIT, this may cause the NULL pointer deference, as the nfct_seqadj(ct) maybe NULL. Last, add some comments to describe the logic change due to the commit a963d710f367 ("netfilter: ctnetlink: Fix regression in CTA_STATUS processing"), which makes me feel a little confusing. Fixes: 76507f69c44e ("[NETFILTER]: nf_conntrack: use RCU for conntrack hash") Signed-off-by: Liping Zhang <zlpnobody@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2017-04-24netfilter: ctnetlink: fix deadlock due to acquire _expect_lock twiceLiping Zhang
Currently, ctnetlink_change_conntrack is always protected by _expect_lock, but this will cause a deadlock when deleting the helper from a conntrack, as the _expect_lock will be acquired again by nf_ct_remove_expectations: CPU0 ---- lock(nf_conntrack_expect_lock); lock(nf_conntrack_expect_lock); *** DEADLOCK *** May be due to missing lock nesting notation 2 locks held by lt-conntrack_gr/12853: #0: (&table[i].mutex){+.+.+.}, at: [<ffffffffa05e2009>] nfnetlink_rcv_msg+0x399/0x6a9 [nfnetlink] #1: (nf_conntrack_expect_lock){+.....}, at: [<ffffffffa05f2c1f>] ctnetlink_new_conntrack+0x17f/0x408 [nf_conntrack_netlink] Call Trace: dump_stack+0x85/0xc2 __lock_acquire+0x1608/0x1680 ? ctnetlink_parse_tuple_proto+0x10f/0x1c0 [nf_conntrack_netlink] lock_acquire+0x100/0x1f0 ? nf_ct_remove_expectations+0x32/0x90 [nf_conntrack] _raw_spin_lock_bh+0x3f/0x50 ? nf_ct_remove_expectations+0x32/0x90 [nf_conntrack] nf_ct_remove_expectations+0x32/0x90 [nf_conntrack] ctnetlink_change_helper+0xc6/0x190 [nf_conntrack_netlink] ctnetlink_new_conntrack+0x1b2/0x408 [nf_conntrack_netlink] nfnetlink_rcv_msg+0x60a/0x6a9 [nfnetlink] ? nfnetlink_rcv_msg+0x1b9/0x6a9 [nfnetlink] ? nfnetlink_bind+0x1a0/0x1a0 [nfnetlink] netlink_rcv_skb+0xa4/0xc0 nfnetlink_rcv+0x87/0x770 [nfnetlink] Since the operations are unrelated to nf_ct_expect, so we can drop the _expect_lock. Also note, after removing the _expect_lock protection, another CPU may invoke nf_conntrack_helper_unregister, so we should use rcu_read_lock to protect __nf_conntrack_helper_find invoked by ctnetlink_change_helper. Fixes: ca7433df3a67 ("netfilter: conntrack: seperate expect locking from nf_conntrack_lock") Signed-off-by: Liping Zhang <zlpnobody@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2017-04-24netfilter: ctnetlink: drop the incorrect cthelper module requestLiping Zhang
First, when creating a new ct, we will invoke request_module to try to load the related inkernel cthelper. So there's no need to call the request_module again when updating the ct helpinfo. Second, ctnetlink_change_helper may be called with rcu_read_lock held, i.e. rcu_read_lock -> nfqnl_recv_verdict -> nfqnl_ct_parse -> ctnetlink_glue_parse -> ctnetlink_glue_parse_ct -> ctnetlink_change_helper. But the request_module invocation may sleep, so we can't call it with the rcu_read_lock held. Remove it now. Signed-off-by: Liping Zhang <zlpnobody@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2017-04-24netfilter: nft_set_bitmap: free dummy elements when destroy the setLiping Zhang
We forget to free dummy elements when deleting the set. So when I was running nft-test.py, I saw many kmemleak warnings: kmemleak: 1344 new suspected memory leaks ... # cat /sys/kernel/debug/kmemleak unreferenced object 0xffff8800631345c8 (size 32): comm "nft", pid 9075, jiffies 4295743309 (age 1354.815s) hex dump (first 32 bytes): f8 63 13 63 00 88 ff ff 88 79 13 63 00 88 ff ff .c.c.....y.c.... 04 0c 00 00 00 00 00 00 00 00 00 00 08 03 00 00 ................ backtrace: [<ffffffff819059da>] kmemleak_alloc+0x4a/0xa0 [<ffffffff81288174>] __kmalloc+0x164/0x310 [<ffffffffa061269d>] nft_set_elem_init+0x3d/0x1b0 [nf_tables] [<ffffffffa06130da>] nft_add_set_elem+0x45a/0x8c0 [nf_tables] [<ffffffffa0613645>] nf_tables_newsetelem+0x105/0x1d0 [nf_tables] [<ffffffffa05fe6d4>] nfnetlink_rcv+0x414/0x770 [nfnetlink] [<ffffffff817f0ca6>] netlink_unicast+0x1f6/0x310 [<ffffffff817f10c6>] netlink_sendmsg+0x306/0x3b0 ... Fixes: e920dde516088 ("netfilter: nft_set_bitmap: keep a list of dummy elements") Signed-off-by: Liping Zhang <zlpnobody@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2017-04-24netfilter: nf_ct_helper: permit cthelpers with different names via nfnetlinkLiping Zhang
cthelpers added via nfnetlink may have the same tuple, i.e. except for the l3proto and l4proto, other fields are all zero. So even with the different names, we will also fail to add them: # nfct helper add ssdp inet udp # nfct helper add tftp inet udp nfct v1.4.3: netlink error: File exists So in order to avoid unpredictable behaviour, we should: 1. cthelpers can be selected by nft ct helper obj or xt_CT target, so report error if duplicated { name, l3proto, l4proto } tuple exist. 2. cthelpers can be selected by nf_ct_tuple_src_mask_cmp when nf_ct_auto_assign_helper is enabled, so also report error if duplicated { l3proto, l4proto, src-port } tuple exist. Also note, if the cthelper is added from userspace, then the src-port will always be zero, it's invalid for nf_ct_auto_assign_helper, so there's no need to check the second point listed above. Fixes: 893e093c786c ("netfilter: nf_ct_helper: bail out on duplicated helpers") Signed-off-by: Liping Zhang <zlpnobody@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2017-04-24openvswitch: Delete conntrack entry clashing with an expectation.Jarno Rajahalme
Conntrack helpers do not check for a potentially clashing conntrack entry when creating a new expectation. Also, nf_conntrack_in() will check expectations (via init_conntrack()) only if a conntrack entry can not be found. The expectation for a packet which also matches an existing conntrack entry will not be removed by conntrack, and is currently handled inconsistently by OVS, as OVS expects the expectation to be removed when the connection tracking entry matching that expectation is confirmed. It should be noted that normally an IP stack would not allow reuse of a 5-tuple of an old (possibly lingering) connection for a new data connection, so this is somewhat unlikely corner case. However, it is possible that a misbehaving source could cause conntrack entries be created that could then interfere with new related connections. Fix this in the OVS module by deleting the clashing conntrack entry after an expectation has been matched. This causes the following nf_conntrack_in() call also find the expectation and remove it when creating the new conntrack entry, as well as the forthcoming reply direction packets to match the new related connection instead of the old clashing conntrack entry. Fixes: 7f8a436eaa2c ("openvswitch: Add conntrack action") Reported-by: Yang Song <yangsong@vmware.com> Signed-off-by: Jarno Rajahalme <jarno@ovn.org> Acked-by: Joe Stringer <joe@ovn.org> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2017-04-24netfilter: xt_CT: fix refcnt leak on error pathGao Feng
There are two cases which causes refcnt leak. 1. When nf_ct_timeout_ext_add failed in xt_ct_set_timeout, it should free the timeout refcnt. Now goto the err_put_timeout error handler instead of going ahead. 2. When the time policy is not found, we should call module_put. Otherwise, the related cthelper module cannot be removed anymore. It is easy to reproduce by typing the following command: # iptables -t raw -A OUTPUT -p tcp -j CT --helper ftp --timeout xxx Signed-off-by: Gao Feng <fgao@ikuai8.com> Signed-off-by: Liping Zhang <zlpnobody@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2017-04-24openvswitch: Add eventmask support to CT action.Jarno Rajahalme
Add a new optional conntrack action attribute OVS_CT_ATTR_EVENTMASK, which can be used in conjunction with the commit flag (OVS_CT_ATTR_COMMIT) to set the mask of bits specifying which conntrack events (IPCT_*) should be delivered via the Netfilter netlink multicast groups. Default behavior depends on the system configuration, but typically a lot of events are delivered. This can be very chatty for the NFNLGRP_CONNTRACK_UPDATE group, even if only some types of events are of interest. Netfilter core init_conntrack() adds the event cache extension, so we only need to set the ctmask value. However, if the system is configured without support for events, the setting will be skipped due to extension not being found. Signed-off-by: Jarno Rajahalme <jarno@ovn.org> Reviewed-by: Greg Rose <gvrose8192@gmail.com> Acked-by: Joe Stringer <joe@ovn.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-04-24openvswitch: Typo fix.Jarno Rajahalme
Fix typo in a comment. Signed-off-by: Jarno Rajahalme <jarno@ovn.org> Acked-by: Greg Rose <gvrose8192@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-04-24udp: disable inner UDP checksum offloads in IPsec caseAnsis Atteka
Otherwise, UDP checksum offloads could corrupt ESP packets by attempting to calculate UDP checksum when this inner UDP packet is already protected by IPsec. One way to reproduce this bug is to have a VM with virtio_net driver (UFO set to ON in the guest VM); and then encapsulate all guest's Ethernet frames in Geneve; and then further encrypt Geneve with IPsec. In this case following symptoms are observed: 1. If using ixgbe NIC, then it will complain with following error message: ixgbe 0000:01:00.1: partial checksum but l4 proto=32! 2. Receiving IPsec stack will drop all the corrupted ESP packets and increase XfrmInStateProtoError counter in /proc/net/xfrm_stat. 3. iperf UDP test from the VM with packet sizes above MTU will not work at all. 4. iperf TCP test from the VM will get ridiculously low performance because. Signed-off-by: Ansis Atteka <aatteka@ovn.org> Co-authored-by: Steffen Klassert <steffen.klassert@secunet.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-04-24ipv4: Avoid caching l3mdev dst on mismatched local routeRobert Shearman
David reported that doing the following: ip li add red type vrf table 10 ip link set dev eth1 vrf red ip addr add 127.0.0.1/8 dev red ip link set dev eth1 up ip li set red up ping -c1 -w1 -I red 127.0.0.1 ip li del red when either policy routing IP rules are present or the local table lookup ip rule is before the l3mdev lookup results in a hang with these messages: unregister_netdevice: waiting for red to become free. Usage count = 1 The problem is caused by caching the dst used for sending the packet out of the specified interface on a local route with a different nexthop interface. Thus the dst could stay around until the route in the table the lookup was done is deleted which may be never. Address the problem by not forcing output device to be the l3mdev in the flow's output interface if the lookup didn't use the l3mdev. This then results in the dst using the right device according to the route. Changes in v2: - make the dev_out passed in by __ip_route_output_key_hash correct instead of checking the nh dev if FLOWI_FLAG_SKIP_NH_OIF is set as suggested by David. Fixes: 5f02ce24c2696 ("net: l3mdev: Allow the l3mdev to be a loopback") Reported-by: David Ahern <dsa@cumulusnetworks.com> Suggested-by: David Ahern <dsa@cumulusnetworks.com> Signed-off-by: Robert Shearman <rshearma@brocade.com> Acked-by: David Ahern <dsa@cumulusnetworks.com> Tested-by: David Ahern <dsa@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-04-24packet: add PACKET_FANOUT_FLAG_UNIQUEID to assign new fanout group id.Mike Maloney
Fanout uses a per net global namespace. A process that intends to create a new fanout group can accidentally join an existing group. It is not possible to detect this. Add socket option PACKET_FANOUT_FLAG_UNIQUEID. When specified the supplied fanout group id must be set to 0, and the kernel chooses an id that is not already in use. This is an ephemeral flag so that other sockets can be added to this group using setsockopt, but NOT specifying this flag. The current getsockopt(..., PACKET_FANOUT, ...) can be used to retrieve the new group id. We assume that there are not a lot of fanout groups and that this is not a high frequency call. The method assigns ids starting at zero and increases until it finds an unused id. It keeps track of the last assigned id, and uses it as a starting point to find new ids. Signed-off-by: Mike Maloney <maloney@google.com> Acked-by: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-04-24VSOCK: Add virtio vsock vsockmon hooksGerard Garcia
The virtio drivers deal with struct virtio_vsock_pkt. Add virtio_transport_deliver_tap_pkt(pkt) for handing packets to the vsockmon device. We call virtio_transport_deliver_tap_pkt(pkt) from net/vmw_vsock/virtio_transport.c and drivers/vhost/vsock.c instead of common code. This is because the drivers may drop packets before handing them to common code - we still want to capture them. Signed-off-by: Gerard Garcia <ggarcia@deic.uab.cat> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Jorgen Hansen <jhansen@vmware.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-04-24VSOCK: Add vsockmon tap functionsGerard Garcia
Add tap functions that can be used by the vsock transports to deliver packets to vsockmon virtual network devices. Signed-off-by: Gerard Garcia <ggarcia@deic.uab.cat> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Jorgen Hansen <jhansen@vmware.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-04-24esp: Fix misplaced spin_unlock_bh.Steffen Klassert
A recent commit moved esp_alloc_tmp() out of a lock protected region, but forgot to remove the unlock from the error path. This patch removes the forgotten unlock. While at it, remove some unneeded error assignments too. Fixes: fca11ebde3f0 ("esp4: Reorganize esp_output") Fixes: 383d0350f2cc ("esp6: Reorganize esp_output") Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2017-04-23Merge branch 'for-mingo' of ↵Ingo Molnar
git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu into core/rcu Pull RCU updates from Paul E. McKenney: - Documentation updates. - Miscellaneous fixes. - Parallelize SRCU callback handling (plus overlapping patches). Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-04-22net/devlink: Add E-Switch encapsulation controlRoi Dayan
This is an e-switch global knob to enable HW support for applying encapsulation/decapsulation to VF traffic as part of SRIOV e-switch offloading. The actual encap/decap is carried out (along with the matching and other actions) per offloaded e-switch rules, e.g as done when offloading the TC tunnel key action. Signed-off-by: Roi Dayan <roid@mellanox.com> Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-04-21Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller
Both conflict were simple overlapping changes. In the kaweth case, Eric Dumazet's skb_cow() bug fix overlapped the conversion of the driver in net-next to use in-netdev stats. Signed-off-by: David S. Miller <davem@davemloft.net>
2017-04-21netpoll: Check for skb->queue_mappingTushar Dave
Reducing real_num_tx_queues needs to be in sync with skb queue_mapping otherwise skbs with queue_mapping greater than real_num_tx_queues can be sent to the underlying driver and can result in kernel panic. One such event is running netconsole and enabling VF on the same device. Or running netconsole and changing number of tx queues via ethtool on same device. e.g. Unable to handle kernel NULL pointer dereference tsk->{mm,active_mm}->context = 0000000000001525 tsk->{mm,active_mm}->pgd = fff800130ff9a000 \|/ ____ \|/ "@'/ .. \`@" /_| \__/ |_\ \__U_/ kworker/48:1(475): Oops [#1] CPU: 48 PID: 475 Comm: kworker/48:1 Tainted: G OE 4.11.0-rc3-davem-net+ #7 Workqueue: events queue_process task: fff80013113299c0 task.stack: fff800131132c000 TSTATE: 0000004480e01600 TPC: 00000000103f9e3c TNPC: 00000000103f9e40 Y: 00000000 Tainted: G OE TPC: <ixgbe_xmit_frame_ring+0x7c/0x6c0 [ixgbe]> g0: 0000000000000000 g1: 0000000000003fff g2: 0000000000000000 g3: 0000000000000001 g4: fff80013113299c0 g5: fff8001fa6808000 g6: fff800131132c000 g7: 00000000000000c0 o0: fff8001fa760c460 o1: fff8001311329a50 o2: fff8001fa7607504 o3: 0000000000000003 o4: fff8001f96e63a40 o5: fff8001311d77ec0 sp: fff800131132f0e1 ret_pc: 000000000049ed94 RPC: <set_next_entity+0x34/0xb80> l0: 0000000000000000 l1: 0000000000000800 l2: 0000000000000000 l3: 0000000000000000 l4: 000b2aa30e34b10d l5: 0000000000000000 l6: 0000000000000000 l7: fff8001fa7605028 i0: fff80013111a8a00 i1: fff80013155a0780 i2: 0000000000000000 i3: 0000000000000000 i4: 0000000000000000 i5: 0000000000100000 i6: fff800131132f1a1 i7: 00000000103fa4b0 I7: <ixgbe_xmit_frame+0x30/0xa0 [ixgbe]> Call Trace: [00000000103fa4b0] ixgbe_xmit_frame+0x30/0xa0 [ixgbe] [0000000000998c74] netpoll_start_xmit+0xf4/0x200 [0000000000998e10] queue_process+0x90/0x160 [0000000000485fa8] process_one_work+0x188/0x480 [0000000000486410] worker_thread+0x170/0x4c0 [000000000048c6b8] kthread+0xd8/0x120 [0000000000406064] ret_from_fork+0x1c/0x2c [0000000000000000] (null) Disabling lock debugging due to kernel taint Caller[00000000103fa4b0]: ixgbe_xmit_frame+0x30/0xa0 [ixgbe] Caller[0000000000998c74]: netpoll_start_xmit+0xf4/0x200 Caller[0000000000998e10]: queue_process+0x90/0x160 Caller[0000000000485fa8]: process_one_work+0x188/0x480 Caller[0000000000486410]: worker_thread+0x170/0x4c0 Caller[000000000048c6b8]: kthread+0xd8/0x120 Caller[0000000000406064]: ret_from_fork+0x1c/0x2c Caller[0000000000000000]: (null) Signed-off-by: Tushar Dave <tushar.n.dave@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-04-21ip6mr: fix notification device destructionNikolay Aleksandrov
Andrey Konovalov reported a BUG caused by the ip6mr code which is caused because we call unregister_netdevice_many for a device that is already being destroyed. In IPv4's ipmr that has been resolved by two commits long time ago by introducing the "notify" parameter to the delete function and avoiding the unregister when called from a notifier, so let's do the same for ip6mr. The trace from Andrey: ------------[ cut here ]------------ kernel BUG at net/core/dev.c:6813! invalid opcode: 0000 [#1] SMP KASAN Modules linked in: CPU: 1 PID: 1165 Comm: kworker/u4:3 Not tainted 4.11.0-rc7+ #251 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011 Workqueue: netns cleanup_net task: ffff880069208000 task.stack: ffff8800692d8000 RIP: 0010:rollback_registered_many+0x348/0xeb0 net/core/dev.c:6813 RSP: 0018:ffff8800692de7f0 EFLAGS: 00010297 RAX: ffff880069208000 RBX: 0000000000000002 RCX: 0000000000000001 RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff88006af90569 RBP: ffff8800692de9f0 R08: ffff8800692dec60 R09: 0000000000000000 R10: 0000000000000006 R11: 0000000000000000 R12: ffff88006af90070 R13: ffff8800692debf0 R14: dffffc0000000000 R15: ffff88006af90000 FS: 0000000000000000(0000) GS:ffff88006cb00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007fe7e897d870 CR3: 00000000657e7000 CR4: 00000000000006e0 Call Trace: unregister_netdevice_many.part.105+0x87/0x440 net/core/dev.c:7881 unregister_netdevice_many+0xc8/0x120 net/core/dev.c:7880 ip6mr_device_event+0x362/0x3f0 net/ipv6/ip6mr.c:1346 notifier_call_chain+0x145/0x2f0 kernel/notifier.c:93 __raw_notifier_call_chain kernel/notifier.c:394 raw_notifier_call_chain+0x2d/0x40 kernel/notifier.c:401 call_netdevice_notifiers_info+0x51/0x90 net/core/dev.c:1647 call_netdevice_notifiers net/core/dev.c:1663 rollback_registered_many+0x919/0xeb0 net/core/dev.c:6841 unregister_netdevice_many.part.105+0x87/0x440 net/core/dev.c:7881 unregister_netdevice_many net/core/dev.c:7880 default_device_exit_batch+0x4fa/0x640 net/core/dev.c:8333 ops_exit_list.isra.4+0x100/0x150 net/core/net_namespace.c:144 cleanup_net+0x5a8/0xb40 net/core/net_namespace.c:463 process_one_work+0xc04/0x1c10 kernel/workqueue.c:2097 worker_thread+0x223/0x19c0 kernel/workqueue.c:2231 kthread+0x35e/0x430 kernel/kthread.c:231 ret_from_fork+0x31/0x40 arch/x86/entry/entry_64.S:430 Code: 3c 32 00 0f 85 70 0b 00 00 48 b8 00 02 00 00 00 00 ad de 49 89 47 78 e9 93 fe ff ff 49 8d 57 70 49 8d 5f 78 eb 9e e8 88 7a 14 fe <0f> 0b 48 8b 9d 28 fe ff ff e8 7a 7a 14 fe 48 b8 00 00 00 00 00 RIP: rollback_registered_many+0x348/0xeb0 RSP: ffff8800692de7f0 ---[ end trace e0b29c57e9b3292c ]--- Reported-by: Andrey Konovalov <andreyknvl@google.com> Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Tested-by: Andrey Konovalov <andreyknvl@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-04-21Merge tag 'nfc-next-4.12-1' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/sameo/nfc-next Samuel Ortiz says: ==================== NFC 4.12 pull request This is the NFC pull request for 4.12. We have: - Improvements for the pn533 command queue handling and device registration order. - Removal of platform data for the pn544 and st21nfca drivers. - Additional device tree options to support more trf7970a hardware options. - Support for Sony's RC-S380P through the port100 driver. - Removal of the obsolte nfcwilink driver. - Headers inclusion cleanups (miscdevice.h, unaligned.h) for many drivers. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2017-04-21net: qrtr: potential use after free in qrtr_sendmsg()Dan Carpenter
If skb_pad() fails then it frees the skb so we should check for errors. Fixes: bdabad3e363d ("net: Add Qualcomm IPC router") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-04-21Merge branch 'master' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec-next Steffen Klassert says: ==================== pull request (net-next): ipsec-next 2017-04-20 This adds the basic infrastructure for IPsec hardware offloading, it creates a configuration API and adjusts the packet path. 1) Add the needed netdev features to configure IPsec offloads. 2) Add the IPsec hardware offloading API. 3) Prepare the ESP packet path for hardware offloading. 4) Add gso handlers for esp4 and esp6, this implements the software fallback for GSO packets. 5) Add xfrm replay handler functions for offloading. 6) Change ESP to use a synchronous crypto algorithm on offloading, we don't have the option for asynchronous returns when we handle IPsec at layer2. 7) Add a xfrm validate function to validate_xmit_skb. This implements the software fallback for non GSO packets. 8) Set the inner_network and inner_transport members of the SKB, as well as encapsulation, to reflect the actual positions of these headers, and removes them only once encryption is done on the payload. From Ilan Tayari. 9) Prepare the ESP GRO codepath for hardware offloading. 10) Fix incorrect null pointer check in esp6. From Colin Ian King. 11) Fix for the GSO software fallback path to detect the fallback correctly. From Ilan Tayari. Please pull or let me know if there are problems. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2017-04-21net_sched: remove useless NULL to tp->rootWANG Cong
There is no need to NULL tp->root in ->destroy(), since tp is going to be freed very soon, and existing readers are still safe to read them. For cls_route, we always init its tp->root, so it can't be NULL, we can drop more useless code. Cc: Daniel Borkmann <daniel@iogearbox.net> Cc: John Fastabend <john.fastabend@gmail.com> Cc: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-04-21net_sched: move the empty tp check from ->destroy() to ->delete()WANG Cong
We could have a race condition where in ->classify() path we dereference tp->root and meanwhile a parallel ->destroy() makes it a NULL. Daniel cured this bug in commit d936377414fa ("net, sched: respect rcu grace period on cls destruction"). This happens when ->destroy() is called for deleting a filter to check if we are the last one in tp, this tp is still linked and visible at that time. The root cause of this problem is the semantic of ->destroy(), it does two things (for non-force case): 1) check if tp is empty 2) if tp is empty we could really destroy it and its caller, if cares, needs to check its return value to see if it is really destroyed. Therefore we can't unlink tp unless we know it is empty. As suggested by Daniel, we could actually move the test logic to ->delete() so that we can safely unlink tp after ->delete() tells us the last one is just deleted and before ->destroy(). Fixes: 1e052be69d04 ("net_sched: destroy proto tp when all filters are gone") Cc: Roi Dayan <roid@mellanox.com> Cc: Daniel Borkmann <daniel@iogearbox.net> Cc: John Fastabend <john.fastabend@gmail.com> Cc: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-04-21sctp: switch to copy_from_iter_full()Al Viro
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2017-04-21net/9p: switch to copy_from_iter_full()Al Viro
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>