summaryrefslogtreecommitdiff
path: root/net
AgeCommit message (Collapse)Author
2018-09-19Merge tag 'batadv-next-for-davem-20180919' of ↵David S. Miller
git://git.open-mesh.org/linux-merge Simon Wunderlich says: ==================== This feature/cleanup patchset includes the following patches: - bump version strings, by Simon Wunderlich - Inform users about debugfs interface deprecation, by Sven Eckelmann - Implement tracing, planned to replace debugfs log messages, by Sven Eckelmann - Move OGM rebroadcasts to per interface struct, by Sven Eckelmann - Enable LockLess TX to increase throughput, by Sven Eckelmann ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-19Merge tag 'batadv-net-for-davem-20180919' of git://git.open-mesh.org/linux-mergeDavid S. Miller
Simon Wunderlich says: ==================== pull request for net: batman-adv 2018-09-19 here are some bugfixes which we would like to see integrated into net. We forgot to bump the version number in the last round for net-next, so the belated patch to do that is included - we hope you can adopt it. This will most likely create a merge conflict later when merging into net-next with this rounds net-next patchset, but net-next should keep the 2018.4 version[1]. [1] resolution: --- a/net/batman-adv/main.h +++ b/net/batman-adv/main.h @@ -25,11 +25,7 @@ #define BATADV_DRIVER_DEVICE "batman-adv" #ifndef BATADV_SOURCE_VERSION -<<<<<<< -#define BATADV_SOURCE_VERSION "2018.3" -======= #define BATADV_SOURCE_VERSION "2018.4" ->>>>>>> #endif /* B.A.T.M.A.N. parameters */ Please pull or let me know of any problem! Here are some batman-adv bugfixes: - Avoid ELP information leak, by Sven Eckelmann - Fix sysfs segfault issues, by Sven Eckelmann (2 patches) - Fix locking when adding entries in various lists, by Sven Eckelmann (5 patches) - Fix refcount if queue_work() fails, by Marek Lindner (2 patches) - Fixup forgotten version bump, by Sven Eckelmann ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-18ipv6: fix memory leak on dst->_metricsWei Wang
When dst->_metrics and f6i->fib6_metrics share the same memory, both take reference count on the dst_metrics structure. However, when dst is destroyed, ip6_dst_destroy() only invokes dst_destroy_metrics_generic() which does not take care of READONLY metrics and does not release refcnt. This causes memory leak. Similar to ipv4 logic, the fix is to properly release refcnt and free the memory space pointed by dst->_metrics if refcnt becomes 0. Fixes: 93531c674315 ("net/ipv6: separate handling of FIB entries from dst based routes") Reported-by: Sabrina Dubroca <sd@queasysnail.net> Signed-off-by: Wei Wang <weiwan@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-18Revert "ipv6: fix double refcount of fib6_metrics"Wei Wang
This reverts commit e70a3aad44cc8b24986687ffc98c4a4f6ecf25ea. This change causes use-after-free on dst->_metrics. The crash trace looks like this: [ 97.763269] BUG: KASAN: use-after-free in ip6_mtu+0x116/0x140 [ 97.769038] Read of size 4 at addr ffff881781d2cf84 by task svw_NetThreadEv/8801 [ 97.777954] CPU: 76 PID: 8801 Comm: svw_NetThreadEv Not tainted 4.15.0-smp-DEV #11 [ 97.777956] Hardware name: Default string Default string/Indus_QC_02, BIOS 5.46.4 03/29/2018 [ 97.777957] Call Trace: [ 97.777971] [<ffffffff895709db>] dump_stack+0x4d/0x72 [ 97.777985] [<ffffffff881651df>] print_address_description+0x6f/0x260 [ 97.777997] [<ffffffff88165747>] kasan_report+0x257/0x370 [ 97.778001] [<ffffffff894488e6>] ? ip6_mtu+0x116/0x140 [ 97.778004] [<ffffffff881658b9>] __asan_report_load4_noabort+0x19/0x20 [ 97.778008] [<ffffffff894488e6>] ip6_mtu+0x116/0x140 [ 97.778013] [<ffffffff892bb91e>] tcp_current_mss+0x12e/0x280 [ 97.778016] [<ffffffff892bb7f0>] ? tcp_mtu_to_mss+0x2d0/0x2d0 [ 97.778022] [<ffffffff887b45b8>] ? depot_save_stack+0x138/0x4a0 [ 97.778037] [<ffffffff87c38985>] ? __mmdrop+0x145/0x1f0 [ 97.778040] [<ffffffff881643b1>] ? save_stack+0xb1/0xd0 [ 97.778046] [<ffffffff89264c82>] tcp_send_mss+0x22/0x220 [ 97.778059] [<ffffffff89273a49>] tcp_sendmsg_locked+0x4f9/0x39f0 [ 97.778062] [<ffffffff881642b4>] ? kasan_check_write+0x14/0x20 [ 97.778066] [<ffffffff89273550>] ? tcp_sendpage+0x60/0x60 [ 97.778070] [<ffffffff881cb359>] ? rw_copy_check_uvector+0x69/0x280 [ 97.778075] [<ffffffff8873c65f>] ? import_iovec+0x9f/0x430 [ 97.778078] [<ffffffff88164be7>] ? kasan_slab_free+0x87/0xc0 [ 97.778082] [<ffffffff8873c5c0>] ? memzero_page+0x140/0x140 [ 97.778085] [<ffffffff881642b4>] ? kasan_check_write+0x14/0x20 [ 97.778088] [<ffffffff89276f6c>] tcp_sendmsg+0x2c/0x50 [ 97.778092] [<ffffffff89276f6c>] ? tcp_sendmsg+0x2c/0x50 [ 97.778098] [<ffffffff89352d43>] inet_sendmsg+0x103/0x480 [ 97.778102] [<ffffffff89352c40>] ? inet_gso_segment+0x15b0/0x15b0 [ 97.778105] [<ffffffff890294da>] sock_sendmsg+0xba/0xf0 [ 97.778108] [<ffffffff8902ab6a>] ___sys_sendmsg+0x6ca/0x8e0 [ 97.778113] [<ffffffff87dccac1>] ? hrtimer_try_to_cancel+0x71/0x3b0 [ 97.778116] [<ffffffff8902a4a0>] ? copy_msghdr_from_user+0x3d0/0x3d0 [ 97.778119] [<ffffffff881646d1>] ? memset+0x31/0x40 [ 97.778123] [<ffffffff87a0cff5>] ? schedule_hrtimeout_range_clock+0x165/0x380 [ 97.778127] [<ffffffff87a0ce90>] ? hrtimer_nanosleep_restart+0x250/0x250 [ 97.778130] [<ffffffff87dcc700>] ? __hrtimer_init+0x180/0x180 [ 97.778133] [<ffffffff87dd1f82>] ? ktime_get_ts64+0x172/0x200 [ 97.778137] [<ffffffff8822b8ec>] ? __fget_light+0x8c/0x2f0 [ 97.778141] [<ffffffff8902d5c6>] __sys_sendmsg+0xe6/0x190 [ 97.778144] [<ffffffff8902d5c6>] ? __sys_sendmsg+0xe6/0x190 [ 97.778147] [<ffffffff8902d4e0>] ? SyS_shutdown+0x20/0x20 [ 97.778152] [<ffffffff87cd4370>] ? wake_up_q+0xe0/0xe0 [ 97.778155] [<ffffffff8902d670>] ? __sys_sendmsg+0x190/0x190 [ 97.778158] [<ffffffff8902d683>] SyS_sendmsg+0x13/0x20 [ 97.778162] [<ffffffff87a1600c>] do_syscall_64+0x2ac/0x430 [ 97.778166] [<ffffffff87c17515>] ? do_page_fault+0x35/0x3d0 [ 97.778171] [<ffffffff8960131f>] ? page_fault+0x2f/0x50 [ 97.778174] [<ffffffff89600071>] entry_SYSCALL_64_after_hwframe+0x3d/0xa2 [ 97.778177] RIP: 0033:0x7f83fa36000d [ 97.778178] RSP: 002b:00007f83ef9229e0 EFLAGS: 00000293 ORIG_RAX: 000000000000002e [ 97.778180] RAX: ffffffffffffffda RBX: 0000000000000001 RCX: 00007f83fa36000d [ 97.778182] RDX: 0000000000004000 RSI: 00007f83ef922f00 RDI: 0000000000000036 [ 97.778183] RBP: 00007f83ef923040 R08: 00007f83ef9231f8 R09: 00007f83ef923168 [ 97.778184] R10: 0000000000000000 R11: 0000000000000293 R12: 00007f83f69c5b40 [ 97.778185] R13: 000000000000001c R14: 0000000000000001 R15: 0000000000004000 [ 97.779684] Allocated by task 5919: [ 97.783185] save_stack+0x46/0xd0 [ 97.783187] kasan_kmalloc+0xad/0xe0 [ 97.783189] kmem_cache_alloc_trace+0xdf/0x580 [ 97.783190] ip6_convert_metrics.isra.79+0x7e/0x190 [ 97.783192] ip6_route_info_create+0x60a/0x2480 [ 97.783193] ip6_route_add+0x1d/0x80 [ 97.783195] inet6_rtm_newroute+0xdd/0xf0 [ 97.783198] rtnetlink_rcv_msg+0x641/0xb10 [ 97.783200] netlink_rcv_skb+0x27b/0x3e0 [ 97.783202] rtnetlink_rcv+0x15/0x20 [ 97.783203] netlink_unicast+0x4be/0x720 [ 97.783204] netlink_sendmsg+0x7bc/0xbf0 [ 97.783205] sock_sendmsg+0xba/0xf0 [ 97.783207] ___sys_sendmsg+0x6ca/0x8e0 [ 97.783208] __sys_sendmsg+0xe6/0x190 [ 97.783209] SyS_sendmsg+0x13/0x20 [ 97.783211] do_syscall_64+0x2ac/0x430 [ 97.783213] entry_SYSCALL_64_after_hwframe+0x3d/0xa2 [ 97.784709] Freed by task 0: [ 97.785056] knetbase: Error: /proc/sys/net/core/txcs_enable does not exist [ 97.794497] save_stack+0x46/0xd0 [ 97.794499] kasan_slab_free+0x71/0xc0 [ 97.794500] kfree+0x7c/0xf0 [ 97.794501] fib6_info_destroy_rcu+0x24f/0x310 [ 97.794504] rcu_process_callbacks+0x38b/0x1730 [ 97.794506] __do_softirq+0x1c8/0x5d0 Reported-by: John Sperbeck <jsperbeck@google.com> Signed-off-by: Wei Wang <weiwan@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-18net/smc: fix sizeof to int comparisonYueHaibing
Comparing an int to a size, which is unsigned, causes the int to become unsigned, giving the wrong result. kernel_sendmsg can return a negative error code. Signed-off-by: YueHaibing <yuehaibing@huawei.com> Signed-off-by: Ursula Braun <ubraun@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-18net/smc: no urgent data check for listen socketsKarsten Graul
Don't check a listen socket for pending urgent data in smc_poll(). Signed-off-by: Karsten Graul <kgraul@linux.ibm.com> Signed-off-by: Ursula Braun <ubraun@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-18net/smc: enable fallback for connection abort in state INITUrsula Braun
If a linkgroup is terminated abnormally already due to failing LLC CONFIRM LINK or LLC ADD LINK, fallback to TCP is still possible. In this case do not switch to state SMC_PEERABORTWAIT and do not set sk_err. Signed-off-by: Ursula Braun <ubraun@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-18net/smc: remove duplicate mutex_unlockUrsula Braun
For a failing smc_listen_rdma_finish() smc_listen_decline() is called. If fallback is possible, the new socket is already enqueued to be accepted in smc_listen_decline(). Avoid enqueuing a second time afterwards in this case, otherwise the smc_create_lgr_pending lock is released twice: [ 373.463976] WARNING: bad unlock balance detected! [ 373.463978] 4.18.0-rc7+ #123 Tainted: G O [ 373.463979] ------------------------------------- [ 373.463980] kworker/1:1/30 is trying to release lock (smc_create_lgr_pending) at: [ 373.463990] [<000003ff801205fc>] smc_listen_work+0x22c/0x5d0 [smc] [ 373.463991] but there are no more locks to release! [ 373.463991] other info that might help us debug this: [ 373.463993] 2 locks held by kworker/1:1/30: [ 373.463994] #0: 00000000772cbaed ((wq_completion)"events"){+.+.}, at: process_one_work+0x1ec/0x6b0 [ 373.464000] #1: 000000003ad0894a ((work_completion)(&new_smc->smc_listen_work)){+.+.}, at: process_one_work+0x1ec/0x6b0 [ 373.464003] stack backtrace: [ 373.464005] CPU: 1 PID: 30 Comm: kworker/1:1 Kdump: loaded Tainted: G O 4.18.0-rc7uschi+ #123 [ 373.464007] Hardware name: IBM 2827 H43 738 (LPAR) [ 373.464010] Workqueue: events smc_listen_work [smc] [ 373.464011] Call Trace: [ 373.464015] ([<0000000000114100>] show_stack+0x60/0xd8) [ 373.464019] [<0000000000a8c9bc>] dump_stack+0x9c/0xd8 [ 373.464021] [<00000000001dcaf8>] print_unlock_imbalance_bug+0xf8/0x108 [ 373.464022] [<00000000001e045c>] lock_release+0x114/0x4f8 [ 373.464025] [<0000000000aa87fa>] __mutex_unlock_slowpath+0x4a/0x300 [ 373.464027] [<000003ff801205fc>] smc_listen_work+0x22c/0x5d0 [smc] [ 373.464029] [<0000000000197a68>] process_one_work+0x2a8/0x6b0 [ 373.464030] [<0000000000197ec2>] worker_thread+0x52/0x410 [ 373.464033] [<000000000019fd0e>] kthread+0x15e/0x178 [ 373.464035] [<0000000000aaf58a>] kernel_thread_starter+0x6/0xc [ 373.464052] [<0000000000aaf584>] kernel_thread_starter+0x0/0xc [ 373.464054] INFO: lockdep is turned off. Signed-off-by: Ursula Braun <ubraun@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-18net/smc: fix non-blocking connect problemUrsula Braun
In state SMC_INIT smc_poll() delegates polling to the internal CLC socket. This means, once the connect worker has finished its kernel_connect() step, the poll wake-up may occur. This is not intended. The wake-up should occur from the wake up call in smc_connect_work() after __smc_connect() has finished. Thus in state SMC_INIT this patch now calls sock_poll_wait() on the main SMC socket. Signed-off-by: Ursula Braun <ubraun@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-18NFC: Fix possible memory corruption when handling SHDLC I-Frame commandsSuren Baghdasaryan
When handling SHDLC I-Frame commands "pipe" field used for indexing into an array should be checked before usage. If left unchecked it might access memory outside of the array of size NFC_HCI_MAX_PIPES(127). Malformed NFC HCI frames could be injected by a malicious NFC device communicating with the device being attacked (remote attack vector), or even by an attacker with physical access to the I2C bus such that they could influence the data transfers on that bus (local attack vector). skb->data is controlled by the attacker and has only been sanitized in the most trivial ways (CRC check), therefore we can consider the create_info struct and all of its members to tainted. 'create_info->pipe' with max value of 255 (uint8) is used to take an offset of the hdev->pipes array of 127 elements which can lead to OOB write. Cc: Samuel Ortiz <sameo@linux.intel.com> Cc: Allen Pais <allen.pais@oracle.com> Cc: "David S. Miller" <davem@davemloft.net> Suggested-by: Kevin Deus <kdeus@google.com> Signed-off-by: Suren Baghdasaryan <surenb@google.com> Acked-by: Kees Cook <keescook@chromium.org> Cc: stable <stable@vger.kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-18Merge ra.kernel.org:/pub/scm/linux/kernel/git/davem/netDavid S. Miller
Two new tls tests added in parallel in both net and net-next. Used Stephen Rothwell's linux-next resolution. Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-17net/ipv6: do not copy dst flags on rt initPeter Oskolkov
DST_NOCOUNT in dst_entry::flags tracks whether the entry counts toward route cache size (net->ipv6.sysctl.ip6_rt_max_size). If the flag is NOT set, dst_ops::pcpuc_entries counter is incremented in dist_init() and decremented in dst_destroy(). This flag is tied to allocation/deallocation of dst_entry and should not be copied from another dst/route. Otherwise it can happen that dst_ops::pcpuc_entries counter grows until no new routes can be allocated because the counter reached ip6_rt_max_size due to DST_NOCOUNT not set and thus no counter decrements on gc-ed routes. Fixes: 3b6761d18bc1 ("net/ipv6: Move dst flags to booleans in fib entries") Cc: David Ahern <dsahern@gmail.com> Acked-by: Wei Wang <weiwan@google.com> Signed-off-by: Peter Oskolkov <posk@google.com> Reviewed-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-17net/ipv4: defensive cipso option parsingStefan Nuernberger
commit 40413955ee26 ("Cipso: cipso_v4_optptr enter infinite loop") fixed a possible infinite loop in the IP option parsing of CIPSO. The fix assumes that ip_options_compile filtered out all zero length options and that no other one-byte options beside IPOPT_END and IPOPT_NOOP exist. While this assumption currently holds true, add explicit checks for zero length and invalid length options to be safe for the future. Even though ip_options_compile should have validated the options, the introduction of new one-byte options can still confuse this code without the additional checks. Signed-off-by: Stefan Nuernberger <snu@amazon.com> Cc: David Woodhouse <dwmw@amazon.co.uk> Cc: Simon Veith <sveith@amazon.de> Cc: stable@vger.kernel.org Acked-by: Paul Moore <paul@paul-moore.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-17net: caif: remove redundant null check on frontpktColin Ian King
It is impossible for frontpkt to be null at the point of the null check because it has been assigned from rearpkt and there is no way rearpkt can be null at the point of the assignment because of the sanity checking and exit paths taken previously. Remove the redundant null check. Detected by CoverityScan, CID#114434 ("Logically dead code") Signed-off-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-17Revert "kcm: remove any offset before parsing messages"David S. Miller
This reverts commit 072222b488bc55cce92ff246bdc10115fd57d3ab. I just read that this causes regressions. Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-17kcm: remove any offset before parsing messagesDominique Martinet
The current code assumes kcm users know they need to look for the strparser offset within their bpf program, which is not documented anywhere and examples laying around do not do. The actual recv function does handle the offset well, so we can create a temporary clone of the skb and pull that one up as required for parsing. The pull itself has a cost if we are pulling beyond the head data, measured to 2-3% latency in a noisy VM with a local client stressing that path. The clone's impact seemed too small to measure. This bug can be exhibited easily by implementing a "trivial" kcm parser taking the first bytes as size, and on the client sending at least two such packets in a single write(). Note that bpf sockmap has the same problem, both for parse and for recv, so it would pulling twice or a real pull within the strparser logic if anyone cares about that. Signed-off-by: Dominique Martinet <asmadeus@codewreck.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-17net: dsa: remove redundant null pointer check before put_devicezhong jiang
put_device has taken the null pinter check into account. So it is safe to remove the duplicated check before put_device. Signed-off-by: zhong jiang <zhongjiang@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-17net: rds: use memset to optimize the recvZhu Yanjun
The function rds_inc_init is in recv process. To use memset can optimize the function rds_inc_init. The test result: Before: 1) + 24.950 us | rds_inc_init [rds](); After: 1) + 10.990 us | rds_inc_init [rds](); Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com> Signed-off-by: Zhu Yanjun <yanjun.zhu@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-17net: dsa: tag_gswip: Add gswip to dsa_tag_protocol_to_str()Hauke Mehrtens
The gswip tag was missing in the dsa_tag_protocol_to_str() function, add it. Fixes: 7969119293f5 ("net: dsa: Add Lantiq / Intel GSWIP tag support") Signed-off-by: Hauke Mehrtens <hauke@hauke-m.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-17tls: fix currently broken MSG_PEEK behaviorDaniel Borkmann
In kTLS MSG_PEEK behavior is currently failing, strace example: [pid 2430] socket(AF_INET, SOCK_STREAM, IPPROTO_IP) = 3 [pid 2430] socket(AF_INET, SOCK_STREAM, IPPROTO_IP) = 4 [pid 2430] bind(4, {sa_family=AF_INET, sin_port=htons(0), sin_addr=inet_addr("0.0.0.0")}, 16) = 0 [pid 2430] listen(4, 10) = 0 [pid 2430] getsockname(4, {sa_family=AF_INET, sin_port=htons(38855), sin_addr=inet_addr("0.0.0.0")}, [16]) = 0 [pid 2430] connect(3, {sa_family=AF_INET, sin_port=htons(38855), sin_addr=inet_addr("0.0.0.0")}, 16) = 0 [pid 2430] setsockopt(3, SOL_TCP, 0x1f /* TCP_??? */, [7564404], 4) = 0 [pid 2430] setsockopt(3, 0x11a /* SOL_?? */, 1, "\3\0033\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 40) = 0 [pid 2430] accept(4, {sa_family=AF_INET, sin_port=htons(49636), sin_addr=inet_addr("127.0.0.1")}, [16]) = 5 [pid 2430] setsockopt(5, SOL_TCP, 0x1f /* TCP_??? */, [7564404], 4) = 0 [pid 2430] setsockopt(5, 0x11a /* SOL_?? */, 2, "\3\0033\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 40) = 0 [pid 2430] close(4) = 0 [pid 2430] sendto(3, "test_read_peek", 14, 0, NULL, 0) = 14 [pid 2430] sendto(3, "_mult_recs\0", 11, 0, NULL, 0) = 11 [pid 2430] recvfrom(5, "test_read_peektest_read_peektest"..., 64, MSG_PEEK, NULL, NULL) = 64 As can be seen from strace, there are two TLS records sent, i) 'test_read_peek' and ii) '_mult_recs\0' where we end up peeking 'test_read_peektest_read_peektest'. This is clearly wrong, and what happens is that given peek cannot call into tls_sw_advance_skb() to unpause strparser and proceed with the next skb, we end up looping over the current one, copying the 'test_read_peek' over and over into the user provided buffer. Here, we can only peek into the currently held skb (current, full TLS record) as otherwise we would end up having to hold all the original skb(s) (depending on the peek depth) in a separate queue when unpausing strparser to process next records, minimally intrusive is to return only up to the current record's size (which likely was what c46234ebb4d1 ("tls: RX path for ktls") originally intended as well). Thus, after patch we properly peek the first record: [pid 2046] wait4(2075, <unfinished ...> [pid 2075] socket(AF_INET, SOCK_STREAM, IPPROTO_IP) = 3 [pid 2075] socket(AF_INET, SOCK_STREAM, IPPROTO_IP) = 4 [pid 2075] bind(4, {sa_family=AF_INET, sin_port=htons(0), sin_addr=inet_addr("0.0.0.0")}, 16) = 0 [pid 2075] listen(4, 10) = 0 [pid 2075] getsockname(4, {sa_family=AF_INET, sin_port=htons(55115), sin_addr=inet_addr("0.0.0.0")}, [16]) = 0 [pid 2075] connect(3, {sa_family=AF_INET, sin_port=htons(55115), sin_addr=inet_addr("0.0.0.0")}, 16) = 0 [pid 2075] setsockopt(3, SOL_TCP, 0x1f /* TCP_??? */, [7564404], 4) = 0 [pid 2075] setsockopt(3, 0x11a /* SOL_?? */, 1, "\3\0033\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 40) = 0 [pid 2075] accept(4, {sa_family=AF_INET, sin_port=htons(45732), sin_addr=inet_addr("127.0.0.1")}, [16]) = 5 [pid 2075] setsockopt(5, SOL_TCP, 0x1f /* TCP_??? */, [7564404], 4) = 0 [pid 2075] setsockopt(5, 0x11a /* SOL_?? */, 2, "\3\0033\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 40) = 0 [pid 2075] close(4) = 0 [pid 2075] sendto(3, "test_read_peek", 14, 0, NULL, 0) = 14 [pid 2075] sendto(3, "_mult_recs\0", 11, 0, NULL, 0) = 11 [pid 2075] recvfrom(5, "test_read_peek", 64, MSG_PEEK, NULL, NULL) = 14 Fixes: c46234ebb4d1 ("tls: RX path for ktls") Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-17tls: async support causes out-of-bounds access in crypto APIsJohn Fastabend
When async support was added it needed to access the sk from the async callback to report errors up the stack. The patch tried to use space after the aead request struct by directly setting the reqsize field in aead_request. This is an internal field that should not be used outside the crypto APIs. It is used by the crypto code to define extra space for private structures used in the crypto context. Users of the API then use crypto_aead_reqsize() and add the returned amount of bytes to the end of the request memory allocation before posting the request to encrypt/decrypt APIs. So this breaks (with general protection fault and KASAN error, if enabled) because the request sent to decrypt is shorter than required causing the crypto API out-of-bounds errors. Also it seems unlikely the sk is even valid by the time it gets to the callback because of memset in crypto layer. Anyways, fix this by holding the sk in the skb->sk field when the callback is set up and because the skb is already passed through to the callback handler via void* we can access it in the handler. Then in the handler we need to be careful to NULL the pointer again before kfree_skb. I added comments on both the setup (in tls_do_decryption) and when we clear it from the crypto callback handler tls_decrypt_done(). After this selftests pass again and fixes KASAN errors/warnings. Fixes: 94524d8fc965 ("net/tls: Add support for async decryption of tls records") Signed-off-by: John Fastabend <john.fastabend@gmail.com> Reviewed-by: Vakul Garg <Vakul.garg@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-17ipv6: fix possible use-after-free in ip6_xmit()Eric Dumazet
In the unlikely case ip6_xmit() has to call skb_realloc_headroom(), we need to call skb_set_owner_w() before consuming original skb, otherwise we risk a use-after-free. Bring IPv6 in line with what we do in IPv4 to fix this. Fixes: 1da177e4c3f41 ("Linux-2.6.12-rc2") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: syzbot <syzkaller@googlegroups.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-17netfilter: nat: remove unnecessary rcu_read_lock in nf_nat_redirect_ipv{4/6}Taehee Yoo
nf_nat_redirect_ipv4() and nf_nat_redirect_ipv6() are only called by netfilter hook point. so that rcu_read_lock and rcu_read_unlock() are unnecessary. Signed-off-by: Taehee Yoo <ap420073@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-09-17netfilter: cttimeout: remove superfluous check on layer 4 netlink functionsPablo Neira Ayuso
We assume they are always set accordingly since a874752a10da ("netfilter: conntrack: timeout interface depend on CONFIG_NF_CONNTRACK_TIMEOUT"), so we can get rid of this checks. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-09-17netfilter: nf_nat_ipv4: remove obsolete EXPORT_SYMBOLFlorian Westphal
There are no external callers anymore, previous change just forgot to also remove the EXPORT_SYMBOL(). Fixes: 9971a514ed269 ("netfilter: nf_nat: add nat type hooks to nat core") Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-09-17netfilter: xtables: avoid BUG_ONFlorian Westphal
I see no reason for them, label or timer cannot be NULL, and if they were, we'll crash with null deref anyway. For skb_header_pointer failure, just set hotdrop to true and toss such packet. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-09-17netfilter: nf_tables: avoid BUG_ON usageFlorian Westphal
None of these spots really needs to crash the kernel. In one two cases we can jsut report error to userspace, in the other cases we can just use WARN_ON (and leak memory instead). Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-09-17netfilter: xt_cgroup: shrink size of v2 pathPablo Neira Ayuso
cgroup v2 path field is PATH_MAX which is too large, this is placing too much pressure on memory allocation for people with many rules doing cgroup v1 classid matching, side effects of this are bug reports like: https://bugzilla.kernel.org/show_bug.cgi?id=200639 This patch registers a new revision that shrinks the cgroup path to 512 bytes, which is the same approach we follow in similar extensions that have a path field. Cc: Tejun Heo <tj@kernel.org> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Acked-by: Tejun Heo <tj@kernel.org>
2018-09-17netfilter: ctnetlink: Support L3 protocol-filter on flushKristian Evensen
The same connection mark can be set on flows belonging to different address families. This commit adds support for filtering on the L3 protocol when flushing connection track entries. If no protocol is specified, then all L3 protocols match. In order to avoid code duplication and a redundant check, the protocol comparison in ctnetlink_dump_table() has been removed. Instead, a filter is created if the GET-message triggering the dump contains an address family. ctnetlink_filter_match() is then used to compare the L3 protocols. Signed-off-by: Kristian Evensen <kristian.evensen@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-09-17netfilter: nf_tables: add xfrm expressionFlorian Westphal
supports fetching saddr/daddr of tunnel mode states, request id and spi. If direction is 'in', use inbound skb secpath, else dst->xfrm. Joint work with Máté Eckl. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-09-17netfilter: remove obsolete need_conntrack stubFlorian Westphal
as of a0ae2562c6c4b27 ("netfilter: conntrack: remove l3proto abstraction") there are no users anymore. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-09-17netfilter: nf_tables: asynchronous releaseFlorian Westphal
Release the committed transaction log from a work queue, moving expensive synchronize_rcu out of the locked section and providing opportunity to batch this. On my test machine this cuts runtime of nft-test.py in half. Based on earlier patch from Pablo Neira Ayuso. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-09-17netfilter: nf_tables: warn when expr implements only one of activate/deactivateFlorian Westphal
->destroy is only allowed to free data, or do other cleanups that do not have side effects on other state, such as visibility to other netlink requests. Such things need to be done in ->deactivate. As a transaction can fail, we need to make sure we can undo such operations, therefore ->activate() has to be provided too. So print a warning and refuse registration if expr->ops provides only one of the two operations. v2: fix nft_expr_check_ops to not repeat same check twice (Jones Desougi) Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-09-17netfilter: nf_tables: split set destruction in deactivate and destroy phaseFlorian Westphal
Splits unbind_set into destroy_set and unbinding operation. Unbinding removes set from lists (so new transaction would not find it anymore) but keeps memory allocated (so packet path continues to work). Rebind function is added to allow unrolling in case transaction that wants to remove set is aborted. Destroy function is added to free the memory, but this could occur outside of transaction in the future. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-09-17netfilter: nf_tables: rt: allow checking if dst has xfrm attachedFlorian Westphal
Useful e.g. to avoid NATting inner headers of to-be-encrypted packets. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-09-16Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpfDavid S. Miller
Daniel Borkmann says: ==================== pull-request: bpf 2018-09-16 The following pull-request contains BPF updates for your *net* tree. The main changes are: 1) Fix end boundary calculation in BTF for the type section, from Martin. 2) Fix and revert subtraction of pointers that was accidentally allowed for unprivileged programs, from Alexei. 3) Fix bpf_msg_pull_data() helper by using __GFP_COMP in order to avoid a warning in linearizing sg pages into a single one for large allocs, from Tushar. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-16ip6_gre: simplify gre header parsing in ip6gre_errHaishuang Yan
Same as ip_gre, use gre_parse_header to parse gre header in gre error handler code. Signed-off-by: Haishuang Yan <yanhaishuang@cmss.chinamobile.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-16ip_gre: fix parsing gre header in ipgre_errHaishuang Yan
gre_parse_header stops parsing when csum_err is encountered, which means tpi->key is undefined and ip_tunnel_lookup will return NULL improperly. This patch introduce a NULL pointer as csum_err parameter. Even when csum_err is encountered, it won't return error and continue parsing gre header as expected. Fixes: 9f57c67c379d ("gre: Remove support for sharing GRE protocol hook.") Reported-by: Jiri Benc <jbenc@redhat.com> Signed-off-by: Haishuang Yan <yanhaishuang@cmss.chinamobile.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-16net/sched: act_police: don't use spinlock in the data pathDavide Caratti
use RCU instead of spinlocks, to protect concurrent read/write on act_police configuration. This reduces the effects of contention in the data path, in case multiple readers are present. Signed-off-by: Davide Caratti <dcaratti@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-16net/sched: act_police: use per-cpu countersDavide Caratti
use per-CPU counters, instead of sharing a single set of stats with all cores. This removes the need of using spinlock when statistics are read or updated. Signed-off-by: Davide Caratti <dcaratti@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-16udp6: add missing checks on edumux packet processingPaolo Abeni
Currently the UDPv6 early demux rx code path lacks some mandatory checks, already implemented into the normal RX code path - namely the checksum conversion and no_check6_rx check. Similar to the previous commit, we move the common processing to an UDPv6 specific helper and call it from both edemux code path and normal code path. In respect to the UDPv4, we need to add an explicit check for non zero csum according to no_check6_rx value. Reported-by: Jianlin Shi <jishi@redhat.com> Suggested-by: Xin Long <lucien.xin@gmail.com> Fixes: c9f2c1ae123a ("udp6: fix socket leak on early demux") Fixes: 2abb7cdc0dc8 ("udp: Add support for doing checksum unnecessary conversion") Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-16udp4: fix IP_CMSG_CHECKSUM for connected socketsPaolo Abeni
commit 2abb7cdc0dc8 ("udp: Add support for doing checksum unnecessary conversion") left out the early demux path for connected sockets. As a result IP_CMSG_CHECKSUM gives wrong values for such socket when GRO is not enabled/available. This change addresses the issue by moving the csum conversion to a common helper and using such helper in both the default and the early demux rx path. Fixes: 2abb7cdc0dc8 ("udp: Add support for doing checksum unnecessary conversion") Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-15batman-adv: Enable LockLess TX for softifSven Eckelmann
The batadv interfaces are virtual interfaces which just tunnel the traffic over other ethernet compatible interfaces. It doesn't need serialization during the tx phase and is using RCU for most of its internal datastructures. Since it doesn't have actual queues which could be locked independently, the throughput gets significantly reduced by the extra lock in the core net code. 8 parallel TCP connections forwarded by an IPQ4019 based hardware over 5GHz could reach: * without LLTX: 349 Mibit/s * with LLTX: 563 Mibit/s Signed-off-by: Sven Eckelmann <sven@narfation.org> Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de>
2018-09-15batman-adv: Move OGM rebroadcast stats to orig_ifinfoSven Eckelmann
B.A.T.M.A.N. IV requires the number of rebroadcast from a neighboring originator. These statistics are gathered per interface which transmitted the OGM (and then received it again). Since an originator is not interface specific, a resizable array was used in each originator. This resizable array had an entry for each interface and had to be resizes (for all OGMs) when the number of active interface was modified. This could cause problems when a large number of interface is added and not enough continuous memory is available to allocate the array. There is already a per interface originator structure "batadv_orig_ifinfo" which can be used to store this information. Signed-off-by: Sven Eckelmann <sven@narfation.org> Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de>
2018-09-15batman-adv: Provide debug messages as trace eventsSven Eckelmann
A private debug logging infrastructure is currently provided via $debug_fs/batman_adv/*/log when CONFIG_BATMAN_ADV_DEBUG is enabled. This is not well integrated in the rest of the tracing infrastructure of the kernel. Other components (like mac80211 or ath10k) allow to gather the debug messages using generic trace events which are better integrated. This makes it possible to interact with them using the existing userspace tools. The tracepoint batadv:batadv_dbg will now be available when CONFIG_BATMAN_ADV_DEBUG and CONFIG_BATMAN_ADV_TRACING is activated. The log level mask is still used for filtering as usual. A full system trace for offline parsing can be created (and read) using: $ batctl ll all $ trace-cmd record -e batadv:batadv_dbg $ trace-cmd report The same can also be done without recording to a file $ batctl ll all $ trace-cmd stream -e batadv:batadv_dbg The trace infrastructure is especially helpful when tracing processes: $ batctl ll all $ ./tools/perf/perf trace --event "batadv:*" batctl p 10.204.32.1 0.000 batadv:batadv_dbg:batman_adv bat0 Parsing outgoing ARP REQUEST 0.045 batadv:batadv_dbg:batman_adv bat0 ARP MSG = [src: a2:64:14:53:f8:22-10.204.32.185 dst: 00:00:00:00:00:00-10.204.32.1] 0.067 batadv:batadv_dbg:batman_adv bat0 Entry updated: 10.204.32.185 a2:64:14:53:f8:22 (vid: -1) 0.099 batadv:batadv_dbg:batman_adv bat0 batadv_dat_select_candidates(): IP=10.204.32.1 hash(IP)=48902 0.757 batadv:batadv_dbg:batman_adv bat0 dat_select_candidates() 0: selected fe:2c:91:68:29:2b addr=48977 dist=65460 1.178 batadv:batadv_dbg:batman_adv bat0 dat_select_candidates() 1: selected fe:81:ab:c5:e3:03 addr=49181 dist=65256 1.809 batadv:batadv_dbg:batman_adv bat0 dat_select_candidates() 2: selected 66:25:a7:48:37:fb addr=49328 dist=65109 1.828 batadv:batadv_dbg:batman_adv bat0 DHT_SEND for 10.204.32.1 Signed-off-by: Sven Eckelmann <sven@narfation.org> Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de>
2018-09-14flow_dissector: implements flow dissector BPF hookPetar Penkov
Adds a hook for programs of type BPF_PROG_TYPE_FLOW_DISSECTOR and attach type BPF_FLOW_DISSECTOR that is executed in the flow dissector path. The BPF program is per-network namespace. Signed-off-by: Petar Penkov <ppenkov@google.com> Signed-off-by: Willem de Bruijn <willemb@google.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-09-14batman-adv: Increase version number to 2018.3Sven Eckelmann
Signed-off-by: Sven Eckelmann <sven@narfation.org> Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de>
2018-09-14net/sched: act_sample: fix NULL dereference in the data pathDavide Caratti
Matteo reported the following splat, testing the datapath of TC 'sample': BUG: KASAN: null-ptr-deref in tcf_sample_act+0xc4/0x310 Read of size 8 at addr 0000000000000000 by task nc/433 CPU: 0 PID: 433 Comm: nc Not tainted 4.19.0-rc3-kvm #17 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS ?-20180531_142017-buildhw-08.phx2.fedoraproject.org-1.fc28 04/01/2014 Call Trace: kasan_report.cold.6+0x6c/0x2fa tcf_sample_act+0xc4/0x310 ? dev_hard_start_xmit+0x117/0x180 tcf_action_exec+0xa3/0x160 tcf_classify+0xdd/0x1d0 htb_enqueue+0x18e/0x6b0 ? deref_stack_reg+0x7a/0xb0 ? htb_delete+0x4b0/0x4b0 ? unwind_next_frame+0x819/0x8f0 ? entry_SYSCALL_64_after_hwframe+0x44/0xa9 __dev_queue_xmit+0x722/0xca0 ? unwind_get_return_address_ptr+0x50/0x50 ? netdev_pick_tx+0xe0/0xe0 ? save_stack+0x8c/0xb0 ? kasan_kmalloc+0xbe/0xd0 ? __kmalloc_track_caller+0xe4/0x1c0 ? __kmalloc_reserve.isra.45+0x24/0x70 ? __alloc_skb+0xdd/0x2e0 ? sk_stream_alloc_skb+0x91/0x3b0 ? tcp_sendmsg_locked+0x71b/0x15a0 ? tcp_sendmsg+0x22/0x40 ? __sys_sendto+0x1b0/0x250 ? __x64_sys_sendto+0x6f/0x80 ? do_syscall_64+0x5d/0x150 ? entry_SYSCALL_64_after_hwframe+0x44/0xa9 ? __sys_sendto+0x1b0/0x250 ? __x64_sys_sendto+0x6f/0x80 ? do_syscall_64+0x5d/0x150 ? entry_SYSCALL_64_after_hwframe+0x44/0xa9 ip_finish_output2+0x495/0x590 ? ip_copy_metadata+0x2e0/0x2e0 ? skb_gso_validate_network_len+0x6f/0x110 ? ip_finish_output+0x174/0x280 __tcp_transmit_skb+0xb17/0x12b0 ? __tcp_select_window+0x380/0x380 tcp_write_xmit+0x913/0x1de0 ? __sk_mem_schedule+0x50/0x80 tcp_sendmsg_locked+0x49d/0x15a0 ? tcp_rcv_established+0x8da/0xa30 ? tcp_set_state+0x220/0x220 ? clear_user+0x1f/0x50 ? iov_iter_zero+0x1ae/0x590 ? __fget_light+0xa0/0xe0 tcp_sendmsg+0x22/0x40 __sys_sendto+0x1b0/0x250 ? __ia32_sys_getpeername+0x40/0x40 ? _copy_to_user+0x58/0x70 ? poll_select_copy_remaining+0x176/0x200 ? __pollwait+0x1c0/0x1c0 ? ktime_get_ts64+0x11f/0x140 ? kern_select+0x108/0x150 ? core_sys_select+0x360/0x360 ? vfs_read+0x127/0x150 ? kernel_write+0x90/0x90 __x64_sys_sendto+0x6f/0x80 do_syscall_64+0x5d/0x150 entry_SYSCALL_64_after_hwframe+0x44/0xa9 RIP: 0033:0x7fefef2b129d Code: ff ff ff ff eb b6 0f 1f 80 00 00 00 00 48 8d 05 51 37 0c 00 41 89 ca 8b 00 85 c0 75 20 45 31 c9 45 31 c0 b8 2c 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 6b f3 c3 66 0f 1f 84 00 00 00 00 00 41 56 41 RSP: 002b:00007fff2f5350c8 EFLAGS: 00000246 ORIG_RAX: 000000000000002c RAX: ffffffffffffffda RBX: 000056118d60c120 RCX: 00007fefef2b129d RDX: 0000000000002000 RSI: 000056118d629320 RDI: 0000000000000003 RBP: 000056118d530370 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000002000 R13: 000056118d5c2a10 R14: 000056118d5c2a10 R15: 000056118d5303b8 tcf_sample_act() tried to update its per-cpu stats, but tcf_sample_init() forgot to allocate them, because tcf_idr_create() was called with a wrong value of 'cpustats'. Setting it to true proved to fix the reported crash. Reported-by: Matteo Croce <mcroce@redhat.com> Fixes: 65a206c01e8e ("net/sched: Change act_api and act_xxx modules to use IDR") Fixes: 5c5670fae430 ("net/sched: Introduce sample tc action") Tested-by: Matteo Croce <mcroce@redhat.com> Signed-off-by: Davide Caratti <dcaratti@redhat.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-14batman-adv: Mark debugfs functionality as deprecatedSven Eckelmann
CONFIG_BATMAN_ADV_DEBUGFS is disabled by default because debugfs is not supported for batman-adv interfaces in any non-default netns. Any remaining users of this interface should still be informed about the deprecation and the generic netlink alternative. Signed-off-by: Sven Eckelmann <sven@narfation.org> Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de>
2018-09-14batman-adv: Start new development cycleSimon Wunderlich
Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de>