summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2018-04-29bpf/verifier: improve register value range tracking with ARSHYonghong Song
When helpers like bpf_get_stack returns an int value and later on used for arithmetic computation, the LSH and ARSH operations are often required to get proper sign extension into 64-bit. For example, without this patch: 54: R0=inv(id=0,umax_value=800) 54: (bf) r8 = r0 55: R0=inv(id=0,umax_value=800) R8_w=inv(id=0,umax_value=800) 55: (67) r8 <<= 32 56: R8_w=inv(id=0,umax_value=3435973836800,var_off=(0x0; 0x3ff00000000)) 56: (c7) r8 s>>= 32 57: R8=inv(id=0) With this patch: 54: R0=inv(id=0,umax_value=800) 54: (bf) r8 = r0 55: R0=inv(id=0,umax_value=800) R8_w=inv(id=0,umax_value=800) 55: (67) r8 <<= 32 56: R8_w=inv(id=0,umax_value=3435973836800,var_off=(0x0; 0x3ff00000000)) 56: (c7) r8 s>>= 32 57: R8=inv(id=0, umax_value=800,var_off=(0x0; 0x3ff)) With better range of "R8", later on when "R8" is added to other register, e.g., a map pointer or scalar-value register, the better register range can be derived and verifier failure may be avoided. In our later example, ...... usize = bpf_get_stack(ctx, raw_data, max_len, BPF_F_USER_STACK); if (usize < 0) return 0; ksize = bpf_get_stack(ctx, raw_data + usize, max_len - usize, 0); ...... Without improving ARSH value range tracking, the register representing "max_len - usize" will have smin_value equal to S64_MIN and will be rejected by verifier. Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Yonghong Song <yhs@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-04-29bpf: remove never-hit branches in verifier adjust_scalar_min_max_valsYonghong Song
In verifier function adjust_scalar_min_max_vals, when src_known is false and the opcode is BPF_LSH/BPF_RSH, early return will happen in the function. So remove the branch in handling BPF_LSH/BPF_RSH when src_known is false. Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Yonghong Song <yhs@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-04-29bpf/verifier: refine retval R0 state for bpf_get_stack helperYonghong Song
The special property of return values for helpers bpf_get_stack and bpf_probe_read_str are captured in verifier. Both helpers return a negative error code or a length, which is equal to or smaller than the buffer size argument. This additional information in the verifier can avoid the condition such as "retval > bufsize" in the bpf program. For example, for the code blow, usize = bpf_get_stack(ctx, raw_data, max_len, BPF_F_USER_STACK); if (usize < 0 || usize > max_len) return 0; The verifier may have the following errors: 52: (85) call bpf_get_stack#65 R0=map_value(id=0,off=0,ks=4,vs=1600,imm=0) R1_w=ctx(id=0,off=0,imm=0) R2_w=map_value(id=0,off=0,ks=4,vs=1600,imm=0) R3_w=inv800 R4_w=inv256 R6=ctx(id=0,off=0,imm=0) R7=map_value(id=0,off=0,ks=4,vs=1600,imm=0) R9_w=inv800 R10=fp0,call_-1 53: (bf) r8 = r0 54: (bf) r1 = r8 55: (67) r1 <<= 32 56: (bf) r2 = r1 57: (77) r2 >>= 32 58: (25) if r2 > 0x31f goto pc+33 R0=inv(id=0) R1=inv(id=0,smax_value=9223372032559808512, umax_value=18446744069414584320, var_off=(0x0; 0xffffffff00000000)) R2=inv(id=0,umax_value=799,var_off=(0x0; 0x3ff)) R6=ctx(id=0,off=0,imm=0) R7=map_value(id=0,off=0,ks=4,vs=1600,imm=0) R8=inv(id=0) R9=inv800 R10=fp0,call_-1 59: (1f) r9 -= r8 60: (c7) r1 s>>= 32 61: (bf) r2 = r7 62: (0f) r2 += r1 math between map_value pointer and register with unbounded min value is not allowed The failure is due to llvm compiler optimization where register "r2", which is a copy of "r1", is tested for condition while later on "r1" is used for map_ptr operation. The verifier is not able to track such inst sequence effectively. Without the "usize > max_len" condition, there is no llvm optimization and the below generated code passed verifier: 52: (85) call bpf_get_stack#65 R0=map_value(id=0,off=0,ks=4,vs=1600,imm=0) R1_w=ctx(id=0,off=0,imm=0) R2_w=map_value(id=0,off=0,ks=4,vs=1600,imm=0) R3_w=inv800 R4_w=inv256 R6=ctx(id=0,off=0,imm=0) R7=map_value(id=0,off=0,ks=4,vs=1600,imm=0) R9_w=inv800 R10=fp0,call_-1 53: (b7) r1 = 0 54: (bf) r8 = r0 55: (67) r8 <<= 32 56: (c7) r8 s>>= 32 57: (6d) if r1 s> r8 goto pc+24 R0=inv(id=0,umax_value=800,var_off=(0x0; 0x3ff)) R1=inv0 R6=ctx(id=0,off=0,imm=0) R7=map_value(id=0,off=0,ks=4,vs=1600,imm=0) R8=inv(id=0,umax_value=800,var_off=(0x0; 0x3ff)) R9=inv800 R10=fp0,call_-1 58: (bf) r2 = r7 59: (0f) r2 += r8 60: (1f) r9 -= r8 61: (bf) r1 = r6 Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Yonghong Song <yhs@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-04-29bpf: add bpf_get_stack helperYonghong Song
Currently, stackmap and bpf_get_stackid helper are provided for bpf program to get the stack trace. This approach has a limitation though. If two stack traces have the same hash, only one will get stored in the stackmap table, so some stack traces are missing from user perspective. This patch implements a new helper, bpf_get_stack, will send stack traces directly to bpf program. The bpf program is able to see all stack traces, and then can do in-kernel processing or send stack traces to user space through shared map or bpf_perf_event_output. Acked-by: Alexei Starovoitov <ast@fb.com> Signed-off-by: Yonghong Song <yhs@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-04-29bpf: change prototype for stack_map_get_build_id_offsetYonghong Song
This patch didn't incur functionality change. The function prototype got changed so that the same function can be reused later. Signed-off-by: Yonghong Song <yhs@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-04-29ALSA: dice: fix kernel NULL pointer dereference due to invalid calculation ↵Takashi Sakamoto
for array index At a commit f91c9d7610a ('ALSA: firewire-lib: cache maximum length of payload to reduce function calls'), maximum size of payload for tx isochronous packet is cached to reduce the number of function calls. This cache was programmed to updated at a first callback of ohci1394 IR context. However, the maximum size is required to queueing packets before starting the isochronous context. As a result, the cached value is reused to queue packets in next time to starting the isochronous context. Then the cache is updated in a first callback of the isochronous context. This can cause kernel NULL pointer dereference in a below call graph: (sound/firewire/amdtp-stream.c) amdtp_stream_start() ->queue_in_packet() ->queue_packet() (drivers/firewire/core-iso.c) ->fw_iso_context_queue() ->struct fw_card_driver.queue_iso() (drivers/firewire/ohci.c) = ohci_queue_iso() ->queue_iso_packet_per_buffer() buffer->pages[page] The issued dereference occurs in a case that: - target unit supports different stream formats for sampling transmission frequency. - maximum length of payload for tx stream in a first trial is bigger than the length in a second trial. In this case, correct number of pages are allocated for DMA and the 'pages' array has enough elements, while index of the element is wrongly calculated according to the old value of length of payload in a call of 'queue_in_packet()'. Then it causes the issue. This commit fixes the critical bug. This affects all of drivers in ALSA firewire stack in Linux kernel v4.12 or later. [12665.302360] BUG: unable to handle kernel NULL pointer dereference at 0000000000000030 [12665.302415] IP: ohci_queue_iso+0x47c/0x800 [firewire_ohci] [12665.302439] PGD 0 [12665.302440] P4D 0 [12665.302450] [12665.302470] Oops: 0000 [#1] SMP PTI [12665.302487] Modules linked in: ... [12665.303096] CPU: 1 PID: 12760 Comm: jackd Tainted: P OE 4.13.0-38-generic #43-Ubuntu [12665.303154] Hardware name: /DH77DF, BIOS KCH7710H.86A.0069.2012.0224.1825 02/24/2012 [12665.303215] task: ffff9ce87da2ae80 task.stack: ffffb5b8823d0000 [12665.303258] RIP: 0010:ohci_queue_iso+0x47c/0x800 [firewire_ohci] [12665.303301] RSP: 0018:ffffb5b8823d3ab8 EFLAGS: 00010086 [12665.303337] RAX: ffff9ce4f4876930 RBX: 0000000000000008 RCX: ffff9ce88a3955e0 [12665.303384] RDX: 0000000000000000 RSI: 0000000034877f00 RDI: 0000000000000000 [12665.303427] RBP: ffffb5b8823d3b68 R08: ffff9ce8ccb390a0 R09: ffff9ce877639ab0 [12665.303475] R10: 0000000000000108 R11: 0000000000000000 R12: 0000000000000003 [12665.303513] R13: 0000000000000000 R14: ffff9ce4f4876950 R15: 0000000000000000 [12665.303554] FS: 00007f2ec467f8c0(0000) GS:ffff9ce8df280000(0000) knlGS:0000000000000000 [12665.303600] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [12665.303633] CR2: 0000000000000030 CR3: 00000002dcf90004 CR4: 00000000000606e0 [12665.303674] Call Trace: [12665.303698] fw_iso_context_queue+0x18/0x20 [firewire_core] [12665.303735] queue_packet+0x88/0xe0 [snd_firewire_lib] [12665.303770] amdtp_stream_start+0x19b/0x270 [snd_firewire_lib] [12665.303811] start_streams+0x276/0x3c0 [snd_dice] [12665.303840] snd_dice_stream_start_duplex+0x1bf/0x480 [snd_dice] [12665.303882] ? vma_gap_callbacks_rotate+0x1e/0x30 [12665.303914] ? __rb_insert_augmented+0xab/0x240 [12665.303936] capture_prepare+0x3c/0x70 [snd_dice] [12665.303961] snd_pcm_do_prepare+0x1d/0x30 [snd_pcm] [12665.303985] snd_pcm_action_single+0x3b/0x90 [snd_pcm] [12665.304009] snd_pcm_action_nonatomic+0x68/0x70 [snd_pcm] [12665.304035] snd_pcm_prepare+0x68/0x90 [snd_pcm] [12665.304058] snd_pcm_common_ioctl1+0x4c0/0x940 [snd_pcm] [12665.304083] snd_pcm_capture_ioctl1+0x19b/0x250 [snd_pcm] [12665.304108] snd_pcm_capture_ioctl+0x27/0x40 [snd_pcm] [12665.304131] do_vfs_ioctl+0xa8/0x630 [12665.304148] ? entry_SYSCALL_64_after_hwframe+0xe9/0x139 [12665.304172] ? entry_SYSCALL_64_after_hwframe+0xe2/0x139 [12665.304195] ? entry_SYSCALL_64_after_hwframe+0xdb/0x139 [12665.304218] ? entry_SYSCALL_64_after_hwframe+0xd4/0x139 [12665.304242] ? entry_SYSCALL_64_after_hwframe+0xcd/0x139 [12665.304265] ? entry_SYSCALL_64_after_hwframe+0xc6/0x139 [12665.304288] ? entry_SYSCALL_64_after_hwframe+0xbf/0x139 [12665.304312] ? entry_SYSCALL_64_after_hwframe+0xb8/0x139 [12665.304335] ? entry_SYSCALL_64_after_hwframe+0xb1/0x139 [12665.304358] SyS_ioctl+0x79/0x90 [12665.304374] ? entry_SYSCALL_64_after_hwframe+0x72/0x139 [12665.304397] entry_SYSCALL_64_fastpath+0x24/0xab [12665.304417] RIP: 0033:0x7f2ec3750ef7 [12665.304433] RSP: 002b:00007fff99e31388 EFLAGS: 00000246 ORIG_RAX: 0000000000000010 [12665.304465] RAX: ffffffffffffffda RBX: 00007fff99e312f0 RCX: 00007f2ec3750ef7 [12665.304494] RDX: 0000000000000000 RSI: 0000000000004140 RDI: 0000000000000007 [12665.304522] RBP: 0000556ebc63fd60 R08: 0000556ebc640560 R09: 0000000000000000 [12665.304553] R10: 0000000000000001 R11: 0000000000000246 R12: 0000556ebc63fcf0 [12665.304584] R13: 0000000000000000 R14: 0000000000000007 R15: 0000000000000000 [12665.304612] Code: 01 00 00 44 89 eb 45 31 ed 45 31 db 66 41 89 1e 66 41 89 5e 0c 66 45 89 5e 0e 49 8b 49 08 49 63 d4 4d 85 c0 49 63 ff 48 8b 14 d1 <48> 8b 72 30 41 8d 14 37 41 89 56 04 48 63 d3 0f 84 ce 00 00 00 [12665.304713] RIP: ohci_queue_iso+0x47c/0x800 [firewire_ohci] RSP: ffffb5b8823d3ab8 [12665.304743] CR2: 0000000000000030 [12665.317701] ---[ end trace 9d55b056dd52a19f ]--- Fixes: f91c9d7610a ('ALSA: firewire-lib: cache maximum length of payload to reduce function calls') Cc: <stable@vger.kernel.org> # v4.12+ Signed-off-by: Takashi Sakamoto <o-takashi@sakamocchi.jp> Signed-off-by: Takashi Iwai <tiwai@suse.de>
2018-04-28Merge tag 'for_linus_stable' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4 Pull ext4 fixes from Ted Ts'o: "Fix misc bugs and a regression for ext4" * tag 'for_linus_stable' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: ext4: add MODULE_SOFTDEP to ensure crc32c is included in the initramfs ext4: fix bitmap position validation ext4: set h_journal if there is a failure starting a reserved handle ext4: prevent right-shifting extents beyond EXT_MAX_BLOCKS
2018-04-28<linux/stringhash.h>: fix end_name_hash() for 64bit longAmir Goldstein
The comment claims that this helper will try not to loose bits, but for 64bit long it looses the high bits before hashing 64bit long into 32bit int. Use the helper hash_long() to do the right thing for 64bit long. For 32bit long, there is no change. All the callers of end_name_hash() either assign the result to qstr->hash, which is u32 or return the result as an int value (e.g. full_name_hash()). Change the helper return type to int to conform to its users. [ It took me a while to apply this, because my initial reaction to it was - incorrectly - that it could make for slower code. After having looked more at it, I take back all my complaints about the patch, Amir was right and I was mis-reading things or just being stupid. I also don't worry too much about the possible performance impact of this on 64-bit, since most architectures that actually care about performance end up not using this very much (the dcache code is the most performance-critical, but the word-at-a-time case uses its own hashing anyway). So this ends up being mostly used for filesystems that do their own degraded hashing (usually because they want a case-insensitive comparison function). A _tiny_ worry remains, in that not everybody uses DCACHE_WORD_ACCESS, and then this potentially makes things more expensive on 64-bit architectures with slow or lacking multipliers even for the normal case. That said, realistically the only such architecture I can think of is PA-RISC. Nobody really cares about performance on that, it's more of a "look ma, I've got warts^W an odd machine" platform. So the patch is fine, and all my initial worries were just misplaced from not looking at this properly. - Linus ] Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-04-28net: phy: Fix modular PHYLIB buildFlorian Fainelli
After commit c59530d0d5dc ("net: Move PHY statistics code into PHY library helpers") we made net/core/ethtool.c reference symbols which are part of the library which can be modular. David introduced a temporary fix with 1ecd6e8ad996 ("phy: Temporary build fix after phylib changes.") which would prevent such modularity. This is not desireable of course, so instead, just inline the functions into include/linux/phy.h to keep both options available. Fixes: c59530d0d5dc ("net: Move PHY statistics code into PHY library helpers") Fixes: 1ecd6e8ad996 ("phy: Temporary build fix after phylib changes.") Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-04-28MAINTAINERS: add myself as maintainer of AFFSDavid Sterba
The AFFS filesystem is still in use by m68k community (Link #2), but as there was no code activity and no maintainer, the filesystem appeared on the list of candidates for staging/removal (Link #1). I volunteer to act as a maintainer of AFFS to collect any fixes that might show up and to guard fs/affs/ against another spring cleaning. Link: https://lkml.kernel.org/r/20180425154602.GA8546@bombadil.infradead.org Link: https://lkml.kernel.org/r/1613268.lKBQxPXt8J@merkaba CC: Martin Steigerwald <martin@lichtvoll.de> CC: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de> Signed-off-by: David Sterba <dsterba@suse.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-04-28Merge branch 'i2c/for-current' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux Pull i2c fixes from Wolfram Sang: - two driver fixes - better parameter check for the core - Documentation updates - part of a tree-wide HAS_DMA cleanup * 'i2c/for-current' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux: i2c: sprd: Fix the i2c count issue i2c: sprd: Prevent i2c accesses after suspend is called i2c: dev: prevent ZERO_SIZE_PTR deref in i2cdev_ioctl_rdwr() Documentation/i2c: adopt kernel commenting style in examples Documentation/i2c: sync docs with current state of i2c-tools Documentation/i2c: whitespace cleanup i2c: Remove depends on HAS_DMA in case of platform dependency
2018-04-28Merge branch 'linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6 Pull crypto fixes from Herbert Xu: - crypto API regression that may cause sporadic alloc failures - double-free bug in drbg * 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6: crypto: drbg - set freed buffers to NULL crypto: api - fix finding algorithm currently being tested
2018-04-28Merge tag '4.17-rc2-smb3' of git://git.samba.org/sfrench/cifs-2.6Linus Torvalds
Pull cifs fixes from Steve French: "A few security related fixes for SMB3, most importantly for SMB3.11 encryption" * tag '4.17-rc2-smb3' of git://git.samba.org/sfrench/cifs-2.6: cifs: smbd: Avoid allocating iov on the stack cifs: smbd: Don't use RDMA read/write when signing is used SMB311: Fix reconnect SMB3: Fix 3.11 encryption to Windows and handle encrypted smb3 tcon CIFS: set *resp_buf_type to NO_BUFFER on error
2018-04-28Merge tag 'powerpc-4.17-4' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux Pull powerpc fixes from Michael Ellerman: "A bunch of fixes, mostly for existing code and going to stable. Our memory hot-unplug path wasn't flushing the cache before removing memory. That is a problem now that we are doing memory hotplug on bare metal. Three fixes for the NPU code that supports devices connected via NVLink (ie. GPUs). The main one tweaks the TLB flush algorithm to avoid soft lockups for large flushes. A fix for our memory error handling where we would loop infinitely, returning back to the bad access and hard lockup the CPU. Fixes for the OPAL RTC driver, which wasn't handling some error cases correctly. A fix for a hardlockup in the powernv cpufreq driver. And finally two fixes to our smp_send_stop(), required due to a recent change to use it on shutdown. Thanks to: Alistair Popple, Balbir Singh, Laurentiu Tudor, Mahesh Salgaonkar, Mark Hairgrove, Nicholas Piggin, Rashmica Gupta, Shilpasri G Bhat" * tag 'powerpc-4.17-4' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: powerpc/kvm/booke: Fix altivec related build break powerpc: Fix deadlock with multiple calls to smp_send_stop cpufreq: powernv: Fix hardlockup due to synchronous smp_call in timer interrupt powerpc: Fix smp_send_stop NMI IPI handling rtc: opal: Fix OPAL RTC driver OPAL_BUSY loops powerpc/mce: Fix a bug where mce loops on memory UE. powerpc/powernv/npu: Do a PID GPU TLB flush when invalidating a large address range powerpc/powernv/npu: Prevent overwriting of pnv_npu2_init_contex() callback parameters powerpc/powernv/npu: Add lock to prevent race in concurrent context init/destroy powerpc/powernv/memtrace: Let the arch hotunplug code flush cache powerpc/mm: Flush cache on memory hot(un)plug
2018-04-27udp: remove stray export symbolWillem de Bruijn
UDP GSO needs to export __udp_gso_segment to call it from ipv6. I accidentally exported static ipv4 function __udp4_gso_segment. Remove that EXPORT_SYMBOL_GPL. Fixes: ee80d1ebe5ba ("udp: add udp gso") Signed-off-by: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-04-27ipv6: sr: Add documentation for seg_flowlabel sysctlAhmed Abdelsalam
This patch adds a documentation for seg_flowlabel sysctl into Documentation/networking/ip-sysctl.txt Signed-off-by: Ahmed Abdelsalam <amsalam20@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-04-27Merge branch 'sfc-more-ARFS-fixes'David S. Miller
Edward Cree says: ==================== sfc: more ARFS fixes A couple more bits of breakage in my recent ARFS and async filters work. Patch #1 in particular fixes a bug that leads to memory trampling and consequent crashes. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2018-04-27sfc: fix ARFS expiry check on EF10Edward Cree
Owing to a missing conditional, the result of rps_may_expire_flow() was being ignored and filters were being removed even if we'd decided not to expire them. Fixes: f8d6203780b7 ("sfc: ARFS filter IDs") Signed-off-by: Edward Cree <ecree@solarflare.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-04-27sfc: Use filter index rather than ID for rps_flow_id tableEdward Cree
efx->type->filter_insert() returns an ID rather than the index that efx->type->filter_async_insert() used to, which causes it to exceed efx->type->max_rx_ip_filters on some EF10 configurations, leading to out- of-bounds array writes. So, in efx_filter_rfs_work(), convert this back into an index (which is what the remove call in the expiry path expects, anyway). Fixes: 3af0f34290f6 ("sfc: replace asynchronous filter operations") Signed-off-by: Edward Cree <ecree@solarflare.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-04-27drivers: net: replace UINT64_MAX with U64_MAXJisheng Zhang
U64_MAX is well defined now while the UINT64_MAX is not, so we fall back to drivers' own definition as below: #ifndef UINT64_MAX #define UINT64_MAX (u64)(~((u64)0)) #endif I believe this is in one phy driver then copied and pasted to other phy drivers. Replace the UINT64_MAX with U64_MAX to clean up the source code. Signed-off-by: Jisheng Zhang <Jisheng.Zhang@synaptics.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-04-27net: support compat 64-bit time in {s,g}etsockoptLance Richardson
For the x32 ABI, struct timeval has two 64-bit fields. However the kernel currently interprets the user-space values used for the SO_RCVTIMEO and SO_SNDTIMEO socket options as having a pair of 32-bit fields. When the seconds portion of the requested timeout is less than 2**32, the seconds portion of the effective timeout is correct but the microseconds portion is zero. When the seconds portion of the requested timeout is zero and the microseconds portion is non-zero, the kernel interprets the timeout as zero (never timeout). Fix by using 64-bit time for SO_RCVTIMEO/SO_SNDTIMEO as required for the ABI. The code included below demonstrates the problem. Results before patch: $ gcc -m64 -Wall -O2 -o socktmo socktmo.c && ./socktmo recv time: 2.008181 seconds send time: 2.015985 seconds $ gcc -m32 -Wall -O2 -o socktmo socktmo.c && ./socktmo recv time: 2.016763 seconds send time: 2.016062 seconds $ gcc -mx32 -Wall -O2 -o socktmo socktmo.c && ./socktmo recv time: 1.007239 seconds send time: 1.023890 seconds Results after patch: $ gcc -m64 -O2 -Wall -o socktmo socktmo.c && ./socktmo recv time: 2.010062 seconds send time: 2.015836 seconds $ gcc -m32 -O2 -Wall -o socktmo socktmo.c && ./socktmo recv time: 2.013974 seconds send time: 2.015981 seconds $ gcc -mx32 -O2 -Wall -o socktmo socktmo.c && ./socktmo recv time: 2.030257 seconds send time: 2.013383 seconds #include <stdio.h> #include <stdlib.h> #include <sys/socket.h> #include <sys/types.h> #include <sys/time.h> void checkrc(char *str, int rc) { if (rc >= 0) return; perror(str); exit(1); } static char buf[1024]; int main(int argc, char **argv) { int rc; int socks[2]; struct timeval tv; struct timeval start, end, delta; rc = socketpair(AF_UNIX, SOCK_STREAM, 0, socks); checkrc("socketpair", rc); /* set timeout to 1.999999 seconds */ tv.tv_sec = 1; tv.tv_usec = 999999; rc = setsockopt(socks[0], SOL_SOCKET, SO_RCVTIMEO, &tv, sizeof tv); rc = setsockopt(socks[0], SOL_SOCKET, SO_SNDTIMEO, &tv, sizeof tv); checkrc("setsockopt", rc); /* measure actual receive timeout */ gettimeofday(&start, NULL); rc = recv(socks[0], buf, sizeof buf, 0); gettimeofday(&end, NULL); timersub(&end, &start, &delta); printf("recv time: %ld.%06ld seconds\n", (long)delta.tv_sec, (long)delta.tv_usec); /* fill send buffer */ do { rc = send(socks[0], buf, sizeof buf, 0); } while (rc > 0); /* measure actual send timeout */ gettimeofday(&start, NULL); rc = send(socks[0], buf, sizeof buf, 0); gettimeofday(&end, NULL); timersub(&end, &start, &delta); printf("send time: %ld.%06ld seconds\n", (long)delta.tv_sec, (long)delta.tv_usec); exit(0); } Fixes: 515c7af85ed9 ("x32: Use compat shims for {g,s}etsockopt") Reported-by: Gopal RajagopalSai <gopalsr83@gmail.com> Signed-off-by: Lance Richardson <lance.richardson.net@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-04-27rMerge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds
Pull KVM fixes from Radim Krčmář: "ARM: - PSCI selection API, a leftover from 4.16 (for stable) - Kick vcpu on active interrupt affinity change - Plug a VMID allocation race on oversubscribed systems - Silence debug messages - Update Christoffer's email address (linaro -> arm) x86: - Expose userspace-relevant bits of a newly added feature - Fix TLB flushing on VMX with VPID, but without EPT" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: x86/headers/UAPI: Move DISABLE_EXITS KVM capability bits to the UAPI kvm: apic: Flush TLB after APIC mode/address change if VPIDs are in use arm/arm64: KVM: Add PSCI version selection API KVM: arm/arm64: vgic: Kick new VCPU on interrupt migration arm64: KVM: Demote SVE and LORegion warnings to debug only MAINTAINERS: Update e-mail address for Christoffer Dall KVM: arm/arm64: Close VMID generation race
2018-04-27selftests: Fix lib.mk run_tests target shell scriptMathieu Desnoyers
Within run_tests target, the whole script needs to be executed within the same shell and not as separate subshells, so the initial test_num variable set to 0 is still present when executing "test_num=`echo $$test_num+1 | bc`;". Demonstration of the issue (make run_tests): TAP version 13 (standard_in) 1: syntax error selftests: basic_test ======================================== ok 1.. selftests: basic_test [PASS] (standard_in) 1: syntax error selftests: basic_percpu_ops_test ======================================== ok 1.. selftests: basic_percpu_ops_test [PASS] (standard_in) 1: syntax error selftests: param_test ======================================== ok 1.. selftests: param_test [PASS] With fix applied: TAP version 13 selftests: basic_test ======================================== ok 1..1 selftests: basic_test [PASS] selftests: basic_percpu_ops_test ======================================== ok 1..2 selftests: basic_percpu_ops_test [PASS] selftests: param_test ======================================== ok 1..3 selftests: param_test [PASS] Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Fixes: 1f87c7c15d7 ("selftests: lib.mk: change RUN_TESTS to print messages in TAP13 format") CC: Shuah Khan <shuahkh@osg.samsung.com> CC: linux-kselftest@vger.kernel.org Signed-off-by: Shuah Khan (Samsung OSG) <shuah@kernel.org>
2018-04-27dt-bindings: meson-uart: DT fix s/clocks-names/clock-names/Geert Uytterhoeven
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be> Acked-by: Neil Armstrong <narmstrong@baylibre.com> Signed-off-by: Rob Herring <robh@kernel.org>
2018-04-27ptp_pch: use helpers function for converting between ns and timespecYueHaibing
use ns_to_timespec64() and timespec64_to_ns() instead of open coding Signed-off-by: YueHaibing <yuehaibing@huawei.com> Acked-by: Richard Cochran <richardcochran@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-04-27net: qrtr: Expose tunneling endpoint to user spaceBjorn Andersson
This implements a misc character device named "qrtr-tun" for the purpose of allowing user space applications to implement endpoints in the qrtr network. This allows more advanced (and dynamic) testing of the qrtr code as well as opens up the ability of tunneling qrtr over a network or USB link. Signed-off-by: Bjorn Andersson <bjorn.andersson@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-04-27MAINTAINERS: add davem in NETWORKING DRIVERSVivien Didelot
"./scripts/get_maintainer.pl -f" does not actually show us David as the maintainer of drivers/net directories such as team, bonding, phy or dsa. Adding him in an M: entry of NETWORKING DRIVERS fixes this. Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-04-27Merge branch 'selftests-Add-tests-for-mirroring-to-gretap'David S. Miller
Petr Machata says: ==================== selftests: Add tests for mirroring to gretap This suite tests GRE-encapsulated mirroring. The general topology that most of the tests use is as follows, but each test defines details of the topology based on its needs, and some tests actually use a somewhat different topology. +---------------------+ +---------------------+ | H1 | | H2 | | + $h1 | | $h2 + | +-----|---------------+ +---------------|-----+ | | +-----|------------------------------------------------------|-----+ | SW o---> mirror | | | +---|------------------------------------------------------|---+ | | | + $swp1 BR $swp2 + | | | +--------------------------------------------------------------+ | | | | + $swp3 + gt6 (ip6gretap) + gt4 (gretap) | +-----|----------------:--------------------:----------------------+ | : : +-----|----------------:--------------------:----------------------+ | + $h3 + h3-gt6(ip6gretap) + h3-gt4 (gretap) | | H3 | +------------------------------------------------------------------+ The following axes of configuration space are tested: - ingress and egress mirroring - mirroring triggered by matchall and flower - mirroring to ipgretap and ip6gretap - remote tunnel reachable directly or through a next-hop route - skip_sw as well as skip_hw configurations Apart from basic tests with the above mentioned features, the following tests are included: - handling of changes to neighbors pertinent to routing decisions in mirrored underlay - handling of configuration changes at the mirrored-to tunnel (endpoint addresses, upness) A suite of mlxsw-specific tests will be part of a separate submission through linux-mlxsw patch queue. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2018-04-27selftests: forwarding: Test changes in mirror-to-gretapPetr Machata
These tests set up mirroring in a situation that the configuration is incorrect, i.e. mirrored packets, if any, are not supposed to reach destination tunnel device. Then the configuration is rectified and mirroring is checked to have started working. Signed-off-by: Petr Machata <petrm@mellanox.com> Reviewed-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-04-27selftests: forwarding: Test neighbor updates when mirroring to gretapPetr Machata
Test that when a mirror to gretap or ip6gretap netdevice is configured, changes to neighbors are reflected. Signed-off-by: Petr Machata <petrm@mellanox.com> Reviewed-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-04-27selftests: forwarding: Test flower mirror to gretapPetr Machata
Add a test for mirroring to a gretap and an ip6gretap netdevices such that the mirroring action is triggered by a flower match. Signed-off-by: Petr Machata <petrm@mellanox.com> Reviewed-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-04-27selftests: forwarding: Test mirror to gretap w/ bound devPetr Machata
Test mirroring to a gretap and an ip6gretap netdevice with a bound device, where the tunnel device and the bound device are in different VRFs (an overlay / underlay configuration). Signed-off-by: Petr Machata <petrm@mellanox.com> Reviewed-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-04-27selftests: forwarding: Test gretap mirror with next-hop remotePetr Machata
Test mirror to a gretap and an ip6gretap netdevice such that the remote address of the tunnel is reachable through a next-hop route. Signed-off-by: Petr Machata <petrm@mellanox.com> Reviewed-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-04-27selftests: forwarding: Add test for mirror to gretapPetr Machata
Add a test for basic mirroring to gretap and ip6gretap netdevices. Signed-off-by: Petr Machata <petrm@mellanox.com> Reviewed-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-04-27selftests: forwarding: Add libs for gretap mirror testingPetr Machata
To simplify implementation of mirror-to-gretap tests, extend lib.sh with several new functions that might potentially be useful more broadly (although right now the mirroring tests will be the only client). Also add mirror_lib.sh with code useful for mirroring tests, mirror_gre_lib.sh with code specifically useful for mirror-to-gretap tests, and mirror_gre_topo.sh that primes a given test with a good baseline topology that the test can then tweak to its liking. Signed-off-by: Petr Machata <petrm@mellanox.com> Reviewed-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-04-27Merge branch 'bnxt_en-next'David S. Miller
Michael Chan says: ==================== bnxt_en: Net-next updates. This series has 3 main features. The first is to add mqprio TC to hardware queue mapping to avoid reprogramming hardware CoS queue watermarks during run-time. The second is DIM improvements from Andy Gospo. The third is some improvements to VF resource allocations when supporting large numbers of VFs with more limited resources. There are some additional minor improvements and a new function level discard counter. v2: Fixed EEPROM typo noted by Andrew Lunn. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2018-04-27bnxt_en: Reserve rings at driver open if none was reserved at probe time.Michael Chan
Add logic to reserve default rings at driver open time if none was reserved during probe time. This will happen when the PF driver did not provision minimum rings to the VF, due to more limited resources. Driver open will only succeed if some minimum rings can be reserved. Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-04-27bnxt_en: Reserve RSS and L2 contexts for VF.Michael Chan
For completeness and correctness, the VF driver needs to reserve these RSS and L2 contexts. Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-04-27bnxt_en: Don't reserve rings on VF when min rings were not provisioned by PF.Michael Chan
When rings are more limited and the PF has not provisioned minimum guaranteed rings to the VF, do not reserve rings during driver probe. Wait till device open before reserving rings when they will be used. Device open will succeed if some minimum rings can be successfully reserved and allocated. Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-04-27bnxt_en: Reserve rings in bnxt_set_channels() if device is down.Michael Chan
The current code does not reserve rings during ethtool -L when the device is down. The rings will be reserved when the device is later opened. Change it to reserve rings during ethtool -L when the device is down. This provides a better guarantee that the device open will be successful when the rings are reserved ahead of time. Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-04-27bnxt_en: add debugfs support for DIMAndy Gospodarek
This adds debugfs support for bnxt_en with the purpose of allowing users to examine the current DIM profile in use for each receive queue. This was instrumental in debugging issues found with DIM and ensuring that the profiles we expect to use are the profiles being used. Signed-off-by: Andy Gospodarek <gospo@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-04-27bnxt_en: reduce timeout on initial HWRM callsAndy Gospodarek
Testing with DIM enabled on older kernels indicated that firmware calls were slower than expected. More detailed analysis indicated that the default 25us delay was higher than necessary. Reducing the time spend in usleep_range() for the first several calls would reduce the overall latency of firmware calls on newer Intel processors. Signed-off-by: Andy Gospodarek <gospo@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-04-27bnxt_en: Increase RING_IDLE minimum threshold to 50Andy Gospodarek
This keeps the RING_IDLE flag set in hardware for higher coalesce settings by default and improved latency. Signed-off-by: Andy Gospodarek <gospo@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-04-27bnxt_en: Do not allow VF to read EEPROM.Michael Chan
Firmware does not allow the operation and would return failure, causing a warning in dmesg. So check for VF and disallow it in the driver. Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-04-27bnxt_en: Display function level rx/tx_discard_pkts via ethtoolVasundhara Volam
Add counters to display sum of rx/tx_discard_pkts of all rings as function level statistics via ethtool. Signed-off-by: Vasundhara Volam <vasundhara-v.volam@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-04-27bnxt_en: Simplify ring alloc/free error messages.Michael Chan
Replace switch statements printing different messages for every ring type with a common message. Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-04-27bnxt_en: Do not set firmware time from VF driver on older firmware.Michael Chan
Older firmware will reject this call and cause an error message to be printed by the VF driver. Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-04-27bnxt_en: Check the lengths of encapsulated firmware responses.Michael Chan
Firmware messages that are forwarded from PF to VFs are encapsulated. The size of these encapsulated messages must not exceed the maximum defined message size. Add appropriate checks to avoid oversize messages. Firmware messages may be expanded in future specs and this will provide some guardrails to avoid data corruption. Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-04-27bnxt_en: Remap TC to hardware queues when configuring PFC.Michael Chan
Initially, the MQPRIO TCs are mapped 1:1 directly to the hardware queues. Some of these hardware queues are configured to be lossless. When PFC is enabled on one of more TCs, we now need to remap the TCs that have PFC enabled to the lossless hardware queues. After remapping, we need to close and open the NIC for the new mapping to take effect. We also need to reprogram all ETS parameters. Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-04-27bnxt_en: Add TC to hardware QoS queue mapping logic.Michael Chan
The current driver maps MQPRIO traffic classes directly 1:1 to the internal hardware queues (TC0 maps to hardware queue 0, etc). This direct mapping requires the internal hardware queues to be reconfigured from lossless to lossy and vice versa when necessary. This involves reconfiguring internal buffer thresholds which is disruptive and not always reliable. Implement a new scheme to map TCs to internal hardware queues by matching up their PFC requirements. This will eliminate the need to reconfigure a hardware queue internal buffers at run time. After remapping, the NIC is closed and opened for the new TC to hardware queues to take effect. This patch only adds the basic mapping logic. Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>