summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2018-12-04bpf: improve verifier branch analysisAlexei Starovoitov
pathological bpf programs may try to force verifier to explode in the number of branch states: 20: (d5) if r1 s<= 0x24000028 goto pc+0 21: (b5) if r0 <= 0xe1fa20 goto pc+2 22: (d5) if r1 s<= 0x7e goto pc+0 23: (b5) if r0 <= 0xe880e000 goto pc+0 24: (c5) if r0 s< 0x2100ecf4 goto pc+0 25: (d5) if r1 s<= 0xe880e000 goto pc+1 26: (c5) if r0 s< 0xf4041810 goto pc+0 27: (d5) if r1 s<= 0x1e007e goto pc+0 28: (b5) if r0 <= 0xe86be000 goto pc+0 29: (07) r0 += 16614 30: (c5) if r0 s< 0x6d0020da goto pc+0 31: (35) if r0 >= 0x2100ecf4 goto pc+0 Teach verifier to recognize always taken and always not taken branches. This analysis is already done for == and != comparison. Expand it to all other branches. It also helps real bpf programs to be verified faster: before after bpf_lb-DLB_L3.o 2003 1940 bpf_lb-DLB_L4.o 3173 3089 bpf_lb-DUNKNOWN.o 1080 1065 bpf_lxc-DDROP_ALL.o 29584 28052 bpf_lxc-DUNKNOWN.o 36916 35487 bpf_netdev.o 11188 10864 bpf_overlay.o 6679 6643 bpf_lcx_jit.o 39555 38437 Reported-by: Anatoly Trosinenko <anatoly.trosinenko@gmail.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Edward Cree <ecree@solarflare.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-12-04bpf: check pending signals while verifying programsAlexei Starovoitov
Malicious user space may try to force the verifier to use as much cpu time and memory as possible. Hence check for pending signals while verifying the program. Note that suspend of sys_bpf(PROG_LOAD) syscall will lead to EAGAIN, since the kernel has to release the resources used for program verification. Reported-by: Anatoly Trosinenko <anatoly.trosinenko@gmail.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Edward Cree <ecree@solarflare.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-12-04Merge branch 'prog_test_run-improvement'Alexei Starovoitov
Lorenz Bauer says: ==================== Right now, there is no safe way to use BPF_PROG_TEST_RUN with data_out. This is because bpf_test_finish copies the output buffer to user space without checking its size. This can lead to the kernel overwriting data in user space after the buffer if xdp_adjust_head and friends are in play. Thanks to everyone for their advice and patience with this patch set! Changes in v5: * Fix up libbpf.map Changes in v4: * Document bpf_prog_test_run and bpf_prog_test_run_xattr * Use struct bpf_prog_test_run_attr for return values Changes in v3: * Introduce bpf_prog_test_run_xattr instead of modifying the existing function Changes in v2: * Make the syscall return ENOSPC if data_size_out is too small * Make bpf_prog_test_run return EINVAL if size_out is missing * Document the new behaviour of data_size_out ==================== Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-12-04selftests: add a test for bpf_prog_test_run_xattrLorenz Bauer
Make sure that bpf_prog_test_run_xattr returns the correct length and that the kernel respects the output size hint. Also check that errno indicates ENOSPC if there is a short output buffer given. Signed-off-by: Lorenz Bauer <lmb@cloudflare.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-12-04libbpf: add bpf_prog_test_run_xattrLorenz Bauer
Add a new function, which encourages safe usage of the test interface. bpf_prog_test_run continues to work as before, but should be considered unsafe. Signed-off-by: Lorenz Bauer <lmb@cloudflare.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-12-04tools: sync uapi/linux/bpf.hLorenz Bauer
Pull changes from "bpf: respect size hint to BPF_PROG_TEST_RUN if present". Signed-off-by: Lorenz Bauer <lmb@cloudflare.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-12-04bpf: respect size hint to BPF_PROG_TEST_RUN if presentLorenz Bauer
Use data_size_out as a size hint when copying test output to user space. ENOSPC is returned if the output buffer is too small. Callers which so far did not set data_size_out are not affected. Signed-off-by: Lorenz Bauer <lmb@cloudflare.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-12-04Revert "exec: make de_thread() freezable"Rafael J. Wysocki
Revert commit c22397888f1e "exec: make de_thread() freezable" as requested by Ingo Molnar: "So there's a new regression in v4.20-rc4, my desktop produces this lockdep splat: [ 1772.588771] WARNING: pkexec/4633 still has locks held! [ 1772.588773] 4.20.0-rc4-custom-00213-g93a49841322b #1 Not tainted [ 1772.588775] ------------------------------------ [ 1772.588776] 1 lock held by pkexec/4633: [ 1772.588778] #0: 00000000ed85fbf8 (&sig->cred_guard_mutex){+.+.}, at: prepare_bprm_creds+0x2a/0x70 [ 1772.588786] stack backtrace: [ 1772.588789] CPU: 7 PID: 4633 Comm: pkexec Not tainted 4.20.0-rc4-custom-00213-g93a49841322b #1 [ 1772.588792] Call Trace: [ 1772.588800] dump_stack+0x85/0xcb [ 1772.588803] flush_old_exec+0x116/0x890 [ 1772.588807] ? load_elf_phdrs+0x72/0xb0 [ 1772.588809] load_elf_binary+0x291/0x1620 [ 1772.588815] ? sched_clock+0x5/0x10 [ 1772.588817] ? search_binary_handler+0x6d/0x240 [ 1772.588820] search_binary_handler+0x80/0x240 [ 1772.588823] load_script+0x201/0x220 [ 1772.588825] search_binary_handler+0x80/0x240 [ 1772.588828] __do_execve_file.isra.32+0x7d2/0xa60 [ 1772.588832] ? strncpy_from_user+0x40/0x180 [ 1772.588835] __x64_sys_execve+0x34/0x40 [ 1772.588838] do_syscall_64+0x60/0x1c0 The warning gets triggered by an ancient lockdep check in the freezer: (gdb) list *0xffffffff812ece06 0xffffffff812ece06 is in flush_old_exec (./include/linux/freezer.h:57). 52 * DO NOT ADD ANY NEW CALLERS OF THIS FUNCTION 53 * If try_to_freeze causes a lockdep warning it means the caller may deadlock 54 */ 55 static inline bool try_to_freeze_unsafe(void) 56 { 57 might_sleep(); 58 if (likely(!freezing(current))) 59 return false; 60 return __refrigerator(false); 61 } I reviewed the ->cred_guard_mutex code, and the mutex is held across all of exec() - and we always did this. But there's this recent -rc4 commit: > Chanho Min (1): > exec: make de_thread() freezable c22397888f1e: exec: make de_thread() freezable I believe this commit is bogus, you cannot call try_to_freeze() from de_thread(), because it's holding the ->cred_guard_mutex." Reported-by: Ingo Molnar <mingo@kernel.org> Tested-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2018-12-04btrfs: tree-checker: Don't check max block group size as current max chunk ↵Qu Wenruo
size limit is unreliable [BUG] A completely valid btrfs will refuse to mount, with error message like: BTRFS critical (device sdb2): corrupt leaf: root=2 block=239681536 slot=172 \ bg_start=12018974720 bg_len=10888413184, invalid block group size, \ have 10888413184 expect (0, 10737418240] This has been reported several times as the 4.19 kernel is now being used. The filesystem refuses to mount, but is otherwise ok and booting 4.18 is a workaround. Btrfs check returns no error, and all kernels used on this fs is later than 2011, which should all have the 10G size limit commit. [CAUSE] For a 12 devices btrfs, we could allocate a chunk larger than 10G due to stripe stripe bump up. __btrfs_alloc_chunk() |- max_stripe_size = 1G |- max_chunk_size = 10G |- data_stripe = 11 |- if (1G * 11 > 10G) { stripe_size = 976128930; stripe_size = round_up(976128930, SZ_16M) = 989855744 However the final stripe_size (989855744) * 11 = 10888413184, which is still larger than 10G. [FIX] For the comprehensive check, we need to do the full check at chunk read time, and rely on bg <-> chunk mapping to do the check. We could just skip the length check for now. Fixes: fce466eab7ac ("btrfs: tree-checker: Verify block_group_item") Cc: stable@vger.kernel.org # v4.19+ Reported-by: Wang Yugui <wangyugui@e16-tech.com> Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2018-12-04MMC: OMAP: fix broken MMC on OMAP15XX/OMAP5910/OMAP310Aaro Koskinen
Since v2.6.22 or so there has been reports [1] about OMAP MMC being broken on OMAP15XX based hardware (OMAP5910 and OMAP310). The breakage seems to have been caused by commit 46a6730e3ff9 ("mmc-omap: Fix omap to use MMC_POWER_ON") that changed clock enabling to be done on MMC_POWER_ON. This can happen multiple times in a row, and on 15XX the hardware doesn't seem to like it and the MMC just stops responding. Fix by memorizing the power mode and do the init only when necessary. Before the patch (on Palm TE): mmc0: new SD card at address b368 mmcblk0: mmc0:b368 SDC 977 MiB mmci-omap mmci-omap.0: command timeout (CMD18) mmci-omap mmci-omap.0: command timeout (CMD13) mmci-omap mmci-omap.0: command timeout (CMD13) mmci-omap mmci-omap.0: command timeout (CMD12) [x 6] mmci-omap mmci-omap.0: command timeout (CMD13) [x 6] mmcblk0: error -110 requesting status mmci-omap mmci-omap.0: command timeout (CMD8) mmci-omap mmci-omap.0: command timeout (CMD18) mmci-omap mmci-omap.0: command timeout (CMD13) mmci-omap mmci-omap.0: command timeout (CMD13) mmci-omap mmci-omap.0: command timeout (CMD12) [x 6] mmci-omap mmci-omap.0: command timeout (CMD13) [x 6] mmcblk0: error -110 requesting status mmcblk0: recovery failed! print_req_error: I/O error, dev mmcblk0, sector 0 Buffer I/O error on dev mmcblk0, logical block 0, async page read mmcblk0: unable to read partition table After the patch: mmc0: new SD card at address b368 mmcblk0: mmc0:b368 SDC 977 MiB mmcblk0: p1 The patch is based on a fix and analysis done by Ladislav Michl. Tested on OMAP15XX/OMAP310 (Palm TE), OMAP1710 (Nokia 770) and OMAP2420 (Nokia N810). [1] https://marc.info/?t=123175197000003&r=1&w=2 Fixes: 46a6730e3ff9 ("mmc-omap: Fix omap to use MMC_POWER_ON") Reported-by: Ladislav Michl <ladis@linux-mips.org> Reported-by: Andrzej Zaborowski <balrogg@gmail.com> Tested-by: Ladislav Michl <ladis@linux-mips.org> Acked-by: Tony Lindgren <tony@atomide.com> Signed-off-by: Aaro Koskinen <aaro.koskinen@iki.fi> Cc: stable@vger.kernel.org Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
2018-12-04drm/fb-helper: Fix typo in parameter descriptionWei Yongjun
Fix typo in parameter description. Fixes: 4be9bd10e22d ("drm/fb_helper: Allow leaking fbdev smem_start") Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com> Acked-by: Neil Armstrong <narmstrong@baylibre.com> Signed-off-by: Neil Armstrong <narmstrong@baylibre.com> Link: https://patchwork.freedesktop.org/patch/msgid/1543905135-35293-1-git-send-email-weiyongjun1@huawei.com
2018-12-04Revert "ovl: relax permission checking on underlying layers"Miklos Szeredi
This reverts commit 007ea44892e6fa963a0876a979e34890325c64eb. The commit broke some selinux-testsuite cases, and it looks like there's no straightforward fix keeping the direction of this patch, so revert for now. The original patch was trying to fix the consistency of permission checks, and not an observed bug. So reverting should be safe. Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2018-12-04mmc: core: use mrq->sbc when sending CMD23 for RPMBWolfram Sang
When sending out CMD23 in the blk preparation, the comment there rightfully says: * However, it is not sufficient to just send CMD23, * and avoid the final CMD12, as on an error condition * CMD12 (stop) needs to be sent anyway. This, coupled * with Auto-CMD23 enhancements provided by some * hosts, means that the complexity of dealing * with this is best left to the host. If CMD23 is * supported by card and host, we'll fill sbc in and let * the host deal with handling it correctly. Let's do this behaviour for RPMB as well, and not send CMD23 independently. Otherwise IP cores (like Renesas SDHI) may timeout because of automatic CMD23/CMD12 handling. Reported-by: Masaharu Hayakawa <masaharu.hayakawa.ry@renesas.com> Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com> Tested-by: Clément Péron <peron.clem@gmail.com> Reviewed-by: Avri Altman <avri.altman@wdc.com> Reviewed-by: Niklas Söderlund <niklas.soderlund+renesas@ragnatech.se> Cc: stable@vger.kernel.org Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
2018-12-04kprobes/x86: Fix instruction patching corruption when copying more than one ↵Masami Hiramatsu
RIP-relative instruction After copy_optimized_instructions() copies several instructions to the working buffer it tries to fix up the real RIP address, but it adjusts the RIP-relative instruction with an incorrect RIP address for the 2nd and subsequent instructions due to a bug in the logic. This will break the kernel pretty badly (with likely outcomes such as a kernel freeze, a crash, or worse) because probed instructions can refer to the wrong data. For example putting kprobes on cpumask_next() typically hits this bug. cpumask_next() is normally like below if CONFIG_CPUMASK_OFFSTACK=y (in this case nr_cpumask_bits is an alias of nr_cpu_ids): <cpumask_next>: 48 89 f0 mov %rsi,%rax 8b 35 7b fb e2 00 mov 0xe2fb7b(%rip),%esi # ffffffff82db9e64 <nr_cpu_ids> 55 push %rbp ... If we put a kprobe on it and it gets jump-optimized, it gets patched by the kprobes code like this: <cpumask_next>: e9 95 7d 07 1e jmpq 0xffffffffa000207a 7b fb jnp 0xffffffff81f8a2e2 <cpumask_next+2> e2 00 loop 0xffffffff81f8a2e9 <cpumask_next+9> 55 push %rbp This shows that the first two MOV instructions were copied to a trampoline buffer at 0xffffffffa000207a. Here is the disassembled result of the trampoline, skipping the optprobe template instructions: # Dump of assembly code from 0xffffffffa000207a to 0xffffffffa00020ea: 54 push %rsp ... 48 83 c4 08 add $0x8,%rsp 9d popfq 48 89 f0 mov %rsi,%rax 8b 35 82 7d db e2 mov -0x1d24827e(%rip),%esi # 0xffffffff82db9e67 <nr_cpu_ids+3> This dump shows that the second MOV accesses *(nr_cpu_ids+3) instead of the original *nr_cpu_ids. This leads to a kernel freeze because cpumask_next() always returns 0 and for_each_cpu() never ends. Fix this by adding 'len' correctly to the real RIP address while copying. [ mingo: Improved the changelog. ] Reported-by: Michael Rodin <michael@rodin.online> Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org> Reviewed-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ravi Bangoria <ravi.bangoria@linux.ibm.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: stable@vger.kernel.org # v4.15+ Fixes: 63fef14fc98a ("kprobes/x86: Make insn buffer always ROX and use text_poke()") Link: http://lkml.kernel.org/r/153504457253.22602.1314289671019919596.stgit@devbox Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-12-04net/mlx5: Update mlx5_ifc with DEVX UCTX capabilities bitsYishai Hadas
Expose device capabilities for DEVX user context, it includes which caps the device is supported and a matching bit to set as part of user context creation. Signed-off-by: Yishai Hadas <yishaih@mellanox.com> Reviewed-by: Artemy Kovalyov <artemyko@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
2018-12-04RDMA/mlx5: Unfold modify RMP functionLeon Romanovsky
There is no need to perform modify_rmp in two separate function, while one of them uses stack as a placeholder for data while other allocates it dynamically. Combine those two functions to one call instead of two. Reviewed-by: Mark Bloch <markb@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
2018-12-04RDMA/mlx5: Unfold create RMP functionLeon Romanovsky
There is no need to perform create_rmp in two separate function, while one of them uses stack as a placeholder for data while other allocates it dynamically. Combine those two functions to one instead of two. Reviewed-by: Mark Bloch <markb@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
2018-12-04RDMA/mlx5: Initialize SRQ tables on mlx5_ibLeon Romanovsky
Transfer initialization and cleanup from mlx5_priv struct of mlx5_core_dev to be part of mlx5_ib_dev. This completes removal of SRQ from mlx5_core. Reviewed-by: Mark Bloch <markb@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
2018-12-04RDMA/mlx5: Update SRQ functions signatures to mlx5_ib formatLeon Romanovsky
Reflect the change of moving SRQ code from mlx5_core to mlx5_ib by updating function signatures do not require mlx5_core_dev as an input, because all operations in mlx5_ib are supposed to use mlx5_ib_dev. Reviewed-by: Mark Bloch <markb@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
2018-12-04RDMA/mlx5: Use stages for callback to setup and release DEVXLeon Romanovsky
Reuse existing infrastructure to initialize and release DEVX uid. The DevX interface is intended for user space access, so it is supposed to be initialized before ib_register_device(). Also it isn't supported in switchdev mode and don't need to initialize it in that mode. Fixes: 76dc5a8406bf ("IB/mlx5: Manage device uid for DEVX white list commands") Reviewed-by: Mark Bloch <markb@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
2018-12-04RDMA/mlx5: Remove SRQ signature global flagLeon Romanovsky
SRQ signature is not supported, hence no need for special static global variable to announce it. Reviewed-by: Mark Bloch <markb@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
2018-12-04net/mlx5: Move SRQ functions to RDMA partLeon Romanovsky
There is no need to keep SRQ which is RDMA object in mlx5_core. In this patch, we partially move the execution code, while next patches will move table initialization/release logic too. Reviewed-by: Mark Bloch <markb@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
2018-12-04net/mlx5: Remove references to local mlx5_core functionsLeon Romanovsky
As a preparation to move SRQ functionality to RDMA, drop all references to mlx5_core logic and make SRQ be dependent on shared code only. Most of the time, we are interested to know if events are working/not working and it is possible with previous commit ("net/mlx5: Debug print for forwarded async events"). Reviewed-by: Mark Bloch <markb@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
2018-12-04net/mlx5: Remove not-used lib/eq.h header fileLeon Romanovsky
lib/eq.h is needed for EQ manipulation which are not performed in SRQ. Reviewed-by: Mark Bloch <markb@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
2018-12-04net/mlx5: Remove dead transobj codeLeon Romanovsky
Delete functions which are not called and not needed. Reviewed-by: Mark Bloch <markb@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
2018-12-04net/mlx5: Align SRQ licenses and copyright informationLeon Romanovsky
Ensure that both RDMA and netdev parts of SRQ implementation has same copyright and license information annotated by SPDX tags. Reviewed-by: Mark Bloch <markb@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
2018-12-03net: marvell: convert to DEFINE_SHOW_ATTRIBUTEYangtao Li
Use DEFINE_SHOW_ATTRIBUTE macro to simplify the code. Signed-off-by: Yangtao Li <tiny.windzz@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-03net: qca_spi: convert to DEFINE_SHOW_ATTRIBUTEYangtao Li
Use DEFINE_SHOW_ATTRIBUTE macro to simplify the code. Signed-off-by: Yangtao Li <tiny.windzz@gmail.com> Acked-by: Stefan Wahren <stefan.wahren@i2se.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-03net: stmmac: convert to DEFINE_SHOW_ATTRIBUTEYangtao Li
Use DEFINE_SHOW_ATTRIBUTE macro to simplify the code. Signed-off-by: Yangtao Li <tiny.windzz@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-03nfp: convert to DEFINE_SHOW_ATTRIBUTEYangtao Li
Use DEFINE_SHOW_ATTRIBUTE macro to simplify the code. Signed-off-by: Yangtao Li <tiny.windzz@gmail.com> Acked-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-04netfilter: nf_tables: fix suspicious RCU usage in nft_chain_stats_replace()Taehee Yoo
basechain->stats is rcu protected data which is updated from nft_chain_stats_replace(). This function is executed from the commit phase which holds the pernet nf_tables commit mutex - not the global nfnetlink subsystem mutex. Test commands to reproduce the problem are: %iptables-nft -I INPUT %iptables-nft -Z %iptables-nft -Z This patch uses RCU calls to handle basechain->stats updates to fix a splat that looks like: [89279.358755] ============================= [89279.363656] WARNING: suspicious RCU usage [89279.368458] 4.20.0-rc2+ #44 Tainted: G W L [89279.374661] ----------------------------- [89279.379542] net/netfilter/nf_tables_api.c:1404 suspicious rcu_dereference_protected() usage! [...] [89279.406556] 1 lock held by iptables-nft/5225: [89279.411728] #0: 00000000bf45a000 (&net->nft.commit_mutex){+.+.}, at: nf_tables_valid_genid+0x1f/0x70 [nf_tables] [89279.424022] stack backtrace: [89279.429236] CPU: 0 PID: 5225 Comm: iptables-nft Tainted: G W L 4.20.0-rc2+ #44 [89279.430135] Call Trace: [89279.430135] dump_stack+0xc9/0x16b [89279.430135] ? show_regs_print_info+0x5/0x5 [89279.430135] ? lockdep_rcu_suspicious+0x117/0x160 [89279.430135] nft_chain_commit_update+0x4ea/0x640 [nf_tables] [89279.430135] ? sched_clock_local+0xd4/0x140 [89279.430135] ? check_flags.part.35+0x440/0x440 [89279.430135] ? __rhashtable_remove_fast.constprop.67+0xec0/0xec0 [nf_tables] [89279.430135] ? sched_clock_cpu+0x126/0x170 [89279.430135] ? find_held_lock+0x39/0x1c0 [89279.430135] ? hlock_class+0x140/0x140 [89279.430135] ? is_bpf_text_address+0x5/0xf0 [89279.430135] ? check_flags.part.35+0x440/0x440 [89279.430135] ? __lock_is_held+0xb4/0x140 [89279.430135] nf_tables_commit+0x2555/0x39c0 [nf_tables] Fixes: f102d66b335a4 ("netfilter: nf_tables: use dedicated mutex to guard transactions") Signed-off-by: Taehee Yoo <ap420073@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-12-03Merge branch 'mlx4_core-cleanups'David S. Miller
Tariq Toukan says: ==================== mlx4_core cleanups This patchset by Erez contains cleanups to the mlx4_core driver. Patch 1 replaces -EINVAL with -EOPNOTSUPP for unsupported operations. Patch 2 fixes some coding style issues. Series generated against net-next commit: 97e6c858a26e net: usb: aqc111: Initialize wol_cfg with memset in aqc111_suspend ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-03net/mlx4_core: Fix several coding style errorsErez Alfasi
Fix 3 checkpatch errors within mlx4/main.c: - Unnecessary mlx4_debug_level global variable initialization to 0. - Prohibited space before comma. - Whitespaces instead of tab. Signed-off-by: Erez Alfasi <ereza@mellanox.com> Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-03net/mlx4_core: Fix return codes of unsupported operationsErez Alfasi
Functions __set_port_type and mlx4_check_port_params returned -EINVAL while the proper return code is -EOPNOTSUPP as a result of an unsupported operation. All drivers should generate this and all users should check for it when detecting an unsupported functionality. Signed-off-by: Erez Alfasi <ereza@mellanox.com> Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-03net: phy: Also request modules for C45 IDsJose Abreu
Logic of phy_device_create() requests PHY modules according to PHY ID but for C45 PHYs we use different field for the IDs. Let's also request the modules for these IDs. Changes from v1: - Only request C22 modules if C45 are not present (Andrew) Signed-off-by: Jose Abreu <joabreu@synopsys.com> Cc: Andrew Lunn <andrew@lunn.ch> Cc: Florian Fainelli <f.fainelli@gmail.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: Joao Pinto <joao.pinto@synopsys.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-03Merge branch 'octeontx2-next'David S. Miller
Jerin Jacob says: ==================== octeontx2-af: NIX and NPC enhancements This patchset is a continuation to earlier submitted four patch series to add a new driver for Marvell's OcteonTX2 SOC's Resource virtualization unit (RVU) admin function driver. 1. octeontx2-af: Add RVU Admin Function driver https://www.spinics.net/lists/netdev/msg528272.html 2. octeontx2-af: NPA and NIX blocks initialization https://www.spinics.net/lists/netdev/msg529163.html 3. octeontx2-af: NPC parser and NIX blocks initialization https://www.spinics.net/lists/netdev/msg530252.html 4. octeontx2-af: NPC MCAM support and FLR handling https://www.spinics.net/lists/netdev/msg534392.html This patch series adds support for below NPC block: - Add NPC(mkex) profile support for various Key extraction configurations NIX block: - Enable dynamic RSS flow key algorithm configuration - Enhancements on Rx checksum and error checks - Add support for Tx packet marking support - TL1 schedule queue allocation enhancements - Add LSO format configuration mbox - VLAN TPID configuration - Skip multicast entry init for broadcast tables v2: - Rename FLOW_* to NIX_FLOW_* to avoid serious global namespace collisions, as we have various FLOW_* definitions coming from include/uapi/linux/pkt_cls.h for example.(David Miller) - Pack the arguments of rvu_get_tl1_schqs() function as 80 columns allows.(David Miller) ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-03octeontx2-af: Enable mkex profileVamsi Attunuru
The following set of NPC registers allow the driver to configure NPC to generate different key value schemes to compare against packet payload in MCAM search. NPC_AF_INTF(0..1)_KEX_CFG NPC_AF_KEX_LDATA(0..1)_FLAGS_CFG NPC_AF_INTF(0..1)_LID(0..7)_LT(0..15)_LD(0..1)_CFG NPC_AF_INTF(0..1)_LDATA(0..1)_FLAGS(0..15)_CFG Currently, the AF driver populates these registers to configure the default values to address the most common use cases such as key generation for channel number + DMAC. The secure firmware stores different configuration value of these registers to enable different NPC use case along with the name for the lookup. Patch loads profile binary from secure firmware over the exiting CGX mailbox interface and apply the profile. AF driver shall fall back to the default configuration in case of any errors. The AF consumer driver can know the selected profile on response to NPC_GET_KEX_CFG mailbox by introducing mkex_pfl_name in the struct npc_get_kex_cfg_rsp. Signed-off-by: Vamsi Attunuru <vamsi.attunuru@marvell.com> Signed-off-by: Jerin Jacob <jerinj@marvell.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-03octeontx2-af: Add LSO format configuration mailboxNithin Dabilpuram
NIX_AF_LSO_FORMAT(0..31)_FIELD(0..7) register enables an SW defined means to define LSO packet modification formats. 0..31 works as an index to choose the algorithm, On success, the mailbox returns the index to the client of chosen LSO algorithm selection. This index will be used in configuring the transmit descriptors. Add mailbox interface to dynamically reserve and configure LSO format. This commit also fixes 'sizem1' for NIX_LSOALG_TCP_FLAGS to '1' i.e 2 Bytes. Signed-off-by: Nithin Dabilpuram <ndabilpuram@marvell.com> Signed-off-by: Sunil Goutham <sgoutham@marvell.com> Signed-off-by: Jerin Jacob <jerinj@marvell.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-03octeontx2-af: Add L3 and L4 packet verification mailboxVidhya Raman
Adds mailbox support for L4 checksum verification and L3 and L4 length verification configuration. Signed-off-by: Vidhya Raman <vraman@marvell.com> Signed-off-by: Jerin Jacob <jerinj@marvell.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-03octeontx2-af: Configure VLAN TPIDsNithin Dabilpuram
Setup TPID's for vlan0 and vlan1 for Tx VLAN insertion offloads. Signed-off-by: Nithin Dabilpuram <ndabilpuram@marvell.com> Signed-off-by: Jerin Jacob <jerinj@marvell.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-03octeontx2-af: Add support for Tx packet markingKrzysztof Kanas
NIX_AF_MARK_FORMAT(0..127)_CTL register enables an SW defined means to mark/insert various data in the packet based on final packet color from traffic shaping HW. 0..127 works as an index to choose the algorithm. On success, the mailbox returns the index to the client. Add NIX_MARK_FORMAT_CFG mailbox which reserves mark format based on tuple (offset, y_mask, y_val, r_mask, r_val) If the tuple is requested again for mark format that was already reserved, then it will be reused. If not it will reserve a new entry if space is available. Also on AF init commonly used marker format such as VLAN DEI, IPv4 ECN, IPv4 DSCP are reserved for AF consumers. Signed-off-by: Krzysztof Kanas <kkanas@marvell.com> Signed-off-by: Jerin Jacob <jerinj@marvell.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-03octeontx2-af: Enable RSS with promiscuous modeVamsi Attunuru
This patch adds support for enabling RSS in promiscuous mode if RSS is already requested by the AF client. Signed-off-by: Vamsi Attunuru <vamsi.attunuru@marvell.com> Signed-off-by: Jerin Jacob <jerinj@marvell.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-03octeontx2-af: Define all NIX_AF_RX_DEF_* registersJerin Jacob
In order to support all NIX specific valid length errors and checksum errors on Rx, Update all NIX_AF_RX_DEF_* registers. Also sorted all registers in HRM definition order. Signed-off-by: Jerin Jacob <jerinj@marvell.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-03octeontx2-af: Enable inner IPv4 checksum and its error codeJerin Jacob
This patch enables the inner IPv4 checksum and defines the error code for Rx inner and outer checksum errors. Setting ERRCODE as 1 so that CQE descriptor can be embedded valid checksum error code and the driver can interpret checksum error as ERRLEV = LID + 1 and ERRCODE = 1. Signed-off-by: Jerin Jacob <jerinj@marvell.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-03octeontx2-af: Allow freeing single TLx Tx schedule queueNithin Dabilpuram
The default behavior was to free all the TLx Tx schedule queues. This patch adds support for freeing a single Tx schedule queue if TXSCHQ_FREE_ALL flag is not set. Signed-off-by: Krzysztof Kanas <kkanas@marvell.com> Signed-off-by: Nithin Dabilpuram <ndabilpuram@marvell.com> Signed-off-by: Jerin Jacob <jerinj@marvell.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-03octeontx2-af: Restrict TL1 allocation and configurationNithin Dabilpuram
TL1 is the root node in the scheduling hierarchy and it is a global resource with a limited number. This patch introduces restriction and validation on the allocation of the TL1 nodes for the effective resource sharing across the AF consumers. - Limit TL1 allocation to 2 per lmac. One could be for the normal link and one for IEEE802.3br express link (Express Send DMA). Effectively all the VF's of an RVU PF(lmac) share the two TL1 schqs. - TL1 cannot be freed once allocated. - Allow VF's to only apply default config to TL1 if not already applied. PF's can always overwrite the TL1 config. - Consider NIX_AQ_INSTOP_WRITE while validating txschq when sq.ena is set. Signed-off-by: Krzysztof Kanas <kkanas@marvell.com> Signed-off-by: Nithin Dabilpuram <ndabilpuram@marvell.com> Signed-off-by: Jerin Jacob <jerinj@marvell.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-03octeontx2-af: Add support for runtime RSS algo index reservationJerin Jacob
Introduced reserve_flowkey_alg_idx()to reserve RSS algorithm index, it would internally use set_flowkey_fields() to generate fields based on the flow key dynamically. On AF driver init, it would reserve a predefined set RSS algo indexes, which will be available all the time for all the AF driver consumers. The leftover algo indexes can be reserved at runtime through exiting nix_rss_flowkey_cfg mailbox message. The NIX_FLOW_KEY_TYPE_PORT is removed from predefined a set of RSS flow type as it is not used by any consumer. Signed-off-by: Jerin Jacob <jerinj@marvell.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-03octeontx2-af: Add support for dynamic flow cfg to RSS field generationJerin Jacob
Introduce state-based algorithm to convert the flow_key value to RSS algo field used by NIX_AF_RX_FLOW_KEY_ALGX_FIELDX register. The outer `for loop` goes over _all_ protocol field and the following variables depict the state machine forward progress logic. a) keyoff_marker - Enabled when hash byte length needs to be accounted in field->key_offset update. b) field_marker - Enabled when a new field needs to be selected. c) group_member - Enabled when a protocol is part of a group. This would remove the existing hard coding and enable to add new protocol support seamlessly. Signed-off-by: Jerin Jacob <jerinj@marvell.com> Signed-off-by: Nithin Dabilpuram <ndabilpuram@marvell.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-03octeontx2-af: Add response for RSS flow key cfg messageJerin Jacob
Added response for nix_rss_flowkey_cfg message to return selected RSS algorithm index. The FLOW_KEY_TYPE* definition is part of the mbox message and it will be used by the other consumers of AF driver hence moving to mbox.h. Also renamed FLOW_* definitions to NIX_FLOW_* to avoid global name space collisions, as we have various coming from include/uapi/linux/pkt_cls.h for example. Signed-off-by: Jerin Jacob <jerinj@marvell.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-03octeontx2-af: Skip NIXLF check for bcast MCE entrySunil Goutham
At the time of initial broadcast packet replication table init, NIXLFs are not yet attached to PF_FUNCs. Hence skipped checking NIXLF while submitting MCE entry init instruction to NIX admin queue. Also did a minor cleanup while installing bcast match entry in packet parser unit i.e NPC. Signed-off-by: Sunil Goutham <sgoutham@marvell.com> Signed-off-by: Jerin Jacob <jerinj@marvell.com> Signed-off-by: David S. Miller <davem@davemloft.net>