summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2020-06-02kdb: Remove the misfeature 'KDBFLAGS'Wei Li
Currently, 'KDBFLAGS' is an internal variable of kdb, it is combined by 'KDBDEBUG' and state flags. It will be shown only when 'KDBDEBUG' is set, and the user can define an environment variable named 'KDBFLAGS' too. These are puzzling indeed. After communication with Daniel, it seems that 'KDBFLAGS' is a misfeature. So let's replace 'KDBFLAGS' with 'KDBDEBUG' to just show the value we wrote into. After this modification, we can use `md4c1 kdb_flags` instead, to observe the state flags. Suggested-by: Daniel Thompson <daniel.thompson@linaro.org> Signed-off-by: Wei Li <liwei391@huawei.com> Link: https://lore.kernel.org/r/20200521072125.21103-1-liwei391@huawei.com [daniel.thompson@linaro.org: Make kdb_flags unsigned to avoid arithmetic right shift] Signed-off-by: Daniel Thompson <daniel.thompson@linaro.org>
2020-06-02kdb: Cleanup math with KDB_CMD_HISTORY_COUNTDouglas Anderson
From code inspection the math in handle_ctrl_cmd() looks super sketchy because it subjects -1 from cmdptr and then does a "% KDB_CMD_HISTORY_COUNT". It turns out that this code works because "cmdptr" is unsigned and KDB_CMD_HISTORY_COUNT is a nice power of 2. Let's make this a little less sketchy. This patch should be a no-op. Signed-off-by: Douglas Anderson <dianders@chromium.org> Link: https://lore.kernel.org/r/20200507161125.1.I2cce9ac66e141230c3644b8174b6c15d4e769232@changeid Reviewed-by: Sumit Garg <sumit.garg@linaro.org> Signed-off-by: Daniel Thompson <daniel.thompson@linaro.org>
2020-06-02serial: amba-pl011: Support kgdboc_earlyconSumit Garg
Implement the read() function in the early console driver. With recently added kgdboc_earlycon feature, this allows you to use kgdb to debug fairly early into the system boot. We only bother implementing this if polling is enabled since kgdb can't be enabled without that. Signed-off-by: Sumit Garg <sumit.garg@linaro.org> Reviewed-by: Douglas Anderson <dianders@chromium.org> Reviewed-by: Daniel Thompson <daniel.thompson@linaro.org> Signed-off-by: Douglas Anderson <dianders@chromium.org> Link: https://lore.kernel.org/r/20200507130644.v4.12.I8ee0811f0e0816dd8bfe7f2f5540b3dba074fae8@changeid Signed-off-by: Daniel Thompson <daniel.thompson@linaro.org>
2020-06-02serial: 8250_early: Support kgdboc_earlyconDouglas Anderson
Implement the read() function in the early console driver. With recent kgdb patches this allows you to use kgdb to debug fairly early into the system boot. We only bother implementing this if polling is enabled since kgdb can't be enabled without that. Signed-off-by: Douglas Anderson <dianders@chromium.org> Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Reviewed-by: Daniel Thompson <daniel.thompson@linaro.org> Link: https://lore.kernel.org/r/20200507130644.v4.11.I8f668556c244776523320a95b09373a86eda11b7@changeid Signed-off-by: Daniel Thompson <daniel.thompson@linaro.org>
2020-06-02serial: qcom_geni_serial: Support kgdboc_earlyconDouglas Anderson
Implement the read() function in the early console driver. With recent kgdb patches this allows you to use kgdb to debug fairly early into the system boot. We only bother implementing this if polling is enabled since kgdb can't be enabled without that. Signed-off-by: Douglas Anderson <dianders@chromium.org> Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Link: https://lore.kernel.org/r/20200507130644.v4.10.If2deff9679a62c1ce1b8f2558a8635dc837adf8c@changeid Signed-off-by: Daniel Thompson <daniel.thompson@linaro.org>
2020-06-02serial: kgdboc: Allow earlycon initialization to be deferredDaniel Thompson
Currently there is no guarantee that an earlycon will be initialized before kgdboc tries to adopt it. Almost the opposite: on systems with ACPI then if earlycon has no arguments then it is guaranteed that earlycon will not be initialized. This patch mitigates the problem by giving kgdboc_earlycon a second chance during console_init(). This isn't quite as good as stopping during early parameter parsing but it is still early in the kernel boot. Signed-off-by: Daniel Thompson <daniel.thompson@linaro.org> Link: https://lore.kernel.org/r/20200430161741.1832050-1-daniel.thompson@linaro.org Reviewed-by: Douglas Anderson <dianders@chromium.org>
2020-06-02Documentation: kgdboc: Document new kgdboc_earlycon parameterDouglas Anderson
The recent patch ("kgdboc: Add kgdboc_earlycon to support early kgdb using boot consoles") adds a new kernel command line parameter. Document it. Note that the patch adding the feature does some comparing/contrasting of "kgdboc_earlycon" vs. the existing "ekgdboc". See that patch for more details, but briefly "ekgdboc" can be used _instead_ of "kgdboc" and just makes "kgdboc" do its normal initialization early (only works if your tty driver is already ready). The new "kgdboc_earlycon" works in combination with "kgdboc" and is backed by boot consoles. Signed-off-by: Douglas Anderson <dianders@chromium.org> Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Reviewed-by: Daniel Thompson <daniel.thompson@linaro.org> Link: https://lore.kernel.org/r/20200507130644.v4.9.I7d5eb42c6180c831d47aef1af44d0b8be3fac559@changeid Signed-off-by: Daniel Thompson <daniel.thompson@linaro.org>
2020-06-02kgdb: Don't call the deinit under spinlockDouglas Anderson
When I combined kgdboc_earlycon with an inflight patch titled ("soc: qcom-geni-se: Add interconnect support to fix earlycon crash") [1] things went boom. Specifically I got a crash during the transition between kgdboc_earlycon and the main kgdboc that looked like this: Call trace: __schedule_bug+0x68/0x6c __schedule+0x75c/0x924 schedule+0x8c/0xbc schedule_timeout+0x9c/0xfc do_wait_for_common+0xd0/0x160 wait_for_completion_timeout+0x54/0x74 rpmh_write_batch+0x1fc/0x23c qcom_icc_bcm_voter_commit+0x1b4/0x388 qcom_icc_set+0x2c/0x3c apply_constraints+0x5c/0x98 icc_set_bw+0x204/0x3bc icc_put+0x30/0xf8 geni_remove_earlycon_icc_vote+0x6c/0x9c qcom_geni_serial_earlycon_exit+0x10/0x1c kgdboc_earlycon_deinit+0x38/0x58 kgdb_register_io_module+0x11c/0x194 configure_kgdboc+0x108/0x174 kgdboc_probe+0x38/0x60 platform_drv_probe+0x90/0xb0 really_probe+0x130/0x2fc ... The problem was that we were holding the "kgdb_registration_lock" while calling into code that didn't expect to be called in spinlock context. Let's slightly defer when we call the deinit code so that it's not done under spinlock. NOTE: this does mean that the "deinit" call of the old kgdb IO module is now made _after_ the init of the new IO module, but presumably that's OK. [1] https://lkml.kernel.org/r/1588919619-21355-3-git-send-email-akashast@codeaurora.org Fixes: 220995622da5 ("kgdboc: Add kgdboc_earlycon to support early kgdb using boot consoles") Signed-off-by: Douglas Anderson <dianders@chromium.org> Link: https://lore.kernel.org/r/20200526142001.1.I523dc33f96589cb9956f5679976d402c8cda36fa@changeid [daniel.thompson@linaro.org: Resolved merge issues by hand] Signed-off-by: Daniel Thompson <daniel.thompson@linaro.org>
2020-06-02kgdboc: Disable all the early code when kgdboc is a moduleDouglas Anderson
When kgdboc is compiled as a module all of the "ekgdboc" and "kgdb_earlycon" code isn't useful and, in fact, breaks compilation. This is because early_param() isn't defined for modules and that's how this code gets configured. It turns out that this was broken by commit eae3e19ca930 ("kgdboc: Remove useless #ifdef CONFIG_KGDB_SERIAL_CONSOLE in kgdboc") and then made worse by commit 220995622da5 ("kgdboc: Add kgdboc_earlycon to support early kgdb using boot consoles"). I guess the #ifdef wasn't so useless, even if it wasn't obvious why it was useful. When kgdboc was compiled as a module only "CONFIG_KGDB_SERIAL_CONSOLE_MODULE" was defined, not "CONFIG_KGDB_SERIAL_CONSOLE". That meant that the old module. Let's basically do the same thing that the old code (pre-removal of the #ifdef) did but use "IS_BUILTIN(CONFIG_KGDB_SERIAL_CONSOLE)" to make it more obvious what the point of the check is. We'll fix kgdboc_earlycon in a similar way. Fixes: 220995622da5 ("kgdboc: Add kgdboc_earlycon to support early kgdb using boot consoles") Fixes: eae3e19ca930 ("kgdboc: Remove useless #ifdef CONFIG_KGDB_SERIAL_CONSOLE in kgdboc") Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Douglas Anderson <dianders@chromium.org> Link: https://lore.kernel.org/r/20200519084345.1.I91670accc8a5ddabab227eb63bb4ad3e2e9d2b58@changeid Signed-off-by: Daniel Thompson <daniel.thompson@linaro.org>
2020-06-02perf tools: Remove some duplicated includesTiezhu Yang
There exists some duplicated includes in tools/perf, remove them. Signed-off-by: Tiezhu Yang <yangtiezhu@loongson.cn> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: xuefeng li <lixuefeng@loongson.cn> Link: http://lore.kernel.org/lkml/1591071304-19338-2-git-send-email-yangtiezhu@loongson.cn Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-06-02perf symbols: Fix kernel maps for kcore and eBPFAdrian Hunter
Adjust 'map->pgoff' also when moving a map's start address. Example with v5.4.34 based kernel: Before: $ sudo tools/perf/perf record -a --kcore -e intel_pt//k sleep 1 [ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 1.958 MB perf.data ] $ sudo tools/perf/perf script --itrace=e >/dev/null Warning: 961 instruction trace errors After: $ sudo tools/perf/perf script --itrace=e >/dev/null $ Committer testing: # uname -a Linux seventh 5.6.10-100.fc30.x86_64 #1 SMP Mon May 4 15:36:44 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux # Before: # perf record -a --kcore -e intel_pt//k sleep 1 [ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 0.923 MB perf.data ] # perf script --itrace=e >/dev/null Warning: 295 instruction trace errors # After: # perf record -a --kcore -e intel_pt//k sleep 1 [ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 0.919 MB perf.data ] # perf script --itrace=e >/dev/null # Fixes: fb5a88d4131a ("perf tools: Preserve eBPF maps when loading kcore") Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: stable@vger.kernel.org Link: http://lore.kernel.org/lkml/20200602112505.1406-1-adrian.hunter@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-06-02tools arch x86: Sync the msr-index.h copy with the kernel sourcesArnaldo Carvalho de Melo
To pick up the changes in: 5cde265384ca ("perf/x86/rapl: Add AMD Fam17h RAPL support") Addressing this tools/perf build warning: Warning: Kernel ABI header at 'tools/arch/x86/include/asm/msr-index.h' differs from latest version at 'arch/x86/include/asm/msr-index.h' diff -u tools/arch/x86/include/asm/msr-index.h arch/x86/include/asm/msr-index.h With this one will be able to use these new AMD MSRs in filters, by name, e.g.: # perf trace -e msr:* --filter="msr==AMD_PKG_ENERGY_STATUS || msr==AMD_RAPL_POWER_UNIT" Just like it is now possible with other MSRs: [root@five ~]# uname -a Linux five 5.5.17-200.fc31.x86_64 #1 SMP Mon Apr 13 15:29:42 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux [root@five ~]# grep 'model name' -m1 /proc/cpuinfo model name : AMD Ryzen 5 3600X 6-Core Processor [root@five ~]# [root@five ~]# perf trace -e msr:*/max-stack=16/ --filter="msr==AMD_PERF_CTL" --max-events=2 0.000 kworker/1:1-ev/2327824 msr:write_msr(msr: AMD_PERF_CTL, val: 2) do_trace_write_msr ([kernel.kallsyms]) do_trace_write_msr ([kernel.kallsyms]) [0xffffffffc01d71c3] ([acpi_cpufreq]) [0] ([unknown]) __cpufreq_driver_target ([kernel.kallsyms]) od_dbs_update ([kernel.kallsyms]) dbs_work_handler ([kernel.kallsyms]) process_one_work ([kernel.kallsyms]) worker_thread ([kernel.kallsyms]) kthread ([kernel.kallsyms]) ret_from_fork ([kernel.kallsyms]) 8.597 kworker/2:2-ev/2338099 msr:write_msr(msr: AMD_PERF_CTL, val: 2) do_trace_write_msr ([kernel.kallsyms]) do_trace_write_msr ([kernel.kallsyms]) [0] ([unknown]) [0] ([unknown]) __cpufreq_driver_target ([kernel.kallsyms]) od_dbs_update ([kernel.kallsyms]) dbs_work_handler ([kernel.kallsyms]) process_one_work ([kernel.kallsyms]) worker_thread ([kernel.kallsyms]) kthread ([kernel.kallsyms]) ret_from_fork ([kernel.kallsyms]) [root@five ~]# Longer explanation with what happens in the perf build process, automatically after this is made in synch with the kernel sources: $ make -C tools/perf O=/tmp/build/perf install-bin <SNIP> Warning: Kernel ABI header at 'tools/arch/x86/include/asm/msr-index.h' differs from latest version at 'arch/x86/include/asm/msr-index.h' diff -u tools/arch/x86/include/asm/msr-index.h arch/x86/include/asm/msr-index.h <SNIP> make: Leaving directory '/home/acme/git/perf/tools/perf' $ $ tools/perf/trace/beauty/tracepoints/x86_msr.sh > before $ $ diff -u tools/arch/x86/include/asm/msr-index.h arch/x86/include/asm/msr-index.h --- tools/arch/x86/include/asm/msr-index.h 2020-06-02 10:46:36.217782288 -0300 +++ arch/x86/include/asm/msr-index.h 2020-05-28 10:41:23.313794627 -0300 @@ -301,6 +301,9 @@ #define MSR_PP1_ENERGY_STATUS 0x00000641 #define MSR_PP1_POLICY 0x00000642 +#define MSR_AMD_PKG_ENERGY_STATUS 0xc001029b +#define MSR_AMD_RAPL_POWER_UNIT 0xc0010299 + /* Config TDP MSRs */ #define MSR_CONFIG_TDP_NOMINAL 0x00000648 #define MSR_CONFIG_TDP_LEVEL_1 0x00000649 $ cp arch/x86/include/asm/msr-index.h tools/arch/x86/include/asm/msr-index.h $ $ make -C tools/perf O=/tmp/build/perf install-bin <SNIP> CC /tmp/build/perf/trace/beauty/tracepoints/x86_msr.o LD /tmp/build/perf/trace/beauty/tracepoints/perf-in.o LD /tmp/build/perf/trace/beauty/perf-in.o LD /tmp/build/perf/perf-in.o LINK /tmp/build/perf/perf <SNIP> make: Leaving directory '/home/acme/git/perf/tools/perf' $ $ tools/perf/trace/beauty/tracepoints/x86_msr.sh > after $ diff -u before after --- before 2020-06-02 10:47:08.486334348 -0300 +++ after 2020-06-02 10:47:33.075008948 -0300 @@ -286,6 +286,8 @@ [0xc0010240 - x86_AMD_V_KVM_MSRs_offset] = "F15H_NB_PERF_CTL", [0xc0010241 - x86_AMD_V_KVM_MSRs_offset] = "F15H_NB_PERF_CTR", [0xc0010280 - x86_AMD_V_KVM_MSRs_offset] = "F15H_PTSC", + [0xc0010299 - x86_AMD_V_KVM_MSRs_offset] = "AMD_RAPL_POWER_UNIT", + [0xc001029b - x86_AMD_V_KVM_MSRs_offset] = "AMD_PKG_ENERGY_STATUS", [0xc00102f0 - x86_AMD_V_KVM_MSRs_offset] = "AMD_PPIN_CTL", [0xc00102f1 - x86_AMD_V_KVM_MSRs_offset] = "AMD_PPIN", }; $ Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Stephane Eranian <eranian@google.com> Cc: Wang Nan <wangnan0@huawei.com> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-06-02perf stat: Ensure group is defined on top of the same cpu maskJiri Olsa
Jin Yao reported the issue (and posted first versions of this change) with groups being defined over events with different cpu mask. This causes assert aborts in get_group_fd, like: # perf stat -M "C2_Pkg_Residency" -a -- sleep 1 perf: util/evsel.c:1464: get_group_fd: Assertion `!(fd == -1)' failed. Aborted All the events in the group have to be defined over the same cpus so the group_fd can be found for every leader/member pair. Adding check to ensure this condition is met and removing the group (with warning) if we detect mixed cpus, like: $ sudo perf stat -e '{power/energy-cores/,cycles},{instructions,power/energy-cores/}' WARNING: event cpu maps do not match, disabling group: anon group { power/energy-cores/, cycles } anon group { instructions, power/energy-cores/ } Ian asked also for cpu maps details, it's displayed in verbose mode: $ sudo perf stat -e '{cycles,power/energy-cores/}' -v WARNING: group events cpu maps do not match, disabling group: anon group { power/energy-cores/, cycles } power/energy-cores/: 0 cycles: 0-7 anon group { instructions, power/energy-cores/ } instructions: 0-7 power/energy-cores/: 0 Committer testing: [root@seventh ~]# perf stat -e '{power/energy-cores/,cycles},{instructions,power/energy-cores/}' WARNING: grouped events cpus do not match, disabling group: anon group { power/energy-cores/, cycles } anon group { instructions, power/energy-cores/ } ^C Performance counter stats for 'system wide': 12.62 Joules power/energy-cores/ 106,920,637 cycles 80,228,899 instructions # 0.75 insn per cycle 12.62 Joules power/energy-cores/ 14.514476987 seconds time elapsed [root@seventh ~]# But if we put compatible events in each group it works: [root@seventh ~]# perf stat -e '{power/energy-cores/,power/energy-ram/},{instructions,cycles}' -a sleep 2 Performance counter stats for 'system wide': 1.95 Joules power/energy-cores/ 0.92 Joules power/energy-ram/ 29,305,715 instructions # 1.03 insn per cycle 28,423,338 cycles 2.001438142 seconds time elapsed [root@seventh ~]# This needs improvement tho: [root@seventh ~]# perf stat -e '{power/energy-cores/,power/energy-ram/},{instructions,cycles}' sleep 2 Error: The sys_perf_event_open() syscall returned with 22 (Invalid argument) for event (power/energy-cores/). /bin/dmesg | grep -i perf may provide additional information. [root@seventh ~]# We need to emit a better message, one stating that the power/ events can't be used for a specific workload, instead it is per-cpu or system wide. Fixes: 6a4bb04caacc8 ("perf tools: Enable grouping logic for parsed events") Co-developed-by: Jin Yao <yao.jin@linux.intel.com> Signed-off-by: Jiri Olsa <jolsa@kernel.org> Acked-by: Ian Rogers <irogers@google.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Michael Petlan <mpetlan@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Link: http://lore.kernel.org/lkml/20200602101736.GE1112120@krava Signed-off-by: Jin Yao <yao.jin@linux.intel.com> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-06-02irq_work: Define irq_work_single() on !CONFIG_IRQ_WORK tooIngo Molnar
Some SMP platforms don't have CONFIG_IRQ_WORK defined, resulting in a link error at build time. Define a stub and clean up the prototype definitions. Reported-by: kbuild test robot <lkp@intel.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Acked-by: Peter Zijlstra <peterz@infradead.org> Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2020-06-01Merge branch 'from-miklos' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull vfs updates from Al Viro: "Assorted patches from Miklos. An interesting part here is /proc/mounts stuff..." The "/proc/mounts stuff" is using a cursor for keeeping the location data while traversing the mount listing. Also probably worth noting is the addition of faccessat2(), which takes an additional set of flags to specify how the lookup is done (AT_EACCESS, AT_SYMLINK_NOFOLLOW, AT_EMPTY_PATH). * 'from-miklos' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: vfs: add faccessat2 syscall vfs: don't parse "silent" option vfs: don't parse "posixacl" option vfs: don't parse forbidden flags statx: add mount_root statx: add mount ID statx: don't clear STATX_ATIME on SB_RDONLY uapi: deprecate STATX_ALL utimensat: AT_EMPTY_PATH support vfs: split out access_override_creds() proc/mounts: add cursor aio: fix async fsync creds vfs: allow unprivileged whiteout creation
2020-06-01Merge branch 'work.set_fs-exec' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull uaccess/coredump updates from Al Viro: "set_fs() removal in coredump-related area - mostly Christoph's stuff..." * 'work.set_fs-exec' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: binfmt_elf_fdpic: remove the set_fs(KERNEL_DS) in elf_fdpic_core_dump binfmt_elf: remove the set_fs(KERNEL_DS) in elf_core_dump binfmt_elf: remove the set_fs in fill_siginfo_note signal: refactor copy_siginfo_to_user32 powerpc/spufs: simplify spufs core dumping powerpc/spufs: stop using access_ok powerpc/spufs: fix copy_to_user while atomic
2020-06-01Merge branch 'uaccess.__copy_to_user' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull uaccess/__copy_to_user updates from Al Viro: "Getting rid of __copy_to_user() callers - stuff that doesn't fit into other series" * 'uaccess.__copy_to_user' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: dlmfs: convert dlmfs_file_read() to copy_to_user() esas2r: don't bother with __copy_to_user()
2020-06-01Merge branch 'uaccess.__copy_from_user' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull uaccess/__copy_from_user updates from Al Viro: "Getting rid of __copy_from_user() callers - patches that don't fit into other series" * 'uaccess.__copy_from_user' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: pstore: switch to copy_from_user() firewire: switch ioctl_queue_iso to use of copy_from_user()
2020-06-01Merge branch 'uaccess.__put_user' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull uaccess/__put-user updates from Al Viro: "Removal of __put_user() calls - misc patches that don't fit into any other series" * 'uaccess.__put_user' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: pcm_native: result of put_user() needs to be checked scsi_ioctl.c: switch SCSI_IOCTL_GET_IDLUN to copy_to_user() compat sysinfo(2): don't bother with field-by-field copyout
2020-06-01Merge branch 'uaccess.readdir' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull uaccess/readdir updates from Al Viro: "Finishing the conversion of readdir.c to unsafe_... API. This includes the uaccess_{read,write}_begin series by Christophe Leroy" * 'uaccess.readdir' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: readdir.c: get rid of the last __put_user(), drop now-useless access_ok() readdir.c: get compat_filldir() more or less in sync with filldir() switch readdir(2) to unsafe_copy_dirent_name() drm/i915/gem: Replace user_access_begin by user_write_access_begin uaccess: Selectively open read or write user access uaccess: Add user_read_access_begin/end and user_write_access_begin/end
2020-06-01Merge branch 'uaccess.access_ok' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull uaccess/access_ok updates from Al Viro: "Removals of trivially pointless access_ok() calls. Note: the fiemap stuff was removed from the series, since they are duplicates with part of ext4 series carried in Ted's tree" * 'uaccess.access_ok' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: vmci_host: get rid of pointless access_ok() hfi1: get rid of pointless access_ok() usb: get rid of pointless access_ok() calls lpfc_debugfs: get rid of pointless access_ok() efi_test: get rid of pointless access_ok() drm_read(): get rid of pointless access_ok() via-pmu: don't bother with access_ok() drivers/crypto/ccp/sev-dev.c: get rid of pointless access_ok() omapfb: get rid of pointless access_ok() calls amifb: get rid of pointless access_ok() calls drivers/fpga/dfl-afu-dma-region.c: get rid of pointless access_ok() drivers/fpga/dfl-fme-pr.c: get rid of pointless access_ok() cm4000_cs.c cmm_ioctl(): get rid of pointless access_ok() nvram: drop useless access_ok() n_hdlc_tty_read(): remove pointless access_ok() tomoyo_write_control(): get rid of pointless access_ok() btrfs_ioctl_send(): don't bother with access_ok() fat_dir_ioctl(): hadn't needed that access_ok() for more than a decade... dlmfs_file_write(): get rid of pointless access_ok()
2020-06-01Merge branch 'uaccess.csum' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull uaccess/csum updates from Al Viro: "Regularize the sitation with uaccess checksum primitives: - fold csum_partial_... into csum_and_copy_..._user() - on x86 collapse several access_ok()/stac()/clac() into user_access_begin()/user_access_end()" * 'uaccess.csum' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: default csum_and_copy_to_user(): don't bother with access_ok() take the dummy csum_and_copy_from_user() into net/checksum.h arm: switch to csum_and_copy_from_user() sh32: convert to csum_and_copy_from_user() m68k: convert to csum_and_copy_from_user() xtensa: switch to providing csum_and_copy_from_user() sparc: switch to providing csum_and_copy_from_user() parisc: turn csum_partial_copy_from_user() into csum_and_copy_from_user() alpha: turn csum_partial_copy_from_user() into csum_and_copy_from_user() ia64: turn csum_partial_copy_from_user() into csum_and_copy_from_user() ia64: csum_partial_copy_nocheck(): don't abuse csum_partial_copy_from_user() x86: switch 32bit csum_and_copy_to_user() to user_access_{begin,end}() x86: switch both 32bit and 64bit to providing csum_and_copy_from_user() x86_64: csum_..._copy_..._user(): switch to unsafe_..._user() get rid of csum_partial_copy_to_user()
2020-06-01dt-bindings: mailbox: Convert imx mu to json-schemaAnson Huang
Convert the i.MX MU binding to DT schema format using json-schema Signed-off-by: Anson Huang <Anson.Huang@nxp.com> Signed-off-by: Rob Herring <robh@kernel.org>
2020-06-01dt-bindings: power: Convert imx gpcv2 to json-schemaAnson Huang
Convert the i.MX GPCv2 binding to DT schema format using json-schema Example is updated based on latest DT file and consumer's example is removed since it is NOT that useful. Signed-off-by: Anson Huang <Anson.Huang@nxp.com> Signed-off-by: Rob Herring <robh@kernel.org>
2020-06-01dt-bindings: power: Convert imx gpc to json-schemaAnson Huang
Convert the i.MX GPC binding to DT schema format using json-schema >From latest reference manual, there is actually ONLY 1 interrupt for GPC, so the additional GPC interrupt is removed. Consumer's example is also removed since it is NOT that useful. Signed-off-by: Anson Huang <Anson.Huang@nxp.com> Signed-off-by: Rob Herring <robh@kernel.org>
2020-06-01Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-nextDavid S. Miller
Alexei Starovoitov says: ==================== pull-request: bpf-next 2020-06-01 The following pull-request contains BPF updates for your *net-next* tree. We've added 55 non-merge commits during the last 1 day(s) which contain a total of 91 files changed, 4986 insertions(+), 463 deletions(-). The main changes are: 1) Add rx_queue_mapping to bpf_sock from Amritha. 2) Add BPF ring buffer, from Andrii. 3) Attach and run programs through devmap, from David. 4) Allow SO_BINDTODEVICE opt in bpf_setsockopt, from Ferenc. 5) link based flow_dissector, from Jakub. 6) Use tracing helpers for lsm programs, from Jiri. 7) Several sk_msg fixes and extensions, from John. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2020-06-01sfc: add missing annotation for efx_ef10_try_update_nic_stats_vf()Jules Irenge
Sparse reports a warning at efx_ef10_try_update_nic_stats_vf() warning: context imbalance in efx_ef10_try_update_nic_stats_vf() - unexpected unlock The root cause is the missing annotation at efx_ef10_try_update_nic_stats_vf() Add the missing _must_hold(&efx->stats_lock) annotation Signed-off-by: Jules Irenge <jbi.octave@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-06-01crypto/chtls: IPv6 support for inline TLSVinay Kumar Yadav
Extends support to IPv6 for Inline TLS server. Signed-off-by: Vinay Kumar Yadav <vinay.yadav@chelsio.com> v1->v2: - cc'd tcp folks. v2->v3: - changed EXPORT_SYMBOL() to EXPORT_SYMBOL_GPL() Signed-off-by: David S. Miller <davem@davemloft.net>
2020-06-01Merge branch 'chelsio-crypto-fixes'David S. Miller
Ayush Sawal says: ==================== Fixing compilation warnings and errors Patch 1: Fixes the warnings seen when compiling using sparse tool. Patch 2: Fixes a cocci check error introduced after commit 567be3a5d227 ("crypto: chelsio - Use multiple txq/rxq per tfm to process the requests"). V1->V2 patch1: Avoid type casting by using get_unaligned_be32() and put_unaligned_be16/32() functions. patch2: Modified subject of the patch. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2020-06-01Crypto/chcr: Fixes a coccinile check errorAyush Sawal
This fixes an error observed after running coccinile check. drivers/crypto/chelsio/chcr_algo.c:1462:5-8: Unneeded variable: "err". Return "0" on line 1480 This line is missed in the commit 567be3a5d227 ("crypto: chelsio - Use multiple txq/rxq per tfm to process the requests"). Fixes: 567be3a5d227 ("crypto: chelsio - Use multiple txq/rxq per tfm to process the requests"). V1->V2 -Modified subject. Signed-off-by: Ayush Sawal <ayush.sawal@chelsio.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-06-01Crypto/chcr: Fixes compilations warningsAyush Sawal
This patch fixes the compilation warnings displayed by sparse tool for chcr driver. V1->V2 Avoid type casting by using get_unaligned_be32() and put_unaligned_be16/32() functions. The key which comes from stack is an u8 byte stream so we store it in an unsigned char array(ablkctx->key). The function get_aes_decrypt_key() is a used to calculate the reverse round key for decryption, for this operation the key has to be divided into 4 bytes, so to extract 4 bytes from an u8 byte stream and store it in an u32 variable, get_aligned_be32() is used. Similarly for copying back the key from u32 variable to the original u8 key stream, put_aligned_be32() is used. Signed-off-by: Ayush Sawal <ayush.sawal@chelsio.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-06-01crypto/chcr: IPV6 code needs to be in CONFIG_IPV6Rohit Maheshwari
Error messages seen while building kernel with CONFIG_IPV6 disabled. Signed-off-by: Rohit Maheshwari <rohitm@chelsio.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-06-01cxgb4/chcr: Enable ktls settings at run timeRohit Maheshwari
Current design enables ktls setting from start, which is not efficient. Now the feature will be enabled when user demands TLS offload on any interface. v1->v2: - taking ULD module refcount till any single connection exists. - taking rtnl_lock() before clearing tls_devops. v2->v3: - cxgb4 is now registering to tlsdev_ops. - module refcount inc/dec in chcr. - refcount is only for connections. - removed new code from cxgb_set_feature(). v3->v4: - fixed warning message. Signed-off-by: Rohit Maheshwari <rohitm@chelsio.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-06-01ipv6: fix IPV6_ADDRFORM operation logicHangbin Liu
Socket option IPV6_ADDRFORM supports UDP/UDPLITE and TCP at present. Previously the checking logic looks like: if (sk->sk_protocol == IPPROTO_UDP || sk->sk_protocol == IPPROTO_UDPLITE) do_some_check; else if (sk->sk_protocol != IPPROTO_TCP) break; After commit b6f6118901d1 ("ipv6: restrict IPV6_ADDRFORM operation"), TCP was blocked as the logic changed to: if (sk->sk_protocol == IPPROTO_UDP || sk->sk_protocol == IPPROTO_UDPLITE) do_some_check; else if (sk->sk_protocol == IPPROTO_TCP) do_some_check; break; else break; Then after commit 82c9ae440857 ("ipv6: fix restrict IPV6_ADDRFORM operation") UDP/UDPLITE were blocked as the logic changed to: if (sk->sk_protocol == IPPROTO_UDP || sk->sk_protocol == IPPROTO_UDPLITE) do_some_check; if (sk->sk_protocol == IPPROTO_TCP) do_some_check; if (sk->sk_protocol != IPPROTO_TCP) break; Fix it by using Eric's code and simply remove the break in TCP check, which looks like: if (sk->sk_protocol == IPPROTO_UDP || sk->sk_protocol == IPPROTO_UDPLITE) do_some_check; else if (sk->sk_protocol == IPPROTO_TCP) do_some_check; else break; Fixes: 82c9ae440857 ("ipv6: fix restrict IPV6_ADDRFORM operation") Signed-off-by: Hangbin Liu <liuhangbin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-06-01Merge tag 'docs-5.8' of git://git.lwn.net/linuxLinus Torvalds
Pull documentation updates from Jonathan Corbet: "A fair amount of stuff this time around, dominated by yet another massive set from Mauro toward the completion of the RST conversion. I *really* hope we are getting close to the end of this. Meanwhile, those patches reach pretty far afield to update document references around the tree; there should be no actual code changes there. There will be, alas, more of the usual trivial merge conflicts. Beyond that we have more translations, improvements to the sphinx scripting, a number of additions to the sysctl documentation, and lots of fixes" * tag 'docs-5.8' of git://git.lwn.net/linux: (130 commits) Documentation: fixes to the maintainer-entry-profile template zswap: docs/vm: Fix typo accept_threshold_percent in zswap.rst tracing: Fix events.rst section numbering docs: acpi: fix old http link and improve document format docs: filesystems: add info about efivars content Documentation: LSM: Correct the basic LSM description mailmap: change email for Ricardo Ribalda docs: sysctl/kernel: document unaligned controls Documentation: admin-guide: update bug-hunting.rst docs: sysctl/kernel: document ngroups_max nvdimm: fixes to maintainter-entry-profile Documentation/features: Correct RISC-V kprobes support entry Documentation/features: Refresh the arch support status files Revert "docs: sysctl/kernel: document ngroups_max" docs: move locking-specific documents to locking/ docs: move digsig docs to the security book docs: move the kref doc into the core-api book docs: add IRQ documentation at the core-api book docs: debugging-via-ohci1394.txt: add it to the core-api book docs: fix references for ipmi.rst file ...
2020-06-01Merge tag 'for-linus' of git://git.armlinux.org.uk/~rmk/linux-armLinus Torvalds
Pull ARM updates from Russell King: - remove a now unnecessary usage of the KERNEL_DS for sys_oabi_epoll_ctl() - update my email address in a number of drivers - decompressor EFI updates from Ard Biesheuvel - module unwind section handling updates - sparsemem Kconfig cleanups - make act_mm macro respect THREAD_SIZE * tag 'for-linus' of git://git.armlinux.org.uk/~rmk/linux-arm: ARM: 8980/1: Allow either FLATMEM or SPARSEMEM on the multiplatform build ARM: 8979/1: Remove redundant ARCH_SPARSEMEM_DEFAULT setting ARM: 8978/1: mm: make act_mm() respect THREAD_SIZE ARM: decompressor: run decompressor in place if loaded via UEFI ARM: decompressor: move GOT into .data for EFI enabled builds ARM: decompressor: defer loading of the contents of the LC0 structure ARM: decompressor: split off _edata and stack base into separate object ARM: decompressor: move headroom variable out of LC0 ARM: 8976/1: module: allow arch overrides for .init section names ARM: 8975/1: module: fix handling of unwind init sections ARM: 8974/1: use SPARSMEM_STATIC when SPARSEMEM is enabled ARM: 8971/1: replace the sole use of a symbol with its definition ARM: 8969/1: decompressor: simplify libfdt builds Update rmk's email address in various drivers ARM: compat: remove KERNEL_DS usage in sys_oabi_epoll_ctl()
2020-06-01tipc: Fix NULL pointer dereference in __tipc_sendstream()YueHaibing
tipc_sendstream() may send zero length packet, then tipc_msg_append() do not alloc skb, skb_peek_tail() will get NULL, msg_set_ack_required will trigger NULL pointer dereference. Reported-by: syzbot+8eac6d030e7807c21d32@syzkaller.appspotmail.com Fixes: 0a3e060f340d ("tipc: add test for Nagle algorithm effectiveness") Signed-off-by: YueHaibing <yuehaibing@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-06-01Merge branch 'Link-based-attach-to-netns'Alexei Starovoitov
Jakub Sitnicki says: ==================== One of the pieces of feedback from recent review of BPF hooks for socket lookup [0] was that new program types should use bpf_link-based attachment. This series introduces new bpf_link type for attaching to network namespace. All link operations are supported. Errors returned from ops follow cgroup example. Patch 4 description goes into error semantics. The major change in v2 is a switch away from RCU to mutex-only synchronization. Andrii pointed out that it is not needed, and it makes sense to keep locking straightforward. Also, there were a couple of bugs in update_prog and fill_info initial implementation, one picked up by kbuild. Those are now fixed. Tests have been extended to cover them. Full changelog below. Series is organized as so: Patches 1-3 prepare a space in struct net to keep state for attached BPF programs, and massage the code in flow_dissector to make it attach type agnostic, to finally move it under kernel/bpf/. Patch 4, the most important one, introduces new bpf_link link type for attaching to network namespace. Patch 5 unifies the update error (ENOLINK) between BPF cgroup and netns. Patches 6-8 make libbpf and bpftool aware of the new link type. Patches 9-12 Add and extend tests to check that link low- and high-level API for operating on links to netns works as intended. Thanks to Alexei, Andrii, Lorenz, Marek, and Stanislav for feedback. -jkbs [0] https://lore.kernel.org/bpf/20200511185218.1422406-1-jakub@cloudflare.com/ Cc: Alexei Starovoitov <alexei.starovoitov@gmail.com> Cc: Andrii Nakryiko <andrii.nakryiko@gmail.com> Cc: Lorenz Bauer <lmb@cloudflare.com> Cc: Marek Majkowski <marek@cloudflare.com> Cc: Stanislav Fomichev <sdf@google.com> v1 -> v2: - Switch to mutex-only synchronization. Don't rely on RCU grace period guarantee when accessing struct net from link release / update / fill_info, and when accessing bpf_link from pernet pre_exit callback. (Andrii) - Drop patch 1, no longer needed with mutex-only synchronization. - Don't leak uninitialized variable contents from fill_info callback when link is in defunct state. (kbuild) - Make fill_info treat the link as defunct (i.e. no attached netns) when struct net refcount is 0, but link has not been yet auto-detached. - Add missing BPF_LINK_TYPE define in bpf_types.h for new link type. - Fix link update_prog callback to update the prog that will run, and not just the link itself. - Return EEXIST on prog attach when link already exists, and on link create when prog is already attached directly. (Andrii) - Return EINVAL on prog detach when link is attached. (Andrii) - Fold __netns_bpf_link_attach into its only caller. (Stanislav) - Get rid of a wrapper around container_of() (Andrii) - Use rcu_dereference_protected instead of rcu_access_pointer on update-side. (Stanislav) - Make return-on-success from netns_bpf_link_create less confusing. (Andrii) - Adapt bpf_link for cgroup to return ENOLINK when updating a defunct link. (Andrii, Alexei) - Order new exported symbols in libbpf.map alphabetically (Andrii) - Keep libbpf's "failed to attach link" warning message clear as to what we failed to attach to (cgroup vs netns). (Andrii) - Extract helpers for printing link attach type. (bpftool, Andrii) - Switch flow_dissector tests to BPF skeleton and extend them to exercise link-based flow dissector attachment. (Andrii) - Harden flow dissector attachment tests with prog query checks after prog attach/detach, or link create/update/close. - Extend flow dissector tests to cover fill_info for defunct links. - Rebase onto recent bpf-next ==================== Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2020-06-01selftests/bpf: Extend test_flow_dissector to cover link creationJakub Sitnicki
Extend the existing flow_dissector test case to run tests once using direct prog attachments, and then for the second time using indirect attachment via link. The intention is to exercises the newly added high-level API for attaching programs to network namespace with links (bpf_program__attach_netns). Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20200531082846.2117903-13-jakub@cloudflare.com
2020-06-01selftests/bpf: Convert test_flow_dissector to use BPF skeletonJakub Sitnicki
Switch flow dissector test setup from custom BPF object loader to BPF skeleton to save boilerplate and prepare for testing higher-level API for attaching flow dissector with bpf_link. To avoid depending on program order in the BPF object when populating the flow dissector PROG_ARRAY map, change the program section names to contain the program index into the map. This follows the example set by tailcall tests. Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20200531082846.2117903-12-jakub@cloudflare.com
2020-06-01selftests/bpf, flow_dissector: Close TAP device FD after the testJakub Sitnicki
test_flow_dissector leaves a TAP device after it's finished, potentially interfering with other tests that will run after it. Fix it by closing the TAP descriptor on cleanup. Fixes: 0905beec9f52 ("selftests/bpf: run flow dissector tests in skb-less mode") Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20200531082846.2117903-11-jakub@cloudflare.com
2020-06-01selftests/bpf: Add tests for attaching bpf_link to netnsJakub Sitnicki
Extend the existing test case for flow dissector attaching to cover: - link creation, - link updates, - link info querying, - mixing links with direct prog attachment. Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20200531082846.2117903-10-jakub@cloudflare.com
2020-06-01bpftool: Support link show for netns-attached linksJakub Sitnicki
Make `bpf link show` aware of new link type, that is links attached to netns. When listing netns-attached links, display netns inode number as its identifier and link attach type. Sample session: # readlink /proc/self/ns/net net:[4026532251] # bpftool prog show 357: flow_dissector tag a04f5eef06a7f555 gpl loaded_at 2020-05-30T16:53:51+0200 uid 0 xlated 16B jited 37B memlock 4096B 358: flow_dissector tag a04f5eef06a7f555 gpl loaded_at 2020-05-30T16:53:51+0200 uid 0 xlated 16B jited 37B memlock 4096B # bpftool link show 108: netns prog 357 netns_ino 4026532251 attach_type flow_dissector # bpftool link -jp show [{ "id": 108, "type": "netns", "prog_id": 357, "netns_ino": 4026532251, "attach_type": "flow_dissector" } ] (... after netns is gone ...) # bpftool link show 108: netns prog 357 netns_ino 0 attach_type flow_dissector # bpftool link -jp show [{ "id": 108, "type": "netns", "prog_id": 357, "netns_ino": 0, "attach_type": "flow_dissector" } ] Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20200531082846.2117903-9-jakub@cloudflare.com
2020-06-01bpftool: Extract helpers for showing link attach typeJakub Sitnicki
Code for printing link attach_type is duplicated in a couple of places, and likely will be duplicated for future link types as well. Create helpers to prevent duplication. Suggested-by: Andrii Nakryiko <andriin@fb.com> Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20200531082846.2117903-8-jakub@cloudflare.com
2020-06-01libbpf: Add support for bpf_link-based netns attachmentJakub Sitnicki
Add bpf_program__attach_nets(), which uses LINK_CREATE subcommand to create an FD-based kernel bpf_link, for attach types tied to network namespace, that is BPF_FLOW_DISSECTOR for the moment. Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20200531082846.2117903-7-jakub@cloudflare.com
2020-06-01bpf, cgroup: Return ENOLINK for auto-detached links on updateJakub Sitnicki
Failure to update a bpf_link because it has been auto-detached by a dying cgroup currently results in EINVAL error, even though the arguments passed to bpf() syscall are not wrong. bpf_links attaching to netns in this case will return ENOLINK, which carries the message that the link is no longer attached to anything. Change cgroup bpf_links to do the same to keep the uAPI errors consistent. Fixes: 0c991ebc8c69 ("bpf: Implement bpf_prog replacement for an active bpf_cgroup_link") Suggested-by: Lorenz Bauer <lmb@cloudflare.com> Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20200531082846.2117903-6-jakub@cloudflare.com
2020-06-01bpf: Add link-based BPF program attachment to network namespaceJakub Sitnicki
Extend bpf() syscall subcommands that operate on bpf_link, that is LINK_CREATE, LINK_UPDATE, OBJ_GET_INFO, to accept attach types tied to network namespaces (only flow dissector at the moment). Link-based and prog-based attachment can be used interchangeably, but only one can exist at a time. Attempts to attach a link when a prog is already attached directly, and the other way around, will be met with -EEXIST. Attempts to detach a program when link exists result in -EINVAL. Attachment of multiple links of same attach type to one netns is not supported with the intention to lift the restriction when a use-case presents itself. Because of that link create returns -E2BIG when trying to create another netns link, when one already exists. Link-based attachments to netns don't keep a netns alive by holding a ref to it. Instead links get auto-detached from netns when the latter is being destroyed, using a pernet pre_exit callback. When auto-detached, link lives in defunct state as long there are open FDs for it. -ENOLINK is returned if a user tries to update a defunct link. Because bpf_link to netns doesn't hold a ref to struct net, special care is taken when releasing, updating, or filling link info. The netns might be getting torn down when any of these link operations are in progress. That is why auto-detach and update/release/fill_info are synchronized by the same mutex. Also, link ops have to always check if auto-detach has not happened yet and if netns is still alive (refcnt > 0). Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20200531082846.2117903-5-jakub@cloudflare.com
2020-06-01flow_dissector: Move out netns_bpf prog callbacksJakub Sitnicki
Move functions to manage BPF programs attached to netns that are not specific to flow dissector to a dedicated module named bpf/net_namespace.c. The set of functions will grow with the addition of bpf_link support for netns attached programs. This patch prepares ground by creating a place for it. This is a code move with no functional changes intended. Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20200531082846.2117903-4-jakub@cloudflare.com
2020-06-01net: Introduce netns_bpf for BPF programs attached to netnsJakub Sitnicki
In order to: (1) attach more than one BPF program type to netns, or (2) support attaching BPF programs to netns with bpf_link, or (3) support multi-prog attach points for netns we will need to keep more state per netns than a single pointer like we have now for BPF flow dissector program. Prepare for the above by extracting netns_bpf that is part of struct net, for storing all state related to BPF programs attached to netns. Turn flow dissector callbacks for querying/attaching/detaching a program into generic ones that operate on netns_bpf. Next patch will move the generic callbacks into their own module. This is similar to how it is organized for cgroup with cgroup_bpf. Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Cc: Stanislav Fomichev <sdf@google.com> Link: https://lore.kernel.org/bpf/20200531082846.2117903-3-jakub@cloudflare.com
2020-06-01flow_dissector: Pull locking up from prog attach callbackJakub Sitnicki
Split out the part of attach callback that happens with attach/detach lock acquired. This structures the prog attach callback in a way that opens up doors for moving the locking out of flow_dissector and into generic callbacks for attaching/detaching progs to netns in subsequent patches. Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Reviewed-by: Stanislav Fomichev <sdf@google.com> Link: https://lore.kernel.org/bpf/20200531082846.2117903-2-jakub@cloudflare.com