git.armlinux.org.uk/linux.git - Linus' kernel tree

Age	Commit message (Collapse)	Author
2020-10-12	thermal: ti-soc-thermal: Enable addition power management	Adam Ford
	The bandgap sensor can be idled when the processor is too, but it isn't currently being done, so the power consumption of OMAP3 boards can elevated if the bangap sensor is enabled. This patch attempts to use some additional power management to idle the clock to the bandgap when not needed. Signed-off-by: Adam Ford <aford173@gmail.com> Reported-by: kernel test robot <lkp@intel.com> Tested-by: Andreas Kemnade <andreas@kemnade.info> # GTA04 Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org> Link: https://lore.kernel.org/r/20200911123157.759379-1-aford173@gmail.com
2020-10-12	thermal: sun8i: Add A100's THS controller support	Yangtao Li
	This patch add thermal sensor controller support for A100, which is similar to the previous ones. Signed-off-by: Yangtao Li <frank@allwinnertech.com> Acked-by: Maxime Ripard <mripard@kernel.org> Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org> Link: https://lore.kernel.org/r/48cc75920b5c69027134626157089d8b94942711.1595572867.git.frank@allwinnertech.com
2020-10-12	thermal: sun8i: add TEMP_CALIB_MASK for calibration data in ↵	Yangtao Li
	sun50i_h6_ths_calibrate For sun50i_h6_ths_calibrate(), the data read from nvmem needs a round of calculation. On the other hand, the newer SOC may store other data in the space other than 12bit sensor data. Add mask operation to read data to avoid conversion error. Signed-off-by: Yangtao Li <frank@allwinnertech.com> Acked-by: Maxime Ripard <mripard@kernel.org> Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org> Link: https://lore.kernel.org/r/dcf98648c16aff7649ff82438bfce6caae3e176f.1595572867.git.frank@allwinnertech.com
2020-10-12	dt-bindings: thermal: sun8i: Add binding for A100's THS controller	Yangtao Li
	Add a binding for A100's ths controller. Signed-off-by: Yangtao Li <frank@allwinnertech.com> Reviewed-by: Rob Herring <robh@kernel.org> Acked-by: Maxime Ripard <mripard@kernel.org> Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org> Link: https://lore.kernel.org/r/8280af8ad82ed340c0ef1c171684aaad91600679.1595572867.git.frank@allwinnertech.com
2020-10-12	thermal: cooling: Remove unused variable *tz	zhuguangqing
	1. devfreq_cooling.c: The variable tz is not used in devfreq_cooling_get_requested_power(), devfreq_cooling_state2power() and devfreq_cooling_power2state(). 2. cpufreq_cooling.c: After 84fe2cab48590, the variable tz is not used anymore in cpufreq_get_requested_power(), cpufreq_state2power() and cpufreq_power2state(). Remove the variable *tz. Signed-off-by: zhuguangqing <zhuguangqing@xiaomi.com> Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org> Link: https://lore.kernel.org/r/20200914071101.13575-1-zhuguangqing83@gmail.com
2020-10-12	thermal: int340x: Add keep alive response method	Srinivas Pandruvada
	When firmware requests keep alive response, send an event to user space to confirm by using imok sysfs entry. Create a new sysf entry called "imok". User space can write an integer, which results in execution of IMOK ACPI method of INT3400 thermal zone device. This results in sending response to firmware request for keep alive. Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org> Link: https://lore.kernel.org/r/20200915223650.406046-4-srinivas.pandruvada@linux.intel.com
2020-10-12	thermal: core: Add new event for sending keep alive notifications	Srinivas Pandruvada
	This event is sent by the platform firmware to confirm that user space thermal solution is alive. The response to this event from the user space thermal solution is platform specific. Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org> Link: https://lore.kernel.org/r/20200915223650.406046-3-srinivas.pandruvada@linux.intel.com
2020-10-12	thermal: int340x: Provide notification for OEM variable change	Srinivas Pandruvada
	When we receive ACPI notification for OEM variable change pass the notification to user space handler. This will avoid polling for OEM variable change from user space. Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org> Link: https://lore.kernel.org/r/20200915223650.406046-2-srinivas.pandruvada@linux.intel.com
2020-10-12	thermal: core: remove unnecessary mutex_init()	Qinglang Miao
	The mutex poweroff_lock is initialized statically. It is unnecessary to initialize by mutex_init(). Signed-off-by: Qinglang Miao <miaoqinglang@huawei.com> Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org> Link: https://lore.kernel.org/r/20200916062139.191233-1-miaoqinglang@huawei.com
2020-10-12	thermal/idle_inject: Fix comment of idle_duration_us and name of latency_ns	zhuguangqing
	The comment of idle_duration_us and the name of latency_ns can be misleading, so fix them. Signed-off-by: zhuguangqing <zhuguangqing@xiaomi.com> Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org> Link: https://lore.kernel.org/r/20200917073553.898-1-zhuguangqing83@gmail.com
2020-10-12	thermal: Kconfig: Update description for RCAR_GEN3_THERMAL config	Lad Prabhakar
	The rcar_gen3_thermal driver also supports RZ/G2 SoC's, update the description to reflect this. Signed-off-by: Lad Prabhakar <prabhakar.mahadev-lad.rj@bp.renesas.com> Reviewed-by: Chris Paterson <Chris.Paterson2@renesas.com> Reviewed-by: Niklas Söderlund <niklas.soderlund+renesas@ragnatech.se> Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org> Link: https://lore.kernel.org/r/20200917152141.30070-1-prabhakar.mahadev-lad.rj@bp.renesas.com
2020-10-12	thermal: stm32: simplify the return expression of stm_thermal_prepare()	Qinglang Miao
	Simplify the return expression. Signed-off-by: Qinglang Miao <miaoqinglang@huawei.com> Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org> Link: https://lore.kernel.org/r/20200921131107.93273-1-miaoqinglang@huawei.com
2020-10-12	dt-bindings: thermal: rcar-gen3-thermal: Add r8a774e1 support	Lad Prabhakar
	Document RZ/G2H (R8A774E1) SoC bindings. Signed-off-by: Lad Prabhakar <prabhakar.mahadev-lad.rj@bp.renesas.com> Reviewed-by: Geert Uytterhoeven <geert+renesas@glider.be> Reviewed-by: Niklas Söderlund <niklas.soderlund+renesas@ragnatech.se> Acked-by: Rob Herring <robh@kernel.org> Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org> Link: https://lore.kernel.org/r/1594811350-14066-3-git-send-email-prabhakar.mahadev-lad.rj@bp.renesas.com
2020-10-12	thermal: rcar_thermal: Add missing braces to conditional statement	Geert Uytterhoeven
	According to Documentation/process/coding-style.rst, if one branch of a conditional statement needs braces, both branches should use braces. Fixes: bbcf90c0646ac797 ("thermal: Explicitly enable non-changing thermal zone devices") Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be> Reviewed-by: Niklas Söderlund <niklas.soderlund+renesas@ragnatech.se> Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org> Link: https://lore.kernel.org/r/20200819092716.3191-1-geert+renesas@glider.be
2020-10-12	thermal: Use kobj_to_dev() instead of container_of()	Tian Tao
	Use kobj_to_dev() instead of container_of() Signed-off-by: Tian Tao <tiantao6@hisilicon.com> Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org> Link: https://lore.kernel.org/r/1597799671-11530-1-git-send-email-tiantao6@hisilicon.com
2020-10-12	thermal: imx8mm: Use dev_err_probe() to simplify error handling	Anson Huang
	dev_err_probe() can reduce code size, uniform error handling and record the defer probe reason etc., use it to simplify the code. Signed-off-by: Anson Huang <Anson.Huang@nxp.com> Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org> Link: https://lore.kernel.org/r/1597129185-8460-2-git-send-email-Anson.Huang@nxp.com
2020-10-12	thermal: imx: Use dev_err_probe() to simplify error handling	Anson Huang
	dev_err_probe() can reduce code size, uniform error handling and record the defer probe reason etc., use it to simplify the code. Signed-off-by: Anson Huang <Anson.Huang@nxp.com> Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org> Link: https://lore.kernel.org/r/1597129185-8460-1-git-send-email-Anson.Huang@nxp.com
2020-10-12	drivers: thermal: Kconfig: fix spelling mistake "acces" -> "access"	Colin Ian King
	There is a spelling mistake in the Kconfig text, fix it. Signed-off-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org> Link: https://lore.kernel.org/r/20200810082739.48007-1-colin.king@canonical.com
2020-10-12	MIPS: cpu-probe: remove MIPS_CPU_BP_GHIST option bit	Thomas Bogendoerfer
	MIPS_CPU_BP_GHIST is only set two times and more or less immediately used in cpu-probe.c itself. Remove this option to make room in options word. Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
2020-10-12	MIPS: cpu-probe: introduce exclusive R3k CPU probe	Thomas Bogendoerfer
	Running a kernel on a R3k of machine definitly will never see one of the newer CPU cores. And since R3k system usually are low on memory we could save quite some kbytes: text data bss dec hex filename 15070 88 32 15190 3b56 arch/mips/kernel/cpu-probe.o 844 4 16 864 360 arch/mips/kernel/cpu-r3k-probe.o Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
2020-10-12	MIPS: cpu-probe: move fpu probing/handling into its own file	Thomas Bogendoerfer
	cpu-probe.c has grown when supporting more and more CPUs and there are use cases where probing for all the CPUs isn't useful like running on a R3k system. But still the fpu handling is nearly the same. For sharing put the fpu code into it's own file. Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
2020-10-12	MIPS: replace add_memory_region with memblock	Thomas Bogendoerfer
	add_memory_region was the old interface for registering memory and was already changed to used memblock internaly. Replace it by directly calling memblock functions. Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
2020-10-12	MIPS: Loongson64: Clean up numa.c	Tiezhu Yang
	(1) Replace nid_to_addroffset() with nid_to_addrbase() and then remove the related useless code. (2) Since end_pfn = start_pfn + node_psize, use "node_psize" instead of "end_pfn - start_pfn" to avoid the redundant calculation. (3) After commit 6fbde6b492df ("MIPS: Loongson64: Move files to the top-level directory"), CONFIG_ZONE_DMA32 is always set for Loongson64 due to MACH_LOONGSON64 selects ZONE_DMA32, so no need to use ifdef any more, just remove it. Signed-off-by: Tiezhu Yang <yangtiezhu@loongson.cn> Reviewed-by: Jiaxun Yang <jiaxun.yang@flygoat.com> Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
2020-10-12	MIPS: Loongson64: Select SMP in Kconfig to avoid build error	Tiezhu Yang
	In the current code, CONFIG_SMP can be set as N by user on the Loongson platform, then there exists the following build error under !CONFIG_SMP: CC arch/mips/kernel/asm-offsets.s In file included from ./include/linux/gfp.h:9:0, from ./include/linux/xarray.h:14, from ./include/linux/radix-tree.h:18, from ./include/linux/fs.h:15, from ./include/linux/compat.h:17, from arch/mips/kernel/asm-offsets.c:12: ./include/linux/topology.h: In function 'numa_node_id': ./include/linux/topology.h:119:2: error: implicit declaration of function 'cpu_logical_map' [-Werror=implicit-function-declaration] return cpu_to_node(raw_smp_processor_id()); ^ cc1: some warnings being treated as errors scripts/Makefile.build:117: recipe for target 'arch/mips/kernel/asm-offsets.s' failed make[1]: *** [arch/mips/kernel/asm-offsets.s] Error 1 Select SMP in Kconfig to avoid the above build error and then remove CONFIG_SMP=y in loongson3_defconfig. Signed-off-by: Tiezhu Yang <yangtiezhu@loongson.cn> Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
2020-10-12	mips: octeon: Add Ubiquiti E200 and E220 boards	Mikhail Gusarov
	These boards are used in - Ubiquiti EdgeRouter (E200), - Ubiquiti EdgeRouter Pro (E200) and - Ubiquiti Security Gateway Pro 4 (E220). Signed-off-by: Mikhail Gusarov <dottedmag@dottedmag.net> Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
2020-10-12	Merge branch 'edac-drivers' into edac-updates-for-v5.10	Borislav Petkov
	Signed-off-by: Borislav Petkov <bp@suse.de>
2020-10-12	scripts: coccicheck: Change default condition for parallelism	Sumera Priyadarsini
	Currently, Coccinelle uses at most one thread per core by default in machines with more than 2 hyperthreads. However, for systems with only 4 hyperthreads, this does not improve performance. Modify coccicheck to use all available threads in machines with upto 4 hyperthreads. Signed-off-by: Sumera Priyadarsini <sylphrenadin@gmail.com> Signed-off-by: Julia Lawall <Julia.Lawall@inria.fr>
2020-10-12	scripts: coccicheck: Add quotes to improve portability	Sumera Priyadarsini
	While fetching the number of threads per core with lscpu, the [:digit:] set is used for translation of digits from 0-9. However, using [:digit:] instead of "[:digit:]" does not seem to work uniformly for some shell types and configurations (such as zsh). Therefore, modify coccicheck to use double quotes around the [:digit:] set for uniformity and better portability. Signed-off-by: Sumera Priyadarsini <sylphrenadin@gmail.com> Signed-off-by: Julia Lawall <Julia.Lawall@inria.fr>
2020-10-12	fuse: connection remove fix	Miklos Szeredi
	Re-add lost removal of fc from fuse_conn_list and the control filesystem. Reported-by: kernel test robot <rong.a.chen@intel.com> Fixes: fcee216beb9c ("fuse: split fuse_mount off of fuse_conn") Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2020-10-12	can: remove obsolete version strings	Oliver Hartkopp
	As pointed out by Jakub Kicinski here: http://lore.kernel.org/r/20201009175751.5c54097f@kicinski-fedora-pc1c0hjn.dhcp.thefacebook.com this patch removes the obsolete version information of the different CAN protocols and the AF_CAN core module. Signed-off-by: Oliver Hartkopp <socketcan@hartkopp.net> Link: https://lore.kernel.org/r/20201012074354.25839-2-socketcan@hartkopp.net Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2020-10-12	can: isotp: implement cleanups / improvements from review	Oliver Hartkopp
	As pointed out by Jakub Kicinski here: http://lore.kernel.org/r/20201009175751.5c54097f@kicinski-fedora-pc1c0hjn.dhcp.thefacebook.com this patch addresses the remarked issues: - remove empty line in comment - remove default=y for CAN_ISOTP in Kconfig - make use of pr_notice_once() - use GFP_ATOMIC instead of gfp_any() in soft hrtimer context The version strings in the CAN subsystem are removed by a separate patch. Signed-off-by: Oliver Hartkopp <socketcan@hartkopp.net> Link: https://lore.kernel.org/r/20201012074354.25839-1-socketcan@hartkopp.net Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2020-10-12	net: 9p: initialize sun_server.sun_path to have addr's value only when addr ↵	Anant Thazhemadam
	is valid In p9_fd_create_unix, checking is performed to see if the addr (passed as an argument) is NULL or not. However, no check is performed to see if addr is a valid address, i.e., it doesn't entirely consist of only 0's. The initialization of sun_server.sun_path to be equal to this faulty addr value leads to an uninitialized variable, as detected by KMSAN. Checking for this (faulty addr) and returning a negative error number appropriately, resolves this issue. Link: http://lkml.kernel.org/r/20201012042404.2508-1-anant.thazhemadam@gmail.com Reported-by: syzbot+75d51fe5bf4ebe988518@syzkaller.appspotmail.com Tested-by: syzbot+75d51fe5bf4ebe988518@syzkaller.appspotmail.com Signed-off-by: Anant Thazhemadam <anant.thazhemadam@gmail.com> Signed-off-by: Dominique Martinet <asmadeus@codewreck.org>
2020-10-12	ALSA: fireworks: use semicolons rather than commas to separate statements	Julia Lawall
	Replace commas with semicolons. What is done is essentially described by the following Coccinelle semantic patch (http://coccinelle.lip6.fr/): // <smpl> @@ expression e1,e2; @@ e1 -, +; e2 ... when any // </smpl> Signed-off-by: Julia Lawall <Julia.Lawall@inria.fr> Acked-by: Takashi Sakamoto <o-takashi@sakamocchi.jp> Link: https://lore.kernel.org/r/1602407979-29038-5-git-send-email-Julia.Lawall@inria.fr Signed-off-by: Takashi Iwai <tiwai@suse.de>
2020-10-12	ALSA: hda: use semicolons rather than commas to separate statements	Julia Lawall
	Replace commas with semicolons. What is done is essentially described by the following Coccinelle semantic patch (http://coccinelle.lip6.fr/): // <smpl> @@ expression e1,e2; @@ e1 -, +; e2 ... when any // </smpl> Signed-off-by: Julia Lawall <Julia.Lawall@inria.fr> Link: https://lore.kernel.org/r/1602407979-29038-3-git-send-email-Julia.Lawall@inria.fr Signed-off-by: Takashi Iwai <tiwai@suse.de>
2020-10-12	Merge branch 'for-next' into for-linus	Takashi Iwai

2020-10-11	cifs: compute full_path already in cifs_readdir()	Ronnie Sahlberg
	Cleanup patch for followon to cache additional information for the root directory when directory lease held. Signed-off-by: Ronnie Sahlberg <lsahlber@redhat.com> Signed-off-by: Steve French <stfrench@microsoft.com>
2020-10-11	cifs: return cached_fid from open_shroot	Ronnie Sahlberg
	Cleanup patch for followon to cache additional information for the root directory when directory lease held. Signed-off-by: Ronnie Sahlberg <lsahlber@redhat.com> Signed-off-by: Steve French <stfrench@microsoft.com>
2020-10-11	update structure definitions from updated protocol documentation	Steve French
	MS-SMB2 was updated recently to include new protocol definitions for updated compression payload header and new RDMA transform capabilities Update structure definitions in smb2pdu.h to match Signed-off-by: Steve French <stfrench@microsoft.com> Acked-by: Ronnie Sahlberg <lsahlber@redhat.com>
2020-10-11	smb3: add defines for new crypto algorithms	Steve French
	In encryption capabilities negotiate context can now request AES256 GCM or CCM Signed-off-by: Steve French <stfrench@microsoft.com> Acked-by: Ronnie Sahlberg <lsahlber@redhat.com>
2020-10-11	Convert trailing spaces and periods in path components	Boris Protopopov
	When converting trailing spaces and periods in paths, do so for every component of the path, not just the last component. If the conversion is not done for every path component, then subsequent operations in directories with trailing spaces or periods (e.g. create(), mkdir()) will fail with ENOENT. This is because on the server, the directory will have a special symbol in its name, and the client needs to provide the same. Signed-off-by: Boris Protopopov <pboris@amazon.com> Acked-by: Ronnie Sahlberg <lsahlber@redhat.com> Signed-off-by: Steve French <stfrench@microsoft.com>
2020-10-11	Merge branch 'bpf, sockmap: allow verdict only sk_skb progs'	Alexei Starovoitov
	John Fastabend says: ==================== This allows a sockmap sk_skb verdict programs to run without a parser. For some use cases, such as verdict program that support streaming data or a l3/l4 proxy that does not use data in packet, loading the nop parser 'return skb->len' is an extra unnecessary complexity. With this series we simply call the verdict program directly from data_ready instead of bouncing through the strparser logic. Patches 1,2 do the lifting on the sockmap side then patches 3,4 add the selftests. This applies on top of the series here, sockmap/sk_skb program memory acct fixes https://patchwork.ozlabs.org/project/netdev/list/?series=206975 it will apply without the above series cleanly, but will have an incorrect memory accounting causing a failure in ./test_sockmap. I could have left it so the series passed without above series, but it seemed odd to have it out there and then require yet another patch to fix it up here. Thanks. --- ==================== Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2020-10-11	bpf, selftests: Add three new sockmap tests for verdict only programs	John Fastabend
	Here we add three new tests for sockmap to test having a verdict program without setting the parser program. The first test covers the most simply case, sender proxy_recv proxy_send recv \| \| \| \| verdict -----+ \| \| \| \| \| +----------------+ +------------+ We load the verdict program on the proxy_recv socket without a parser program. It then does a redirect into the send path of the proxy_send socket using sendpage_locked(). Next we test the drop case to ensure if we kfree_skb as a result of the verdict program everything behaves as expected. Next we test the same configuration above, but with ktls and a redirect into socket ingress queue. Shown here tls tls sender proxy_recv proxy_send recv \| \| \| \| verdict ------------------+ \| \| redirect_ingress +----------------+ Also to set up ping/pong test Signed-off-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/160239302638.8495.17125996694402793471.stgit@john-Precision-5820-Tower
2020-10-11	bpf, selftests: Add option to test_sockmap to omit adding parser program	John Fastabend
	Add option to allow running without a parser program in place. To test with ping/pong program use, # test_sockmap -t ping --txmsg_omit_skb_parser this will send packets between two socket bouncing through a proxy socket that does not use a parser program. (ping) (pong) sender proxy_recv proxy_send recv \| \| \| \| verdict -----+ \| \| \| \| \| +----------------+ +------------+ Signed-off-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/160239300387.8495.11908295143121563076.stgit@john-Precision-5820-Tower
2020-10-11	bpf, sockmap: Allow skipping sk_skb parser program	John Fastabend
	Currently, we often run with a nop parser namely one that just does this, 'return skb->len'. This happens when either our verdict program can handle streaming data or it is only looking at socket data such as IP addresses and other metadata associated with the flow. The second case is common for a L3/L4 proxy for instance. So lets allow loading programs without the parser then we can skip the stream parser logic and avoid having to add a BPF program that is effectively a nop. Signed-off-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/160239297866.8495.13345662302749219672.stgit@john-Precision-5820-Tower
2020-10-11	bpf, sockmap: Check skb_verdict and skb_parser programs explicitly	John Fastabend
	We are about to allow skb_verdict to run without skb_parser programs as a first step change code to check each program type specifically. This should be a mechanical change without any impact to actual result. Signed-off-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/160239294756.8495.5796595770890272219.stgit@john-Precision-5820-Tower
2020-10-11	Merge branch 'sockmap/sk_skb program memory acct fixes'	Alexei Starovoitov
	John Fastabend says: ==================== Users of sockmap and skmsg trying to build proxys and other tools have pointed out to me the error handling can be problematic. If the proxy is under-provisioned and/or the BPF admin does not have the ability to update/modify memory provisions on the sockets its possible data may be dropped. For some things we have retries so everything works out OK, but for most things this is likely not great. And things go bad. The original design dropped memory accounting on the receive socket as early as possible. We did this early in sk_skb handling and then charged it to the redirect socket immediately after running the BPF program. But, this design caused a fundamental problem. Namely, what should we do if we redirect to a socket that has already reached its socket memory limits. For proxy use cases the network admin can tune memory limits. But, in general we punted on this problem and told folks to simply make your memory limits high enough to handle your workload. This is not a really good answer. When deploying into environments where we expect this to be transparent its no longer the case because we need to tune params. In fact its really only viable in cases where we have fine grained control over the application. For example a proxy redirecting from an ingress socket to an egress socket. The result is I get bug reports because its surprising for one, but more importantly also breaks some use cases. So lets fix it. This series cleans up the different cases so that in many common modes, such as passing packet up to receive socket, we can simply use the underlying assumption that the TCP stack already has done memory accounting. Next instead of trying to do memory accounting against the socket we plan to redirect into we keep memory accounting on the receive socket until the skb can be put on the redirect socket. This means if we do an egress redirect to a socket and sock_writable() returns EAGAIN we can requeue the skb on the workqueue and try again. The same scenario plays out for ingress. If the skb can not be put on the receive queue of the redirect socket than we simply requeue and retry. In both cases memory is still accounted for against the receiving socket. This also handles head of line blocking. With the above scheme the skb is on a queue associated with the socket it will be sent/recv'd on, but the memory accounting is against the received socket. This means the receive socket can advance to the next skb and avoid head of line blocking. At least until its receive memory on the socket runs out. This will put some maximum size on the amount of data any socket can enqueue giving us bounds on the skb lists so they can't grow indefinitely. Overall I think this is a win. Tested with test_sockmap. These are fixes, but I tagged it for bpf-next considering we are at -rc8. v1->v2: Fix uninitialized/unused variables (kernel test robot) v2->v3: fix typo in patch2 err=0 needs to be <0 so use err=-EIO --- ==================== Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2020-10-11	bpf, sockmap: Add memory accounting so skbs on ingress lists are visible	John Fastabend
	Move skb->sk assignment out of sk_psock_bpf_run() and into individual callers. Then we can use proper skb_set_owner_r() call to assign a sk to a skb. This improves things by also charging the truesize against the sockets sk_rmem_alloc counter. With this done we get some accounting in place to ensure the memory associated with skbs on the workqueue are still being accounted for somewhere. Finally, by using skb_set_owner_r the destructor is setup so we can just let the normal skb_kfree logic recover the memory. Combined with previous patch dropping skb_orphan() we now can recover from memory pressure and maintain accounting. Note, we will charge the skbs against their originating socket even if being redirected into another socket. Once the skb completes the redirect op the kfree_skb will give the memory back. This is important because if we charged the socket we are redirecting to (like it was done before this series) the sock_writeable() test could fail because of the skb trying to be sent is already charged against the socket. Also TLS case is special. Here we wait until we have decided not to simply PASS the packet up the stack. In the case where we PASS the packet up the stack we already have an skb which is accounted for on the TLS socket context. For the parser case we continue to just set/clear skb->sk this is because the skb being used here may be combined with other skbs or turned into multiple skbs depending on the parser logic. For example the parser could request a payload length greater than skb->len so that the strparser needs to collect multiple skbs. At any rate the final result will be handled in the strparser recv callback. Fixes: 604326b41a6fb ("bpf, sockmap: convert to generic sk_msg interface") Signed-off-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/160226867513.5692.10579573214635925960.stgit@john-Precision-5820-Tower
2020-10-11	bpf, sockmap: Remove skb_orphan and let normal skb_kfree do cleanup	John Fastabend
	Calling skb_orphan() is unnecessary in the strp rcv handler because the skb is from a skb_clone() in __strp_recv. So it never has a destructor or a sk assigned. Plus its confusing to read because it might hint to the reader that the skb could have an sk assigned which is not true. Even if we did have an sk assigned it would be cleaner to simply wait for the upcoming kfree_skb(). Additionally, move the comment about strparser clone up so its closer to the logic it is describing and add to it so that it is more complete. Fixes: 604326b41a6fb ("bpf, sockmap: convert to generic sk_msg interface") Signed-off-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/160226865548.5692.9098315689984599579.stgit@john-Precision-5820-Tower
2020-10-11	bpf, sockmap: Remove dropped data on errors in redirect case	John Fastabend
	In the sk_skb redirect case we didn't handle the case where we overrun the sk_rmem_alloc entry on ingress redirect or sk_wmem_alloc on egress. Because we didn't have anything implemented we simply dropped the skb. This meant data could be dropped if socket memory accounting was in place. This fixes the above dropped data case by moving the memory checks later in the code where we actually do the send or recv. This pushes those checks into the workqueue and allows us to return an EAGAIN error which in turn allows us to try again later from the workqueue. Fixes: 51199405f9672 ("bpf: skb_verdict, support SK_PASS on RX BPF path") Signed-off-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/160226863689.5692.13861422742592309285.stgit@john-Precision-5820-Tower
2020-10-11	bpf, sockmap: Remove skb_set_owner_w wmem will be taken later from sendpage	John Fastabend
	The skb_set_owner_w is unnecessary here. The sendpage call will create a fresh skb and set the owner correctly from workqueue. Its also not entirely harmless because it consumes cycles, but also impacts resource accounting by increasing sk_wmem_alloc. This is charging the socket we are going to send to for the skb, but we will put it on the workqueue for some time before this happens so we are artifically inflating sk_wmem_alloc for this period. Further, we don't know how many skbs will be used to send the packet or how it will be broken up when sent over the new socket so charging it with one big sum is also not correct when the workqueue may break it up if facing memory pressure. Seeing we don't know how/when this is going to be sent drop the early accounting. A later patch will do proper accounting charged on receive socket for the case where skbs get enqueued on the workqueue. Fixes: 604326b41a6fb ("bpf, sockmap: convert to generic sk_msg interface") Signed-off-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/160226861708.5692.17964237936462425136.stgit@john-Precision-5820-Tower