summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2020-06-29m68k: mm: fix node memblock initAngelo Dureghello
After pulling 5.7.0 (linux-next merge), mcf5441x mmu boot was hanging silently. memblock_add() seems not appropriate, since using MAX_NUMNODES as node id, while memblock_add_node() sets up memory for node id 0. Signed-off-by: Angelo Dureghello <angelo.dureghello@timesys.com> Signed-off-by: Mike Rapoport <rppt@linux.ibm.com> Signed-off-by: Greg Ungerer <gerg@linux-m68k.org>
2020-06-29m68k: nommu: register start of the memory with memblockMike Rapoport
The m68k nommu setup code didn't register the beginning of the physical memory with memblock because it was anyway occupied by the kernel. However, commit fa3354e4ea39 ("mm: free_area_init: use maximal zone PFNs rather than zone sizes") changed zones initialization to use memblock.memory to detect the zone extents and this caused inconsistency between zone PFNs and the actual PFNs: BUG: Bad page state in process swapper pfn:20165 page:41fe0ca0 refcount:0 mapcount:1 mapping:00000000 index:0x0 flags: 0x0() raw: 00000000 00000100 00000122 00000000 00000000 00000000 00000000 00000000 page dumped because: nonzero mapcount CPU: 0 PID: 1 Comm: swapper Not tainted 5.8.0-rc1-00001-g3a38f8a60c65-dirty #1 Stack from 404c9ebc: 404c9ebc 4029ab28 4029ab28 40088470 41fe0ca0 40299e21 40299df1 404ba2a4 00020165 00000000 41fd2c10 402c7ba0 41fd2c04 40088504 41fe0ca0 40299e21 00000000 40088a12 41fe0ca0 41fe0ca4 0000020a 00000000 00000001 402ca000 00000000 41fe0ca0 41fd2c10 41fd2c10 00000000 00000000 402b2388 00000001 400a0934 40091056 404c9f44 404c9f44 40088db4 402c7ba0 00000001 41fd2c04 41fe0ca0 41fd2000 41fe0ca0 40089e02 4026ecf4 40089e4e 41fe0ca0 ffffffff Call Trace: [<40088470>] 0x40088470 [<40088504>] 0x40088504 [<40088a12>] 0x40088a12 [<402ca000>] 0x402ca000 [<400a0934>] 0x400a0934 Adjust the memory registration with memblock to include the beginning of the physical memory and make sure that the area occupied by the kernel is marked as reserved. Signed-off-by: Mike Rapoport <rppt@linux.ibm.com> Signed-off-by: Greg Ungerer <gerg@linux-m68k.org>
2020-06-29blk-mq-debugfs: update blk_queue_flag_name[] accordingly for new flagsHou Tao
Else there may be magic numbers in /sys/kernel/debug/block/*/state. Signed-off-by: Hou Tao <houtao1@huawei.com> Reviewed-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-06-29ACPI: DPTF: Add battery participant for TigerLakeSrinivas Pandruvada
Add DPTF battery participant ACPI ID for platforms based on the Intel TigerLake SoC. Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> [ rjw: Changelog ] Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2020-06-29Drivers: hv: Change flag to write log level in panic msg to falseJoseph Salisbury
When the kernel panics, one page of kmsg data may be collected and sent to Hyper-V to aid in diagnosing the failure. The collected kmsg data typically contains 50 to 100 lines, each of which has a log level prefix that isn't very useful from a diagnostic standpoint. So tell kmsg_dump_get_buffer() to not include the log level, enabling more information that *is* useful to fit in the page. Requesting in stable kernels, since many kernels running in production are stable releases. Cc: stable@vger.kernel.org Signed-off-by: Joseph Salisbury <joseph.salisbury@microsoft.com> Reviewed-by: Michael Kelley <mikelley@microsoft.com> Link: https://lore.kernel.org/r/1593210497-114310-1-git-send-email-joseph.salisbury@microsoft.com Signed-off-by: Wei Liu <wei.liu@kernel.org>
2020-06-29thermal/drivers/rcar_gen3: Fix undefined temperature if negativeDien Pham
As description for DIV_ROUND_CLOSEST in file include/linux/kernel.h. "Result is undefined for negative divisors if the dividend variable type is unsigned and for negative dividends if the divisor variable type is unsigned." In current code, the FIXPT_DIV uses DIV_ROUND_CLOSEST but has not checked sign of divisor before using. It makes undefined temperature value in case the value is negative. This patch fixes to satisfy DIV_ROUND_CLOSEST description and fix bug too. Note that the variable name "reg" is not good because it should be the same type as rcar_gen3_thermal_read(). However, it's better to rename the "reg" in a further patch as cleanup. Signed-off-by: Van Do <van.do.xw@renesas.com> Signed-off-by: Dien Pham <dien.pham.ry@renesas.com> [shimoda: minor fixes, add Fixes tag] Fixes: 564e73d283af ("thermal: rcar_gen3_thermal: Add R-Car Gen3 thermal driver") Signed-off-by: Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com> Reviewed-by: Niklas Soderlund <niklas.soderlund+renesas@ragnatech.se> Tested-by: Niklas Soderlund <niklas.soderlund+renesas@ragnatech.se> Reviewed-by: Amit Kucheria <amit.kucheria@linaro.org> Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org> Link: https://lore.kernel.org/r/1593085099-2057-1-git-send-email-yoshihiro.shimoda.uh@renesas.com
2020-06-29thermal/drivers/cpufreq_cooling: Fix wrong frequency converted from powerFinley Xiao
The function cpu_power_to_freq is used to find a frequency and set the cooling device to consume at most the power to be converted. For example, if the power to be converted is 80mW, and the em table is as follow. struct em_cap_state table[] = { /* KHz mW */ { 1008000, 36, 0 }, { 1200000, 49, 0 }, { 1296000, 59, 0 }, { 1416000, 72, 0 }, { 1512000, 86, 0 }, }; The target frequency should be 1416000KHz, not 1512000KHz. Fixes: 349d39dc5739 ("thermal: cpu_cooling: merge frequency and power tables") Cc: <stable@vger.kernel.org> # v4.13+ Signed-off-by: Finley Xiao <finley.xiao@rock-chips.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org> Reviewed-by: Amit Kucheria <amit.kucheria@linaro.org> Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org> Link: https://lore.kernel.org/r/20200619090825.32747-1-finley.xiao@rock-chips.com
2020-06-29thermal/drivers/tsens: Fix compilation warnings by making functions staticAmit Kucheria
After merging tsens-common.c into tsens.c, we can now mark some functions static so they don't need any prototype declarations. This fixes the following issue reported by lkp. >> drivers/thermal/qcom/tsens.c:385:13: warning: no previous prototype for 'tsens_critical_irq_thread' [-Wmissing-prototypes] 385 | irqreturn_t tsens_critical_irq_thread(int irq, void *data) | ^~~~~~~~~~~~~~~~~~~~~~~~~ >> drivers/thermal/qcom/tsens.c:455:13: warning: no previous prototype for 'tsens_irq_thread' [-Wmissing-prototypes] 455 | irqreturn_t tsens_irq_thread(int irq, void *data) | ^~~~~~~~~~~~~~~~ >> drivers/thermal/qcom/tsens.c:523:5: warning: no previous prototype for 'tsens_set_trips' [-Wmissing-prototypes] 523 | int tsens_set_trips(void *_sensor, int low, int high) | ^~~~~~~~~~~~~~~ >> drivers/thermal/qcom/tsens.c:560:5: warning: no previous prototype for 'tsens_enable_irq' [-Wmissing-prototypes] 560 | int tsens_enable_irq(struct tsens_priv *priv) | ^~~~~~~~~~~~~~~~ >> drivers/thermal/qcom/tsens.c:573:6: warning: no previous prototype for 'tsens_disable_irq' [-Wmissing-prototypes] 573 | void tsens_disable_irq(struct tsens_priv *priv) | ^~~~~~~~~~~~~~~~~ Signed-off-by: Amit Kucheria <amit.kucheria@linaro.org> Reported-by: kbuild test robot <lkp@intel.com> Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org> Link: https://lore.kernel.org/r/6757a26876b29922929abf64b1c11fa3b3033d03.1590579709.git.amit.kucheria@linaro.org
2020-06-29thermal/drivers/sprd: Fix return value of sprd_thm_probe()Tiezhu Yang
When call function devm_platform_ioremap_resource(), we should use IS_ERR() to check the return value and return PTR_ERR() if failed. Fixes: 554fdbaf19b1 ("thermal: sprd: Add Spreadtrum thermal driver support") Signed-off-by: Tiezhu Yang <yangtiezhu@loongson.cn> Reviewed-by: Baolin Wang <baolin.wang7@gmail.com> Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org> Link: https://lore.kernel.org/r/1590371941-25430-1-git-send-email-yangtiezhu@loongson.cn
2020-06-29thermal/drivers/mediatek: Fix bank number settings on mt8183Michael Kao
MT8183_NUM_ZONES should be set to 1 because MT8183 doesn't have multiple banks. Fixes: a4ffe6b52d27 ("thermal: mediatek: add support for MT8183") Signed-off-by: Michael Kao <michael.kao@mediatek.com> Signed-off-by: Hsin-Yi Wang <hsinyi@chromium.org> Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org> Link: https://lore.kernel.org/r/20200323121537.22697-6-michael.kao@mediatek.com
2020-06-29thermal/drivers: imx: Fix missing of_node_put() at probe timeAnson Huang
After finishing using cpu node got from of_get_cpu_node(), of_node_put() needs to be called. Signed-off-by: Anson Huang <Anson.Huang@nxp.com> Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org> Link: https://lore.kernel.org/r/1585232945-23368-1-git-send-email-Anson.Huang@nxp.com
2020-06-29drm/i915: Include asm sources for {ivb, hsw}_clear_kernel.cRodrigo Vivi
Alexandre Oliva has recently removed these files from Linux Libre with concerns that the sources weren't available. The sources are available on IGT repository, and only open source tools are used to generate the {ivb,hsw}_clear_kernel.c files. However, the remaining concern from Alexandre Oliva was around GPL license and the source not been present when distributing the code. So, it looks like 2 alternatives are possible, the use of linux-firmware.git repository to store the blob or making sure that the source is also present in our tree. Since the goal is to limit the i915 firmware to only the micro-controller blobs let's make sure that we do include the asm sources here in our tree. Btw, I tried to have some diligence here and make sure that the asms that these commits are adding are truly the source for the mentioned files: igt$ ./scripts/generate_clear_kernel.sh -g ivb \ -m ~/mesa/build/src/intel/tools/i965_asm Output file not specified - using default file "ivb-cb_assembled" Generating gen7 CB Kernel assembled file "ivb_clear_kernel.c" for i915 driver... igt$ diff ~/i915/drm-tip/drivers/gpu/drm/i915/gt/ivb_clear_kernel.c \ ivb_clear_kernel.c < * Generated by: IGT Gpu Tools on Fri 21 Feb 2020 05:29:32 AM UTC > * Generated by: IGT Gpu Tools on Mon 08 Jun 2020 10:00:54 AM PDT 61c61 < }; > }; \ No newline at end of file igt$ ./scripts/generate_clear_kernel.sh -g hsw \ -m ~/mesa/build/src/intel/tools/i965_asm Output file not specified - using default file "hsw-cb_assembled" Generating gen7.5 CB Kernel assembled file "hsw_clear_kernel.c" for i915 driver... igt$ diff ~/i915/drm-tip/drivers/gpu/drm/i915/gt/hsw_clear_kernel.c \ hsw_clear_kernel.c 5c5 < * Generated by: IGT Gpu Tools on Fri 21 Feb 2020 05:30:13 AM UTC > * Generated by: IGT Gpu Tools on Mon 08 Jun 2020 10:01:42 AM PDT 61c61 < }; > }; \ No newline at end of file Used IGT and Mesa master repositories from Fri Jun 5 2020) IGT: 53e8c878a6fb ("tests/kms_chamelium: Force reprobe after replugging the connector") Mesa: 5d13c7477eb1 ("radv: set keep_statistic_info with RADV_DEBUG=shaderstats") Mesa built with: meson build -D platforms=drm,x11 -D dri-drivers=i965 \ -D gallium-drivers=iris -D prefix=/usr \ -D libdir=/usr/lib64/ -Dtools=intel \ -Dkulkan-drivers=intel && ninja -C build v2: Header clean-up and include build instructions in a readme (Chris) Modified commit message to respect check-patch Reference: http://www.fsfla.org/pipermail/linux-libre/2020-June/003374.html Reference: http://www.fsfla.org/pipermail/linux-libre/2020-June/003375.html Fixes: 47f8253d2b89 ("drm/i915/gen7: Clear all EU/L3 residual contexts") Cc: <stable@vger.kernel.org> # v5.7+ Cc: Alexandre Oliva <lxoliva@fsfla.org> Cc: Prathap Kumar Valsan <prathap.kumar.valsan@intel.com> Cc: Akeem G Abodunrin <akeem.g.abodunrin@intel.com> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: Jani Nikula <jani.nikula@intel.com> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Reviewed-by: Jon Bloomfield <jon.bloomfield@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200610201807.191440-1-rodrigo.vivi@intel.com (cherry picked from commit 5a7eeb8ba143d860050ecea924a8f074f02d8023) Signed-off-by: Jani Nikula <jani.nikula@intel.com>
2020-06-29exfat: flush dirty metadata in fsyncSungjong Seo
generic_file_fsync() exfat used could not guarantee the consistency of a file because it has flushed not dirty metadata but only dirty data pages for a file. Instead of that, use exfat_file_fsync() for files and directories so that it guarantees to commit both the metadata and data pages for a file. Signed-off-by: Sungjong Seo <sj1557.seo@samsung.com> Signed-off-by: Namjae Jeon <namjae.jeon@samsung.com>
2020-06-29exfat: move setting VOL_DIRTY over exfat_remove_entries()Namjae Jeon
Move setting VOL_DIRTY over exfat_remove_entries() to avoid unneeded leaving VOL_DIRTY on -ENOTEMPTY. Fixes: 5f2aa075070c ("exfat: add inode operations") Cc: stable@vger.kernel.org # v5.7 Reported-by: Tetsuhiro Kohada <kohada.t2@gmail.com> Reviewed-by: Sungjong Seo <sj1557.seo@samsung.com> Signed-off-by: Namjae Jeon <namjae.jeon@samsung.com>
2020-06-29exfat: call sync_filesystem for read-only remountHyunchul Lee
We need to commit dirty metadata and pages to disk before remounting exfat as read-only. This fixes a failure in xfstests generic/452 generic/452 does the following: cp something <exfat>/ mount -o remount,ro <exfat> the <exfat>/something is corrupted. because while exfat is remounted as read-only, exfat doesn't have a chance to commit metadata and vfs invalidates page caches in a block device. Signed-off-by: Hyunchul Lee <hyc.lee@gmail.com> Acked-by: Sungjong Seo <sj1557.seo@samsung.com> Signed-off-by: Namjae Jeon <namjae.jeon@samsung.com>
2020-06-29exfat: add missing brelse() calls on error pathsDan Carpenter
If the second exfat_get_dentry() call fails then we need to release "old_bh" before returning. There is a similar bug in exfat_move_file(). Fixes: 5f2aa075070c ("exfat: add inode operations") Reported-by: Markus Elfring <Markus.Elfring@web.de> Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Namjae Jeon <namjae.jeon@samsung.com>
2020-06-29exfat: Set the unused characters of FileName field to the value 0000hHyeongseok.Kim
Some fsck tool complain that padding part of the FileName field is not set to the value 0000h. So let's maintain filesystem cleaner, as exfat's spec. recommendation. Signed-off-by: Hyeongseok.Kim <Hyeongseok@gmail.com> Reviewed-by: Sungjong Seo <sj1557.seo@samsung.com> Signed-off-by: Namjae Jeon <namjae.jeon@samsung.com>
2020-06-29Merge tag 'gvt-fixes-2020-06-17' of https://github.com/intel/gvt-linux into ↵Jani Nikula
drm-intel-fixes gvt-fixes-2020-06-17 - Two missed MMIO handler fixes for SKL/CFL (Colin) - Fix mask register bits check (Colin) - Fix one lockdep error for debugfs entry access (Colin) Signed-off-by: Jani Nikula <jani.nikula@intel.com> From: Zhenyu Wang <zhenyuw@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200617043418.GQ5687@zhen-hp.sh.intel.com
2020-06-29dma-mapping: warn when coherent pool is depletedDavid Rientjes
When a DMA coherent pool is depleted, allocation failures may or may not get reported in the kernel log depending on the allocator. The admin does have a workaround, however, by using coherent_pool= on the kernel command line. Provide some guidance on the failure and a recommended minimum size for the pools (double the size). Signed-off-by: David Rientjes <rientjes@google.com> Tested-by: Guenter Roeck <linux@roeck-us.net> Signed-off-by: Christoph Hellwig <hch@lst.de>
2020-06-29x86/fpu: Reset MXCSR to default in kernel_fpu_begin()Petteri Aimonen
Previously, kernel floating point code would run with the MXCSR control register value last set by userland code by the thread that was active on the CPU core just before kernel call. This could affect calculation results if rounding mode was changed, or a crash if a FPU/SIMD exception was unmasked. Restore MXCSR to the kernel's default value. [ bp: Carve out from a bigger patch by Petteri, add feature check, add FNINIT call too (amluto). ] Signed-off-by: Petteri Aimonen <jpa@git.mail.kapsi.fi> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://bugzilla.kernel.org/show_bug.cgi?id=207979 Link: https://lkml.kernel.org/r/20200624114646.28953-2-bp@alien8.de
2020-06-29powerpc/mm/pkeys: Make pkey access check work on execute_only_keyAneesh Kumar K.V
Jan reported that LTP mmap03 was getting stuck in a page fault loop after commit c46241a370a6 ("powerpc/pkeys: Check vma before returning key fault error to the user"), as well as a minimised reproducer: #include <fcntl.h> #include <stdio.h> #include <stdlib.h> #include <unistd.h> #include <sys/mman.h> int main(int ac, char **av) { int page_sz = getpagesize(); int fildes; char *addr; fildes = open("tempfile", O_WRONLY | O_CREAT, 0666); write(fildes, &fildes, sizeof(fildes)); close(fildes); fildes = open("tempfile", O_RDONLY); unlink("tempfile"); addr = mmap(0, page_sz, PROT_EXEC, MAP_FILE | MAP_PRIVATE, fildes, 0); printf("%d\n", *addr); return 0; } And noticed that access_pkey_error() in page fault handler now always seem to return false: __do_page_fault access_pkey_error(is_pkey: 1, is_exec: 0, is_write: 0) arch_vma_access_permitted pkey_access_permitted if (!is_pkey_enabled(pkey)) return true return false pkey_access_permitted() should not check if the pkey is available in UAMOR (using is_pkey_enabled()). The kernel needs to do that check only when allocating keys. This also makes sure the execute_only_key which is marked as non-manageable via UAMOR is handled correctly in pkey_access_permitted(), and fixes the bug. Fixes: c46241a370a6 ("powerpc/pkeys: Check vma before returning key fault error to the user") Reported-by: Jan Stancek <jstancek@redhat.com> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> [mpe: Include bug report details etc. in the change log] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200627070147.297535-1-aneesh.kumar@linux.ibm.com
2020-06-28llc: make sure applications use ARPHRD_ETHEREric Dumazet
syzbot was to trigger a bug by tricking AF_LLC with non sensible addr->sllc_arphrd It seems clear LLC requires an Ethernet device. Back in commit abf9d537fea2 ("llc: add support for SO_BINDTODEVICE") Octavian Purdila added possibility for application to use a zero value for sllc_arphrd, convert it to ARPHRD_ETHER to not cause regressions on existing applications. BUG: KASAN: use-after-free in __read_once_size include/linux/compiler.h:199 [inline] BUG: KASAN: use-after-free in list_empty include/linux/list.h:268 [inline] BUG: KASAN: use-after-free in waitqueue_active include/linux/wait.h:126 [inline] BUG: KASAN: use-after-free in wq_has_sleeper include/linux/wait.h:160 [inline] BUG: KASAN: use-after-free in skwq_has_sleeper include/net/sock.h:2092 [inline] BUG: KASAN: use-after-free in sock_def_write_space+0x642/0x670 net/core/sock.c:2813 Read of size 8 at addr ffff88801e0b4078 by task ksoftirqd/3/27 CPU: 3 PID: 27 Comm: ksoftirqd/3 Not tainted 5.5.0-rc1-syzkaller #0 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.12.0-59-gc9ba5276e321-prebuilt.qemu.org 04/01/2014 Call Trace: __dump_stack lib/dump_stack.c:77 [inline] dump_stack+0x197/0x210 lib/dump_stack.c:118 print_address_description.constprop.0.cold+0xd4/0x30b mm/kasan/report.c:374 __kasan_report.cold+0x1b/0x41 mm/kasan/report.c:506 kasan_report+0x12/0x20 mm/kasan/common.c:639 __asan_report_load8_noabort+0x14/0x20 mm/kasan/generic_report.c:135 __read_once_size include/linux/compiler.h:199 [inline] list_empty include/linux/list.h:268 [inline] waitqueue_active include/linux/wait.h:126 [inline] wq_has_sleeper include/linux/wait.h:160 [inline] skwq_has_sleeper include/net/sock.h:2092 [inline] sock_def_write_space+0x642/0x670 net/core/sock.c:2813 sock_wfree+0x1e1/0x260 net/core/sock.c:1958 skb_release_head_state+0xeb/0x260 net/core/skbuff.c:652 skb_release_all+0x16/0x60 net/core/skbuff.c:663 __kfree_skb net/core/skbuff.c:679 [inline] consume_skb net/core/skbuff.c:838 [inline] consume_skb+0xfb/0x410 net/core/skbuff.c:832 __dev_kfree_skb_any+0xa4/0xd0 net/core/dev.c:2967 dev_kfree_skb_any include/linux/netdevice.h:3650 [inline] e1000_unmap_and_free_tx_resource.isra.0+0x21b/0x3a0 drivers/net/ethernet/intel/e1000/e1000_main.c:1963 e1000_clean_tx_irq drivers/net/ethernet/intel/e1000/e1000_main.c:3854 [inline] e1000_clean+0x4cc/0x1d10 drivers/net/ethernet/intel/e1000/e1000_main.c:3796 napi_poll net/core/dev.c:6532 [inline] net_rx_action+0x508/0x1120 net/core/dev.c:6600 __do_softirq+0x262/0x98c kernel/softirq.c:292 run_ksoftirqd kernel/softirq.c:603 [inline] run_ksoftirqd+0x8e/0x110 kernel/softirq.c:595 smpboot_thread_fn+0x6a3/0xa40 kernel/smpboot.c:165 kthread+0x361/0x430 kernel/kthread.c:255 ret_from_fork+0x24/0x30 arch/x86/entry/entry_64.S:352 Allocated by task 8247: save_stack+0x23/0x90 mm/kasan/common.c:72 set_track mm/kasan/common.c:80 [inline] __kasan_kmalloc mm/kasan/common.c:513 [inline] __kasan_kmalloc.constprop.0+0xcf/0xe0 mm/kasan/common.c:486 kasan_slab_alloc+0xf/0x20 mm/kasan/common.c:521 slab_post_alloc_hook mm/slab.h:584 [inline] slab_alloc mm/slab.c:3320 [inline] kmem_cache_alloc+0x121/0x710 mm/slab.c:3484 sock_alloc_inode+0x1c/0x1d0 net/socket.c:240 alloc_inode+0x68/0x1e0 fs/inode.c:230 new_inode_pseudo+0x19/0xf0 fs/inode.c:919 sock_alloc+0x41/0x270 net/socket.c:560 __sock_create+0xc2/0x730 net/socket.c:1384 sock_create net/socket.c:1471 [inline] __sys_socket+0x103/0x220 net/socket.c:1513 __do_sys_socket net/socket.c:1522 [inline] __se_sys_socket net/socket.c:1520 [inline] __ia32_sys_socket+0x73/0xb0 net/socket.c:1520 do_syscall_32_irqs_on arch/x86/entry/common.c:337 [inline] do_fast_syscall_32+0x27b/0xe16 arch/x86/entry/common.c:408 entry_SYSENTER_compat+0x70/0x7f arch/x86/entry/entry_64_compat.S:139 Freed by task 17: save_stack+0x23/0x90 mm/kasan/common.c:72 set_track mm/kasan/common.c:80 [inline] kasan_set_free_info mm/kasan/common.c:335 [inline] __kasan_slab_free+0x102/0x150 mm/kasan/common.c:474 kasan_slab_free+0xe/0x10 mm/kasan/common.c:483 __cache_free mm/slab.c:3426 [inline] kmem_cache_free+0x86/0x320 mm/slab.c:3694 sock_free_inode+0x20/0x30 net/socket.c:261 i_callback+0x44/0x80 fs/inode.c:219 __rcu_reclaim kernel/rcu/rcu.h:222 [inline] rcu_do_batch kernel/rcu/tree.c:2183 [inline] rcu_core+0x570/0x1540 kernel/rcu/tree.c:2408 rcu_core_si+0x9/0x10 kernel/rcu/tree.c:2417 __do_softirq+0x262/0x98c kernel/softirq.c:292 The buggy address belongs to the object at ffff88801e0b4000 which belongs to the cache sock_inode_cache of size 1152 The buggy address is located 120 bytes inside of 1152-byte region [ffff88801e0b4000, ffff88801e0b4480) The buggy address belongs to the page: page:ffffea0000782d00 refcount:1 mapcount:0 mapping:ffff88807aa59c40 index:0xffff88801e0b4ffd raw: 00fffe0000000200 ffffea00008e6c88 ffffea0000782d48 ffff88807aa59c40 raw: ffff88801e0b4ffd ffff88801e0b4000 0000000100000003 0000000000000000 page dumped because: kasan: bad access detected Memory state around the buggy address: ffff88801e0b3f00: fb fb fb fb fb fb fb fb fb fb fb fb fc fc fc fc ffff88801e0b3f80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc >ffff88801e0b4000: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb ^ ffff88801e0b4080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb ffff88801e0b4100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb Fixes: abf9d537fea2 ("llc: add support for SO_BINDTODEVICE") Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-06-28net: explain the lockdep annotations for dev_uc_unsync()Cong Wang
The lockdep annotations for dev_uc_unsync() and dev_mc_unsync() are not easy to understand, so add some comments to explain why they are correct. Similar for the rest netif_addr_lock_bh() cases, they don't need nested version. Cc: Taehee Yoo <ap420073@gmail.com> Cc: Dmitry Vyukov <dvyukov@google.com> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-06-28net: get rid of lockdep_set_class_and_subclass()Cong Wang
lockdep_set_class_and_subclass() is meant to reduce the _nested() annotations by assigning a default subclass. For addr_list_lock, we have to compute the subclass at run-time as the netdevice topology changes after creation. So, we should just get rid of these lockdep_set_class_and_subclass() and stick with our _nested() annotations. Fixes: 845e0ebb4408 ("net: change addr_list_lock back to static key") Suggested-by: Taehee Yoo <ap420073@gmail.com> Cc: Dmitry Vyukov <dvyukov@google.com> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-06-28lib: packing: add documentation for pbuflen argumentVladimir Oltean
Fixes sparse warning: Function parameter or member 'pbuflen' not described in 'packing' Fixes: 554aae35007e ("lib: Add support for generic packing operations") Signed-off-by: Vladimir Oltean <olteanv@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-06-28bridge: mrp: Fix endian conversion and some other warningsHoratiu Vultur
The following sparse warnings are fixed: net/bridge/br_mrp.c:106:18: warning: incorrect type in assignment (different base types) net/bridge/br_mrp.c:106:18: expected unsigned short [usertype] net/bridge/br_mrp.c:106:18: got restricted __be16 [usertype] net/bridge/br_mrp.c:281:23: warning: incorrect type in argument 1 (different modifiers) net/bridge/br_mrp.c:281:23: expected struct list_head *entry net/bridge/br_mrp.c:281:23: got struct list_head [noderef] * net/bridge/br_mrp.c:332:28: warning: incorrect type in argument 1 (different modifiers) net/bridge/br_mrp.c:332:28: expected struct list_head *new net/bridge/br_mrp.c:332:28: got struct list_head [noderef] * net/bridge/br_mrp.c:332:40: warning: incorrect type in argument 2 (different modifiers) net/bridge/br_mrp.c:332:40: expected struct list_head *head net/bridge/br_mrp.c:332:40: got struct list_head [noderef] * net/bridge/br_mrp.c:682:29: warning: incorrect type in argument 1 (different modifiers) net/bridge/br_mrp.c:682:29: expected struct list_head const *head net/bridge/br_mrp.c:682:29: got struct list_head [noderef] * Reported-by: kernel test robot <lkp@intel.com> Fixes: 2f1a11ae11d222 ("bridge: mrp: Add MRP interface.") Fixes: 4b8d7d4c599182 ("bridge: mrp: Extend bridge interface") Fixes: 9a9f26e8f7ea30 ("bridge: mrp: Connect MRP API with the switchdev API") Signed-off-by: Horatiu Vultur <horatiu.vultur@microchip.com> Acked-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-06-29drm/exynos: fix ref count leak in mic_pre_enableNavid Emamdoost
in mic_pre_enable, pm_runtime_get_sync is called which increments the counter even in case of failure, leading to incorrect ref count. In case of failure, decrement the ref count before returning. Signed-off-by: Navid Emamdoost <navid.emamdoost@gmail.com> Signed-off-by: Inki Dae <inki.dae@samsung.com>
2020-06-29drm/exynos: Properly propagate return value in drm_iommu_attach_device()Marek Szyprowski
Propagate the proper error codes from the called functions instead of unconditionally returning 0. Reported-by: kbuild test robot <lkp@intel.com> Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com> Merge conflict so merged it manually. Signed-off-by: Inki Dae <inki.dae@samsung.com>
2020-06-29drm/exynos: Remove dev_err() on platform_get_irq() failureTamseel Shams
platform_get_irq() will call dev_err() itself on failure, so there is no need for the driver to also do this. This is detected by coccinelle. Signed-off-by: Tamseel Shams <m.shams@samsung.com> Signed-off-by: Inki Dae <inki.dae@samsung.com>
2020-06-28Linux 5.8-rc3v5.8-rc3Linus Torvalds
2020-06-28Merge tag 'arm-omap-fixes-5.8-1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc Pull ARM OMAP fixes from Arnd Bergmann: "The OMAP developers are particularly active at hunting down regressions, so this is a separate branch with OMAP specific fixes for v5.8: As Tony explains "The recent display subsystem (DSS) related platform data changes caused display related regressions for suspend and resume. Looks like I only tested suspend and resume before dropping the legacy platform data, and forgot to test it after dropping it. Turns out the main issue was that we no longer have platform code calling pm_runtime_suspend for DSS like we did for the legacy platform data case, and that fix is still being discussed on the dri-devel list and will get merged separately. The DSS related testing exposed a pile other other display related issues that also need fixing though": - Fix ti-sysc optional clock handling and reset status checks for devices that reset automatically in idle like DSS - Ignore ti-sysc clockactivity bit unless separately requested to avoid unexpected performance issues - Init ti-sysc framedonetv_irq to true and disable for am4 - Avoid duplicate DSS reset for legacy mode with dts data - Remove LCD timings for am4 as they cause warnings now that we're using generic panels Other OMAP changes from Tony include: - Fix omap_prm reset deassert as we still have drivers setting the pm_runtime_irq_safe() flag - Flush posted write for ti-sysc enable and disable - Fix droid4 spi related errors with spi flags - Fix am335x USB range and a typo for softreset - Fix dra7 timer nodes for clocks for IPU and DSP - Drop duplicate mailboxes after mismerge for dra7 - Prevent pocketgeagle header line signal from accidentally setting micro-SD write protection signal by removing the default mux - Fix NFSroot flakeyness after resume for duover by switching the smsc911x gpio interrupt to back to level sensitive - Fix regression for omap4 clockevent source after recent system timer changes - Yet another ethernet regression fix for the "rgmii" vs "rgmii-rxid" phy-mode - One patch to convert am3/am4 DT files to use the regular sdhci-omap driver instead of the old hsmmc driver, this was meant for the merge window but got lost in the process" * tag 'arm-omap-fixes-5.8-1' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc: (21 commits) ARM: dts: am5729: beaglebone-ai: fix rgmii phy-mode ARM: dts: Fix omap4 system timer source clocks ARM: dts: Fix duovero smsc interrupt for suspend ARM: dts: am335x-pocketbeagle: Fix mmc0 Write Protect Revert "bus: ti-sysc: Increase max softreset wait" ARM: dts: am437x-epos-evm: remove lcd timings ARM: dts: am437x-gp-evm: remove lcd timings ARM: dts: am437x-sk-evm: remove lcd timings ARM: dts: dra7-evm-common: Fix duplicate mailbox nodes ARM: dts: dra7: Fix timer nodes properly for timer_sys_ck clocks ARM: dts: Fix am33xx.dtsi ti,sysc-mask wrong softreset flag ARM: dts: Fix am33xx.dtsi USB ranges length bus: ti-sysc: Increase max softreset wait ARM: OMAP2+: Fix legacy mode dss_reset bus: ti-sysc: Fix uninitialized framedonetv_irq bus: ti-sysc: Ignore clockactivity unless specified as a quirk bus: ti-sysc: Use optional clocks on for enable and wait for softreset bit ARM: dts: omap4-droid4: Fix spi configuration and increase rate bus: ti-sysc: Flush posted write on enable and disable soc: ti: omap-prm: use atomic iopoll instead of sleeping one ...
2020-06-28Merge tag 'arm-fixes-5.8-1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc Pull ARM SoC fixes from Arnd Bergmann: "Here are a couple of bug fixes, mostly for devicetree files NXP i.MX: - Use correct voltage on some i.MX8M board device trees to avoid hardware damage - Code fixes for a compiler warning and incorrect reference counting, both harmless. - Fix the i.MX8M SoC driver to correctly identify imx8mp - Fix watchdog configuration in imx6ul-kontron device tree. Broadcom: - A small regression fix for the Raspberry-Pi firmware driver - A Kconfig change to use the correct timer driver on Northstar - A DT fix for the Luxul XWC-2000 machine - Two more DT fixes for NSP SoCs STmicroelectronics STI - Revert one broken patch for L2 cache configuration ARM Versatile Express: - Fix a regression by reverting a broken DT cleanup TEE drivers: - MAINTAINERS: change tee mailing list" * tag 'arm-fixes-5.8-1' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc: Revert "ARM: sti: Implement dummy L2 cache's write_sec" soc: imx8m: fix build warning ARM: imx6: add missing put_device() call in imx6q_suspend_init() ARM: imx5: add missing put_device() call in imx_suspend_alloc_ocram() soc: imx8m: Correct i.MX8MP UID fuse offset ARM: dts: imx6ul-kontron: Change WDOG_ANY signal from push-pull to open-drain ARM: dts: imx6ul-kontron: Move watchdog from Kontron i.MX6UL/ULL board to SoM arm64: dts: imx8mm-beacon: Fix voltages on LDO1 and LDO2 arm64: dts: imx8mn-ddr4-evk: correct ldo1/ldo2 voltage range arm64: dts: imx8mm-evk: correct ldo1/ldo2 voltage range ARM: dts: NSP: Correct FA2 mailbox node ARM: bcm2835: Fix integer overflow in rpi_firmware_print_firmware_revision() MAINTAINERS: change tee mailing list ARM: dts: NSP: Disable PL330 by default, add dma-coherent property ARM: bcm: Select ARM_TIMER_SP804 for ARCH_BCM_NSP ARM: dts: BCM5301X: Add missing memory "device_type" for Luxul XWC-2000 arm: dts: vexpress: Move mcc node back into motherboard node
2020-06-28Merge tag 'timers-urgent-2020-06-28' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull timer fix from Ingo Molnar: "A single DocBook fix" * tag 'timers-urgent-2020-06-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: timekeeping: Fix kerneldoc system_device_crosststamp & al
2020-06-28Merge tag 'perf-urgent-2020-06-28' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull perf fix from Ingo Molnar: "A single Kbuild dependency fix" * tag 'perf-urgent-2020-06-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: perf/x86/rapl: Fix RAPL config variable bug
2020-06-28Merge tag 'efi-urgent-2020-06-28' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull EFI fixes from Ingo Molnar: - Fix build regression on v4.8 and older - Robustness fix for TPM log parsing code - kobject refcount fix for the ESRT parsing code - Two efivarfs fixes to make it behave more like an ordinary file system - Style fixup for zero length arrays - Fix a regression in path separator handling in the initrd loader - Fix a missing prototype warning - Add some kerneldoc headers for newly introduced stub routines - Allow support for SSDT overrides via EFI variables to be disabled - Report CPU mode and MMU state upon entry for 32-bit ARM - Use the correct stack pointer alignment when entering from mixed mode * tag 'efi-urgent-2020-06-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: efi/libstub: arm: Print CPU boot mode and MMU state at boot efi/libstub: arm: Omit arch specific config table matching array on arm64 efi/x86: Setup stack correctly for efi_pe_entry efi: Make it possible to disable efivar_ssdt entirely efi/libstub: Descriptions for stub helper functions efi/libstub: Fix path separator regression efi/libstub: Fix missing-prototype warning for skip_spaces() efi: Replace zero-length array and use struct_size() helper efivarfs: Don't return -EINTR when rate-limiting reads efivarfs: Update inode modification time for successful writes efi/esrt: Fix reference count leak in esre_create_sysfs_entry. efi/tpm: Verify event log header before parsing efi/x86: Fix build with gcc 4
2020-06-28Merge tag 'sched_urgent_for_5.8_rc3' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull scheduler fixes from Borislav Petkov: "The most anticipated fix in this pull request is probably the horrible build fix for the RANDSTRUCT fail that didn't make -rc2. Also included is the cleanup that removes those BUILD_BUG_ON()s and replaces it with ugly unions. Also included is the try_to_wake_up() race fix that was first triggered by Paul's RCU-torture runs, but was independently hit by Dave Chinner's fstest runs as well" * tag 'sched_urgent_for_5.8_rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: sched/cfs: change initial value of runnable_avg smp, irq_work: Continue smp_call_function*() and irq_work*() integration sched/core: s/WF_ON_RQ/WQ_ON_CPU/ sched/core: Fix ttwu() race sched/core: Fix PI boosting between RT and DEADLINE tasks sched/deadline: Initialize ->dl_boosted sched/core: Check cpus_mask, not cpus_ptr in __set_cpus_allowed_ptr(), to fix mask corruption sched/core: Fix CONFIG_GCC_PLUGIN_RANDSTRUCT build fail
2020-06-28Merge tag 'x86_urgent_for_5.8_rc3' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fixes from Borislav Petkov: - AMD Memory bandwidth counter width fix, by Babu Moger. - Use the proper length type in the 32-bit truncate() syscall variant, by Jiri Slaby. - Reinit IA32_FEAT_CTL during wakeup to fix the case where after resume, VMXON would #GP due to VMX not being properly enabled, by Sean Christopherson. - Fix a static checker warning in the resctrl code, by Dan Carpenter. - Add a CR4 pinning mask for bits which cannot change after boot, by Kees Cook. - Align the start of the loop of __clear_user() to 16 bytes, to improve performance on AMD zen1 and zen2 microarchitectures, by Matt Fleming. * tag 'x86_urgent_for_5.8_rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/asm/64: Align start of __clear_user() loop to 16-bytes x86/cpu: Use pinning mask for CR4 bits needing to be 0 x86/resctrl: Fix a NULL vs IS_ERR() static checker warning in rdt_cdp_peer_get() x86/cpu: Reinitialize IA32_FEAT_CTL MSR on BSP during wakeup syscalls: Fix offset type of ksys_ftruncate() x86/resctrl: Fix memory bandwidth counter width for AMD
2020-06-28Merge tag 'rcu_urgent_for_5.8_rc3' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull RCU-vs-KCSAN fixes from Borislav Petkov: "A single commit that uses "arch_" atomic operations to avoid the instrumentation that comes with the non-"arch_" versions. In preparation for that commit, it also has another commit that makes these "arch_" atomic operations available to generic code. Without these commits, KCSAN uses can see pointless errors" * tag 'rcu_urgent_for_5.8_rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: rcu: Fixup noinstr warnings locking/atomics: Provide the arch_atomic_ interface to generic code
2020-06-28Merge tag 'objtool_urgent_for_5.8_rc3' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull objtool fixes from Borislav Petkov: "Three fixes from Peter Zijlstra suppressing KCOV instrumentation in noinstr sections. Peter Zijlstra says: "Address KCOV vs noinstr. There is no function attribute to selectively suppress KCOV instrumentation, instead teach objtool to NOP out the calls in noinstr functions" This cures a bunch of KCOV crashes (as used by syzcaller)" * tag 'objtool_urgent_for_5.8_rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: objtool: Fix noinstr vs KCOV objtool: Provide elf_write_{insn,reloc}() objtool: Clean up elf_write() condition
2020-06-28Merge tag 'x86_entry_for_5.8' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 entry fixes from Borislav Petkov: "This is the x86/entry urgent pile which has accumulated since the merge window. It is not the smallest but considering the almost complete entry core rewrite, the amount of fixes to follow is somewhat higher than usual, which is to be expected. Peter Zijlstra says: 'These patches address a number of instrumentation issues that were found after the x86/entry overhaul. When combined with rcu/urgent and objtool/urgent, these patches make UBSAN/KASAN/KCSAN happy again. Part of making this all work is bumping the minimum GCC version for KASAN builds to gcc-8.3, the reason for this is that the __no_sanitize_address function attribute is broken in GCC releases before that. No known GCC version has a working __no_sanitize_undefined, however because the only noinstr violation that results from this happens when an UB is found, we treat it like WARN. That is, we allow it to violate the noinstr rules in order to get the warning out'" * tag 'x86_entry_for_5.8' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/entry: Fix #UD vs WARN more x86/entry: Increase entry_stack size to a full page x86/entry: Fixup bad_iret vs noinstr objtool: Don't consider vmlinux a C-file kasan: Fix required compiler version compiler_attributes.h: Support no_sanitize_undefined check with GCC 4 x86/entry, bug: Comment the instrumentation_begin() usage for WARN() x86/entry, ubsan, objtool: Whitelist __ubsan_handle_*() x86/entry, cpumask: Provide non-instrumented variant of cpu_is_offline() compiler_types.h: Add __no_sanitize_{address,undefined} to noinstr kasan: Bump required compiler version x86, kcsan: Add __no_kcsan to noinstr kcsan: Remove __no_kcsan_or_inline x86, kcsan: Remove __no_kcsan_or_inline usage
2020-06-28Merge branch 'fix-sockmap'Alexei Starovoitov
John Fastabend says: ==================== Fix a splat introduced by recent changes to avoid skipping ingress policy when kTLS is enabled. The RCU splat was introduced because in the non-TLS case the caller is wrapped in an rcu_read_lock/unlock. But, in the TLS case we have a reference to the psock and the caller did not wrap its call in rcu_read_lock/unlock. To fix extend the RCU section to include the redirect case which was missed. From v1->v2 I changed the location a bit to simplify the code some. See patch 1. But, then Martin asked why it was not needed in the non-TLS case. The answer for patch 1 was, as stated above, because the caller has the rcu read lock. However, there was still a missing case where a BPF user could in-theory line up a set of parameters to hit a case where the code was entered from strparser side from a different context then the initial caller. To hit this user would need a parser program to return value greater than skb->len then an ENOMEM error could happen in the strparser codepath triggering strparser to retry from a workqueue and without rcu_read_lock original caller used. See patch 2 for details. Finally, we don't actually have any selftests for parser returning a value geater than skb->len so add one in patch 3. This is especially needed because at least I don't have any code that uses the parser to return value greater than skb->len. So I wouldn't have caught any errors here in my own testing. Thanks, John v1->v2: simplify code in patch 1 some and add patches 2 and 3. ==================== Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2020-06-28bpf, sockmap: Add ingres skb tests that utilize merge skbsJohn Fastabend
Add a test to check strparser merging skbs is working. Signed-off-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Martin KaFai Lau <kafai@fb.com> Link: https://lore.kernel.org/bpf/159312681884.18340.4922800172600252370.stgit@john-XPS-13-9370
2020-06-28bpf, sockmap: RCU dereferenced psock may be used outside RCU blockJohn Fastabend
If an ingress verdict program specifies message sizes greater than skb->len and there is an ENOMEM error due to memory pressure we may call the rcv_msg handler outside the strp_data_ready() caller context. This is because on an ENOMEM error the strparser will retry from a workqueue. The caller currently protects the use of psock by calling the strp_data_ready() inside a rcu_read_lock/unlock block. But, in above workqueue error case the psock is accessed outside the read_lock/unlock block of the caller. So instead of using psock directly we must do a look up against the sk again to ensure the psock is available. There is an an ugly piece here where we must handle the case where we paused the strp and removed the psock. On psock removal we first pause the strparser and then remove the psock. If the strparser is paused while an skb is scheduled on the workqueue the skb will be dropped on the flow and kfree_skb() is called. If the workqueue manages to get called before we pause the strparser but runs the rcvmsg callback after the psock is removed we will hit the unlikely case where we run the sockmap rcvmsg handler but do not have a psock. For now we will follow strparser logic and drop the skb on the floor with skb_kfree(). This is ugly because the data is dropped. To date this has not caused problems in practice because either the application controlling the sockmap is coordinating with the datapath so that skbs are "flushed" before removal or we simply wait for the sock to be closed before removing it. This patch fixes the describe RCU bug and dropping the skb doesn't make things worse. Future patches will improve this by allowing the normal case where skbs are not merged to skip the strparser altogether. In practice many (most?) use cases have no need to merge skbs so its both a code complexity hit as seen above and a performance issue. For example, in the Cilium case we always set the strparser up to return sbks 1:1 without any merging and have avoided above issues. Fixes: e91de6afa81c1 ("bpf: Fix running sk_skb program types with ktls") Signed-off-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Martin KaFai Lau <kafai@fb.com> Link: https://lore.kernel.org/bpf/159312679888.18340.15248924071966273998.stgit@john-XPS-13-9370
2020-06-28bpf, sockmap: RCU splat with redirect and strparser error or TLSJohn Fastabend
There are two paths to generate the below RCU splat the first and most obvious is the result of the BPF verdict program issuing a redirect on a TLS socket (This is the splat shown below). Unlike the non-TLS case the caller of the *strp_read() hooks does not wrap the call in a rcu_read_lock/unlock. Then if the BPF program issues a redirect action we hit the RCU splat. However, in the non-TLS socket case the splat appears to be relatively rare, because the skmsg caller into the strp_data_ready() is wrapped in a rcu_read_lock/unlock. Shown here, static void sk_psock_strp_data_ready(struct sock *sk) { struct sk_psock *psock; rcu_read_lock(); psock = sk_psock(sk); if (likely(psock)) { if (tls_sw_has_ctx_rx(sk)) { psock->parser.saved_data_ready(sk); } else { write_lock_bh(&sk->sk_callback_lock); strp_data_ready(&psock->parser.strp); write_unlock_bh(&sk->sk_callback_lock); } } rcu_read_unlock(); } If the above was the only way to run the verdict program we would be safe. But, there is a case where the strparser may throw an ENOMEM error while parsing the skb. This is a result of a failed skb_clone, or alloc_skb_for_msg while building a new merged skb when the msg length needed spans multiple skbs. This will in turn put the skb on the strp_wrk workqueue in the strparser code. The skb will later be dequeued and verdict programs run, but now from a different context without the rcu_read_lock()/unlock() critical section in sk_psock_strp_data_ready() shown above. In practice I have not seen this yet, because as far as I know most users of the verdict programs are also only working on single skbs. In this case no merge happens which could trigger the above ENOMEM errors. In addition the system would need to be under memory pressure. For example, we can't hit the above case in selftests because we missed having tests to merge skbs. (Added in later patch) To fix the below splat extend the rcu_read_lock/unnlock block to include the call to sk_psock_tls_verdict_apply(). This will fix both TLS redirect case and non-TLS redirect+error case. Also remove psock from the sk_psock_tls_verdict_apply() function signature its not used there. [ 1095.937597] WARNING: suspicious RCU usage [ 1095.940964] 5.7.0-rc7-02911-g463bac5f1ca79 #1 Tainted: G W [ 1095.944363] ----------------------------- [ 1095.947384] include/linux/skmsg.h:284 suspicious rcu_dereference_check() usage! [ 1095.950866] [ 1095.950866] other info that might help us debug this: [ 1095.950866] [ 1095.957146] [ 1095.957146] rcu_scheduler_active = 2, debug_locks = 1 [ 1095.961482] 1 lock held by test_sockmap/15970: [ 1095.964501] #0: ffff9ea6b25de660 (sk_lock-AF_INET){+.+.}-{0:0}, at: tls_sw_recvmsg+0x13a/0x840 [tls] [ 1095.968568] [ 1095.968568] stack backtrace: [ 1095.975001] CPU: 1 PID: 15970 Comm: test_sockmap Tainted: G W 5.7.0-rc7-02911-g463bac5f1ca79 #1 [ 1095.977883] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.12.0-1 04/01/2014 [ 1095.980519] Call Trace: [ 1095.982191] dump_stack+0x8f/0xd0 [ 1095.984040] sk_psock_skb_redirect+0xa6/0xf0 [ 1095.986073] sk_psock_tls_strp_read+0x1d8/0x250 [ 1095.988095] tls_sw_recvmsg+0x714/0x840 [tls] v2: Improve commit message to identify non-TLS redirect plus error case condition as well as more common TLS case. In the process I decided doing the rcu_read_unlock followed by the lock/unlock inside branches was unnecessarily complex. We can just extend the current rcu block and get the same effeective without the shuffling and branching. Thanks Martin! Fixes: e91de6afa81c1 ("bpf: Fix running sk_skb program types with ktls") Reported-by: Jakub Sitnicki <jakub@cloudflare.com> Reported-by: kernel test robot <rong.a.chen@intel.com> Signed-off-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Martin KaFai Lau <kafai@fb.com> Acked-by: Jakub Sitnicki <jakub@cloudflare.com> Link: https://lore.kernel.org/bpf/159312677907.18340.11064813152758406626.stgit@john-XPS-13-9370
2020-06-28sched/cfs: change initial value of runnable_avgVincent Guittot
Some performance regression on reaim benchmark have been raised with commit 070f5e860ee2 ("sched/fair: Take into account runnable_avg to classify group") The problem comes from the init value of runnable_avg which is initialized with max value. This can be a problem if the newly forked task is finally a short task because the group of CPUs is wrongly set to overloaded and tasks are pulled less agressively. Set initial value of runnable_avg equals to util_avg to reflect that there is no waiting time so far. Fixes: 070f5e860ee2 ("sched/fair: Take into account runnable_avg to classify group") Reported-by: kernel test robot <rong.a.chen@intel.com> Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20200624154422.29166-1-vincent.guittot@linaro.org
2020-06-28smp, irq_work: Continue smp_call_function*() and irq_work*() integrationPeter Zijlstra
Instead of relying on BUG_ON() to ensure the various data structures line up, use a bunch of horrible unions to make it all automatic. Much of the union magic is to ensure irq_work and smp_call_function do not (yet) see the members of their respective data structures change name. Suggested-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Reviewed-by: Frederic Weisbecker <frederic@kernel.org> Link: https://lkml.kernel.org/r/20200622100825.844455025@infradead.org
2020-06-28sched/core: s/WF_ON_RQ/WQ_ON_CPU/Peter Zijlstra
Use a better name for this poorly named flag, to avoid confusion... Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Acked-by: Mel Gorman <mgorman@suse.de> Link: https://lkml.kernel.org/r/20200622100825.785115830@infradead.org
2020-06-28sched/core: Fix ttwu() racePeter Zijlstra
Paul reported rcutorture occasionally hitting a NULL deref: sched_ttwu_pending() ttwu_do_wakeup() check_preempt_curr() := check_preempt_wakeup() find_matching_se() is_same_group() if (se->cfs_rq == pse->cfs_rq) <-- *BOOM* Debugging showed that this only appears to happen when we take the new code-path from commit: 2ebb17717550 ("sched/core: Offload wakee task activation if it the wakee is descheduling") and only when @cpu == smp_processor_id(). Something which should not be possible, because p->on_cpu can only be true for remote tasks. Similarly, without the new code-path from commit: c6e7bd7afaeb ("sched/core: Optimize ttwu() spinning on p->on_cpu") this would've unconditionally hit: smp_cond_load_acquire(&p->on_cpu, !VAL); and if: 'cpu == smp_processor_id() && p->on_cpu' is possible, this would result in an instant live-lock (with IRQs disabled), something that hasn't been reported. The NULL deref can be explained however if the task_cpu(p) load at the beginning of try_to_wake_up() returns an old value, and this old value happens to be smp_processor_id(). Further assume that the p->on_cpu load accurately returns 1, it really is still running, just not here. Then, when we enqueue the task locally, we can crash in exactly the observed manner because p->se.cfs_rq != rq->cfs_rq, because p's cfs_rq is from the wrong CPU, therefore we'll iterate into the non-existant parents and NULL deref. The closest semi-plausible scenario I've managed to contrive is somewhat elaborate (then again, actual reproduction takes many CPU hours of rcutorture, so it can't be anything obvious): X->cpu = 1 rq(1)->curr = X CPU0 CPU1 CPU2 // switch away from X LOCK rq(1)->lock smp_mb__after_spinlock dequeue_task(X) X->on_rq = 9 switch_to(Z) X->on_cpu = 0 UNLOCK rq(1)->lock // migrate X to cpu 0 LOCK rq(1)->lock dequeue_task(X) set_task_cpu(X, 0) X->cpu = 0 UNLOCK rq(1)->lock LOCK rq(0)->lock enqueue_task(X) X->on_rq = 1 UNLOCK rq(0)->lock // switch to X LOCK rq(0)->lock smp_mb__after_spinlock switch_to(X) X->on_cpu = 1 UNLOCK rq(0)->lock // X goes sleep X->state = TASK_UNINTERRUPTIBLE smp_mb(); // wake X ttwu() LOCK X->pi_lock smp_mb__after_spinlock if (p->state) cpu = X->cpu; // =? 1 smp_rmb() // X calls schedule() LOCK rq(0)->lock smp_mb__after_spinlock dequeue_task(X) X->on_rq = 0 if (p->on_rq) smp_rmb(); if (p->on_cpu && ttwu_queue_wakelist(..)) [*] smp_cond_load_acquire(&p->on_cpu, !VAL) cpu = select_task_rq(X, X->wake_cpu, ...) if (X->cpu != cpu) switch_to(Y) X->on_cpu = 0 UNLOCK rq(0)->lock However I'm having trouble convincing myself that's actually possible on x86_64 -- after all, every LOCK implies an smp_mb() there, so if ttwu observes ->state != RUNNING, it must also observe ->cpu != 1. (Most of the previous ttwu() races were found on very large PowerPC) Nevertheless, this fully explains the observed failure case. Fix it by ordering the task_cpu(p) load after the p->on_cpu load, which is easy since nothing actually uses @cpu before this. Fixes: c6e7bd7afaeb ("sched/core: Optimize ttwu() spinning on p->on_cpu") Reported-by: Paul E. McKenney <paulmck@kernel.org> Tested-by: Paul E. McKenney <paulmck@kernel.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lkml.kernel.org/r/20200622125649.GC576871@hirez.programming.kicks-ass.net
2020-06-28sched/core: Fix PI boosting between RT and DEADLINE tasksJuri Lelli
syzbot reported the following warning: WARNING: CPU: 1 PID: 6351 at kernel/sched/deadline.c:628 enqueue_task_dl+0x22da/0x38a0 kernel/sched/deadline.c:1504 At deadline.c:628 we have: 623 static inline void setup_new_dl_entity(struct sched_dl_entity *dl_se) 624 { 625 struct dl_rq *dl_rq = dl_rq_of_se(dl_se); 626 struct rq *rq = rq_of_dl_rq(dl_rq); 627 628 WARN_ON(dl_se->dl_boosted); 629 WARN_ON(dl_time_before(rq_clock(rq), dl_se->deadline)); [...] } Which means that setup_new_dl_entity() has been called on a task currently boosted. This shouldn't happen though, as setup_new_dl_entity() is only called when the 'dynamic' deadline of the new entity is in the past w.r.t. rq_clock and boosted tasks shouldn't verify this condition. Digging through the PI code I noticed that what above might in fact happen if an RT tasks blocks on an rt_mutex hold by a DEADLINE task. In the first branch of boosting conditions we check only if a pi_task 'dynamic' deadline is earlier than mutex holder's and in this case we set mutex holder to be dl_boosted. However, since RT 'dynamic' deadlines are only initialized if such tasks get boosted at some point (or if they become DEADLINE of course), in general RT 'dynamic' deadlines are usually equal to 0 and this verifies the aforementioned condition. Fix it by checking that the potential donor task is actually (even if temporary because in turn boosted) running at DEADLINE priority before using its 'dynamic' deadline value. Fixes: 2d3d891d3344 ("sched/deadline: Add SCHED_DEADLINE inheritance logic") Reported-by: syzbot+119ba87189432ead09b4@syzkaller.appspotmail.com Signed-off-by: Juri Lelli <juri.lelli@redhat.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Reviewed-by: Daniel Bristot de Oliveira <bristot@redhat.com> Tested-by: Daniel Wagner <dwagner@suse.de> Link: https://lkml.kernel.org/r/20181119153201.GB2119@localhost.localdomain
2020-06-28sched/deadline: Initialize ->dl_boostedJuri Lelli
syzbot reported the following warning triggered via SYSC_sched_setattr(): WARNING: CPU: 0 PID: 6973 at kernel/sched/deadline.c:593 setup_new_dl_entity /kernel/sched/deadline.c:594 [inline] WARNING: CPU: 0 PID: 6973 at kernel/sched/deadline.c:593 enqueue_dl_entity /kernel/sched/deadline.c:1370 [inline] WARNING: CPU: 0 PID: 6973 at kernel/sched/deadline.c:593 enqueue_task_dl+0x1c17/0x2ba0 /kernel/sched/deadline.c:1441 This happens because the ->dl_boosted flag is currently not initialized by __dl_clear_params() (unlike the other flags) and setup_new_dl_entity() rightfully complains about it. Initialize dl_boosted to 0. Fixes: 2d3d891d3344 ("sched/deadline: Add SCHED_DEADLINE inheritance logic") Reported-by: syzbot+5ac8bac25f95e8b221e7@syzkaller.appspotmail.com Signed-off-by: Juri Lelli <juri.lelli@redhat.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Tested-by: Daniel Wagner <dwagner@suse.de> Link: https://lkml.kernel.org/r/20200617072919.818409-1-juri.lelli@redhat.com