summaryrefslogtreecommitdiff
path: root/drivers
AgeCommit message (Collapse)Author
2020-06-05ACPICA: iASL: add new OperationRegion subtype keyword PlatformRtMechanismErik Kaneda
ACPICA commit 2c2eefa827bd37297f5f9ca4b263fcba829aaf3f Link: https://github.com/acpica/acpica/commit/2c2eefa8 Signed-off-by: Erik Kaneda <erik.kaneda@intel.com> Signed-off-by: Bob Moore <robert.moore@intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2020-06-05mtd: clear cache_state to avoid writing to bad blocks repeatedlyXiaoming Ni
The function call process is as follows: mtd_blktrans_work() while (1) do_blktrans_request() mtdblock_writesect() do_cached_write() write_cached_data() /*if cache_state is STATE_DIRTY*/ erase_write() write_cached_data() returns failure without modifying cache_state and cache_offset. So when do_cached_write() is called again, write_cached_data() will be called again to perform erase_write() on the same cache_offset. But if this cache_offset points to a bad block, erase_write() will always return -EIO. Writing to this mtdblk is equivalent to losing the current data, and repeatedly writing to the bad block. Repeatedly writing a bad block has no real benefits, but brings some negative effects: 1 Lost subsequent data 2 Loss of flash device life 3 erase_write() bad blocks are very time-consuming. For example: the function do_erase_oneblock() in chips/cfi_cmdset_0020.c or chips/cfi_cmdset_0002.c may take more than 20 seconds to return Therefore, when erase_write() returns -EIO in write_cached_data(), clear cache_state to avoid writing to bad blocks repeatedly. Signed-off-by: Xiaoming Ni <nixiaoming@huawei.com> Reviewed-by: Miquel Raynal <miquel.raynal@bootlin.com> Signed-off-by: Richard Weinberger <richard@nod.at>
2020-06-05mtd: parser: cmdline: Support MTD names containing one or more colonsBoris Brezillon
Looks like some drivers define MTD names with a colon in it, thus making mtdpart= parsing impossible. Let's fix the parser to gracefully handle that case: the last ':' in a partition definition sequence is considered instead of the first one. Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com> Signed-off-by: Ron Minnich <rminnich@google.com> Tested-by: Ron Minnich <rminnich@google.com> Signed-off-by: Richard Weinberger <richard@nod.at>
2020-06-05mtd: physmap_of_gemini: remove defined but not used symbol 'syscon_match'Jason Yan
It's not used by anyone now, remove it. Fix the following gcc warning: drivers/mtd/maps/physmap-gemini.c:49:34: warning: ‘syscon_match’ defined but not used [-Wunused-const-variable=] static const struct of_device_id syscon_match[] = { ^~~~~~~~~~~~ Reported-by: Hulk Robot <hulkci@huawei.com> Signed-off-by: Jason Yan <yanaijie@huawei.com> Reviewed-by: Linus Walleij <linus.walleij@linaro.org> Signed-off-by: Richard Weinberger <richard@nod.at>
2020-06-04block: remove the error argument to the block_bio_complete tracepointChristoph Hellwig
The status can be trivially derived from the bio itself. That also avoid callers like NVMe to incorrectly pass a blk_status_t instead of the errno, and the overhead of translating the blk_status_t to the errno in the I/O completion fast path when no tracing is enabled. Fixes: 35fe0d12c8a3 ("nvme: trace bio completion") Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-06-04ata/libata: Fix usage of page address by page_address in ↵Ye Bin
ata_scsi_mode_select_xlat function BUG: KASAN: use-after-free in ata_scsi_mode_select_xlat+0x10bd/0x10f0 drivers/ata/libata-scsi.c:4045 Read of size 1 at addr ffff88803b8cd003 by task syz-executor.6/12621 CPU: 1 PID: 12621 Comm: syz-executor.6 Not tainted 4.19.95 #1 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1ubuntu1 04/01/2014 Call Trace: __dump_stack lib/dump_stack.c:77 [inline] dump_stack+0xac/0xee lib/dump_stack.c:118 print_address_description+0x60/0x223 mm/kasan/report.c:253 kasan_report_error mm/kasan/report.c:351 [inline] kasan_report mm/kasan/report.c:409 [inline] kasan_report.cold+0xae/0x2d8 mm/kasan/report.c:393 ata_scsi_mode_select_xlat+0x10bd/0x10f0 drivers/ata/libata-scsi.c:4045 ata_scsi_translate+0x2da/0x680 drivers/ata/libata-scsi.c:2035 __ata_scsi_queuecmd drivers/ata/libata-scsi.c:4360 [inline] ata_scsi_queuecmd+0x2e4/0x790 drivers/ata/libata-scsi.c:4409 scsi_dispatch_cmd+0x2ee/0x6c0 drivers/scsi/scsi_lib.c:1867 scsi_queue_rq+0xfd7/0x1990 drivers/scsi/scsi_lib.c:2170 blk_mq_dispatch_rq_list+0x1e1/0x19a0 block/blk-mq.c:1186 blk_mq_do_dispatch_sched+0x147/0x3d0 block/blk-mq-sched.c:108 blk_mq_sched_dispatch_requests+0x427/0x680 block/blk-mq-sched.c:204 __blk_mq_run_hw_queue+0xbc/0x200 block/blk-mq.c:1308 __blk_mq_delay_run_hw_queue+0x3c0/0x460 block/blk-mq.c:1376 blk_mq_run_hw_queue+0x152/0x310 block/blk-mq.c:1413 blk_mq_sched_insert_request+0x337/0x6c0 block/blk-mq-sched.c:397 blk_execute_rq_nowait+0x124/0x320 block/blk-exec.c:64 blk_execute_rq+0xc5/0x112 block/blk-exec.c:101 sg_scsi_ioctl+0x3b0/0x6a0 block/scsi_ioctl.c:507 sg_ioctl+0xd37/0x23f0 drivers/scsi/sg.c:1106 vfs_ioctl fs/ioctl.c:46 [inline] file_ioctl fs/ioctl.c:501 [inline] do_vfs_ioctl+0xae6/0x1030 fs/ioctl.c:688 ksys_ioctl+0x76/0xa0 fs/ioctl.c:705 __do_sys_ioctl fs/ioctl.c:712 [inline] __se_sys_ioctl fs/ioctl.c:710 [inline] __x64_sys_ioctl+0x6f/0xb0 fs/ioctl.c:710 do_syscall_64+0xa0/0x2e0 arch/x86/entry/common.c:293 entry_SYSCALL_64_after_hwframe+0x44/0xa9 RIP: 0033:0x45c479 Code: ad b6 fb ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 7b b6 fb ff c3 66 2e 0f 1f 84 00 00 00 00 RSP: 002b:00007fb0e9602c78 EFLAGS: 00000246 ORIG_RAX: 0000000000000010 RAX: ffffffffffffffda RBX: 00007fb0e96036d4 RCX: 000000000045c479 RDX: 0000000020000040 RSI: 0000000000000001 RDI: 0000000000000003 RBP: 000000000076bfc0 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000246 R12: 00000000ffffffff R13: 000000000000046d R14: 00000000004c6e1a R15: 000000000076bfcc Allocated by task 12577: set_track mm/kasan/kasan.c:460 [inline] kasan_kmalloc mm/kasan/kasan.c:553 [inline] kasan_kmalloc+0xbf/0xe0 mm/kasan/kasan.c:531 __kmalloc+0xf3/0x1e0 mm/slub.c:3749 kmalloc include/linux/slab.h:520 [inline] load_elf_phdrs+0x118/0x1b0 fs/binfmt_elf.c:441 load_elf_binary+0x2de/0x4610 fs/binfmt_elf.c:737 search_binary_handler fs/exec.c:1654 [inline] search_binary_handler+0x15c/0x4e0 fs/exec.c:1632 exec_binprm fs/exec.c:1696 [inline] __do_execve_file.isra.0+0xf52/0x1a90 fs/exec.c:1820 do_execveat_common fs/exec.c:1866 [inline] do_execve fs/exec.c:1883 [inline] __do_sys_execve fs/exec.c:1964 [inline] __se_sys_execve fs/exec.c:1959 [inline] __x64_sys_execve+0x8a/0xb0 fs/exec.c:1959 do_syscall_64+0xa0/0x2e0 arch/x86/entry/common.c:293 entry_SYSCALL_64_after_hwframe+0x44/0xa9 Freed by task 12577: set_track mm/kasan/kasan.c:460 [inline] __kasan_slab_free+0x129/0x170 mm/kasan/kasan.c:521 slab_free_hook mm/slub.c:1370 [inline] slab_free_freelist_hook mm/slub.c:1397 [inline] slab_free mm/slub.c:2952 [inline] kfree+0x8b/0x1a0 mm/slub.c:3904 load_elf_binary+0x1be7/0x4610 fs/binfmt_elf.c:1118 search_binary_handler fs/exec.c:1654 [inline] search_binary_handler+0x15c/0x4e0 fs/exec.c:1632 exec_binprm fs/exec.c:1696 [inline] __do_execve_file.isra.0+0xf52/0x1a90 fs/exec.c:1820 do_execveat_common fs/exec.c:1866 [inline] do_execve fs/exec.c:1883 [inline] __do_sys_execve fs/exec.c:1964 [inline] __se_sys_execve fs/exec.c:1959 [inline] __x64_sys_execve+0x8a/0xb0 fs/exec.c:1959 do_syscall_64+0xa0/0x2e0 arch/x86/entry/common.c:293 entry_SYSCALL_64_after_hwframe+0x44/0xa9 The buggy address belongs to the object at ffff88803b8ccf00 which belongs to the cache kmalloc-512 of size 512 The buggy address is located 259 bytes inside of 512-byte region [ffff88803b8ccf00, ffff88803b8cd100) The buggy address belongs to the page: page:ffffea0000ee3300 count:1 mapcount:0 mapping:ffff88806cc03080 index:0xffff88803b8cc780 compound_mapcount: 0 flags: 0x100000000008100(slab|head) raw: 0100000000008100 ffffea0001104080 0000000200000002 ffff88806cc03080 raw: ffff88803b8cc780 00000000800c000b 00000001ffffffff 0000000000000000 page dumped because: kasan: bad access detected Memory state around the buggy address: ffff88803b8ccf00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb ffff88803b8ccf80: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb >ffff88803b8cd000: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb ^ ffff88803b8cd080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb ffff88803b8cd100: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc You can refer to "https://www.lkml.org/lkml/2019/1/17/474" reproduce this error. The exception code is "bd_len = p[3];", "p" value is ffff88803b8cd000 which belongs to the cache kmalloc-512 of size 512. The "page_address(sg_page(scsi_sglist(scmd)))" maybe from sg_scsi_ioctl function "buffer" which allocated by kzalloc, so "buffer" may not page aligned. This also looks completely buggy on highmem systems and really needs to use a kmap_atomic. --Christoph Hellwig To address above bugs, Paolo Bonzini advise to simpler to just make a char array of size CACHE_MPAGE_LEN+8+8+4-2(or just 64 to make it easy), use sg_copy_to_buffer to copy from the sglist into the buffer, and workthere. Signed-off-by: Ye Bin <yebin10@huawei.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-06-04sata_rcar: handle pm_runtime_get_sync failure casesNavid Emamdoost
Calling pm_runtime_get_sync increments the counter even in case of failure, causing incorrect ref count. Call pm_runtime_put if pm_runtime_get_sync fails. Signed-off-by: Navid Emamdoost <navid.emamdoost@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-06-04Merge tag 'riscv-for-linus-5.8-mw0' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux Pull RISC-V updates from Palmer Dabbelt: - The remainder of the code necessary to support the Kendryte K210: * Support for building device trees into the kernel, as the K210 doesn't have a bootloader that provides one * A K210 device tree and the associated defconfig update * Support for skipping PMP initialization on systems that trap on PMP accesses rather than treating them as WARL - Support for KGDB - Improvements to text patching - Some cleanups to the SiFive L2 cache driver * tag 'riscv-for-linus-5.8-mw0' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux: soc: sifive: l2 cache: Mark l2_get_priv_group as static soc: sifive: l2 cache: Eliminate an unsigned zero compare warning riscv: Add support to determine no. of L2 cache way enabled riscv: cacheinfo: Implement cache_get_priv_group with a generic ops structure riscv: Use text_mutex instead of patch_lock riscv: Use NOKPROBE_SYMBOL() instead of __krpobes annotation riscv: Remove the 'riscv_' prefix of function name riscv: Add SW single-step support for KDB riscv: Use the XML target descriptions to report 3 system registers riscv: Add KGDB support kgdb: Add kgdb_has_hit_break function RISC-V: Skip setting up PMPs on traps riscv: K210: Update defconfig riscv: K210: Add a built-in device tree riscv: Allow device trees to be built into the kernel
2020-06-04loop: Fix wrong masking of status flagsMartijn Coenen
In faf1d25440d6, loop_set_status() now assigns lo_status directly from the passed in lo_flags, but then fixes it up by masking out flags that can't be set by LOOP_SET_STATUS; unfortunately the mask was negated. Re-ran all ltp ioctl_loop tests, and they all passed. Pass run of the previously failing one: tst_test.c:1247: INFO: Timeout per run is 0h 05m 00s tst_device.c:88: INFO: Found free device 0 '/dev/loop0' ioctl_loop01.c:49: PASS: /sys/block/loop0/loop/partscan = 0 ioctl_loop01.c:50: PASS: /sys/block/loop0/loop/autoclear = 0 ioctl_loop01.c:51: PASS: /sys/block/loop0/loop/backing_file = '/tmp/ZRJ6H4/test.img' ioctl_loop01.c:65: PASS: get expected lo_flag 12 ioctl_loop01.c:67: PASS: /sys/block/loop0/loop/partscan = 1 ioctl_loop01.c:68: PASS: /sys/block/loop0/loop/autoclear = 1 ioctl_loop01.c:77: PASS: access /dev/loop0p1 succeeds ioctl_loop01.c:83: PASS: access /sys/block/loop0/loop0p1 succeeds Summary: passed 8 failed 0 skipped 0 warnings 0 Fixes: faf1d25440d6 ("loop: Clean up LOOP_SET_STATUS lo_flags handling") Reported-by: Naresh Kamboju <naresh.kamboju@linaro.org> Signed-off-by: Martijn Coenen <maco@android.com> Tested-by: Naresh Kamboju <naresh.kamboju@linaro.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-06-04Merge tag 'devicetree-for-5.8' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux Pull devicetree updates from Rob Herring: - Convert various DT (non-binding) doc files to ReST - Various improvements to device link code - Fix __of_attach_node_sysfs refcounting bug - Add support for 'memory-region-names' with reserved-memory binding - Vendor prefixes for Protonic Holland, BeagleBoard.org, Alps, Check Point, Würth Elektronik, U-Boot, Vaisala, Baikal Electronics, Shanghai Awinic Technology Co., MikroTik, Silex Insight - A bunch more binding conversions to DT schema. Only 3K to go. - Add a minimum version check for schema tools - Treewide dropping of 'allOf' usage with schema references. Not needed in new json-schema spec. - Some formatting clean-ups of schemas * tag 'devicetree-for-5.8' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux: (194 commits) dt-bindings: clock: Add documentation for X1830 bindings. dt-bindings: mailbox: Convert imx mu to json-schema dt-bindings: power: Convert imx gpcv2 to json-schema dt-bindings: power: Convert imx gpc to json-schema dt-bindings: Merge gpio-usb-b-connector with usb-connector dt-bindings: timer: renesas: cmt: Convert to json-schema dt-bindings: clock: Convert i.MX8QXP LPCG to json-schema dt-bindings: timer: Convert i.MX GPT to json-schema dt-bindings: thermal: rcar-thermal: Add device tree support for r8a7742 dt-bindings: serial: Add binding for UART pin swap dt-bindings: geni-se: Add interconnect binding for GENI QUP dt-bindings: geni-se: Convert QUP geni-se bindings to YAML dt-bindings: vendor-prefixes: Add Silex Insight vendor prefix dt-bindings: input: touchscreen: edt-ft5x06: change reg property dt-bindings: usb: qcom,dwc3: Introduce interconnect properties for Qualcomm DWC3 driver dt-bindings: timer: renesas: mtu2: Convert to json-schema of/fdt: Remove redundant kbasename function call dt-bindings: clock: Convert i.MX1 clock to json-schema dt-bindings: clock: Convert i.MX21 clock to json-schema dt-bindings: clock: Convert i.MX25 clock to json-schema ...
2020-06-04Merge tag 'arm-drivers-5.8' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc Pull ARM/SoC driver updates from Arnd Bergmann: "These are updates to SoC specific drivers that did not have another subsystem maintainer tree to go through for some reason: - Some bus and memory drivers for the MIPS P5600 based Baikal-T1 SoC that is getting added through the MIPS tree. - There are new soc_device identification drivers for TI K3, Qualcomm MSM8939 - New reset controller drivers for NXP i.MX8MP, Renesas RZ/G1H, and Hisilicon hi6220 - The SCMI firmware interface can now work across ARM SMC/HVC as a transport. - Mediatek platforms now use a new driver for their "MMSYS" hardware block that controls clocks and some other aspects in behalf of the media and gpu drivers. - Some Tegra processors have improved power management support, including getting woken up by the PMIC and cluster power down during idle. - A new v4l staging driver for Tegra is added. - Cleanups and minor bugfixes for TI, NXP, Hisilicon, Mediatek, and Tegra" * tag 'arm-drivers-5.8' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc: (155 commits) clk: sprd: fix compile-testing bus: bt1-axi: Build the driver into the kernel bus: bt1-apb: Build the driver into the kernel bus: bt1-axi: Use sysfs_streq instead of strncmp bus: bt1-axi: Optimize the return points in the driver bus: bt1-apb: Use sysfs_streq instead of strncmp bus: bt1-apb: Use PTR_ERR_OR_ZERO to return from request-regs method bus: bt1-apb: Fix show/store callback identations bus: bt1-apb: Include linux/io.h dt-bindings: memory: Add Baikal-T1 L2-cache Control Block binding memory: Add Baikal-T1 L2-cache Control Block driver bus: Add Baikal-T1 APB-bus driver bus: Add Baikal-T1 AXI-bus driver dt-bindings: bus: Add Baikal-T1 APB-bus binding dt-bindings: bus: Add Baikal-T1 AXI-bus binding staging: tegra-video: fix V4L2 dependency tee: fix crypto select drivers: soc: ti: knav_qmss_queue: Make knav_gp_range_ops static soc: ti: add k3 platforms chipid module driver dt-bindings: soc: ti: add binding for k3 platforms chipid module ...
2020-06-04Merge tag 'arm-soc-5.8' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/socLinus Torvalds
Pull ARM SoC updates from Arnd Bergmann: "One new platform gets added, the Realtek RTD1195, which is an older Cortex-a7 based relative of the RTD12xx chips that are already supported in arch/arm64. The platform may also be extended to support running 32-bit kernels on those 64-bit chips for memory-constrained machines. In the Renesas shmobile platform, we gain support for "RZ/G1H" or R8A7742, an eight-core chip based on Cortex-A15 and Cortex-A7 cores, originally released in 2016 as one of the last high-end 32-bit designs. There is ongoing cleanup for the integrator, tegra, imx, and omap2 platforms, with integrator getting very close to the goal of having zero code in arch/arm/, and omap2 moving more of the chip specifics from old board code into device tree files. The Versatile Express platform is made more modular, with built-in drivers now becoming loadable modules. This is part of a greater effort for the Android OS to have a common kernel binary for all platforms and any platform specific code in loadable modules. The PXA platform drops support for Compulab's pxa2xx boards that had rather unusual flash and PCI drivers but no known users remaining. All device drivers specific to those boards can now get removed as well. Across platforms, there is ongoing cleanup, with Geert and Rob revisiting some a lot of Kconfig options" * tag 'arm-soc-5.8' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc: (94 commits) ARM: omap2: fix omap5_realtime_timer_init definition ARM: zynq: Don't select CONFIG_ICST ARM: OMAP2+: Fix regression for using local timer on non-SMP SoCs clk: versatile: Fix kconfig dependency on COMMON_CLK_VERSATILE ARM: davinci: fix build failure without I2C power: reset: vexpress: fix build issue power: vexpress: cleanup: use builtin_platform_driver power: vexpress: add suppress_bind_attrs to true Revert "ARM: vexpress: Don't select VEXPRESS_CONFIG" MAINTAINERS: pxa: remove Compulab arm/pxa support ARM: pxa: remove Compulab pxa2xx boards bus: arm-integrator-lm: Fix return value check in integrator_ap_lm_probe() soc: imx: move cpu code to drivers/soc/imx ARM: imx: move cpu definitions into a header ARM: imx: use device_initcall for imx_soc_device_init ARM: imx: pcm037: make pcm970_sja1000_platform_data static bus: ti-sysc: Timers no longer need legacy quirk handling ARM: OMAP2+: Drop old timer code for dmtimer and 32k counter ARM: dts: Configure system timers for omap2 ARM: dts: Configure system timers for ti81xx ...
2020-06-04Merge branch 'akpm' (patches from Andrew)Linus Torvalds
Merge yet more updates from Andrew Morton: - More MM work. 100ish more to go. Mike Rapoport's "mm: remove __ARCH_HAS_5LEVEL_HACK" series should fix the current ppc issue - Various other little subsystems * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (127 commits) lib/ubsan.c: fix gcc-10 warnings tools/testing/selftests/vm: remove duplicate headers selftests: vm: pkeys: fix multilib builds for x86 selftests: vm: pkeys: use the correct page size on powerpc selftests/vm/pkeys: override access right definitions on powerpc selftests/vm/pkeys: test correct behaviour of pkey-0 selftests/vm/pkeys: introduce a sub-page allocator selftests/vm/pkeys: detect write violation on a mapped access-denied-key page selftests/vm/pkeys: associate key on a mapped page and detect write violation selftests/vm/pkeys: associate key on a mapped page and detect access violation selftests/vm/pkeys: improve checks to determine pkey support selftests/vm/pkeys: fix assertion in test_pkey_alloc_exhaust() selftests/vm/pkeys: fix number of reserved powerpc pkeys selftests/vm/pkeys: introduce powerpc support selftests/vm/pkeys: introduce generic pkey abstractions selftests: vm: pkeys: use the correct huge page size selftests/vm/pkeys: fix alloc_random_pkey() to make it really random selftests/vm/pkeys: fix assertion in pkey_disable_set/clear() selftests/vm/pkeys: fix pkey_disable_clear() selftests: vm: pkeys: add helpers for pkey bits ...
2020-06-04rapidio: convert get_user_pages() --> pin_user_pages()John Hubbard
This code was using get_user_pages_fast(), in a "Case 2" scenario (DMA/RDMA), using the categorization from [1]. That means that it's time to convert the get_user_pages_fast() + put_page() calls to pin_user_pages_fast() + unpin_user_pages() calls. There is some helpful background in [2]: basically, this is a small part of fixing a long-standing disconnect between pinning pages, and file systems' use of those pages. [1] Documentation/core-api/pin_user_pages.rst [2] "Explicit pinning of user-space pages": https://lwn.net/Articles/807108/ Signed-off-by: John Hubbard <jhubbard@nvidia.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Cc: Matt Porter <mporter@kernel.crashing.org> Cc: Alexandre Bounine <alex.bou9@gmail.com> Cc: Sumit Semwal <sumit.semwal@linaro.org> Cc: Dan Carpenter <dan.carpenter@oracle.com> Link: http://lkml.kernel.org/r/20200517235620.205225-3-jhubbard@nvidia.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-06-04rapidio: avoid data race between file operation callbacks and mport_cdev_add().Madhuparna Bhowmik
Fields of md(mport_dev) are set after cdev_device_add(). However, the file operation callbacks can be called after cdev_device_add() and therefore accesses to fields of md in the callbacks can race with the rest of the mport_cdev_add() function. One such example is INIT_LIST_HEAD(&md->portwrites) in mport_cdev_add(), the list is initialised after cdev_device_add(). This can race with list_add_tail(&pw_filter->md_node,&md->portwrites) in rio_mport_add_pw_filter() which is called by unlocked_ioctl. To avoid such data races use cdev_device_add() after initializing md. Found by Linux Driver Verification project (linuxtesting.org). Signed-off-by: Madhuparna Bhowmik <madhuparnabhowmik10@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Acked-by: Alexandre Bounine <alex.bou9@gmail.com> Cc: Matt Porter <mporter@kernel.crashing.org> Cc: Dan Carpenter <dan.carpenter@oracle.com> Cc: Mike Marshall <hubcap@omnibond.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ira Weiny <ira.weiny@intel.com> Cc: Allison Randal <allison@lohutok.net> Cc: Pavel Andrianov <andrianov@ispras.ru> Link: http://lkml.kernel.org/r/20200426112950.1803-1-madhuparnabhowmik10@gmail.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-06-04zcomp: Use ARRAY_SIZE() for backends listAndy Shevchenko
Instead of keeping NULL terminated array switch to use ARRAY_SIZE() which helps to further clean up. Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Acked-by: Minchan Kim <minchan@kernel.org> Cc: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Link: http://lkml.kernel.org/r/20200508100758.51644-1-andriy.shevchenko@linux.intel.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-06-04device-dax: add memory via add_memory_driver_managed()David Hildenbrand
Currently, when adding memory, we create entries in /sys/firmware/memmap/ as "System RAM". This will lead to kexec-tools to add that memory to the fixed-up initial memmap for a kexec kernel (loaded via kexec_load()). The memory will be considered initial System RAM by the kexec'd kernel and can no longer be reconfigured. This is not what happens during a real reboot. Let's add our memory via add_memory_driver_managed() now, so we won't create entries in /sys/firmware/memmap/ and indicate the memory as "System RAM (kmem)" in /proc/iomem. This allows everybody (especially kexec-tools) to identify that this memory is special and has to be treated differently than ordinary (hotplugged) System RAM. Before configuring the namespace: [root@localhost ~]# cat /proc/iomem ... 140000000-33fffffff : Persistent Memory 140000000-33fffffff : namespace0.0 3280000000-32ffffffff : PCI Bus 0000:00 After configuring the namespace: [root@localhost ~]# cat /proc/iomem ... 140000000-33fffffff : Persistent Memory 140000000-1481fffff : namespace0.0 148200000-33fffffff : dax0.0 3280000000-32ffffffff : PCI Bus 0000:00 After loading kmem before this change: [root@localhost ~]# cat /proc/iomem ... 140000000-33fffffff : Persistent Memory 140000000-1481fffff : namespace0.0 150000000-33fffffff : dax0.0 150000000-33fffffff : System RAM 3280000000-32ffffffff : PCI Bus 0000:00 After loading kmem after this change: [root@localhost ~]# cat /proc/iomem ... 140000000-33fffffff : Persistent Memory 140000000-1481fffff : namespace0.0 150000000-33fffffff : dax0.0 150000000-33fffffff : System RAM (kmem) 3280000000-32ffffffff : PCI Bus 0000:00 After a proper reboot: [root@localhost ~]# cat /proc/iomem ... 140000000-33fffffff : Persistent Memory 140000000-1481fffff : namespace0.0 148200000-33fffffff : dax0.0 3280000000-32ffffffff : PCI Bus 0000:00 Within the kexec kernel before this change: [root@localhost ~]# cat /proc/iomem ... 140000000-33fffffff : Persistent Memory 140000000-1481fffff : namespace0.0 150000000-33fffffff : System RAM 3280000000-32ffffffff : PCI Bus 0000:00 Within the kexec kernel after this change: [root@localhost ~]# cat /proc/iomem ... 140000000-33fffffff : Persistent Memory 140000000-1481fffff : namespace0.0 148200000-33fffffff : dax0.0 3280000000-32ffffffff : PCI Bus 0000:00 /sys/firmware/memmap/ before this change: 0000000000000000-000000000009fc00 (System RAM) 000000000009fc00-00000000000a0000 (Reserved) 00000000000f0000-0000000000100000 (Reserved) 0000000000100000-00000000bffdf000 (System RAM) 00000000bffdf000-00000000c0000000 (Reserved) 00000000feffc000-00000000ff000000 (Reserved) 00000000fffc0000-0000000100000000 (Reserved) 0000000100000000-0000000140000000 (System RAM) 0000000150000000-0000000340000000 (System RAM) /sys/firmware/memmap/ after a proper reboot: 0000000000000000-000000000009fc00 (System RAM) 000000000009fc00-00000000000a0000 (Reserved) 00000000000f0000-0000000000100000 (Reserved) 0000000000100000-00000000bffdf000 (System RAM) 00000000bffdf000-00000000c0000000 (Reserved) 00000000feffc000-00000000ff000000 (Reserved) 00000000fffc0000-0000000100000000 (Reserved) 0000000100000000-0000000140000000 (System RAM) /sys/firmware/memmap/ after this change: 0000000000000000-000000000009fc00 (System RAM) 000000000009fc00-00000000000a0000 (Reserved) 00000000000f0000-0000000000100000 (Reserved) 0000000000100000-00000000bffdf000 (System RAM) 00000000bffdf000-00000000c0000000 (Reserved) 00000000feffc000-00000000ff000000 (Reserved) 00000000fffc0000-0000000100000000 (Reserved) 0000000100000000-0000000140000000 (System RAM) kexec-tools already seem to basically ignore any System RAM that's not on top level when searching for areas to place kexec images - but also for determining crash areas to dump via kdump. Changing the resource name won't have an impact. Handle unloading of the driver after memory hotremove failed properly, by duplicating the string if necessary. Signed-off-by: David Hildenbrand <david@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Acked-by: Pankaj Gupta <pankaj.gupta.linux@gmail.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Pankaj Gupta <pankaj.gupta.linux@gmail.com> Cc: Wei Yang <richard.weiyang@gmail.com> Cc: Baoquan He <bhe@redhat.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Eric Biederman <ebiederm@xmission.com> Cc: Pavel Tatashin <pasha.tatashin@soleen.com> Cc: Dan Williams <dan.j.williams@intel.com> Link: http://lkml.kernel.org/r/20200508084217.9160-5-david@redhat.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-06-04drm: remove drm specific kmap_atomic codeIra Weiny
kmap_atomic_prot() is now exported by all architectures. Use this function rather than open coding a driver specific kmap_atomic. [arnd@arndb.de: include linux/highmem.h] Link: http://lkml.kernel.org/r/20200508220150.649044-1-arnd@arndb.de Signed-off-by: Ira Weiny <ira.weiny@intel.com> Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Christian König <christian.koenig@amd.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Andy Lutomirski <luto@kernel.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Chris Zankel <chris@zankel.net> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: Helge Deller <deller@gmx.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: "James E.J. Bottomley" <James.Bottomley@HansenPartnership.com> Cc: Max Filippov <jcmvbkbc@gmail.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/20200507150004.1423069-12-ira.weiny@intel.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-06-04usb: core: kcov: collect coverage from usb complete callbackAndrey Konovalov
This patch adds kcov_remote_start/stop() callbacks around the urb complete() callback that is executed in softirq context when dummy_hcd is in use. As the result, kcov can be used to collect coverage from those callbacks, which is used to facilitate coverage-guided fuzzing with syzkaller. Signed-off-by: Andrey Konovalov <andreyknvl@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Dmitry Vyukov <dvyukov@google.com> Cc: Alan Stern <stern@rowland.harvard.edu> Cc: Alexander Potapenko <glider@google.com> Cc: Andrey Konovalov <andreyknvl@gmail.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Marco Elver <elver@google.com> Link: http://lkml.kernel.org/r/4520671eeb604adbc2432c248b0c07fbaa5519ef.1585233617.git.andreyknvl@google.com Link: http://lkml.kernel.org/r/2821d497ac1cdc0efb5e00df30271e4a67fc8009.1584655448.git.andreyknvl@google.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-06-04dm bufio: delete unused and inefficient dm_bufio_discard_buffersMikulas Patocka
There is no user for this interface. If in future it is needed it can be reimplemented to walk the rbtree of buffers instead of doing block-by-block lookups. Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2020-06-04crypto/chtls:Fix compile error when CONFIG_IPV6 is disabledVinay Kumar Yadav
Fix compile errors,warnings when CONFIG_IPV6 is disabled and inconsistent indenting. v1->v2: - Corrected errors/warnings reported when used newer gcc version, unused array. Fixes: 6abde0b24122 ("crypto/chtls: IPv6 support for inline TLS") Signed-off-by: Vinay Kumar Yadav <vinay.yadav@chelsio.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-06-04yam: fix possible memory leak in yam_init_driverWang Hai
If register_netdev(dev) fails, free_netdev(dev) needs to be called, otherwise a memory leak will occur. Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Reported-by: Hulk Robot <hulkci@huawei.com> Signed-off-by: Wang Hai <wanghai38@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-06-04lan743x: Use correct MAC_CR configuration for 1 GBit speedRoelof Berg
Corrected the MAC_CR configuration bits for 1 GBit operation. The data sheet allows MAC_CR(2:1) to be 10 and also 11 for 1 GBit/s speed, but only 10 works correctly. Devices tested: Microchip Lan7431, fixed-phy mode Microchip Lan7430, normal phy mode Fixes: 6f197fb63850 ("lan743x: Added fixed link and RGMII support") Signed-off-by: Roelof Berg <rberg@berg-solutions.de> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-06-04net: ethernet: freescale: remove unneeded include for ucc_gethValentin Longchamp
net/sch_generic.h does not need to be included, remove it. Signed-off-by: Valentin Longchamp <valentin@longchamp.me> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-06-04r8169: fix failing WoLHeiner Kallweit
Th referenced change added an extra hw reset to rtl8169_net_suspend() what makes WoL fail on few chip versions. Therefore skip the extra reset if we're going down and WoL is enabled. In rtl_shutdown() rtl8169_hw_reset() is called by rtl8169_net_suspend() already if needed, therefore avoid issues issue by removing the extra call. The fix was tested on a system with RTL8168g. Meanwhile rtl8169_hw_reset() does more than a hw reset and should be renamed. But that's net-next material. Fixes: 8ac8e8c64b53 ("r8169: make rtl8169_down central chip quiesce function") Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-06-04net: ethernet: dwmac: Fix an error code in imx_dwmac_probe()Dan Carpenter
The code is return PTR_ERR(NULL) which is zero or success. We should return -ENOMEM instead. Fixes: 94abdad6974a5 ("net: ethernet: dwmac: add ethernet glue logic for NXP imx8 chip") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Acked-by: Fugang Duan <fugang.duan@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-06-04net: mdiobus: Disable preemption upon u64_stats updateAhmed S. Darwish
The u64_stats mechanism uses sequence counters to protect against 64-bit values tearing on 32-bit architectures. Updating u64_stats is thus a sequence counter write side critical section where preemption must be disabled. For mdiobus_stats_acct(), disable preemption upon the u64_stats update. It is called from process context through mdiobus_read() and mdiobus_write(). Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Ahmed S. Darwish <a.darwish@linutronix.de> Reviewed-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-06-04net: phy: fixed_phy: Remove unused seqcountAhmed S. Darwish
Commit bf7afb29d545 ("phy: improve safety of fixed-phy MII register reading") protected the fixed PHY status with a sequence counter. Two years later, commit d2b977939b18 ("net: phy: fixed-phy: remove fixed_phy_update_state()") removed the sequence counter's write side critical section -- neutralizing its read side retry loop. Remove the unused seqcount. Signed-off-by: Ahmed S. Darwish <a.darwish@linutronix.de> Reviewed-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-06-04net: dsa: qca8k: Fix "Unexpected gfp" kernel exceptionMichal Vokáč
Commit 7e99e3470172 ("net: dsa: remove dsa_switch_alloc helper") replaced the dsa_switch_alloc helper by devm_kzalloc in all DSA drivers. Unfortunately it introduced a typo in qca8k.c driver and wrong argument is passed to the devm_kzalloc function. This fix mitigates the following kernel exception: Unexpected gfp: 0x6 (__GFP_HIGHMEM|GFP_DMA32). Fixing up to gfp: 0x101 (GFP_DMA|__GFP_ZERO). Fix your code! CPU: 1 PID: 44 Comm: kworker/1:1 Not tainted 5.5.9-yocto-ua #1 Hardware name: Freescale i.MX6 Quad/DualLite (Device Tree) Workqueue: events deferred_probe_work_func [<c0014924>] (unwind_backtrace) from [<c00123bc>] (show_stack+0x10/0x14) [<c00123bc>] (show_stack) from [<c04c8fb4>] (dump_stack+0x90/0xa4) [<c04c8fb4>] (dump_stack) from [<c00e1b10>] (new_slab+0x20c/0x214) [<c00e1b10>] (new_slab) from [<c00e1cd0>] (___slab_alloc.constprop.0+0x1b8/0x540) [<c00e1cd0>] (___slab_alloc.constprop.0) from [<c00e2074>] (__slab_alloc.constprop.0+0x1c/0x24) [<c00e2074>] (__slab_alloc.constprop.0) from [<c00e4538>] (__kmalloc_track_caller+0x1b0/0x298) [<c00e4538>] (__kmalloc_track_caller) from [<c02cccac>] (devm_kmalloc+0x24/0x70) [<c02cccac>] (devm_kmalloc) from [<c030d888>] (qca8k_sw_probe+0x94/0x1ac) [<c030d888>] (qca8k_sw_probe) from [<c0304788>] (mdio_probe+0x30/0x54) [<c0304788>] (mdio_probe) from [<c02c93bc>] (really_probe+0x1e0/0x348) [<c02c93bc>] (really_probe) from [<c02c9884>] (driver_probe_device+0x60/0x16c) [<c02c9884>] (driver_probe_device) from [<c02c7fb0>] (bus_for_each_drv+0x70/0x94) [<c02c7fb0>] (bus_for_each_drv) from [<c02c9708>] (__device_attach+0xb4/0x11c) [<c02c9708>] (__device_attach) from [<c02c8148>] (bus_probe_device+0x84/0x8c) [<c02c8148>] (bus_probe_device) from [<c02c8cec>] (deferred_probe_work_func+0x64/0x90) [<c02c8cec>] (deferred_probe_work_func) from [<c0033c14>] (process_one_work+0x1d4/0x41c) [<c0033c14>] (process_one_work) from [<c00340a4>] (worker_thread+0x248/0x528) [<c00340a4>] (worker_thread) from [<c0039148>] (kthread+0x124/0x150) [<c0039148>] (kthread) from [<c00090d8>] (ret_from_fork+0x14/0x3c) Exception stack(0xee1b5fb0 to 0xee1b5ff8) 5fa0: 00000000 00000000 00000000 00000000 5fc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 5fe0: 00000000 00000000 00000000 00000000 00000013 00000000 qca8k 2188000.ethernet-1:0a: Using legacy PHYLIB callbacks. Please migrate to PHYLINK! qca8k 2188000.ethernet-1:0a eth2 (uninitialized): PHY [2188000.ethernet-1:01] driver [Generic PHY] qca8k 2188000.ethernet-1:0a eth1 (uninitialized): PHY [2188000.ethernet-1:02] driver [Generic PHY] Fixes: 7e99e3470172 ("net: dsa: remove dsa_switch_alloc helper") Signed-off-by: Michal Vokáč <michal.vokac@ysoft.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-06-04geneve: change from tx_error to tx_dropped on missing metadataJiri Benc
If the geneve interface is in collect_md (external) mode, it can't send any packets submitted directly to its net interface, as such packets won't have metadata attached. This is expected. However, the kernel itself sends some packets to the interface, most notably, IPv6 DAD, IPv6 multicast listener reports, etc. This is not wrong, as tunnel metadata can be specified in routing table (although technically, that has never worked for IPv6, but hopefully will be fixed eventually) and then the interface must correctly participate in IPv6 housekeeping. The problem is that any such attempt increases the tx_error counter. Just bringing up a geneve interface with IPv6 enabled is enough to see a number of tx_errors. That causes confusion among users, prompting them to find a network error where there is none. Change the counter used to tx_dropped. That better conveys the meaning (there's nothing wrong going on, just some packets are getting dropped) and hopefully will make admins panic less. Signed-off-by: Jiri Benc <jbenc@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-06-04net: ena: xdp: update napi budget for DROP and ABORTEDSameeh Jubran
This patch fixes two issues with XDP: 1. If the XDP verdict is XDP_ABORTED we break the loop, which results in us handling one buffer per napi cycle instead of the total budget (usually 64). To overcome this simply change the xdp_verdict check to != XDP_PASS. When the verdict is XDP_PASS, the skb is not expected to be NULL. 2. Update the residual budget for XDP_DROP and XDP_ABORTED, since packets are handled in these cases. Fixes: 548c4940b9f1 ("net: ena: Implement XDP_TX action") Signed-off-by: Sameeh Jubran <sameehj@amazon.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-06-04net: ena: xdp: XDP_TX: fix memory leakSameeh Jubran
When sending very high packet rate, the XDP tx queues can get full and start dropping packets. In this case we don't free the pages which results in ena driver draining the system memory. Fix: Simply free the pages when necessary. Fixes: 548c4940b9f1 ("net: ena: Implement XDP_TX action") Signed-off-by: Sameeh Jubran <sameehj@amazon.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-06-04crypto/chcr: error seen if CONFIG_CHELSIO_TLS_DEVICE isn't setRohit Maheshwari
cxgb4_uld_in_use() is used only by cxgb4_ktls_det_feature() which is under CONFIG_CHELSIO_TLS_DEVICE macro. Fixes: a3ac249a1ab5 ("cxgb4/chcr: Enable ktls settings at run time") Signed-off-by: Rohit Maheshwari <rohitm@chelsio.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-06-04virtio-mem: Don't rely on implicit compiler padding for requestsDavid Hildenbrand
The compiler will add padding after the last member, make that explicit. The size of a request is always 24 bytes. The size of a response always 10 bytes. Add compile-time checks. Cc: "Michael S. Tsirkin" <mst@redhat.com> Cc: Pankaj Gupta <pankaj.gupta.linux@gmail.com> Cc: teawater <teawaterz@linux.alibaba.com> Signed-off-by: David Hildenbrand <david@redhat.com> Link: https://lore.kernel.org/r/20200515101402.16597-1-david@redhat.com Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2020-06-04virtio-mem: Try to unplug the complete online memory block firstDavid Hildenbrand
Right now, we always try to unplug single subblocks when processing an online memory block. Let's try to unplug the complete online memory block first, in case it is fully plugged and the unplug request is large enough. Fallback to single subblocks in case the memory block cannot get unplugged as a whole. Cc: "Michael S. Tsirkin" <mst@redhat.com> Cc: Pankaj Gupta <pankaj.gupta.linux@gmail.com> Signed-off-by: David Hildenbrand <david@redhat.com> Link: https://lore.kernel.org/r/20200507140139.17083-16-david@redhat.com Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2020-06-04virtio-mem: Use -ETXTBSY as error code if the device is busyDavid Hildenbrand
Let's be able to distinguish if the device or if memory is busy. Cc: "Michael S. Tsirkin" <mst@redhat.com> Cc: Pankaj Gupta <pankaj.gupta.linux@gmail.com> Signed-off-by: David Hildenbrand <david@redhat.com> Link: https://lore.kernel.org/r/20200507140139.17083-15-david@redhat.com Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2020-06-04virtio-mem: Unplug subblocks right-to-leftDavid Hildenbrand
We unplug blocks right-to-left, let's also unplug subblocks within a block right-to-left. Cc: "Michael S. Tsirkin" <mst@redhat.com> Cc: Pankaj Gupta <pankaj.gupta.linux@gmail.com> Signed-off-by: David Hildenbrand <david@redhat.com> Link: https://lore.kernel.org/r/20200507140139.17083-14-david@redhat.com Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2020-06-04virtio-mem: Drop manual check for already present memoryDavid Hildenbrand
Registering our parent resource will fail if any memory is still present (e.g., because somebody unloaded the driver and tries to reload it). No need for the manual check. Move our "unplug all" handling to after registering the resource. Cc: "Michael S. Tsirkin" <mst@redhat.com> Cc: Pankaj Gupta <pankaj.gupta.linux@gmail.com> Signed-off-by: David Hildenbrand <david@redhat.com> Link: https://lore.kernel.org/r/20200507140139.17083-13-david@redhat.com Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2020-06-04virtio-mem: Add parent resource for all added "System RAM"David Hildenbrand
Let's add a parent resource, named after the virtio device (inspired by drivers/dax/kmem.c). This allows user space to identify which memory belongs to which virtio-mem device. With this change and two virtio-mem devices: :/# cat /proc/iomem 00000000-00000fff : Reserved 00001000-0009fbff : System RAM [...] 140000000-333ffffff : virtio0 140000000-147ffffff : System RAM 148000000-14fffffff : System RAM 150000000-157ffffff : System RAM [...] 334000000-3033ffffff : virtio1 338000000-33fffffff : System RAM 340000000-347ffffff : System RAM 348000000-34fffffff : System RAM [...] Cc: "Michael S. Tsirkin" <mst@redhat.com> Cc: Pankaj Gupta <pankaj.gupta.linux@gmail.com> Signed-off-by: David Hildenbrand <david@redhat.com> Link: https://lore.kernel.org/r/20200507140139.17083-12-david@redhat.com Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Pankaj Gupta <pankaj.gupta.linux@gmail.com>
2020-06-04virtio-mem: Better retry handlingDavid Hildenbrand
Let's start with a retry interval of 5 seconds and double the time until we reach 5 minutes, in case we keep getting errors. Reset the retry interval in case we succeeded. The two main reasons for having to retry are - The hypervisor is busy and cannot process our request - We cannot reach the desired requested_size (esp., not enough memory can get unplugged because we can't allocate any subblocks). Tested-by: Pankaj Gupta <pankaj.gupta.linux@gmail.com> Cc: "Michael S. Tsirkin" <mst@redhat.com> Cc: Jason Wang <jasowang@redhat.com> Cc: Oscar Salvador <osalvador@suse.de> Cc: Michal Hocko <mhocko@kernel.org> Cc: Igor Mammedov <imammedo@redhat.com> Cc: Dave Young <dyoung@redhat.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Pavel Tatashin <pasha.tatashin@soleen.com> Cc: Stefan Hajnoczi <stefanha@redhat.com> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: David Hildenbrand <david@redhat.com> Link: https://lore.kernel.org/r/20200507140139.17083-11-david@redhat.com Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2020-06-04virtio-mem: Offline and remove completely unplugged memory blocksDavid Hildenbrand
Let's offline+remove memory blocks once all subblocks are unplugged. We can use the new Linux MM interface for that. As no memory is in use anymore, this shouldn't take a long time and shouldn't fail. There might be corner cases where the offlining could still fail (especially, if another notifier NACKs the offlining request). Acked-by: Pankaj Gupta <pankaj.gupta.linux@gmail.com> Tested-by: Pankaj Gupta <pankaj.gupta.linux@gmail.com> Cc: "Michael S. Tsirkin" <mst@redhat.com> Cc: Jason Wang <jasowang@redhat.com> Cc: Oscar Salvador <osalvador@suse.de> Cc: Michal Hocko <mhocko@kernel.org> Cc: Igor Mammedov <imammedo@redhat.com> Cc: Dave Young <dyoung@redhat.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Pavel Tatashin <pasha.tatashin@soleen.com> Cc: Stefan Hajnoczi <stefanha@redhat.com> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: David Hildenbrand <david@redhat.com> Link: https://lore.kernel.org/r/20200507140139.17083-10-david@redhat.com Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2020-06-04virtio-mem: Allow to offline partially unplugged memory blocksDavid Hildenbrand
Dropping the reference count of PageOffline() pages during MEM_GOING_ONLINE allows offlining code to skip them. However, we also have to clear PG_reserved, because PG_reserved pages get detected as unmovable right away. Take care of restoring the reference count when offlining is canceled. Clarify why we don't have to perform any action when unloading the driver. Also, let's add a warning if anybody is still holding a reference to unplugged pages when offlining. Tested-by: Pankaj Gupta <pankaj.gupta.linux@gmail.com> Cc: "Michael S. Tsirkin" <mst@redhat.com> Cc: Jason Wang <jasowang@redhat.com> Cc: Oscar Salvador <osalvador@suse.de> Cc: Michal Hocko <mhocko@kernel.org> Cc: Igor Mammedov <imammedo@redhat.com> Cc: Dave Young <dyoung@redhat.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Pavel Tatashin <pasha.tatashin@soleen.com> Cc: Stefan Hajnoczi <stefanha@redhat.com> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: David Hildenbrand <david@redhat.com> Link: https://lore.kernel.org/r/20200507140139.17083-8-david@redhat.com Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2020-06-04virtio-mem: Paravirtualized memory hotunplug part 2David Hildenbrand
We also want to unplug online memory (contained in online memory blocks and, therefore, managed by the buddy), and eventually replug it later. When requested to unplug memory, we use alloc_contig_range() to allocate subblocks in online memory blocks (so we are the owner) and send them to our hypervisor. When requested to plug memory, we can replug such memory using free_contig_range() after asking our hypervisor. We also want to mark all allocated pages PG_offline, so nobody will touch them. To differentiate pages that were never onlined when onlining the memory block from pages allocated via alloc_contig_range(), we use PageDirty(). Based on this flag, virtio_mem_fake_online() can either online the pages for the first time or use free_contig_range(). It is worth noting that there are no guarantees on how much memory can actually get unplugged again. All device memory might completely be fragmented with unmovable data, such that no subblock can get unplugged. We are not touching the ZONE_MOVABLE. If memory is onlined to the ZONE_MOVABLE, it can only get unplugged after that memory was offlined manually by user space. In normal operation, virtio-mem memory is suggested to be onlined to ZONE_NORMAL. In the future, we will try to make unplug more likely to succeed. Add a module parameter to control if online memory shall be touched. As we want to access alloc_contig_range()/free_contig_range() from kernel module context, export the symbols. Note: Whenever virtio-mem uses alloc_contig_range(), all affected pages are on the same node, in the same zone, and contain no holes. Acked-by: Michal Hocko <mhocko@suse.com> # to export contig range allocator API Tested-by: Pankaj Gupta <pankaj.gupta.linux@gmail.com> Cc: "Michael S. Tsirkin" <mst@redhat.com> Cc: Jason Wang <jasowang@redhat.com> Cc: Oscar Salvador <osalvador@suse.de> Cc: Michal Hocko <mhocko@kernel.org> Cc: Igor Mammedov <imammedo@redhat.com> Cc: Dave Young <dyoung@redhat.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Pavel Tatashin <pasha.tatashin@soleen.com> Cc: Stefan Hajnoczi <stefanha@redhat.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Mel Gorman <mgorman@techsingularity.net> Cc: Mike Rapoport <rppt@linux.ibm.com> Cc: Alexander Duyck <alexander.h.duyck@linux.intel.com> Cc: Alexander Potapenko <glider@google.com> Signed-off-by: David Hildenbrand <david@redhat.com> Link: https://lore.kernel.org/r/20200507140139.17083-6-david@redhat.com Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2020-06-04virtio-mem: Paravirtualized memory hotunplug part 1David Hildenbrand
Unplugging subblocks of memory blocks that are offline is easy. All we have to do is watch out for concurrent onlining activity. Tested-by: Pankaj Gupta <pankaj.gupta.linux@gmail.com> Cc: "Michael S. Tsirkin" <mst@redhat.com> Cc: Jason Wang <jasowang@redhat.com> Cc: Oscar Salvador <osalvador@suse.de> Cc: Michal Hocko <mhocko@kernel.org> Cc: Igor Mammedov <imammedo@redhat.com> Cc: Dave Young <dyoung@redhat.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Pavel Tatashin <pasha.tatashin@soleen.com> Cc: Stefan Hajnoczi <stefanha@redhat.com> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: David Hildenbrand <david@redhat.com> Link: https://lore.kernel.org/r/20200507140139.17083-5-david@redhat.com Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2020-06-04virtio-mem: Allow to specify an ACPI PXM as nidDavid Hildenbrand
We want to allow to specify (similar as for a DIMM), to which node a virtio-mem device (and, therefore, its memory) belongs. Add a new virtio-mem feature flag and export pxm_to_node, so it can be used in kernel module context. Acked-by: Michal Hocko <mhocko@suse.com> # for the export Acked-by: "Rafael J. Wysocki" <rafael@kernel.org> # for the export Acked-by: Pankaj Gupta <pankaj.gupta.linux@gmail.com> Tested-by: Pankaj Gupta <pankaj.gupta.linux@gmail.com> Cc: "Michael S. Tsirkin" <mst@redhat.com> Cc: Jason Wang <jasowang@redhat.com> Cc: Oscar Salvador <osalvador@suse.de> Cc: Michal Hocko <mhocko@kernel.org> Cc: Igor Mammedov <imammedo@redhat.com> Cc: Dave Young <dyoung@redhat.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Pavel Tatashin <pasha.tatashin@soleen.com> Cc: Stefan Hajnoczi <stefanha@redhat.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Len Brown <lenb@kernel.org> Cc: linux-acpi@vger.kernel.org Signed-off-by: David Hildenbrand <david@redhat.com> Link: https://lore.kernel.org/r/20200507140139.17083-4-david@redhat.com Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2020-06-04virtio-mem: Paravirtualized memory hotplugDavid Hildenbrand
Each virtio-mem device owns exactly one memory region. It is responsible for adding/removing memory from that memory region on request. When the device driver starts up, the requested amount of memory is queried and then plugged to Linux. On request, further memory can be plugged or unplugged. This patch only implements the plugging part. On x86-64, memory can currently be plugged in 4MB ("subblock") granularity. When required, a new memory block will be added (e.g., usually 128MB on x86-64) in order to plug more subblocks. Only x86-64 was tested for now. The online_page callback is used to keep unplugged subblocks offline when onlining memory - similar to the Hyper-V balloon driver. Unplugged pages are marked PG_offline, to tell dump tools (e.g., makedumpfile) to skip them. User space is usually responsible for onlining the added memory. The memory hotplug notifier is used to synchronize virtio-mem activity against memory onlining/offlining. Each virtio-mem device can belong to a NUMA node, which allows us to easily add/remove small chunks of memory to/from a specific NUMA node by using multiple virtio-mem devices. Something that works even when the guest has no idea about the NUMA topology. One way to view virtio-mem is as a "resizable DIMM" or a DIMM with many "sub-DIMMS". This patch directly introduces the basic infrastructure to implement memory unplug. Especially the memory block states and subblock bitmaps will be heavily used there. Notes: - In case memory is to be onlined by user space, we limit the amount of offline memory blocks, to not run out of memory. This is esp. an issue if memory is added faster than it is getting onlined. - Suspend/Hibernate is not supported due to the way virtio-mem devices behave. Limited support might be possible in the future. - Reloading the device driver is not supported. Reviewed-by: Pankaj Gupta <pankaj.gupta.linux@gmail.com> Tested-by: Pankaj Gupta <pankaj.gupta.linux@gmail.com> Cc: "Michael S. Tsirkin" <mst@redhat.com> Cc: Jason Wang <jasowang@redhat.com> Cc: Oscar Salvador <osalvador@suse.de> Cc: Michal Hocko <mhocko@kernel.org> Cc: Igor Mammedov <imammedo@redhat.com> Cc: Dave Young <dyoung@redhat.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Pavel Tatashin <pasha.tatashin@soleen.com> Cc: Stefan Hajnoczi <stefanha@redhat.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net> Cc: Len Brown <lenb@kernel.org> Cc: linux-acpi@vger.kernel.org Signed-off-by: David Hildenbrand <david@redhat.com> Link: https://lore.kernel.org/r/20200507140139.17083-2-david@redhat.com Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2020-06-04vdpasim: Fix some coccinelle warningsSamuel Zou
Fix below warnings reported by coccicheck: drivers/vdpa/vdpa_sim/vdpa_sim.c:104:1-10: WARNING: Assignment of 0/1 to bool variable drivers/vdpa/vdpa_sim/vdpa_sim.c:164:7-11: WARNING: Unsigned expression compared with zero: read <= 0 drivers/vdpa/vdpa_sim/vdpa_sim.c:169:7-12: WARNING: Unsigned expression compared with zero: write <= 0 1. The 'ready' variable in vdpasim_virtqueue struct is bool type. It is better to initialize vq->ready to false 2. Modify 'read' and 'write' variables type from size_t to ssize_t. And preserve the reverse christmas tree ordering of local variables. Fixes: 2c53d0f64c06 ("vdpasim: vDPA device simulator") Reported-by: Hulk Robot <hulkci@huawei.com> Signed-off-by: Samuel Zou <zou_wei@huawei.com> Link: https://lore.kernel.org/r/1588990802-28451-1-git-send-email-zou_wei@huawei.com Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2020-06-04ifcvf: move IRQ request/free to status change handlersZhu Lingshan
This commit move IRQ request and free operations from probe() to VIRTIO status change handler to comply with VIRTIO spec. VIRTIO spec 1.1, section 2.1.2 Device Requirements: Device Status Field The device MUST NOT consume buffers or send any used buffer notifications to the driver before DRIVER_OK. Signed-off-by: Zhu Lingshan <lingshan.zhu@intel.com> Link: https://lore.kernel.org/r/1589270444-3669-1-git-send-email-lingshan.zhu@intel.com Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Jason Wang <jasowang@redhat.com>
2020-06-04vhost: (cosmetic) remove a superfluous variable initialisationGuennadi Liakhovetski
Even the compiler is able to figure out that in this case the initialisation is superfluous. Signed-off-by: Guennadi Liakhovetski <guennadi.liakhovetski@linux.intel.com> Link: https://lore.kernel.org/r/20200527180541.5570-3-guennadi.liakhovetski@linux.intel.com Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2020-06-04crypto: virtio: Fix dest length calculation in __virtio_crypto_skcipher_do_req()Longpeng(Mike)
The src/dst length is not aligned with AES_BLOCK_SIZE(which is 16) in some testcases in tcrypto.ko. For example, the src/dst length of one of cts(cbc(aes))'s testcase is 17, the crypto_virtio driver will set @src_data_len=16 but @dst_data_len=17 in this case and get a wrong at then end. SRC: pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp (17 bytes) EXP: cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc pp (17 bytes) DST: cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc 00 (pollute the last bytes) (pp: plaintext cc:ciphertext) Fix this issue by limit the length of dest buffer. Fixes: dbaf0624ffa5 ("crypto: add virtio-crypto driver") Cc: Gonglei <arei.gonglei@huawei.com> Cc: Herbert Xu <herbert@gondor.apana.org.au> Cc: "Michael S. Tsirkin" <mst@redhat.com> Cc: Jason Wang <jasowang@redhat.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: virtualization@lists.linux-foundation.org Cc: linux-kernel@vger.kernel.org Cc: stable@vger.kernel.org Signed-off-by: Longpeng(Mike) <longpeng2@huawei.com> Link: https://lore.kernel.org/r/20200602070501.2023-4-longpeng2@huawei.com Signed-off-by: Michael S. Tsirkin <mst@redhat.com>