summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2020-10-11um: vector: Use GFP_ATOMIC under spin lockTiezhu Yang
Use GFP_ATOMIC instead of GFP_KERNEL under spin lock to fix possible sleep-in-atomic-context bugs. Fixes: 9807019a62dc ("um: Loadable BPF "Firmware" for vector drivers") Signed-off-by: Tiezhu Yang <yangtiezhu@loongson.cn> Acked-By: Anton Ivanov <anton.ivanov@cambridgegreys.com> Signed-off-by: Richard Weinberger <richard@nod.at>
2020-10-11um: Fix null pointer dereference in vector_user_bpfGaurav Singh
The bpf_prog is being checked for !NULL after uml_kmalloc but later its used directly for example: bpf_prog->filter = bpf and is also later returned upon success. Fix this, do a NULL check and return right away. Signed-off-by: Gaurav Singh <gaurav1086@gmail.com> Acked-By: Anton Ivanov <anton.ivanov@cambridgegreys.com> Signed-off-by: Richard Weinberger <richard@nod.at>
2020-10-11Merge tag 'cfi/for-5.10' of ↵Richard Weinberger
gitolite.kernel.org:pub/scm/linux/kernel/git/mtd/linux into mtd/next HyperBus changes * DMA support for TI's AM654 HyperBus controller driver. * HyperBus frontend driver for Renesas RPC-IF driver.
2020-10-11Merge tag 'spi-nor/for-5.10' of ↵Richard Weinberger
git://git.kernel.org/pub/scm/linux/kernel/git/mtd/linux into mtd/next SPI NOR core changes: - Support for Winbond w25q64jwm flash - Enable 4K sector support for mx25l12805d SPI NOR controller drivers changes: - intel-spi: - Add Alder Lake-S PCI ID
2020-10-11Merge tag 'nand/for-5.10' of ↵Richard Weinberger
git://git.kernel.org/pub/scm/linux/kernel/git/mtd/linux into mtd/next NAND core changes: * Use the new generic ECC object * Create helpers to set/extract the ECC requirements * Create a helper to extract the ECC configuration * Add a NAND page I/O request type * Introduce the ECC engine framework Raw NAND core changes: * Don't overwrite the error code from nand_set_ecc_soft_ops() * Introduce nand_set_ecc_on_host_ops() * Use the NAND framework user_conf object for ECC flags * Use the ECC framework user input parsing bits * Use the ECC framework nand_ecc_is_strong_enough() helper * Use the ECC framework OOB layouts * Make use of the ECC framework * Use nanddev_get/set_ecc_requirements() when relevant * Use the new ECC engine type enumeration * Separate the ECC engine type and the ECC byte placement * Move the nand_ecc_algo enum to the generic NAND layer * Rename the ECC algorithm enumeration items * Add a kernel doc to the ECC algorithm enumeration * DT bindings: - Document boolean NAND ECC properties - Document nand-ecc-engine - Document nand-ecc-placement Raw NAND drivers changes: * Ams-Delta: Fix non-OF build warning * Atmel: - Check return values for nand_read_data_op - Simplify with dev_err_probe() - Get rid of the legacy interface implementation - Convert the driver to exec_op() - Use nand_prog_page_end_op() - Use nand_{write,read}_data_op() - Drop redundant nand_read_page_op() - Enable the NFC controller at probe time - Disable clk on error handling path in probe * Cadence: remove a redundant dev_err call * Gpmi: - Simplify with dev_err_probe() * Marvell: - Fix and update kerneldoc - Simplify with dev_err_probe() - Fix and update kerneldoc - Simplify with dev_err_probe() - Support panic_write for mtdoops * Onenand: - Simplify the return expression of onenand_transfer_auto_oob - Simplify with dev_err_probe() * Oxnas: cleanup/simplify code * Pasemi: Make pasemi_device_ready() static * Qcom: Simplify with dev_err_probe() * Stm32_fmc2: fix a buffer overflow * Vf610: Remove unused function vf610_nfc_transfer_size() SPI-NAND changes: * Use nanddev_get_ecc_conf() when relevant * Gigadevice: - Add support for GD5F4GQ4xC - Add QE Bit - Use only one dummy byte in QUADIO * Macronix: - Add support for MX31UF1GE4BC - Add support for MX31LF1GE4BC
2020-10-11ubifs: mount_ubifs: Release authentication resource in error handling pathZhihao Cheng
Release the authentication related resource in some error handling branches in mount_ubifs(). Signed-off-by: Zhihao Cheng <chengzhihao1@huawei.com> Cc: <stable@vger.kernel.org> # 4.20+ Fixes: d8a22773a12c6d7 ("ubifs: Enable authentication support") Reviewed-by: Sascha Hauer <s.hauer@pengutronix.de> Signed-off-by: Richard Weinberger <richard@nod.at>
2020-10-11ubifs: Don't parse authentication mount options in remount processZhihao Cheng
There is no need to dump authentication options while remounting, because authentication initialization can only be doing once in the first mount process. Dumping authentication mount options in remount process may cause memory leak if UBIFS has already been mounted with old authentication mount options. Signed-off-by: Zhihao Cheng <chengzhihao1@huawei.com> Cc: <stable@vger.kernel.org> # 4.20+ Fixes: d8a22773a12c6d7 ("ubifs: Enable authentication support") Reviewed-by: Sascha Hauer <s.hauer@pengutronix.de> Signed-off-by: Richard Weinberger <richard@nod.at>
2020-10-11ubifs: Fix a memleak after dumping authentication mount optionsZhihao Cheng
Fix a memory leak after dumping authentication mount options in error handling branch. Signed-off-by: Zhihao Cheng <chengzhihao1@huawei.com> Cc: <stable@vger.kernel.org> # 4.20+ Fixes: d8a22773a12c6d7 ("ubifs: Enable authentication support") Reviewed-by: Sascha Hauer <s.hauer@pengutronix.de> Signed-off-by: Richard Weinberger <richard@nod.at>
2020-10-11bpf: Migrate from patchwork.ozlabs.org to patchwork.kernel.org.Alexei Starovoitov
Move the bpf/bpf-next patch processing queue to patchwork.kernel.org. Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20201011200149.66537-1-alexei.starovoitov@gmail.com
2020-10-11bpf: Always return target ifindex in bpf_fib_lookupToke Høiland-Jørgensen
The bpf_fib_lookup() helper performs a neighbour lookup for the destination IP and returns BPF_FIB_LKUP_NO_NEIGH if this fails, with the expectation that the BPF program will pass the packet up the stack in this case. However, with the addition of bpf_redirect_neigh() that can be used instead to perform the neighbour lookup, at the cost of a bit of duplicated work. For that we still need the target ifindex, and since bpf_fib_lookup() already has that at the time it performs the neighbour lookup, there is really no reason why it can't just return it in any case. So let's just always return the ifindex if the FIB lookup itself succeeds. Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Cc: David Ahern <dsahern@gmail.com> Link: https://lore.kernel.org/bpf/20201009184234.134214-1-toke@redhat.com
2020-10-11Merge branch 'samples: bpf: Refactor XDP programs with libbpf'Alexei Starovoitov
"Daniel T. Lee" says: ==================== To avoid confusion caused by the increasing fragmentation of the BPF Loader program, this commit would like to convert the previous bpf_load loader with the libbpf loader. Thanks to libbpf's bpf_link interface, managing the tracepoint BPF program is much easier. bpf_program__attach_tracepoint manages the enable of tracepoint event and attach of BPF programs to it with a single interface bpf_link, so there is no need to manage event_fd and prog_fd separately. And due to addition of generic bpf_program__attach() to libbpf, it is now possible to attach BPF programs with __attach() instead of explicitly calling __attach_<type>(). This patchset refactors xdp_monitor with using this libbpf API, and the bpf_load is removed and migrated to libbpf. Also, attach_tracepoint() is replaced with the generic __attach() method in xdp_redirect_cpu. Moreover, maps in kern program have been converted to BTF-defined map. --- Changes in v2: - added cleanup logic for bpf_link and bpf_object in xdp_monitor - program section match with bpf_program__is_<type> instead of strncmp - revert BTF key/val type to default of BPF_MAP_TYPE_PERF_EVENT_ARRAY - split increment into seperate satement - refactor pointer array initialization - error code cleanup ==================== Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2020-10-11samples: bpf: Refactor XDP kern program maps with BTF-defined mapDaniel T. Lee
Most of the samples were converted to use the new BTF-defined MAP as they moved to libbpf, but some of the samples were missing. Instead of using the previous BPF MAP definition, this commit refactors xdp_monitor and xdp_sample_pkts_kern MAP definition with the new BTF-defined MAP format. Also, this commit removes the max_entries attribute at PERF_EVENT_ARRAY map type. The libbpf's bpf_object__create_map() will automatically set max_entries to the maximum configured number of CPUs on the host. Signed-off-by: Daniel T. Lee <danieltimlee@gmail.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20201010181734.1109-4-danieltimlee@gmail.com
2020-10-11samples: bpf: Replace attach_tracepoint() to attach() in xdp_redirect_cpuDaniel T. Lee
>From commit d7a18ea7e8b6 ("libbpf: Add generic bpf_program__attach()"), for some BPF programs, it is now possible to attach BPF programs with __attach() instead of explicitly calling __attach_<type>(). This commit refactors the __attach_tracepoint() with libbpf's generic __attach() method. In addition, this refactors the logic of setting the map FD to simplify the code. Also, the missing removal of bpf_load.o in Makefile has been fixed. Signed-off-by: Daniel T. Lee <danieltimlee@gmail.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20201010181734.1109-3-danieltimlee@gmail.com
2020-10-11samples: bpf: Refactor xdp_monitor with libbpfDaniel T. Lee
To avoid confusion caused by the increasing fragmentation of the BPF Loader program, this commit would like to change to the libbpf loader instead of using the bpf_load. Thanks to libbpf's bpf_link interface, managing the tracepoint BPF program is much easier. bpf_program__attach_tracepoint manages the enable of tracepoint event and attach of BPF programs to it with a single interface bpf_link, so there is no need to manage event_fd and prog_fd separately. This commit refactors xdp_monitor with using this libbpf API, and the bpf_load is removed and migrated to libbpf. Signed-off-by: Daniel T. Lee <danieltimlee@gmail.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20201010181734.1109-2-danieltimlee@gmail.com
2020-10-11Merge branch 'Offload-tc-vlan-mangle-to-mscc_ocelot-switch'Jakub Kicinski
Vladimir Oltean says: ==================== Offload tc-vlan mangle to mscc_ocelot switch This series offloads one more action to the VCAP IS1 ingress TCAM, which is to change the classified VLAN for packets, according to the VCAP IS1 keys (VLAN, source MAC, source IP, EtherType, etc). ==================== Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-10-11selftests: net: mscc: ocelot: add test for VLAN modify actionVladimir Oltean
Create a test that changes a VLAN ID from 200 to 300. We also need to modify the preferences of the filters installed for the other rules so that they are unique, because we now install the "tc-vlan modify" filter in VCAP IS1 only temporarily, and we need to perform the deletion by filter preference number. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-10-11net: dsa: tag_ocelot: use VLAN information from tagging header when availableVladimir Oltean
When the Extraction Frame Header contains a valid classified VLAN, use that instead of the VLAN header present in the packet. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-10-11net: mscc: ocelot: offload VLAN mangle action to VCAP IS1Vladimir Oltean
The VCAP_IS1_ACT_VID_REPLACE_ENA action, from the VCAP IS1 ingress TCAM, changes the classified VLAN. We are only exposing this ability for switch ports that are under VLAN aware bridges. This is because in standalone ports mode and under a bridge with vlan_filtering=0, the ocelot driver configures the switch to operate as VLAN-unaware, so the classified VLAN is not derived from the 802.1Q header from the packet, but instead is always equal to the port-based VLAN ID of the ingress port. We _can_ still change the classified VLAN for packets when operating in this mode, but the end result will most likely be a drop, since both the ingress and the egress port need to be members of the modified VLAN. And even if we install the new classified VLAN into the VLAN table of the switch, the result would still not be as expected: we wouldn't see, on the output port, the modified VLAN tag, but the original one, even though the classified VLAN was indeed modified. This is because of how the hardware works: on egress, what is pushed to the frame is a "port tag", which gives us the following options: - Tag all frames with port tag (derived from the classified VLAN) - Tag all frames with port tag, except if the classified VLAN is 0 or equal to the native VLAN of the egress port - No port tag Needless to say, in VLAN-unaware mode we are disabling the port tag. Otherwise, the existing VLAN tag would be ignored, and a second VLAN tag (the port tag), holding the classified VLAN, would be pushed (instead of replacing the existing 802.1Q tag). This is definitely not what the user wanted when installing a "vlan modify" action. So it is simply not worth bothering with VLAN modify rules under other configurations except when the ports are fully VLAN-aware. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-10-11Merge branch 'akpm' (patches from Andrew)Linus Torvalds
Merge misc fixes from Andrew Morton: "Five fixes. Subsystems affected by this patch series: MAINTAINERS, mm/pagemap, mm/swap, and mm/hugetlb" * emailed patches from Andrew Morton <akpm@linux-foundation.org>: mm: khugepaged: recalculate min_free_kbytes after memory hotplug as expected by khugepaged mm: validate inode in mapping_set_error() mm: mmap: Fix general protection fault in unlink_file_vma() MAINTAINERS: Antoine Tenart's email address MAINTAINERS: change hardening mailing list
2020-10-11Merge branch 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfsLinus Torvalds
Pull vfs fix from Al Viro: "Fixes an obvious bug (memory leak introduced in 5.8)" * 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: pipe: Fix memory leaks in create_pipe_files()
2020-10-11Merge branch 'enetc-Migrate-to-PHYLINK-and-PCS_LYNX'Jakub Kicinski
Claudiu Manoil says: ==================== enetc: Migrate to PHYLINK and PCS_LYNX Transitioning the enetc driver from phylib to phylink. Offloading the serdes configuration to the PCS_LYNX module is a mandatory part of this transition. Aiming for a cleaner, more maintainable design, and better code reuse. The first 2 patches are clean up prerequisites. Tested on a p1028rdb board. v2: validate() explicitly rejects now all interface modes not supported by the driver instead of relying on the device tree to provide only supported interfaces, and dropped redundant activation of pcs_poll (addressing Ioana's findings) ==================== Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-10-11enetc: Migrate to PHYLINK and PCS_LYNXClaudiu Manoil
This is a methodical transition of the driver from phylib to phylink, following the guidelines from sfp-phylink.rst. The MAC register configurations based on interface mode were moved from the probing path to the mac_config() hook. MAC enable and disable commands (enabling Rx and Tx paths at MAC level) were also extracted and assigned to their corresponding phylink hooks. As part of the migration to phylink, the serdes configuration from the driver was offloaded to the PCS_LYNX module, introduced in commit 0da4c3d393e4 ("net: phy: add Lynx PCS module"), the PCS_LYNX module being a mandatory component required to make the enetc driver work with phylink. Signed-off-by: Claudiu Manoil <claudiu.manoil@nxp.com> Reviewed-by: Ioana Ciornei <ioana.cionei@nxp.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-10-11arm64: dts: fsl-ls1028a-rdb: Specify in-band mode for ENETC port 0Claudiu Manoil
As part of the transition of the enetc ethernet driver from phylib to phylink, the in-band operation mode of the SGMII interface from enetc port 0 needs to be specified explicitly for phylink. Signed-off-by: Claudiu Manoil <claudiu.manoil@nxp.com> Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-10-11enetc: Clean up serdes configurationClaudiu Manoil
Decouple internal mdio bus creation from serdes configuration, as a prerequisite to offloading serdes configuration to a different module. Group together mdio bus creation routines, cleanup. Signed-off-by: Claudiu Manoil <claudiu.manoil@nxp.com> Reviewed-by: Ioana Ciornei <ioana.ciornei@nxp.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-10-11enetc: Clean up MAC and link configurationClaudiu Manoil
Decouple level MAC configuration based on phy interface type from general port configuration. Group together MAC and link configuration code. Decouple external mdio bus creation from interface type parsing. No longer return an (unhandled) error code when phy_node not found, use phy_node to indicate whether the port has a phy or not. No longer fall-through when serdes configuration fails for the link modes that require internal link configuration. Signed-off-by: Claudiu Manoil <claudiu.manoil@nxp.com> Reviewed-by: Ioana Ciornei <ioana.ciornei@nxp.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-10-11Merge tag 'x86-urgent-2020-10-11' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fixes from Ingo Molnar: "Two fixes: - Fix a (hopefully final) IRQ state tracking bug vs MCE handling - Fix a documentation link" * tag 'x86-urgent-2020-10-11' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: Documentation/x86: Fix incorrect references to zero-page.txt x86/mce: Use idtentry_nmi_enter/exit()
2020-10-11Merge tag 'irqchip-5.10' of ↵Thomas Gleixner
git://git.kernel.org/pub/scm/linux/kernel/git/maz/arm-platforms into irq/core Pull irqchip updates from Marc Zyngier: Core changes: - Allow irq retriggering to follow a hierarchy - Allow interrupt hierarchies to be trimmed at allocation time - Allow interrupts to be hidden from /proc/interrupts (IPIs) - Introduce stub for set_handle_irq() when !GENERIC_IRQ_MULTI_HANDLER - New per-cpu IPI handling flow Architecture changes: - Move arm/arm64 IPI handling to the core interrupt code, removing the home brewed accounting Driver updates: - New driver for the MStar (and more recently Mediatek) platforms - New driver for the Actions Owl SIRQ controller - New driver for the TI PRUSS infrastructure - Wake-up support for the Qualcomm PDC controller - Primary interrupt controller support for the Designware APB ICTL - Convert the IPI code for GIC, GICv3, hip04, armada-270-xp and bcm2836 to using standard interrupts - Improve GICv3 pseudo-NMI support to deal with both non-secure and secure priorities on arm64 - Convert the GIC/GICv3 drivers to using HW-based irq retrigger - A sprinkling of dev_err_probe() conversion - A set of NVIDIA Tegra fixes for interrupt hierarchy corruption - A reset fix for the Loongson HTVEC driver - A couple of error handling fixes in the TI SCI drivers
2020-10-11Merge tag 'perf-urgent-2020-10-11' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull perf fix from Ingo Molnar: "Fix an error handling bug that can cause a lockup if a CPU is offline (doh ...)" * tag 'perf-urgent-2020-10-11' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: perf: Fix task_function_call() error handling
2020-10-11mm: khugepaged: recalculate min_free_kbytes after memory hotplug as expected ↵Vijay Balakrishna
by khugepaged When memory is hotplug added or removed the min_free_kbytes should be recalculated based on what is expected by khugepaged. Currently after hotplug, min_free_kbytes will be set to a lower default and higher default set when THP enabled is lost. This change restores min_free_kbytes as expected for THP consumers. [vijayb@linux.microsoft.com: v5] Link: https://lkml.kernel.org/r/1601398153-5517-1-git-send-email-vijayb@linux.microsoft.com Fixes: f000565adb77 ("thp: set recommended min free kbytes") Signed-off-by: Vijay Balakrishna <vijayb@linux.microsoft.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Pavel Tatashin <pasha.tatashin@soleen.com> Acked-by: Michal Hocko <mhocko@suse.com> Cc: Allen Pais <apais@microsoft.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Song Liu <songliubraving@fb.com> Cc: <stable@vger.kernel.org> Link: https://lkml.kernel.org/r/1600305709-2319-2-git-send-email-vijayb@linux.microsoft.com Link: https://lkml.kernel.org/r/1600204258-13683-1-git-send-email-vijayb@linux.microsoft.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-10-11mm: validate inode in mapping_set_error()Minchan Kim
The swap address_space doesn't have host. Thus, it makes kernel crash once swap write meets error. Fix it. Fixes: 735e4ae5ba28 ("vfs: track per-sb writeback errors and report them to syncfs") Signed-off-by: Minchan Kim <minchan@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Acked-by: Jeff Layton <jlayton@kernel.org> Cc: Jan Kara <jack@suse.cz> Cc: Andres Freund <andres@anarazel.de> Cc: Matthew Wilcox <willy@infradead.org> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Christoph Hellwig <hch@infradead.org> Cc: Dave Chinner <david@fromorbit.com> Cc: David Howells <dhowells@redhat.com> Cc: <stable@vger.kernel.org> Link: https://lkml.kernel.org/r/20201010000650.750063-1-minchan@kernel.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-10-11mm: mmap: Fix general protection fault in unlink_file_vma()Miaohe Lin
The syzbot reported the below general protection fault: general protection fault, probably for non-canonical address 0xe00eeaee0000003b: 0000 [#1] PREEMPT SMP KASAN KASAN: maybe wild-memory-access in range [0x00777770000001d8-0x00777770000001df] CPU: 1 PID: 10488 Comm: syz-executor721 Not tainted 5.9.0-rc3-syzkaller #0 RIP: 0010:unlink_file_vma+0x57/0xb0 mm/mmap.c:164 Call Trace: free_pgtables+0x1b3/0x2f0 mm/memory.c:415 exit_mmap+0x2c0/0x530 mm/mmap.c:3184 __mmput+0x122/0x470 kernel/fork.c:1076 mmput+0x53/0x60 kernel/fork.c:1097 exit_mm kernel/exit.c:483 [inline] do_exit+0xa8b/0x29f0 kernel/exit.c:793 do_group_exit+0x125/0x310 kernel/exit.c:903 get_signal+0x428/0x1f00 kernel/signal.c:2757 arch_do_signal+0x82/0x2520 arch/x86/kernel/signal.c:811 exit_to_user_mode_loop kernel/entry/common.c:136 [inline] exit_to_user_mode_prepare+0x1ae/0x200 kernel/entry/common.c:167 syscall_exit_to_user_mode+0x7e/0x2e0 kernel/entry/common.c:242 entry_SYSCALL_64_after_hwframe+0x44/0xa9 It's because the ->mmap() callback can change vma->vm_file and fput the original file. But the commit d70cec898324 ("mm: mmap: merge vma after call_mmap() if possible") failed to catch this case and always fput() the original file, hence add an extra fput(). [ Thanks Hillf for pointing this extra fput() out. ] Fixes: d70cec898324 ("mm: mmap: merge vma after call_mmap() if possible") Reported-by: syzbot+c5d5a51dcbb558ca0cb5@syzkaller.appspotmail.com Signed-off-by: Miaohe Lin <linmiaohe@huawei.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Cc: Christian König <ckoenig.leichtzumerken@gmail.com> Cc: Hongxiang Lou <louhongxiang@huawei.com> Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: Dave Airlie <airlied@redhat.com> Cc: Daniel Vetter <daniel@ffwll.ch> Cc: Sumit Semwal <sumit.semwal@linaro.org> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: John Hubbard <jhubbard@nvidia.com> Link: https://lkml.kernel.org/r/20200916090733.31427-1-linmiaohe@huawei.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-10-11MAINTAINERS: Antoine Tenart's email addressAntoine Tenart
Use my kernel.org address instead of my bootlin.com one. Signed-off-by: Antoine Tenart <atenart@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Link: https://lkml.kernel.org/r/20201005164533.16811-1-atenart@kernel.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-10-11MAINTAINERS: change hardening mailing listKees Cook
As more email from git history gets aimed at the OpenWall kernel-hardening@ list, there has been a desire to separate "new topics" from "on-going" work. To handle this, the superset of hardening email topics are now to be directed to linux-hardening@vger.kernel.org. Update the MAINTAINERS file and the .mailmap to accomplish this, so that linux-hardening@ can be treated like any other regular upstream kernel development list. Signed-off-by: Kees Cook <keescook@chromium.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Cc: Randy Dunlap <rdunlap@infradead.org> Cc: Emese Revfy <re.emese@gmail.com> Cc: "Tobin C. Harding" <me@tobin.cc> Cc: Tycho Andersen <tycho@tycho.pizza> Cc: Jonathan Corbet <corbet@lwn.net> Link: https://lore.kernel.org/linux-hardening/202010051443.279CC265D@keescook/ Link: https://lkml.kernel.org/r/20201006000012.2768958-1-keescook@chromium.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-10-11Merge branch 'Follow-up BPF helper improvements'Alexei Starovoitov
Daniel Borkmann says: ==================== This series addresses most of the feedback [0] that was to be followed up from the last series, that is, UAPI helper comment improvements and getting rid of the ifindex obj file hacks in the selftest by using a BPF map instead. The __sk_buff data/data_end pointer work, I'm planning to do in a later round as well as the mem*() BPF improvements we have in Cilium for libbpf. Next, the series adds two features, i) a helper called redirect_peer() to improve latency on netns switch, and ii) to allow map in map with dynamic inner array map sizes. Selftests for each are added as well. For details, please check individual patches, thanks! [0] https://lore.kernel.org/bpf/cover.1601477936.git.daniel@iogearbox.net/ v5 -> v6: - Going with Andrii's suggestion to make the misconfigured verifier test more robust, and only probe on -EOPNOTSUPP (Andrii) v4 -> v5: - Replace cnt == -EOPNOTSUPP check with cnt < 0; I've used < 0 here as I think it's useful to keep the existing cnt == 0 || cnt >= ARRAY_SIZE(insn_buf) for error detection (Andrii) v3 -> v4: - Rename new array map flag to BPF_F_INNER_MAP (Alexei) v2 -> v3: - Remove tab that slipped into uapi helper desc (Jakub) - Rework map in map for array to error from map_gen_lookup (Andrii) v1 -> v2: - Fixed selftest comment wrt inner1/inner2 value (Yonghong) ==================== Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2020-10-11bpf, selftests: Add redirect_peer selftestDaniel Borkmann
Extend the test_tc_redirect test and add a small test that exercises the new redirect_peer() helper for the IPv4 and IPv6 case. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20201010234006.7075-7-daniel@iogearbox.net
2020-10-11bpf, selftests: Make redirect_neigh test more extensibleDaniel Borkmann
Rename into test_tc_redirect.sh and move setup and test code into separate functions so they can be reused for newly added tests in here. Also remove the crude hack to override ifindex inside the object file via xxd and sed and just use a simple map instead. Map given iproute2 does not support BTF fully and therefore neither global data at this point. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Yonghong Song <yhs@fb.com> Link: https://lore.kernel.org/bpf/20201010234006.7075-6-daniel@iogearbox.net
2020-10-11bpf, selftests: Add test for different array inner map sizeDaniel Borkmann
Extend the "diff_size" subtest to also include a non-inlined array map variant where dynamic inner #elems are possible. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Yonghong Song <yhs@fb.com> Link: https://lore.kernel.org/bpf/20201010234006.7075-5-daniel@iogearbox.net
2020-10-11bpf: Allow for map-in-map with dynamic inner array map entriesDaniel Borkmann
Recent work in f4d05259213f ("bpf: Add map_meta_equal map ops") and 134fede4eecf ("bpf: Relax max_entries check for most of the inner map types") added support for dynamic inner max elements for most map-in-map types. Exceptions were maps like array or prog array where the map_gen_lookup() callback uses the maps' max_entries field as a constant when emitting instructions. We recently implemented Maglev consistent hashing into Cilium's load balancer which uses map-in-map with an outer map being hash and inner being array holding the Maglev backend table for each service. This has been designed this way in order to reduce overall memory consumption given the outer hash map allows to avoid preallocating a large, flat memory area for all services. Also, the number of service mappings is not always known a-priori. The use case for dynamic inner array map entries is to further reduce memory overhead, for example, some services might just have a small number of back ends while others could have a large number. Right now the Maglev backend table for small and large number of backends would need to have the same inner array map entries which adds a lot of unneeded overhead. Dynamic inner array map entries can be realized by avoiding the inlined code generation for their lookup. The lookup will still be efficient since it will be calling into array_map_lookup_elem() directly and thus avoiding retpoline. The patch adds a BPF_F_INNER_MAP flag to map creation which therefore skips inline code generation and relaxes array_map_meta_equal() check to ignore both maps' max_entries. This also still allows to have faster lookups for map-in-map when BPF_F_INNER_MAP is not specified and hence dynamic max_entries not needed. Example code generation where inner map is dynamic sized array: # bpftool p d x i 125 int handle__sys_enter(void * ctx): ; int handle__sys_enter(void *ctx) 0: (b4) w1 = 0 ; int key = 0; 1: (63) *(u32 *)(r10 -4) = r1 2: (bf) r2 = r10 ; 3: (07) r2 += -4 ; inner_map = bpf_map_lookup_elem(&outer_arr_dyn, &key); 4: (18) r1 = map[id:468] 6: (07) r1 += 272 7: (61) r0 = *(u32 *)(r2 +0) 8: (35) if r0 >= 0x3 goto pc+5 9: (67) r0 <<= 3 10: (0f) r0 += r1 11: (79) r0 = *(u64 *)(r0 +0) 12: (15) if r0 == 0x0 goto pc+1 13: (05) goto pc+1 14: (b7) r0 = 0 15: (b4) w6 = -1 ; if (!inner_map) 16: (15) if r0 == 0x0 goto pc+6 17: (bf) r2 = r10 ; 18: (07) r2 += -4 ; val = bpf_map_lookup_elem(inner_map, &key); 19: (bf) r1 = r0 | No inlining but instead 20: (85) call array_map_lookup_elem#149280 | call to array_map_lookup_elem() ; return val ? *val : -1; | for inner array lookup. 21: (15) if r0 == 0x0 goto pc+1 ; return val ? *val : -1; 22: (61) r6 = *(u32 *)(r0 +0) ; } 23: (bc) w0 = w6 24: (95) exit Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20201010234006.7075-4-daniel@iogearbox.net
2020-10-11bpf: Add redirect_peer helperDaniel Borkmann
Add an efficient ingress to ingress netns switch that can be used out of tc BPF programs in order to redirect traffic from host ns ingress into a container veth device ingress without having to go via CPU backlog queue [0]. For local containers this can also be utilized and path via CPU backlog queue only needs to be taken once, not twice. On a high level this borrows from ipvlan which does similar switch in __netif_receive_skb_core() and then iterates via another_round. This helps to reduce latency for mentioned use cases. Pod to remote pod with redirect(), TCP_RR [1]: # percpu_netperf 10.217.1.33 RT_LATENCY: 122.450 (per CPU: 122.666 122.401 122.333 122.401 ) MEAN_LATENCY: 121.210 (per CPU: 121.100 121.260 121.320 121.160 ) STDDEV_LATENCY: 120.040 (per CPU: 119.420 119.910 125.460 115.370 ) MIN_LATENCY: 46.500 (per CPU: 47.000 47.000 47.000 45.000 ) P50_LATENCY: 118.500 (per CPU: 118.000 119.000 118.000 119.000 ) P90_LATENCY: 127.500 (per CPU: 127.000 128.000 127.000 128.000 ) P99_LATENCY: 130.750 (per CPU: 131.000 131.000 129.000 132.000 ) TRANSACTION_RATE: 32666.400 (per CPU: 8152.200 8169.842 8174.439 8169.897 ) Pod to remote pod with redirect_peer(), TCP_RR: # percpu_netperf 10.217.1.33 RT_LATENCY: 44.449 (per CPU: 43.767 43.127 45.279 45.622 ) MEAN_LATENCY: 45.065 (per CPU: 44.030 45.530 45.190 45.510 ) STDDEV_LATENCY: 84.823 (per CPU: 66.770 97.290 84.380 90.850 ) MIN_LATENCY: 33.500 (per CPU: 33.000 33.000 34.000 34.000 ) P50_LATENCY: 43.250 (per CPU: 43.000 43.000 43.000 44.000 ) P90_LATENCY: 46.750 (per CPU: 46.000 47.000 47.000 47.000 ) P99_LATENCY: 52.750 (per CPU: 51.000 54.000 53.000 53.000 ) TRANSACTION_RATE: 90039.500 (per CPU: 22848.186 23187.089 22085.077 21919.130 ) [0] https://linuxplumbersconf.org/event/7/contributions/674/attachments/568/1002/plumbers_2020_cilium_load_balancer.pdf [1] https://github.com/borkmann/netperf_scripts/blob/master/percpu_netperf Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20201010234006.7075-3-daniel@iogearbox.net
2020-10-11bpf: Improve bpf_redirect_neigh helper descriptionDaniel Borkmann
Follow-up to address David's feedback that we should better describe internals of the bpf_redirect_neigh() helper. Suggested-by: David Ahern <dsahern@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Reviewed-by: David Ahern <dsahern@gmail.com> Link: https://lore.kernel.org/bpf/20201010234006.7075-2-daniel@iogearbox.net
2020-10-12scripts: remove namespace.plJacob Keller
namespace.pl is intended to help locate symbols which are defined but are not used externally. The goal is to avoid bloat of the namespace in the resulting kernel image. The script relies on object data, and only finds unused symbols for the configuration used to generate that object data. This results in a lot of false positive warnings such as symbols only used by a single architecture, or symbols which are used externally only under certain configurations. Running namespace.pl using allyesconfig, allmodconfig, and x86_64_defconfig yields the following results: * allmodconfig * 11122 unique symbol names with no external reference * 1194 symbols listed as multiply defined * 214 symbols it can't resolve * allyesconfig * 10997 unique symbol names with no external reference * 1194 symbols listed as multiply defined * 214 symbols it can't resolve * x86_64_defconfig * 5757 unique symbol names with no external reference * 528 symbols listed as multiply defined * 154 symbols it can't resolve The script also has no way to easily limit the scope of the checks to a given subset of the kernel, such as only checking for symbols defined within a module or subsystem. Discussion on public mailing lists seems to indicate that many view the tool output as suspect or not very useful (see discussions at [1] and [2] for further context). As described by Masahiro Yamada at [2], namespace.pl provides 3 types of checks: listing multiply defined symbols, resolving external symbols, and warnings about symbols with no reference. The first category of issues is easily caught by the linker as any set of multiply defined symbols should fail to link. The second category of issues is also caught by linking, as undefined symbols would cause issues. Even with modules, these types of issues where a module relies on an external symbol are caught by modpost. The remaining category of issues reported is the list of symbols with no external reference, and is the primary motivation of this script. However, it ought to be clear from the above examples that the output is difficult to sort through. Even allyesconfig has ~10000 entries. The current submit-checklist indicates that patches ought to go through namespacecheck and fix any new issues arising. But that itself presents problems. As described at [1], many cases of reports are due to configuration where a function is used externally by some configuration settings. Prominent maintainers appear to dislike changes modify code such that symbols become static based on CONFIG_* flags ([3], and [4]) One possible solution is to adjust the advice and indicate that we only care about the output of namespacecheck on allyesconfig or allmodconfig builds... However, given the discussion at [2], I suspect that few people are actively using this tool. It doesn't have a maintainer in the MAINTAINERS flie, and it produces so many warnings for unused symbols that it is difficult to use effectively. Thus, I propose we simply remove it. [1] https://lore.kernel.org/netdev/20200708164812.384ae8ea@kicinski-fedora-pc1c0hjn.dhcp.thefacebook.com/ [2] https://lore.kernel.org/lkml/20190129204319.15238-1-jacob.e.keller@intel.com/ [3] https://lore.kernel.org/netdev/20190828.154744.2058157956381129672.davem@davemloft.net/ [4] https://lore.kernel.org/netdev/20190827210928.576c5fef@cakuba.netronome.com/ Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Acked-by: Randy Dunlap <rdunlap@infradead.org> Acked-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
2020-10-10Merge branch 'i2c/for-current' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux Pull i2c fixes from Wolfram Sang: "Some more driver bugfixes for I2C. Including a revert - the updated series for it will come during the next merge window" * 'i2c/for-current' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux: i2c: owl: Clear NACK and BUS error bits Revert "i2c: imx: Fix reset of I2SR_IAL flag" i2c: meson: fixup rate calculation with filter delay i2c: meson: keep peripheral clock enabled i2c: meson: fix clock setting overwrite i2c: imx: Fix reset of I2SR_IAL flag
2020-10-10cifs: Fix incomplete memory allocation on setxattr pathVladimir Zapolskiy
On setxattr() syscall path due to an apprent typo the size of a dynamically allocated memory chunk for storing struct smb2_file_full_ea_info object is computed incorrectly, to be more precise the first addend is the size of a pointer instead of the wanted object size. Coincidentally it makes no difference on 64-bit platforms, however on 32-bit targets the following memcpy() writes 4 bytes of data outside of the dynamically allocated memory. ============================================================================= BUG kmalloc-16 (Not tainted): Redzone overwritten ----------------------------------------------------------------------------- Disabling lock debugging due to kernel taint INFO: 0x79e69a6f-0x9e5cdecf @offset=368. First byte 0x73 instead of 0xcc INFO: Slab 0xd36d2454 objects=85 used=51 fp=0xf7d0fc7a flags=0x35000201 INFO: Object 0x6f171df3 @offset=352 fp=0x00000000 Redzone 5d4ff02d: cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc ................ Object 6f171df3: 00 00 00 00 00 05 06 00 73 6e 72 75 62 00 66 69 ........snrub.fi Redzone 79e69a6f: 73 68 32 0a sh2. Padding 56254d82: 5a 5a 5a 5a 5a 5a 5a 5a ZZZZZZZZ CPU: 0 PID: 8196 Comm: attr Tainted: G B 5.9.0-rc8+ #3 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-1 04/01/2014 Call Trace: dump_stack+0x54/0x6e print_trailer+0x12c/0x134 check_bytes_and_report.cold+0x3e/0x69 check_object+0x18c/0x250 free_debug_processing+0xfe/0x230 __slab_free+0x1c0/0x300 kfree+0x1d3/0x220 smb2_set_ea+0x27d/0x540 cifs_xattr_set+0x57f/0x620 __vfs_setxattr+0x4e/0x60 __vfs_setxattr_noperm+0x4e/0x100 __vfs_setxattr_locked+0xae/0xd0 vfs_setxattr+0x4e/0xe0 setxattr+0x12c/0x1a0 path_setxattr+0xa4/0xc0 __ia32_sys_lsetxattr+0x1d/0x20 __do_fast_syscall_32+0x40/0x70 do_fast_syscall_32+0x29/0x60 do_SYSENTER_32+0x15/0x20 entry_SYSENTER_32+0x9f/0xf2 Fixes: 5517554e4313 ("cifs: Add support for writing attributes on SMB2+") Signed-off-by: Vladimir Zapolskiy <vladimir@tuxera.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-10-10mm/khugepaged: fix filemap page_to_pgoff(page) != offsetHugh Dickins
There have been elusive reports of filemap_fault() hitting its VM_BUG_ON_PAGE(page_to_pgoff(page) != offset, page) on kernels built with CONFIG_READ_ONLY_THP_FOR_FS=y. Suren has hit it on a kernel with CONFIG_READ_ONLY_THP_FOR_FS=y and CONFIG_NUMA is not set: and he has analyzed it down to how khugepaged without NUMA reuses the same huge page after collapse_file() failed (whereas NUMA targets its allocation to the respective node each time). And most of us were usually testing with CONFIG_NUMA=y kernels. collapse_file(old start) new_page = khugepaged_alloc_page(hpage) __SetPageLocked(new_page) new_page->index = start // hpage->index=old offset new_page->mapping = mapping xas_store(&xas, new_page) filemap_fault page = find_get_page(mapping, offset) // if offset falls inside hpage then // compound_head(page) == hpage lock_page_maybe_drop_mmap() __lock_page(page) // collapse fails xas_store(&xas, old page) new_page->mapping = NULL unlock_page(new_page) collapse_file(new start) new_page = khugepaged_alloc_page(hpage) __SetPageLocked(new_page) new_page->index = start // hpage->index=new offset new_page->mapping = mapping // mapping becomes valid again // since compound_head(page) == hpage // page_to_pgoff(page) got changed VM_BUG_ON_PAGE(page_to_pgoff(page) != offset) An initial patch replaced __SetPageLocked() by lock_page(), which did fix the race which Suren illustrates above. But testing showed that it's not good enough: if the racing task's __lock_page() gets delayed long after its find_get_page(), then it may follow collapse_file(new start)'s successful final unlock_page(), and crash on the same VM_BUG_ON_PAGE. It could be fixed by relaxing filemap_fault()'s VM_BUG_ON_PAGE to a check and retry (as is done for mapping), with similar relaxations in find_lock_entry() and pagecache_get_page(): but it's not obvious what else might get caught out; and khugepaged non-NUMA appears to be unique in exposing a page to page cache, then revoking, without going through a full cycle of freeing before reuse. Instead, non-NUMA khugepaged_prealloc_page() release the old page if anyone else has a reference to it (1% of cases when I tested). Although never reported on huge tmpfs, I believe its find_lock_entry() has been at similar risk; but huge tmpfs does not rely on khugepaged for its normal working nearly so much as READ_ONLY_THP_FOR_FS does. Reported-by: Denis Lisov <dennis.lissov@gmail.com> Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=206569 Link: https://lore.kernel.org/linux-mm/?q=20200219144635.3b7417145de19b65f258c943%40linux-foundation.org Reported-by: Qian Cai <cai@lca.pw> Link: https://lore.kernel.org/linux-xfs/?q=20200616013309.GB815%40lca.pw Reported-and-analyzed-by: Suren Baghdasaryan <surenb@google.com> Fixes: 87c460a0bded ("mm/khugepaged: collapse_shmem() without freezing new_page") Signed-off-by: Hugh Dickins <hughd@google.com> Cc: stable@vger.kernel.org # v4.9+ Reviewed-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-10-10power: supply: ltc2941: Fix ptr to enum castIskren Chernev
clang complains about casting pointers to smaller enum types. Signed-off-by: Iskren Chernev <iskren.chernev@gmail.com> Signed-off-by: Sebastian Reichel <sebastian.reichel@collabora.com>
2020-10-10coccinelle: api: kfree_sensitive: print memset positionDenis Efremov
Print memset() call position in addition to the kfree() position to ease issues identification. Signed-off-by: Denis Efremov <efremov@linux.com> Signed-off-by: Julia Lawall <julia.lawall@inria.fr>
2020-10-10drivers/net/wan/hdlc_fr: Move the skb_headroom check out of fr_hard_headerXie He
Move the skb_headroom check out of fr_hard_header and into pvc_xmit. This has two benefits: 1. Originally we only do this check for skbs sent by users on Ethernet- emulating PVC devices. After the change we do this check for skbs sent on normal PVC devices, too. (Also add a comment to make it clear that this is only a protection against upper layers that don't take dev->needed_headroom into account. Such upper layers should be rare and I believe they should be fixed.) 2. After the change we can simplify the parameter list of fr_hard_header. We no longer need to use a pointer to pointers (skb_p) because we no longer need to replace the skb inside fr_hard_header. Cc: Krzysztof Halasa <khc@pm.waw.pl> Signed-off-by: Xie He <xie.he.0141@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-10-10io_uring: keep a pointer ref_node in file_dataPavel Begunkov
->cur_refs of struct fixed_file_data always points to percpu_ref embedded into struct fixed_file_ref_node. Don't overuse container_of() and offsetting, and point directly to fixed_file_ref_node. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-10-10io_uring: refactor *files_register()'s error pathsPavel Begunkov
Don't keep repeating cleaning sequences in error paths, write it once in the and use labels. It's less error prone and looks cleaner. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-10-10io_uring: clean file_data access in files_registerPavel Begunkov
Keep file_data in a local var and replace with it complex references such as ctx->file_data. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>