Age | Commit message (Collapse) | Author |
|
Check that PML is actually enabled before setting the mask to force a
SPTE to be write-protected. The bits used for the !AD_ENABLED case are
in the upper half of the SPTE. With 64-bit paging and EPT, these bits
are ignored, but with 32-bit PAE paging they are reserved. Setting them
for L2 SPTEs without checking PML breaks NPT on 32-bit KVM.
Fixes: 1f4e5fc83a42 ("KVM: x86: fix nested guest live migration with PML")
Cc: stable@vger.kernel.org
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20210225204749.1512652-2-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
Reported by syzkaller:
KASAN: null-ptr-deref in range [0x0000000000000140-0x0000000000000147]
CPU: 1 PID: 8370 Comm: syz-executor859 Not tainted 5.11.0-syzkaller #0
RIP: 0010:synic_get arch/x86/kvm/hyperv.c:165 [inline]
RIP: 0010:kvm_hv_set_sint_gsi arch/x86/kvm/hyperv.c:475 [inline]
RIP: 0010:kvm_hv_irq_routing_update+0x230/0x460 arch/x86/kvm/hyperv.c:498
Call Trace:
kvm_set_irq_routing+0x69b/0x940 arch/x86/kvm/../../../virt/kvm/irqchip.c:223
kvm_vm_ioctl+0x12d0/0x2800 arch/x86/kvm/../../../virt/kvm/kvm_main.c:3959
vfs_ioctl fs/ioctl.c:48 [inline]
__do_sys_ioctl fs/ioctl.c:753 [inline]
__se_sys_ioctl fs/ioctl.c:739 [inline]
__x64_sys_ioctl+0x193/0x200 fs/ioctl.c:739
do_syscall_64+0x2d/0x70 arch/x86/entry/common.c:46
entry_SYSCALL_64_after_hwframe+0x44/0xae
Hyper-V context is lazily allocated until Hyper-V specific MSRs are accessed
or SynIC is enabled. However, the syzkaller testcase sets irq routing table
directly w/o enabling SynIC. This results in null-ptr-deref when accessing
SynIC Hyper-V context. This patch fixes it.
syzkaller source: https://syzkaller.appspot.com/x/repro.c?x=163342ccd00000
Reported-by: syzbot+6987f3b2dbd9eda95f12@syzkaller.appspotmail.com
Fixes: 8f014550dfb1 ("KVM: x86: hyper-v: Make Hyper-V emulation enablement conditional")
Signed-off-by: Wanpeng Li <wanpengli@tencent.com>
Message-Id: <1614326399-5762-1-git-send-email-wanpengli@tencent.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
The 'mmu_page_hash' is used as hash table while 'active_mmu_pages' is a
list. Remove the misplaced comment as it's mostly stating the obvious
anyways.
Signed-off-by: Dongli Zhang <dongli.zhang@oracle.com>
Reviewed-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20210226061945.1222-1-dongli.zhang@oracle.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
parisc uses -fpatchable-function-entry with dynamic ftrace, which means we
don't need recordmcount. Select FTRACE_MCOUNT_USE_PATCHABLE_FUNCTION_ENTRY
to tell that to the build system.
Reported-by: Guenter Roeck <linux@roeck-us.net>
Fixes: 3b15cdc15956 ("tracing: move function tracer options to Kconfig")
Signed-off-by: Sami Tolvanen <samitolvanen@google.com>
Tested-by: Guenter Roeck <linux@roeck-us.net>
Tested-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Kees Cook <keescook@chromium.org>
Link: https://lore.kernel.org/r/20210224225706.2726050-1-samitolvanen@google.com
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/mips/linux
Pull more MIPS updates from Thomas Bogendoerfer:
- added n64 block driver
- fix for ubsan warnings
- fix for bcm63xx platform
- update of linux-mips mailinglist
* tag 'mips_5.12_1' of git://git.kernel.org/pub/scm/linux/kernel/git/mips/linux:
arch: mips: update references to current linux-mips list
mips: bmips: init clocks earlier
vmlinux.lds.h: catch even more instrumentation symbols into .data
n64: store dev instance into disk private data
n64: cleanup n64cart_probe()
n64: cosmetics changes
n64: remove curly brackets
n64: use sector SECTOR_SHIFT instead 512
n64: use enums for reg
n64: move module param at the top
n64: move module info at the end
n64: use pr_fmt to avoid duplicate string
block: Add n64 cart driver
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild
Pull Kbuild updates from Masahiro Yamada:
- Fix false-positive build warnings for ARCH=ia64 builds
- Optimize dictionary size for module compression with xz
- Check the compiler and linker versions in Kconfig
- Fix misuse of extra-y
- Support DWARF v5 debug info
- Clamp SUBLEVEL to 255 because stable releases 4.4.x and 4.9.x
exceeded the limit
- Add generic syscall{tbl,hdr}.sh for cleanups across arches
- Minor cleanups of genksyms
- Minor cleanups of Kconfig
* tag 'kbuild-v5.12' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild: (38 commits)
initramfs: Remove redundant dependency of RD_ZSTD on BLK_DEV_INITRD
kbuild: remove deprecated 'always' and 'hostprogs-y/m'
kbuild: parse C= and M= before changing the working directory
kbuild: reuse this-makefile to define abs_srctree
kconfig: unify rule of config, menuconfig, nconfig, gconfig, xconfig
kconfig: omit --oldaskconfig option for 'make config'
kconfig: fix 'invalid option' for help option
kconfig: remove dead code in conf_askvalue()
kconfig: clean up nested if-conditionals in check_conf()
kconfig: Remove duplicate call to sym_get_string_value()
Makefile: Remove # characters from compiler string
Makefile: reuse CC_VERSION_TEXT
kbuild: check the minimum linker version in Kconfig
kbuild: remove ld-version macro
scripts: add generic syscallhdr.sh
scripts: add generic syscalltbl.sh
arch: syscalls: remove $(srctree)/ prefix from syscall tables
arch: syscalls: add missing FORCE and fix 'targets' to make if_changed work
gen_compile_commands: prune some directories
kbuild: simplify access to the kernel's version
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci
Pull PCI updates from Bjorn Helgaas:
"Enumeration:
- Remove unnecessary locking around _OSC (Bjorn Helgaas)
- Clarify message about _OSC failure (Bjorn Helgaas)
- Remove notification of PCIe bandwidth changes (Bjorn Helgaas)
- Tidy checking of syscall user config accessors (Heiner Kallweit)
Resource management:
- Decline to resize resources if boot config must be preserved (Ard
Biesheuvel)
- Fix pci_register_io_range() memory leak (Geert Uytterhoeven)
Error handling (Keith Busch):
- Clear error status from the correct device
- Retain error recovery status so drivers can use it after reset
- Log the type of Port (Root or Switch Downstream) that we reset
- Always request a reset for Downstream Ports in frozen state
Endpoint framework and NTB (Kishon Vijay Abraham I):
- Make *_get_first_free_bar() take into account 64 bit BAR
- Add helper API to get the 'next' unreserved BAR
- Make *_free_bar() return error codes on failure
- Remove unused pci_epf_match_device()
- Add support to associate secondary EPC with EPF
- Add support in configfs to associate two EPCs with EPF
- Add pci_epc_ops to map MSI IRQ
- Add pci_epf_ops to expose function-specific attrs
- Allow user to create sub-directory of 'EPF Device' directory
- Implement ->msi_map_irq() ops for cadence
- Configure LM_EP_FUNC_CFG based on epc->function_num_map for cadence
- Add EP function driver to provide NTB functionality
- Add support for EPF PCI Non-Transparent Bridge
- Add specification for PCI NTB function device
- Add PCI endpoint NTB function user guide
- Add configfs binding documentation for pci-ntb endpoint function
Broadcom STB PCIe controller driver:
- Add support for BCM4908 and external PERST# signal controller
(Rafał Miłecki)
Cadence PCIe controller driver:
- Retrain Link to work around Gen2 training defect (Nadeem Athani)
- Fix merge botch in cdns_pcie_host_map_dma_ranges() (Krzysztof
Wilczyński)
Freescale Layerscape PCIe controller driver:
- Add LX2160A rev2 EP mode support (Hou Zhiqiang)
- Convert to builtin_platform_driver() (Michael Walle)
MediaTek PCIe controller driver:
- Fix OF node reference leak (Krzysztof Wilczyński)
Microchip PolarFlare PCIe controller driver:
- Add Microchip PolarFire PCIe controller driver (Daire McNamara)
Qualcomm PCIe controller driver:
- Use PHY_REFCLK_USE_PAD only for ipq8064 (Ansuel Smith)
- Add support for ddrss_sf_tbu clock for sm8250 (Dmitry Baryshkov)
Renesas R-Car PCIe controller driver:
- Drop PCIE_RCAR config option (Lad Prabhakar)
- Always allocate MSI addresses in 32bit space (Marek Vasut)
Rockchip PCIe controller driver:
- Add FriendlyARM NanoPi M4B DT binding (Chen-Yu Tsai)
- Make 'ep-gpios' DT property optional (Chen-Yu Tsai)
Synopsys DesignWare PCIe controller driver:
- Work around ECRC configuration hardware defect (Vidya Sagar)
- Drop support for config space in DT 'ranges' (Rob Herring)
- Change size to u64 for EP outbound iATU (Shradha Todi)
- Add upper limit address for outbound iATU (Shradha Todi)
- Make dw_pcie ops optional (Jisheng Zhang)
- Remove unnecessary dw_pcie_ops from al driver (Jisheng Zhang)
Xilinx Versal CPM PCIe controller driver:
- Fix OF node reference leak (Pan Bian)
Miscellaneous:
- Remove tango host controller driver (Arnd Bergmann)
- Remove IRQ handler & data together (altera-msi, brcmstb, dwc)
(Martin Kaiser)
- Fix xgene-msi race in installing chained IRQ handler (Martin
Kaiser)
- Apply CONFIG_PCI_DEBUG to entire drivers/pci hierarchy (Junhao He)
- Fix pci-bridge-emul array overruns (Russell King)
- Remove obsolete uses of WARN_ON(in_interrupt()) (Sebastian Andrzej
Siewior)"
* tag 'pci-v5.12-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci: (69 commits)
PCI: qcom: Use PHY_REFCLK_USE_PAD only for ipq8064
PCI: qcom: Add support for ddrss_sf_tbu clock
dt-bindings: PCI: qcom: Document ddrss_sf_tbu clock for sm8250
PCI: al: Remove useless dw_pcie_ops
PCI: dwc: Don't assume the ops in dw_pcie always exist
PCI: dwc: Add upper limit address for outbound iATU
PCI: dwc: Change size to u64 for EP outbound iATU
PCI: dwc: Drop support for config space in 'ranges'
PCI: layerscape: Convert to builtin_platform_driver()
PCI: layerscape: Add LX2160A rev2 EP mode support
dt-bindings: PCI: layerscape: Add LX2160A rev2 compatible strings
PCI: dwc: Work around ECRC configuration issue
PCI/portdrv: Report reset for frozen channel
PCI/AER: Specify the type of Port that was reset
PCI/ERR: Retain status from error notification
PCI/AER: Clear AER status from Root Port when resetting Downstream Port
PCI/ERR: Clear status of the reporting device
dt-bindings: arm: rockchip: Add FriendlyARM NanoPi M4B
PCI: rockchip: Make 'ep-gpios' DT property optional
Documentation: PCI: Add PCI endpoint NTB function user guide
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/greentime/linux
Pull nds32 updates from Greentime Hu:
"Code clean-up and refinement"
* tag 'nds32-for-linux-5.12' of git://git.kernel.org/pub/scm/linux/kernel/git/greentime/linux:
nds32: Fix bogus reference to <asm/procinfo.h>
nds32: use get_kernel_nofault in dump_mem
nds32: remove dump_instr
nds32: configs: Cleanup CONFIG_CROSS_COMPILE
nds32: Replace <linux/clk-provider.h> by <linux/of_clk.h>
|
|
Currently the arm64 unwinder code returns -EINVAL whenever it can't find
the next stack frame, not distinguishing between cases where the stack has
been corrupted or is otherwise in a state it shouldn't be and cases
where we have reached the end of the stack. At the minute none of the
callers care what error code is returned but this will be important for
reliable stack trace which needs to be sure that the stack is intact.
Change to return -ENOENT in the case where we reach the bottom of the
stack. The error codes from this function are only used in kernel, this
particular code is chosen as we are indicating that we know there is no
frame there.
Signed-off-by: Mark Brown <broonie@kernel.org>
Acked-by: Mark Rutland <mark.rutland@arm.com>
Link: https://lore.kernel.org/r/20210224165037.24138-1-broonie@kernel.org
Signed-off-by: Will Deacon <will@kernel.org>
|
|
Since commit f086f67485c5 ("arm64: ptrace: add support for syscall
emulation"), if system call number -1 is called and the process is being
traced with PTRACE_SYSCALL, for example by strace, the seccomp check is
skipped and -ENOSYS is returned unconditionally (unless altered by the
tracer) rather than carrying out action specified in the seccomp filter.
The consequence of this is that it is not possible to reliably strace
a seccomp based implementation of a foreign system call interface in
which r7/x8 is permitted to be -1 on entry to a system call.
Also trace_sys_enter and audit_syscall_entry are skipped if a system
call is skipped.
Fix by removing the in_syscall(regs) check restoring the previous
behaviour which is like AArch32, x86 (which uses generic code) and
everything else.
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Catalin Marinas<catalin.marinas@arm.com>
Cc: <stable@vger.kernel.org>
Fixes: f086f67485c5 ("arm64: ptrace: add support for syscall emulation")
Reviewed-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Sudeep Holla <sudeep.holla@arm.com>
Tested-by: Sudeep Holla <sudeep.holla@arm.com>
Signed-off-by: Timothy E Baldwin <T.E.Baldwin99@members.leeds.ac.uk>
Link: https://lore.kernel.org/r/90edd33b-6353-1228-791f-0336d94d5f8c@majoroak.me.uk
Signed-off-by: Will Deacon <will@kernel.org>
|
|
Fix the interpreation of nested_svm_vmexit()'s return value when
synthesizing a nested VM-Exit after intercepting an SVM instruction while
L2 was running. The helper returns '0' on success, whereas a return
value of '0' in the exit handler path means "exit to userspace". The
incorrect return value causes KVM to exit to userspace without filling
the run state, e.g. QEMU logs "KVM: unknown exit, hardware reason 0".
Fixes: 14c2bf81fcd2 ("KVM: SVM: Fix #GP handling for doubly-nested virtualization")
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20210224005627.657028-1-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
Andestech(nds32) never had <asm/procinfo.h>.
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Acked-by: Greentime Hu <green.hu@gmail.com>
Signed-off-by: Greentime Hu <green.hu@gmail.com>
|
|
Use the proper get_kernel_nofault helper to access an unsafe kernel
pointer without faulting instead of playing with set_fs and get_user.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Nick Hu <nickhu@andestech.com>
Acked-by: Greentime Hu <green.hu@gmail.com>
Signed-off-by: Greentime Hu <green.hu@gmail.com>
|
|
dump_inst has a return before actually doing anything, so just drop the
whole thing.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Nick Hu <nickhu@andestech.com>
Acked-by: Greentime Hu <green.hu@gmail.com>
Signed-off-by: Greentime Hu <green.hu@gmail.com>
|
|
CONFIG_CROSS_COMPILE is gone since commit f1089c92da79 ("kbuild: remove
CONFIG_CROSS_COMPILE support").
Signed-off-by: Krzysztof Kozlowski <krzk@kernel.org>
Acked-by: Greentime Hu <green.hu@gmail.com>
Signed-off-by: Greentime Hu <green.hu@gmail.com>
|
|
The Andes platform code is not a clock provider, and just needs to call
of_clk_init().
Hence it can include <linux/of_clk.h> instead of <linux/clk-provider.h>.
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Acked-by: Greentime Hu <green.hu@gmail.com>
Reviewed-by: Stephen Boyd <sboyd@kernel.org>
Signed-off-by: Greentime Hu <green.hu@gmail.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 irq entry updates from Thomas Gleixner:
"The irq stack switching was moved out of the ASM entry code in course
of the entry code consolidation. It ended up being suboptimal in
various ways.
This reworks the X86 irq stack handling:
- Make the stack switching inline so the stackpointer manipulation is
not longer at an easy to find place.
- Get rid of the unnecessary indirect call.
- Avoid the double stack switching in interrupt return and reuse the
interrupt stack for softirq handling.
- A objtool fix for CONFIG_FRAME_POINTER=y builds where it got
confused about the stack pointer manipulation"
* tag 'x86-entry-2021-02-24' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
objtool: Fix stack-swizzle for FRAME_POINTER=y
um: Enforce the usage of asm-generic/softirq_stack.h
x86/softirq/64: Inline do_softirq_own_stack()
softirq: Move do_softirq_own_stack() to generic asm header
softirq: Move __ARCH_HAS_DO_SOFTIRQ to Kconfig
x86: Select CONFIG_HAVE_IRQ_EXIT_ON_IRQ_STACK
x86/softirq: Remove indirection in do_softirq_own_stack()
x86/entry: Use run_sysvec_on_irqstack_cond() for XEN upcall
x86/entry: Convert device interrupts to inline stack switching
x86/entry: Convert system vectors to irq stack macro
x86/irq: Provide macro for inlining irq stack switching
x86/apic: Split out spurious handling code
x86/irq/64: Adjust the per CPU irq stack pointer by 8
x86/irq: Sanitize irq stack tracking
x86/entry: Fix instrumentation annotation
|
|
Merge misc updates from Andrew Morton:
"A few small subsystems and some of MM.
172 patches.
Subsystems affected by this patch series: hexagon, scripts, ntfs,
ocfs2, vfs, and mm (slab-generic, slab, slub, debug, pagecache, swap,
memcg, pagemap, mprotect, mremap, page-reporting, vmalloc, kasan,
pagealloc, memory-failure, hugetlb, vmscan, z3fold, compaction,
mempolicy, oom-kill, hugetlbfs, and migration)"
* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (172 commits)
mm/migrate: remove unneeded semicolons
hugetlbfs: remove unneeded return value of hugetlb_vmtruncate()
hugetlbfs: fix some comment typos
hugetlbfs: correct some obsolete comments about inode i_mutex
hugetlbfs: make hugepage size conversion more readable
hugetlbfs: remove meaningless variable avoid_reserve
hugetlbfs: correct obsolete function name in hugetlbfs_read_iter()
hugetlbfs: use helper macro default_hstate in init_hugetlbfs_fs
hugetlbfs: remove useless BUG_ON(!inode) in hugetlbfs_setattr()
hugetlbfs: remove special hugetlbfs_set_page_dirty()
mm/hugetlb: change hugetlb_reserve_pages() to type bool
mm, oom: fix a comment in dump_task()
mm/mempolicy: use helper range_in_vma() in queue_pages_test_walk()
numa balancing: migrate on fault among multiple bound nodes
mm, compaction: make fast_isolate_freepages() stay within zone
mm/compaction: fix misbehaviors of fast_find_migrateblock()
mm/compaction: correct deferral logic for proactive compaction
mm/compaction: remove duplicated VM_BUG_ON_PAGE !PageLocked
mm/compaction: remove rcu_read_lock during page compaction
z3fold: simplify the zhdr initialization code in init_z3fold_page()
...
|
|
Function set_pmd_at is to set pmd entry, if tlb entry need to be flushed,
there exists pmdp_huge_clear_flush alike function before set_pmd_at is
called. So it is not necessary to call flush_tlb_all in this function.
In these scenarios, tlb for the pmd range needs to be flushed:
- privilege degrade such as wrprotect is set on the pmd entry
- pmd entry is cleared
- there is exception if set_pmd_at is issued by dup_mmap, since
flush_tlb_mm is called for parent process, it is not necessary to
flush tlb in function copy_huge_pmd.
Link: http://lkml.kernel.org/r/1592990792-1923-3-git-send-email-maobibo@loongson.cn
Signed-off-by: Bibo Mao <maobibo@loongson.cn>
Cc: Anshuman Khandual <anshuman.khandual@arm.com>
Cc: Daniel Silsby <dansilsby@gmail.com>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Mike Rapoport <rppt@linux.ibm.com>
Cc: Paul Burton <paulburton@kernel.org>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
As David suggested, simply passing 'struct zone *zone' is enough. We can
get all needed information from 'struct zone*' easily.
Link: https://lkml.kernel.org/r/20210122135956.5946-4-bhe@redhat.com
Signed-off-by: Baoquan He <bhe@redhat.com>
Suggested-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Mike Rapoport <rppt@linux.ibm.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
The current memmap_init_zone() only handles memory region inside one zone,
actually memmap_init() does the memmap init of one zone. So rename both
of them accordingly.
Link: https://lkml.kernel.org/r/20210122135956.5946-3-bhe@redhat.com
Signed-off-by: Baoquan He <bhe@redhat.com>
Reviewed-by: Mike Rapoport <rppt@linux.ibm.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Patch series "mm: clean up names and parameters of memmap_init_xxxx functions", v5.
This patchset corrects inappropriate function names of memmap_init_xxx,
and simplify parameters of functions in the code flow. And also fix a
prototype warning reported by lkp.
This patch (of 5);
Kernel test robot calling make with 'W=1' is triggering warning like
below for memmap_init_zone() function.
mm/page_alloc.c:6259:23: warning: no previous prototype for 'memmap_init_zone' [-Wmissing-prototypes]
6259 | void __meminit __weak memmap_init_zone(unsigned long size, int nid,
| ^~~~~~~~~~~~~~~~
Fix it by adding the function declaration in include/linux/mm.h. Since
memmap_init_zone() has a generic version with '__weak', the declaratoin in
ia64 header file can be simply removed.
Link: https://lkml.kernel.org/r/20210122135956.5946-1-bhe@redhat.com
Link: https://lkml.kernel.org/r/20210122135956.5946-2-bhe@redhat.com
Signed-off-by: Baoquan He <bhe@redhat.com>
Reported-by: kernel test robot <lkp@intel.com>
Reviewed-by: Mike Rapoport <rppt@linux.ibm.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
On a high level, this patch allows running KUnit KASAN tests with the
hardware tag-based KASAN mode.
Internally, this change reenables tag checking at the end of each KASAN
test that triggers a tag fault and leads to tag checking being disabled.
Also simplify is_write calculation in report_tag_fault.
With this patch KASAN tests are still failing for the hardware tag-based
mode; fixes come in the next few patches.
[andreyknvl@google.com: export HW_TAGS symbols for KUnit tests]
Link: https://lkml.kernel.org/r/e7eeb252da408b08f0c81b950a55fb852f92000b.1613155970.git.andreyknvl@google.com
Link: https://linux-review.googlesource.com/id/Id94dc9eccd33b23cda4950be408c27f879e474c8
Link: https://lkml.kernel.org/r/51b23112cf3fd62b8f8e9df81026fa2b15870501.1610733117.git.andreyknvl@google.com
Signed-off-by: Andrey Konovalov <andreyknvl@google.com>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Reviewed-by: Vincenzo Frascino <vincenzo.frascino@arm.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Branislav Rankov <Branislav.Rankov@arm.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Evgenii Stepanov <eugenis@google.com>
Cc: Kevin Brodsky <kevin.brodsky@arm.com>
Cc: Marco Elver <elver@google.com>
Cc: Peter Collingbourne <pcc@google.com>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Since CONFIG_EXPERIMENTAL was removed in 2013, go ahead and drop it
from any defconfig files.
Link: https://lkml.kernel.org/r/20210115010011.29483-1-rdunlap@infradead.org
Fixes: 3d374d09f16f ("final removal of CONFIG_EXPERIMENTAL")
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Cc: Kees Cook <keescook@chromium.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Brian Cain <bcain@codeaurora.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull Simple Firmware Interface (SFI) support removal from Rafael Wysocki:
"Drop support for depercated platforms using SFI, drop the entire
support for SFI that has been long deprecated too and make some
janitorial changes on top of that (Andy Shevchenko)"
* tag 'sfi-removal-5.12-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
x86/platform/intel-mid: Update Copyright year and drop file names
x86/platform/intel-mid: Remove unused header inclusion in intel-mid.h
x86/platform/intel-mid: Drop unused __intel_mid_cpu_chip and Co.
x86/platform/intel-mid: Get rid of intel_scu_ipc_legacy.h
x86/PCI: Describe @reg for type1_access_ok()
x86/PCI: Get rid of custom x86 model comparison
sfi: Remove framework for deprecated firmware
cpufreq: sfi-cpufreq: Remove driver for deprecated firmware
media: atomisp: Remove unused header
mfd: intel_msic: Remove driver for deprecated platform
x86/apb_timer: Remove driver for deprecated platform
x86/platform/intel-mid: Remove unused leftovers (vRTC)
x86/platform/intel-mid: Remove unused leftovers (msic)
x86/platform/intel-mid: Remove unused leftovers (msic_thermal)
x86/platform/intel-mid: Remove unused leftovers (msic_power_btn)
x86/platform/intel-mid: Remove unused leftovers (msic_gpio)
x86/platform/intel-mid: Remove unused leftovers (msic_battery)
x86/platform/intel-mid: Remove unused leftovers (msic_ocd)
x86/platform/intel-mid: Remove unused leftovers (msic_audio)
platform/x86: intel_scu_wdt: Drop mistakenly added const
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc
Pull char/misc driver updates from Greg KH:
"Here is the large set of char/misc/whatever driver subsystem updates
for 5.12-rc1. Over time it seems like this tree is collecting more and
more tiny driver subsystems in one place, making it easier for those
maintainers, which is why this is getting larger.
Included in here are:
- coresight driver updates
- habannalabs driver updates
- virtual acrn driver addition (proper acks from the x86 maintainers)
- broadcom misc driver addition
- speakup driver updates
- soundwire driver updates
- fpga driver updates
- amba driver updates
- mei driver updates
- vfio driver updates
- greybus driver updates
- nvmeem driver updates
- phy driver updates
- mhi driver updates
- interconnect driver udpates
- fsl-mc bus driver updates
- random driver fix
- some small misc driver updates (rtsx, pvpanic, etc.)
All of these have been in linux-next for a while, with the only
reported issue being a merge conflict due to the dfl_device_id
addition from the fpga subsystem in here"
* tag 'char-misc-5.12-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc: (311 commits)
spmi: spmi-pmic-arb: Fix hw_irq overflow
Documentation: coresight: Add PID tracing description
coresight: etm-perf: Support PID tracing for kernel at EL2
coresight: etm-perf: Clarify comment on perf options
ACRN: update MAINTAINERS: mailing list is subscribers-only
regmap: sdw-mbq: use MODULE_LICENSE("GPL")
regmap: sdw: use no_pm routines for SoundWire 1.2 MBQ
regmap: sdw: use _no_pm functions in regmap_read/write
soundwire: intel: fix possible crash when no device is detected
MAINTAINERS: replace my with email with replacements
mhi: Fix double dma free
uapi: map_to_7segment: Update example in documentation
uio: uio_pci_generic: don't fail probe if pdev->irq equals to IRQ_NOTCONNECTED
drivers/misc/vmw_vmci: restrict too big queue size in qp_host_alloc_queue
firewire: replace tricky statement by two simple ones
vme: make remove callback return void
firmware: google: make coreboot driver's remove callback return void
firmware: xilinx: Use explicit values for all enum values
sample/acrn: Introduce a sample of HSM ioctl interface usage
virt: acrn: Introduce an interface for Service VM to control vCPU
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/lftan/nios2
Pull arch/nios2 updates from Ley Foon Tan:
- don't use _end for calculating min_low_pfn
- fix broken sys_clone syscall
- take mmap lock in cacheflush syscall
* tag 'nios2-5.12-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/lftan/nios2:
nios2: Don't use _end for calculating min_low_pfn
nios2: fixed broken sys_clone syscall
Take mmap lock in cacheflush syscall
|
|
Although there has been a bit of back and forth on the subject, it
appears that invalidating TLBs requires an ISB instruction after the
TLBI/DSB sequence when FEAT_ETS is not implemented by the CPU.
From the bible:
| In an implementation that does not implement FEAT_ETS, a TLB
| maintenance instruction executed by a PE, PEx, can complete at any
| time after it is issued, but is only guaranteed to be finished for a
| PE, PEx, after the execution of DSB by the PEx followed by a Context
| synchronization event
Add the missing ISB in enter_vhe(), just in case.
Fixes: f359182291c7 ("arm64: Provide an 'upgrade to VHE' stub hypercall")
Suggested-by: Will Deacon <will@kernel.org>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Acked-by: Mark Rutland <mark.rutland@arm.com>
Link: https://lore.kernel.org/r/20210224093738.3629662-4-maz@kernel.org
Signed-off-by: Will Deacon <will@kernel.org>
|
|
Although there has been a bit of back and forth on the subject, it
appears that invalidating TLBs requires an ISB instruction when FEAT_ETS
is not implemented by the CPU.
From the bible:
| In an implementation that does not implement FEAT_ETS, a TLB
| maintenance instruction executed by a PE, PEx, can complete at any
| time after it is issued, but is only guaranteed to be finished for a
| PE, PEx, after the execution of DSB by the PEx followed by a Context
| synchronization event
Add the missing ISB in __primary_switch, just in case.
Fixes: 3c5e9f238bc4 ("arm64: head.S: move KASLR processing out of __enable_mmu()")
Suggested-by: Will Deacon <will@kernel.org>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Acked-by: Mark Rutland <mark.rutland@arm.com>
Link: https://lore.kernel.org/r/20210224093738.3629662-3-maz@kernel.org
Signed-off-by: Will Deacon <will@kernel.org>
|
|
Enabling the MMU requires the write to SCTLR_ELx (and the ISB
that follows) to live in some identity-mapped memory. Otherwise,
the translation will result in something totally unexpected
(either fetching the wrong instruction stream, or taking a
fault of some sort).
This is exactly what happens in mutate_to_vhe(), as this code
lives in the .hyp.text section, which isn't identity-mapped.
With the right configuration, this explodes badly.
Extract the MMU-enabling part of mutate_to_vhe(), and move
it to its own function that lives in the idmap. This ensures
nothing bad happens.
Fixes: f359182291c7 ("arm64: Provide an 'upgrade to VHE' stub hypercall")
Reported-by: "kernelci.org bot" <bot@kernelci.org>
Tested-by: Guillaume Tucker <guillaume.tucker@collabora.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20210224093738.3629662-2-maz@kernel.org
Signed-off-by: Will Deacon <will@kernel.org>
|
|
Make the hyp vector table entries local functions so they
are not accidentally referred to outside of this file.
Using SYM_CODE_START_LOCAL matches the other vector tables (in hyp-stub.S,
hibernate-asm.S and entry.S)
Signed-off-by: Joey Gouly <joey.gouly@arm.com>
Acked-by: Will Deacon <will@kernel.org>
Acked-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20210222164956.43514-1-joey.gouly@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
|
|
In the arch addition of PF_IO_WORKER, I missed parisc and powerpc for
some reason. Fix that up, ensuring they handle PF_IO_WORKER like they do
PF_KTHREAD in copy_thread().
Reported-by: Bruno Goncalves <bgoncalv@redhat.com>
Fixes: 4727dc20e042 ("arch: setup PF_IO_WORKER threads like PF_KTHREAD")
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Add support to the CPU Measurement counter facility device driver
to extract complete counter sets per CPU and per counter set from user
space. This includes a new device named /dev/hwctr and support
for the device driver functions open, close and ioctl. Other
functions are not supported.
The ioctl command supports 3 subcommands:
S390_HWCTR_START: enables counter sets on a list of CPUs.
S390_HWCTR_STOP: disables counter sets on a list of CPUs.
S390_HWCTR_READ: reads counter sets on a list of CPUs.
The ioctl(..., S390_HWCTR_READ, ...) is the only subcommand which
returns data. It requires member data_bytes to be positive and
indicates the maximum amount of data available to store counter set
data. The other ioctl() subcommands do not use this member and it
should be set to zero.
The S390_HWCTR_READ subcommand returns the following data:
The cpuset data is flattened using the following scheme, stored in member
data:
0x0 0x8 0xc 0x10 0x10 0x18 0x20 0x28 0xU-1
+---------+-----+---------+-----+---------+-----+-----+------+------+
| no_cpus | cpu | no_sets | set | no_cnts | cv1 | cv2 | .... | cv_n |
+---------+-----+---------+-----+---------+-----+-----+------+------+
0xU 0xU+4 0xU+8 0xU+10 0xV-1
+-----+---------+-----+-----+------+------+
| set | no_cnts | cv1 | cv2 | .... | cv_n |
+-----+---------+-----+-----+------+------+
0xV 0xV+4 0xV+8 0xV+c
+-----+---------+-----+---------+-----+-----+------+------+
| cpu | no_sets | set | no_cnts | cv1 | cv2 | .... | cv_n |
+-----+---------+-----+---------+-----+-----+------+------+
U and V denote arbitrary hexadezimal addresses.
The first integer represents the number of CPUs data was extracted
from. This is followed by CPU number and number of counter sets extracted.
Both are two integer values. This is followed by the set identifer
and number of counters extracted. Both are two integer values. This is
followed by the counter values, each element is eight bytes in size.
The S390_HWCTR_READ ioctl subcommand is also limited to one call per
minute. This ensures that an application does not read out the
counter sets too often and reduces the overall CPU performance.
The complete counter set extraction is an expensive operation.
Reviewed-by: Sumanth Korikkar <sumanthk@linux.ibm.com>
Signed-off-by: Thomas Richter <tmricht@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
|
|
The immediate need to have this is to have bpf_send_signal() send the
signal ASAP instead of during the next hrtimer interrupt. However, it
should also improve irq_work_queue() latencies in general, as well as
get s390 out of the lame architectures list [1].
[1] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/kernel/irq_work.c?h=v5.11#n45
Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com>
Reviewed-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
|
|
Make cpumasks static variables to avoid potential large stack
frames. There shouldn't be any concurrent callers since all current
callers are serialized with the cpu hotplug lock.
Reviewed-by: Alexander Gordeev <agordeev@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
|
|
Make "cpumask_t cpumask" a static variable to avoid a potential large
stack frame. Also protect against potential concurrent callers by
introducing a local lock.
Note: smp_emergency_stop() gets only called with irqs and machine
checks disabled, therefore a cpu local deadlock is not possible. For
concurrent callers the first cpu which enters the critical section
wins and will stop all other cpus.
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
|
|
Avoid a potentially large stack frame and overflow by making
"cpumask_t avail" a static variable. There is no concurrent
access due to the existing locking.
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
|
|
Move locking to __smp_rescan() instead of duplicating it to all call sites.
Reviewed-by: Alexander Gordeev <agordeev@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
|
|
Due to historical reasons vmem_*() functions misuse
or ignore the notion of physical vs virtual addresses
difference.
Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
Reviewed-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
|
|
The physical address of page tables is passed around and
used as virtual address in various locations.
Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
Reviewed-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
|
|
There is little sense in applying __pa() to a physical
address, but that what pfn_pXd() macros do.
Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
Reviewed-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
|
|
This update fixes semantics of pXd_deref macros which
are expected to return a CPU-addressable pointer.
Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
Reviewed-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
|
|
Provide correct mnemonic for selfhr.
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux
Pull more clang LTO updates from Kees Cook:
"Clang LTO x86 enablement.
Full disclosure: while this has _not_ been in linux-next (since it
initially looked like the objtool dependencies weren't going to make
v5.12), it has been under daily build and runtime testing by Sami for
quite some time. These x86 portions have been discussed on lkml, with
Peter, Josh, and others helping nail things down.
The bulk of the changes are to get objtool working happily. The rest
of the x86 enablement is very small.
Summary:
- Generate __mcount_loc in objtool (Peter Zijlstra)
- Support running objtool against vmlinux.o (Sami Tolvanen)
- Clang LTO enablement for x86 (Sami Tolvanen)"
Link: https://lore.kernel.org/lkml/20201013003203.4168817-26-samitolvanen@google.com/
Link: https://lore.kernel.org/lkml/cover.1611263461.git.jpoimboe@redhat.com/
* tag 'clang-lto-v5.12-rc1-part2' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
kbuild: lto: force rebuilds when switching CONFIG_LTO
x86, build: allow LTO to be selected
x86, cpu: disable LTO for cpu.c
x86, vdso: disable LTO only for vDSO
kbuild: lto: postpone objtool
objtool: Split noinstr validation from --vmlinux
x86, build: use objtool mcount
tracing: add support for objtool mcount
objtool: Don't autodetect vmlinux.o
objtool: Fix __mcount_loc generation with Clang's assembler
objtool: Add a pass for generating __mcount_loc
|
|
Pull sparc updates from David Miller:
"A host of mall cleanups and adjustments that have accumulated while I
was away, nothing major"
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc: (26 commits)
sparc: make xchg() into a statement expression
sparc64: Use arch_validate_flags() to validate ADI flag
sparc32: Fix comparing pointer to 0 coccicheck warning
sparc: fix led.c driver when PROC_FS is not enabled
sparc: Fix handling of page table constructor failure
sparc64: only select COMPAT_BINFMT_ELF if BINFMT_ELF is set
tty: hvcs: Drop unnecessary if block
tty: vcc: Drop unnecessary if block
tty: vcc: Drop impossible to hit WARN_ON
sparc: sparc64_defconfig: add necessary configs for qemu
sparc64: switch defconfig from the legacy ide driver to libata
sparc32: Preserve clone syscall flags argument for restarts due to signals
sparc32: Limit memblock allocation to low memory
sparc: Replace test_ti_thread_flag() with test_tsk_thread_flag()
sbus: char: Remove meaningless jump label out_free
sparc32: signal: Fix stack trampoline for RT signals
sparc: remove SA_STATIC_ALLOC macro definition
sparc: use for_each_child_of_node() macro
sparc: Use fallthrough pseudo-keyword
sparc32: srmmu: improve type safety of __nocache_fix()
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input
Pull input updates from Dmitry Torokhov:
"Mostly existing driver fixes plus a new driver for game controllers
directly connected to Nintendo 64, and an enhancement for keyboards
driven by Chrome OS EC to communicate layout of the top row to
userspace"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input: (47 commits)
Input: st1232 - fix NORMAL vs. IDLE state handling
Input: aiptek - convert sysfs sprintf/snprintf family to sysfs_emit
Input: alps - fix spelling of "positive"
ARM: dts: cros-ec-keyboard: Use keymap macros
dt-bindings: input: Fix the keymap for LOCK key
dt-bindings: input: Create macros for cros-ec keymap
Input: cros-ec-keyb - expose function row physical map to userspace
dt-bindings: input: cros-ec-keyb: Add a new property describing top row
Input: applespi - fix occasional crc errors under load.
Input: applespi - don't wait for responses to commands indefinitely.
Input: st1232 - add IDLE state as ready condition
Input: zinitix - fix return type of zinitix_init_touch()
Input: i8042 - add ASUS Zenbook Flip to noselftest list
Input: add missing dependencies on CONFIG_HAS_IOMEM
Input: joydev - prevent potential read overflow in ioctl
Input: elo - fix an error code in elo_connect()
Input: xpad - add support for PowerA Enhanced Wired Controller for Xbox Series X|S
Input: sur40 - fix an error code in sur40_probe()
Input: elants_i2c - detect enum overflow
Input: zinitix - remove unneeded semicolon
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/hid/hid
Pull HID updates from Jiri Kosina:
- support for "Unified Battery" feature on Logitech devices from Filipe
Laíns
- power management improvements for intel-ish driver from Zhang Lixu
- support for Goodix devices from Douglas Anderson
- improved handling of generic HID keyboard in order to make it easier
for userspace to figure out the details of the device, from Dmitry
Torokhov
- Playstation DualSense support from Roderick Colenbrander
- other assorted small fixes and device ID additions.
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/hid/hid: (49 commits)
HID: playstation: add DualSense player LED support.
HID: playstation: add microphone mute support for DualSense.
HID: playstation: add initial DualSense lightbar support.
HID: wacom: Ignore attempts to overwrite the touch_max value from HID
HID: playstation: fix array size comparison (off-by-one)
HID: playstation: fix unused variable in ps_battery_get_property.
HID: playstation: report DualSense hardware and firmware version.
HID: playstation: add DualSense classic rumble support.
HID: playstation: add DualSense Bluetooth support.
HID: playstation: track devices in list.
HID: playstation: add DualSense accelerometer and gyroscope support.
HID: playstation: add DualSense touchpad support.
HID: playstation: add DualSense battery support.
HID: playstation: use DualSense MAC address as unique identifier.
HID: playstation: initial DualSense USB support.
HID: ite: Enable QUIRK_TOUCHPAD_ON_OFF_REPORT on Acer Aspire Switch 10E
HID: Ignore battery for Elan touchscreen on HP Spectre X360 15-df0xxx
HID: logitech-dj: add support for the new lightspeed connection iteration
HID: intel-ish-hid: ipc: Add Tiger Lake H PCI device ID
HID: logitech-dj: add support for keyboard events in eQUAD step 4 Gaming
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux
Pull idmapped mounts from Christian Brauner:
"This introduces idmapped mounts which has been in the making for some
time. Simply put, different mounts can expose the same file or
directory with different ownership. This initial implementation comes
with ports for fat, ext4 and with Christoph's port for xfs with more
filesystems being actively worked on by independent people and
maintainers.
Idmapping mounts handle a wide range of long standing use-cases. Here
are just a few:
- Idmapped mounts make it possible to easily share files between
multiple users or multiple machines especially in complex
scenarios. For example, idmapped mounts will be used in the
implementation of portable home directories in
systemd-homed.service(8) where they allow users to move their home
directory to an external storage device and use it on multiple
computers where they are assigned different uids and gids. This
effectively makes it possible to assign random uids and gids at
login time.
- It is possible to share files from the host with unprivileged
containers without having to change ownership permanently through
chown(2).
- It is possible to idmap a container's rootfs and without having to
mangle every file. For example, Chromebooks use it to share the
user's Download folder with their unprivileged containers in their
Linux subsystem.
- It is possible to share files between containers with
non-overlapping idmappings.
- Filesystem that lack a proper concept of ownership such as fat can
use idmapped mounts to implement discretionary access (DAC)
permission checking.
- They allow users to efficiently changing ownership on a per-mount
basis without having to (recursively) chown(2) all files. In
contrast to chown (2) changing ownership of large sets of files is
instantenous with idmapped mounts. This is especially useful when
ownership of a whole root filesystem of a virtual machine or
container is changed. With idmapped mounts a single syscall
mount_setattr syscall will be sufficient to change the ownership of
all files.
- Idmapped mounts always take the current ownership into account as
idmappings specify what a given uid or gid is supposed to be mapped
to. This contrasts with the chown(2) syscall which cannot by itself
take the current ownership of the files it changes into account. It
simply changes the ownership to the specified uid and gid. This is
especially problematic when recursively chown(2)ing a large set of
files which is commong with the aforementioned portable home
directory and container and vm scenario.
- Idmapped mounts allow to change ownership locally, restricting it
to specific mounts, and temporarily as the ownership changes only
apply as long as the mount exists.
Several userspace projects have either already put up patches and
pull-requests for this feature or will do so should you decide to pull
this:
- systemd: In a wide variety of scenarios but especially right away
in their implementation of portable home directories.
https://systemd.io/HOME_DIRECTORY/
- container runtimes: containerd, runC, LXD:To share data between
host and unprivileged containers, unprivileged and privileged
containers, etc. The pull request for idmapped mounts support in
containerd, the default Kubernetes runtime is already up for quite
a while now: https://github.com/containerd/containerd/pull/4734
- The virtio-fs developers and several users have expressed interest
in using this feature with virtual machines once virtio-fs is
ported.
- ChromeOS: Sharing host-directories with unprivileged containers.
I've tightly synced with all those projects and all of those listed
here have also expressed their need/desire for this feature on the
mailing list. For more info on how people use this there's a bunch of
talks about this too. Here's just two recent ones:
https://www.cncf.io/wp-content/uploads/2020/12/Rootless-Containers-in-Gitpod.pdf
https://fosdem.org/2021/schedule/event/containers_idmap/
This comes with an extensive xfstests suite covering both ext4 and
xfs:
https://git.kernel.org/brauner/xfstests-dev/h/idmapped_mounts
It covers truncation, creation, opening, xattrs, vfscaps, setid
execution, setgid inheritance and more both with idmapped and
non-idmapped mounts. It already helped to discover an unrelated xfs
setgid inheritance bug which has since been fixed in mainline. It will
be sent for inclusion with the xfstests project should you decide to
merge this.
In order to support per-mount idmappings vfsmounts are marked with
user namespaces. The idmapping of the user namespace will be used to
map the ids of vfs objects when they are accessed through that mount.
By default all vfsmounts are marked with the initial user namespace.
The initial user namespace is used to indicate that a mount is not
idmapped. All operations behave as before and this is verified in the
testsuite.
Based on prior discussions we want to attach the whole user namespace
and not just a dedicated idmapping struct. This allows us to reuse all
the helpers that already exist for dealing with idmappings instead of
introducing a whole new range of helpers. In addition, if we decide in
the future that we are confident enough to enable unprivileged users
to setup idmapped mounts the permission checking can take into account
whether the caller is privileged in the user namespace the mount is
currently marked with.
The user namespace the mount will be marked with can be specified by
passing a file descriptor refering to the user namespace as an
argument to the new mount_setattr() syscall together with the new
MOUNT_ATTR_IDMAP flag. The system call follows the openat2() pattern
of extensibility.
The following conditions must be met in order to create an idmapped
mount:
- The caller must currently have the CAP_SYS_ADMIN capability in the
user namespace the underlying filesystem has been mounted in.
- The underlying filesystem must support idmapped mounts.
- The mount must not already be idmapped. This also implies that the
idmapping of a mount cannot be altered once it has been idmapped.
- The mount must be a detached/anonymous mount, i.e. it must have
been created by calling open_tree() with the OPEN_TREE_CLONE flag
and it must not already have been visible in the filesystem.
The last two points guarantee easier semantics for userspace and the
kernel and make the implementation significantly simpler.
By default vfsmounts are marked with the initial user namespace and no
behavioral or performance changes are observed.
The manpage with a detailed description can be found here:
https://git.kernel.org/brauner/man-pages/c/1d7b902e2875a1ff342e036a9f866a995640aea8
In order to support idmapped mounts, filesystems need to be changed
and mark themselves with the FS_ALLOW_IDMAP flag in fs_flags. The
patches to convert individual filesystem are not very large or
complicated overall as can be seen from the included fat, ext4, and
xfs ports. Patches for other filesystems are actively worked on and
will be sent out separately. The xfstestsuite can be used to verify
that port has been done correctly.
The mount_setattr() syscall is motivated independent of the idmapped
mounts patches and it's been around since July 2019. One of the most
valuable features of the new mount api is the ability to perform
mounts based on file descriptors only.
Together with the lookup restrictions available in the openat2()
RESOLVE_* flag namespace which we added in v5.6 this is the first time
we are close to hardened and race-free (e.g. symlinks) mounting and
path resolution.
While userspace has started porting to the new mount api to mount
proper filesystems and create new bind-mounts it is currently not
possible to change mount options of an already existing bind mount in
the new mount api since the mount_setattr() syscall is missing.
With the addition of the mount_setattr() syscall we remove this last
restriction and userspace can now fully port to the new mount api,
covering every use-case the old mount api could. We also add the
crucial ability to recursively change mount options for a whole mount
tree, both removing and adding mount options at the same time. This
syscall has been requested multiple times by various people and
projects.
There is a simple tool available at
https://github.com/brauner/mount-idmapped
that allows to create idmapped mounts so people can play with this
patch series. I'll add support for the regular mount binary should you
decide to pull this in the following weeks:
Here's an example to a simple idmapped mount of another user's home
directory:
u1001@f2-vm:/$ sudo ./mount --idmap both:1000:1001:1 /home/ubuntu/ /mnt
u1001@f2-vm:/$ ls -al /home/ubuntu/
total 28
drwxr-xr-x 2 ubuntu ubuntu 4096 Oct 28 22:07 .
drwxr-xr-x 4 root root 4096 Oct 28 04:00 ..
-rw------- 1 ubuntu ubuntu 3154 Oct 28 22:12 .bash_history
-rw-r--r-- 1 ubuntu ubuntu 220 Feb 25 2020 .bash_logout
-rw-r--r-- 1 ubuntu ubuntu 3771 Feb 25 2020 .bashrc
-rw-r--r-- 1 ubuntu ubuntu 807 Feb 25 2020 .profile
-rw-r--r-- 1 ubuntu ubuntu 0 Oct 16 16:11 .sudo_as_admin_successful
-rw------- 1 ubuntu ubuntu 1144 Oct 28 00:43 .viminfo
u1001@f2-vm:/$ ls -al /mnt/
total 28
drwxr-xr-x 2 u1001 u1001 4096 Oct 28 22:07 .
drwxr-xr-x 29 root root 4096 Oct 28 22:01 ..
-rw------- 1 u1001 u1001 3154 Oct 28 22:12 .bash_history
-rw-r--r-- 1 u1001 u1001 220 Feb 25 2020 .bash_logout
-rw-r--r-- 1 u1001 u1001 3771 Feb 25 2020 .bashrc
-rw-r--r-- 1 u1001 u1001 807 Feb 25 2020 .profile
-rw-r--r-- 1 u1001 u1001 0 Oct 16 16:11 .sudo_as_admin_successful
-rw------- 1 u1001 u1001 1144 Oct 28 00:43 .viminfo
u1001@f2-vm:/$ touch /mnt/my-file
u1001@f2-vm:/$ setfacl -m u:1001:rwx /mnt/my-file
u1001@f2-vm:/$ sudo setcap -n 1001 cap_net_raw+ep /mnt/my-file
u1001@f2-vm:/$ ls -al /mnt/my-file
-rw-rwxr--+ 1 u1001 u1001 0 Oct 28 22:14 /mnt/my-file
u1001@f2-vm:/$ ls -al /home/ubuntu/my-file
-rw-rwxr--+ 1 ubuntu ubuntu 0 Oct 28 22:14 /home/ubuntu/my-file
u1001@f2-vm:/$ getfacl /mnt/my-file
getfacl: Removing leading '/' from absolute path names
# file: mnt/my-file
# owner: u1001
# group: u1001
user::rw-
user:u1001:rwx
group::rw-
mask::rwx
other::r--
u1001@f2-vm:/$ getfacl /home/ubuntu/my-file
getfacl: Removing leading '/' from absolute path names
# file: home/ubuntu/my-file
# owner: ubuntu
# group: ubuntu
user::rw-
user:ubuntu:rwx
group::rw-
mask::rwx
other::r--"
* tag 'idmapped-mounts-v5.12' of git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux: (41 commits)
xfs: remove the possibly unused mp variable in xfs_file_compat_ioctl
xfs: support idmapped mounts
ext4: support idmapped mounts
fat: handle idmapped mounts
tests: add mount_setattr() selftests
fs: introduce MOUNT_ATTR_IDMAP
fs: add mount_setattr()
fs: add attr_flags_to_mnt_flags helper
fs: split out functions to hold writers
namespace: only take read lock in do_reconfigure_mnt()
mount: make {lock,unlock}_mount_hash() static
namespace: take lock_mount_hash() directly when changing flags
nfs: do not export idmapped mounts
overlayfs: do not mount on top of idmapped mounts
ecryptfs: do not mount on top of idmapped mounts
ima: handle idmapped mounts
apparmor: handle idmapped mounts
fs: make helpers idmap mount aware
exec: handle idmapped mounts
would_dump: handle idmapped mounts
...
|
|
Pass code model and stack alignment to the linker as these are not
stored in LLVM bitcode, and allow CONFIG_LTO_CLANG* to be enabled.
Signed-off-by: Sami Tolvanen <samitolvanen@google.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
|
|
Clang incorrectly inlines functions with differing stack protector
attributes, which breaks __restore_processor_state() that relies on
stack protector being disabled. This change disables LTO for cpu.c
to work aroung the bug.
Link: https://bugs.llvm.org/show_bug.cgi?id=47479
Suggested-by: Nick Desaulniers <ndesaulniers@google.com>
Signed-off-by: Sami Tolvanen <samitolvanen@google.com>
|