Age | Commit message (Collapse) | Author |
|
Since ext4/ocfs2 are using jbd2_inode dirty range scoping APIs now,
jbd2_journal_inode_add_[write|wait] are not used any more, remove them.
Link: http://lkml.kernel.org/r/1562977611-8412-2-git-send-email-joseph.qi@linux.alibaba.com
Signed-off-by: Joseph Qi <joseph.qi@linux.alibaba.com>
Reviewed-by: Ross Zwisler <zwisler@google.com>
Acked-by: Changwei Ge <chge@linux.alibaba.com>
Cc: Gang He <ghe@suse.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Joseph Qi <jiangqi903@gmail.com>
Cc: Jun Piao <piaojun@huawei.com>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Cc: Mark Fasheh <mark@fasheh.com>
Cc: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
6ba0e7dc64a5 ("jbd2: introduce jbd2_inode dirty range scoping") allow us
scoping each of the inode dirty ranges associated with a given
transaction, and ext4 already does this way.
Now let's also use the newly introduced jbd2_inode dirty range scoping to
prevent us from waiting forever when trying to complete a journal
transaction in ocfs2.
Link: http://lkml.kernel.org/r/1562977611-8412-1-git-send-email-joseph.qi@linux.alibaba.com
Signed-off-by: Joseph Qi <joseph.qi@linux.alibaba.com>
Reviewed-by: Ross Zwisler <zwisler@google.com>
Reviewed-by: Changwei Ge <chge@linux.alibaba.com>
Cc: "Theodore Ts'o" <tytso@mit.edu>
Cc: Mark Fasheh <mark@fasheh.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Cc: Joseph Qi <jiangqi903@gmail.com>
Cc: Gang He <ghe@suse.com>
Cc: Jun Piao <piaojun@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Since 9e3596b0c653 ("kbuild: initramfs cleanup, set target from Kconfig")
"make clean" leaves behind compressed initramfs images. Example:
$ make defconfig
$ sed -i 's|CONFIG_INITRAMFS_SOURCE=""|CONFIG_INITRAMFS_SOURCE="/tmp/ir.cpio"|' .config
$ make olddefconfig
$ make -s
$ make -s clean
$ git clean -ndxf | grep initramfs
Would remove usr/initramfs_data.cpio.gz
clean rules do not have CONFIG_* context so they do not know which
compression format was used. Thus they don't know which files to delete.
Tell clean to delete all possible compression formats.
Once patched usr/initramfs_data.cpio.gz and friends are deleted by
"make clean".
Link: http://lkml.kernel.org/r/20190722063251.55541-1-gthelen@google.com
Fixes: 9e3596b0c653 ("kbuild: initramfs cleanup, set target from Kconfig")
Signed-off-by: Greg Thelen <gthelen@google.com>
Cc: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
z3fold_page_reclaim()'s retry mechanism is broken: on a second iteration
it will have zhdr from the first one so that zhdr is no longer in line
with struct page. That leads to crashes when the system is stressed.
Fix that by moving zhdr assignment up.
While at it, protect against using already freed handles by using own
local slots structure in z3fold_page_reclaim().
Link: http://lkml.kernel.org/r/20190908162919.830388dc7404d1e2c80f4095@gmail.com
Signed-off-by: Vitaly Wool <vitalywool@gmail.com>
Reported-by: Markus Linnala <markus.linnala@gmail.com>
Reported-by: Chris Murphy <bugzilla@colorremedies.com>
Reported-by: Agustin Dall'Alba <agustin@dallalba.com.ar>
Cc: "Maciej S. Szmigiero" <mail@maciej.szmigiero.name>
Cc: Shakeel Butt <shakeelb@google.com>
Cc: Henry Burns <henrywolfeburns@gmail.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
On kernels without CONFIG_MMU, we get a link error for the siw driver:
drivers/infiniband/sw/siw/siw_mem.o: In function `siw_umem_get':
siw_mem.c:(.text+0x4c8): undefined reference to `can_do_mlock'
This is probably not the only driver that needs the function and could
otherwise build correctly without CONFIG_MMU, so add a dummy variant that
always returns false.
Link: http://lkml.kernel.org/r/20190909204201.931830-1-arnd@arndb.de
Fixes: 2251334dcac9 ("rdma/siw: application buffer management")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Suggested-by: Jason Gunthorpe <jgg@mellanox.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Bernard Metzler <bmt@zurich.ibm.com>
Cc: "Matthew Wilcox (Oracle)" <willy@infradead.org>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
With the original commit applied, z3fold_zpool_destroy() may get blocked
on wait_event() for indefinite time. Revert this commit for the time
being to get rid of this problem since the issue the original commit
addresses is less severe.
Link: http://lkml.kernel.org/r/20190910123142.7a9c8d2de4d0acbc0977c602@gmail.com
Fixes: d776aaa9895eb6eb77 ("mm/z3fold.c: fix race between migration and destruction")
Reported-by: Agustín Dall'Alba <agustin@dallalba.com.ar>
Signed-off-by: Vitaly Wool <vitalywool@gmail.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Vitaly Wool <vitalywool@gmail.com>
Cc: Shakeel Butt <shakeelb@google.com>
Cc: Jonathan Adams <jwadams@google.com>
Cc: Henry Burns <henrywolfeburns@gmail.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
If userspace reads the buffer via blockdev while mounting,
sb_getblk()+modify can race with buffer read via blockdev.
For example,
FS userspace
bh = sb_getblk()
modify bh->b_data
read
ll_rw_block(bh)
fill bh->b_data by on-disk data
/* lost modified data by FS */
set_buffer_uptodate(bh)
set_buffer_uptodate(bh)
Userspace should not use the blockdev while mounting though, the udev
seems to be already doing this. Although I think the udev should try to
avoid this, workaround the race by small overhead.
Link: http://lkml.kernel.org/r/87pnk7l3sw.fsf_-_@mail.parknet.co.jp
Signed-off-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
Reported-by: Jan Stancek <jstancek@redhat.com>
Tested-by: Jan Stancek <jstancek@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Right now we force an unbind of SCM memory at drcindex on H_OVERLAP error.
This really slows down operations like kexec where we get the H_OVERLAP
error because we don't go through a full hypervisor re init.
H_OVERLAP error for a H_SCM_BIND_MEM hcall indicates that SCM memory at
drc index is already bound. Since we don't specify a logical memory
address for bind hcall, we can use the H_SCM_QUERY hcall to query
the already bound logical address.
Boot time difference with and without patch is:
[ 5.583617] IOMMU table initialized, virtual merging enabled
[ 5.603041] papr_scm ibm,persistent-memory:ibm,pmemory@44104001: Retrying bind after unbinding
[ 301.514221] papr_scm ibm,persistent-memory:ibm,pmemory@44108001: Retrying bind after unbinding
[ 340.057238] hv-24x7: read 1530 catalog entries, created 537 event attrs (0 failures), 275 descs
after fix
[ 5.101572] IOMMU table initialized, virtual merging enabled
[ 5.116984] papr_scm ibm,persistent-memory:ibm,pmemory@44104001: Querying SCM details
[ 5.117223] papr_scm ibm,persistent-memory:ibm,pmemory@44108001: Querying SCM details
[ 5.120530] hv-24x7: read 1530 catalog entries, created 537 event attrs (0 failures), 275 descs
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20190903123452.28620-2-aneesh.kumar@linux.ibm.com
|
|
This simplifies the error handling and also enable us to switch to
H_SCM_QUERY hcall in a later patch on H_OVERLAP error.
We also do some kernel print formatting fixup in this patch.
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20190903123452.28620-1-aneesh.kumar@linux.ibm.com
|
|
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
[mpe: Some minor fixes to make it build]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20190924035254.24612-4-aneesh.kumar@linux.ibm.com
|
|
Add the flag to the filelayout driver to add LAYOUTGET to
the OPEN compound.
Signed-off-by: Olga Kornievskaia <kolga@netapp.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
|
|
In the years since the max readahead size was fixed in NFS, a number of
things have happened:
- Users can now set the value directly using /sys/class/bdi
- NFS max supported block sizes have increased by several orders of
magnitude from 64K to 1MB.
- Disk access latencies are orders of magnitude faster due to SSD + NVME.
In particular note that if the server is advertising 1MB as the optimal
read size, as that will set the readahead size to 15MB.
Let's therefore adjust down, and try to default to VM_READAHEAD_PAGES.
However let's inform the VM about our preferred block size so that it
can choose to round up in cases where that makes sense.
Reported-by: Alkis Georgopoulos <alkisg@gmail.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
|
|
Pull Microblaze updates from Michal Simek:
- clean up reset gpio handler
- defconfig updates
- add support for 8 byte get_user()
- switch to generic dma code
* tag 'microblaze-v5.4-rc1' of git://git.monstr.eu/linux-2.6-microblaze:
microblaze: Switch to standard restart handler
microblaze: defconfig synchronization
microblaze: Enable Xilinx AXI emac driver by default
arch/microblaze: support get_user() of size 8 bytes
microblaze: remove ioremap_fullcache
microblaze: use the generic dma coherent remap allocator
microblaze/nommu: use the generic uncached segment support
|
|
git://git.infradead.org/linux-platform-drivers-x86
Pull x86 platform-drivers fixes from Andy Shevchenko:
- Fix compilation error of ASUS WMI driver when CONFIG_ACPI_BATTERY=n
- Fix I²C multi-instantiate driver to work with several USB PD devices
- Fix boot issue on Siemens SIMATIC IPC277E when PMC critical clock is
being disabled
- Plenty of fixes to Intel Speed-Select Technology tools
* tag 'platform-drivers-x86-v5.4-2' of git://git.infradead.org/linux-platform-drivers-x86:
platform/x86: i2c-multi-instantiate: Derive the device name from parent
platform/x86: pmc_atom: Add Siemens SIMATIC IPC277E to critclk_systems DMI table
tools/power/x86/intel-speed-select: Fix perf-profile command output
tools/power/x86/intel-speed-select: Extend core-power command set
tools/power/x86/intel-speed-select: Fix some debug prints
tools/power/x86/intel-speed-select: Format get-assoc information
tools/power/x86/intel-speed-select: Allow online/offline based on tdp
tools/power/x86/intel-speed-select: Fix high priority core mask over count
platform/x86: asus-wmi: Make it depend on ACPI battery API
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux
Pull Hyper-V updates from Sasha Levin:
- first round of vmbus hibernation support (Dexuan Cui)
- remove dependencies on PAGE_SIZE (Maya Nakamura)
- move the hyper-v tools/ code into the tools build system (Andy
Shevchenko)
- hyper-v balloon cleanups (Dexuan Cui)
* tag 'hyperv-next-signed' of git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux:
Drivers: hv: vmbus: Resume after fixing up old primary channels
Drivers: hv: vmbus: Suspend after cleaning up hv_sock and sub channels
Drivers: hv: vmbus: Clean up hv_sock channels by force upon suspend
Drivers: hv: vmbus: Suspend/resume the vmbus itself for hibernation
Drivers: hv: vmbus: Ignore the offers when resuming from hibernation
Drivers: hv: vmbus: Implement suspend/resume for VSC drivers for hibernation
Drivers: hv: vmbus: Add a helper function is_sub_channel()
Drivers: hv: vmbus: Suspend/resume the synic for hibernation
Drivers: hv: vmbus: Break out synic enable and disable operations
HID: hv: Remove dependencies on PAGE_SIZE for ring buffer
Tools: hv: move to tools buildsystem
hv_balloon: Reorganize the probe function
hv_balloon: Use a static page for the balloon_up send buffer
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull more mount API conversions from Al Viro:
"Assorted conversions of options parsing to new API.
gfs2 is probably the most serious one here; the rest is trivial stuff.
Other things in what used to be #work.mount are going to wait for the
next cycle (and preferably go via git trees of the filesystems
involved)"
* 'work.mount3' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
gfs2: Convert gfs2 to fs_context
vfs: Convert spufs to use the new mount API
vfs: Convert hypfs to use the new mount API
hypfs: Fix error number left in struct pointer member
vfs: Convert functionfs to use the new mount API
vfs: Convert bpf to use the new mount API
|
|
Fix
arch/ia64/kernel/irq_ia64.c:586:1: warning: no return statement in function returning non-void [-Wreturn-type]
arch/ia64/mm/contig.c:111:6: warning: unused variable 'rc' [-Wunused-variable]
arch/ia64/mm/discontig.c:189:39: warning: unused variable 'rc' [-Wunused-variable]
Signed-off-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
load different cp firmware according to the DID and RID
Reviewed-by: Feifei Xu <Feifei.Xu@amd.com>
Signed-off-by: Tianci.Yin <tianci.yin@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
It's apparently needed in some configurations.
Reviewed-by: Harry Wentland <harry.wentland@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
to_mp() was first introduced with the following commit:
'commit 801cc4e17a34c ("xfs: debug mode forced buffered write failure")'
But the user of to_mp() was removed by below commit:
'commit f8c47250ba46e ("xfs: convert drop_writes to use the errortag
mechanism")'
So kernel build with clang throws below warning message:
fs/xfs/xfs_sysfs.c:72:1: warning: unused function 'to_mp' [-Wunused-function]
to_mp(struct kobject *kobject)
Hence to_mp() might be removed safely to get rid of warning message.
Signed-off-by: Austin Kim <austindh.kim@gmail.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
|
|
xfs_trans_log_buf takes first byte, last byte as args. In this
case, it should be from 0 to sizeof() - 1.
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
|
|
Running old skge driver on PowerPC causes checksum errors
because hardware reported 1's complement checksum is in little-endian
byte order.
Reported-by: Benoit <benoit.sansoni@gmail.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
struct archdr is only big enough to hold the header of various types of
arcnet packets. So to provide enough space to hold the data read from
hardware provide a buffer large enough to hold a packet with maximal
size.
The problem was noticed by the stack protector which makes the kernel
oops.
Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Acked-by: Michael Grzeschik <m.grzeschik@pengutronix.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The intention was to have the GEO_TX_POWER_LIMIT command in FW version
36 as well, but not all 8000 family got this feature enabled. The
8000 family is the only one using version 36, so skip this version
entirely. If we try to send this command to the firmwares that do not
support it, we get a BAD_COMMAND response from the firmware.
This fixes https://bugzilla.kernel.org/show_bug.cgi?id=204151.
Cc: stable@vger.kernel.org # 4.19+
Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
|
|
mt7615 patch/n9/cr4 firmwares are available in mediatek folder in
linux-firmware repository. Because of this mt7615 won't work on regular
distributions like Ubuntu. Fix path definitions. Moreover remove useless
firmware name pointers and use definitions directly
Fixes: 04b8e65922f6 ("mt76: add mac80211 driver for MT7615 PCIe-based chipsets")
Cc: stable@vger.kernel.org
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
|
|
Greg Kroah-Hartman says:
====================
Raw socket cleanups
Ori Nimron pointed out that there are a number of places in the kernel
where you can create a raw socket, without having to have the
CAP_NET_RAW permission.
To resolve this, here's a short patch series to test these odd and old
protocols for this permission before allowing the creation to succeed
All patches are currently against the net tree.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
When creating a raw AF_NFC socket, CAP_NET_RAW needs to be checked
first.
Signed-off-by: Ori Nimron <orinimron123@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
When creating a raw AF_IEEE802154 socket, CAP_NET_RAW needs to be
checked first.
Signed-off-by: Ori Nimron <orinimron123@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Acked-by: Stefan Schmidt <stefan@datenfreihafen.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
When creating a raw AF_AX25 socket, CAP_NET_RAW needs to be checked
first.
Signed-off-by: Ori Nimron <orinimron123@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
When creating a raw AF_APPLETALK socket, CAP_NET_RAW needs to be checked
first.
Signed-off-by: Ori Nimron <orinimron123@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
When creating a raw AF_ISDN socket, CAP_NET_RAW needs to be checked
first.
Signed-off-by: Ori Nimron <orinimron123@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
If the allocation done in tcf_exts_init() failed,
we end up with a NULL pointer in exts->actions.
kasan: GPF could be caused by NULL-ptr deref or user memory access
general protection fault: 0000 [#1] PREEMPT SMP KASAN
CPU: 1 PID: 8198 Comm: syz-executor.3 Not tainted 5.3.0-rc8+ #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
RIP: 0010:tcf_action_destroy+0x71/0x160 net/sched/act_api.c:705
Code: c3 08 44 89 ee e8 4f cb bb fb 41 83 fd 20 0f 84 c9 00 00 00 e8 c0 c9 bb fb 48 89 d8 48 b9 00 00 00 00 00 fc ff df 48 c1 e8 03 <80> 3c 08 00 0f 85 c0 00 00 00 4c 8b 33 4d 85 f6 0f 84 9d 00 00 00
RSP: 0018:ffff888096e16ff0 EFLAGS: 00010246
RAX: 0000000000000000 RBX: 0000000000000000 RCX: dffffc0000000000
RDX: 0000000000040000 RSI: ffffffff85b6ab30 RDI: 0000000000000000
RBP: ffff888096e17020 R08: ffff8880993f6140 R09: fffffbfff11cae67
R10: fffffbfff11cae66 R11: ffffffff88e57333 R12: 0000000000000000
R13: 0000000000000000 R14: ffff888096e177a0 R15: 0000000000000001
FS: 00007f62bc84a700(0000) GS:ffff8880ae900000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000758040 CR3: 0000000088b64000 CR4: 00000000001426e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
tcf_exts_destroy+0x38/0xb0 net/sched/cls_api.c:3030
tcindex_set_parms+0xf7f/0x1e50 net/sched/cls_tcindex.c:488
tcindex_change+0x230/0x318 net/sched/cls_tcindex.c:519
tc_new_tfilter+0xa4b/0x1c70 net/sched/cls_api.c:2152
rtnetlink_rcv_msg+0x838/0xb00 net/core/rtnetlink.c:5214
netlink_rcv_skb+0x177/0x450 net/netlink/af_netlink.c:2477
rtnetlink_rcv+0x1d/0x30 net/core/rtnetlink.c:5241
netlink_unicast_kernel net/netlink/af_netlink.c:1302 [inline]
netlink_unicast+0x531/0x710 net/netlink/af_netlink.c:1328
netlink_sendmsg+0x8a5/0xd60 net/netlink/af_netlink.c:1917
sock_sendmsg_nosec net/socket.c:637 [inline]
sock_sendmsg+0xd7/0x130 net/socket.c:657
___sys_sendmsg+0x3e2/0x920 net/socket.c:2311
__sys_sendmmsg+0x1bf/0x4d0 net/socket.c:2413
__do_sys_sendmmsg net/socket.c:2442 [inline]
Fixes: 90b73b77d08e ("net: sched: change action API to use array of pointers to actions")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: syzbot <syzkaller@googlegroups.com>
Cc: Vlad Buslov <vladbu@mellanox.com>
Cc: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Allowing an unlimited number of MSRs to be specified via the VMX
load/store MSR lists (e.g., vm-entry MSR load list) is bad for two
reasons. First, a guest can specify an unreasonable number of MSRs,
forcing KVM to process all of them in software. Second, the SDM bounds
the number of MSRs allowed to be packed into the atomic switch MSR lists.
Quoting the "Miscellaneous Data" section in the "VMX Capability
Reporting Facility" appendix:
"Bits 27:25 is used to compute the recommended maximum number of MSRs
that should appear in the VM-exit MSR-store list, the VM-exit MSR-load
list, or the VM-entry MSR-load list. Specifically, if the value bits
27:25 of IA32_VMX_MISC is N, then 512 * (N + 1) is the recommended
maximum number of MSRs to be included in each list. If the limit is
exceeded, undefined processor behavior may result (including a machine
check during the VMX transition)."
Because KVM needs to protect itself and can't model "undefined processor
behavior", arbitrarily force a VM-entry to fail due to MSR loading when
the MSR load list is too large. Similarly, trigger an abort during a VM
exit that encounters an MSR load list or MSR store list that is too large.
The MSR list size is intentionally not pre-checked so as to maintain
compatibility with hardware inasmuch as possible.
Test these new checks with the kvm-unit-test "x86: nvmx: test max atomic
switch MSRs".
Suggested-by: Jim Mattson <jmattson@google.com>
Reviewed-by: Jim Mattson <jmattson@google.com>
Reviewed-by: Peter Shier <pshier@google.com>
Signed-off-by: Marc Orr <marcorr@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
The RDPRU instruction gives the guest read access to the IA32_APERF
MSR and the IA32_MPERF MSR. According to volume 3 of the APM, "When
virtualization is enabled, this instruction can be intercepted by the
Hypervisor. The intercept bit is at VMCB byte offset 10h, bit 14."
Since we don't enumerate the instruction in KVM_SUPPORTED_CPUID,
intercept it and synthesize #UD.
Signed-off-by: Jim Mattson <jmattson@google.com>
Reviewed-by: Drew Schmitt <dasch@google.com>
Reviewed-by: Jacob Xu <jacobhxu@google.com>
Reviewed-by: Peter Shier <pshier@google.com>
Reviewed-by: Krish Sadhukhan <krish.sadhukhan@oracle.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
According to the Intel SDM, volume 2, "CPUID," the index is
significant (or partially significant) for CPUID leaves 0FH, 10H, 12H,
17H, 18H, and 1FH.
Add the corresponding flag to these CPUID leaves in do_host_cpuid().
Signed-off-by: Jim Mattson <jmattson@google.com>
Reviewed-by: Peter Shier <pshier@google.com>
Reviewed-by: Steve Rutherford <srutherford@google.com>
Fixes: a87f2d3a6eadab ("KVM: x86: Add Intel CPUID.1F cpuid emulation support")
Reviewed-by: Krish Sadhukhan <krish.sadhukhan@oracle.com>
Cc: stable@vger.kernel.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
Fix sparse warning:
fs/fuse/dev.c:468:6: warning: symbol 'fuse_args_to_req' was not declared. Should it be static?
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Fixes: 68583165f962 ("fuse: add pages to fuse_args")
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
|
|
If cuse_send_init fails, need to fuse_conn_put cc->fc.
cuse_channel_open->fuse_conn_init->refcount_set(&fc->count, 1)
->fuse_dev_alloc->fuse_conn_get
->fuse_dev_free->fuse_conn_put
Fixes: cc080e9e9be1 ("fuse: introduce per-instance fuse_dev structure")
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: zhengbin <zhengbin13@huawei.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
|
|
With DEBUG_PAGEALLOC on, the following triggers.
BUG: unable to handle page fault for address: ffff88859367c000
#PF: supervisor read access in kernel mode
#PF: error_code(0x0000) - not-present page
PGD 3001067 P4D 3001067 PUD 406d3a8067 PMD 406d30c067 PTE 800ffffa6c983060
Oops: 0000 [#1] SMP DEBUG_PAGEALLOC
CPU: 38 PID: 3110657 Comm: python2.7
RIP: 0010:fuse_readdir+0x88f/0xe7a [fuse]
Code: 49 8b 4d 08 49 39 4e 60 0f 84 44 04 00 00 48 8b 43 08 43 8d 1c 3c 4d 01 7e 68 49 89 dc 48 03 5c 24 38 49 89 46 60 8b 44 24 30 <8b> 4b 10 44 29 e0 48 89 ca 48 83 c1 1f 48 83 e1 f8 83 f8 17 49 89
RSP: 0018:ffffc90035edbde0 EFLAGS: 00010286
RAX: 0000000000001000 RBX: ffff88859367bff0 RCX: 0000000000000000
RDX: 0000000000000000 RSI: ffff88859367bfed RDI: 0000000000920907
RBP: ffffc90035edbe90 R08: 000000000000014b R09: 0000000000000004
R10: ffff88859367b000 R11: 0000000000000000 R12: 0000000000000ff0
R13: ffffc90035edbee0 R14: ffff889fb8546180 R15: 0000000000000020
FS: 00007f80b5f4a740(0000) GS:ffff889fffa00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: ffff88859367c000 CR3: 0000001c170c2001 CR4: 00000000003606e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
iterate_dir+0x122/0x180
__x64_sys_getdents+0xa6/0x140
do_syscall_64+0x42/0x100
entry_SYSCALL_64_after_hwframe+0x44/0xa9
It's in fuse_parse_cache(). %rbx (ffff88859367bff0) is fuse_dirent
pointer - addr + offset. FUSE_DIRENT_SIZE() is trying to dereference
namelen off of it but that derefs into the next page which is disabled
by pagealloc debug causing a PF.
This is caused by dirent->namelen being accessed before ensuring that
there's enough bytes in the page for the dirent. Fix it by pushing
down reclen calculation.
Signed-off-by: Tejun Heo <tj@kernel.org>
Fixes: 5d7bc7e8680c ("fuse: allow using readdir cache")
Cc: stable@vger.kernel.org # v4.20+
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
|
|
This function has been made static, which now causes a compile-time
warning:
WARNING: "fuse_put_request" [vmlinux] is a static EXPORT_SYMBOL_GPL
Remove the unneeded export.
Fixes: 66abc3599c3c ("fuse: unexport request ops")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
|
|
account per-file, dentry, and inode data
blockdev/superblock and temporary per-request data was left alone, as
this usually isn't accounted
Reviewed-by: Shakeel Butt <shakeelb@google.com>
Signed-off-by: Khazhismel Kumykov <khazhy@google.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
|
|
Implements the optimization noted in commit f75fdf22b0a8 ("fuse: don't
use ->d_time"), as the additional memory can be significant. (In
particular, on SLAB configurations this 8-byte alloc becomes 32 bytes).
Per-dentry, this can consume significant memory.
Reviewed-by: Shakeel Butt <shakeelb@google.com>
Signed-off-by: Khazhismel Kumykov <khazhy@google.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
|
|
unlock_page() was missing in case of an already in-flight write against the
same page.
Signed-off-by: Vasily Averin <vvs@virtuozzo.com>
Fixes: ff17be086477 ("fuse: writepage: skip already in flight")
Cc: <stable@vger.kernel.org> # v3.13
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
|
|
After 75b28af("io_uring: allocate the two rings together"), we compare
sq.head with cached_cq_tail to determine does there any cq invalid.
Actually, we should use cq.head.
Fixes: 75b28affdd6a ("io_uring: allocate the two rings together")
Signed-off-by: yangerkun <yangerkun@huawei.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Do not skip invalid shadow pages when zapping obsolete pages if the
pages' root_count has reached zero, in which case the page can be
immediately zapped and freed.
Update the comment accordingly.
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
Toggle mmu_valid_gen between '0' and '1' instead of blindly incrementing
the generation. Because slots_lock is held for the entire duration of
zapping obsolete pages, it's impossible for there to be multiple invalid
generations associated with shadow pages at any given time.
Toggling between the two generations (valid vs. invalid) allows changing
mmu_valid_gen from an unsigned long to a u8, which reduces the size of
struct kvm_mmu_page from 160 to 152 bytes on 64-bit KVM, i.e. reduces
KVM's memory footprint by 8 bytes per shadow page.
Set sp->mmu_valid_gen before it is added to active_mmu_pages.
Functionally this has no effect as kvm_mmu_alloc_page() has a single
caller that sets sp->mmu_valid_gen soon thereafter, but visually it is
jarring to see a shadow page being added to the list without its
mmu_valid_gen first being set.
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
Now that the fast invalidate mechanism has been reintroduced, restore
the performance tweaks for fast invalidation that existed prior to its
removal.
Paraphrasing the original changelog (commit 5ff0568374ed2 was itself a
partial revert):
Don't force reloading the remote mmu when zapping an obsolete page, as
a MMU_RELOAD request has already been issued by kvm_mmu_zap_all_fast()
immediately after incrementing mmu_valid_gen, i.e. after marking pages
obsolete.
This reverts commit 5ff0568374ed2e585376a3832857ade5daccd381.
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
Now that the fast invalidate mechanism has been reintroduced, restore
the performance tweaks for fast invalidation that existed prior to its
removal.
Paraphrashing the original changelog:
Introduce a per-VM list to track obsolete shadow pages, i.e. pages
which have been deleted from the mmu cache but haven't yet been freed.
When page reclaiming is needed, zap/free the deleted pages first.
This reverts commit 52d5dedc79bdcbac2976159a172069618cf31be5.
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
pages""
Now that the fast invalidate mechanism has been reintroduced, restore
the performance tweaks for fast invalidation that existed prior to its
removal.
Paraphrashing the original changelog:
Reload the mmu on all vCPUs after updating the generation number so
that obsolete pages are not used by any vCPUs. This allows collapsing
all TLB flushes during obsolete page zapping into a single flush, as
there is no need to flush when dropping mmu_lock (to reschedule).
Note: a remote TLB flush is still needed before freeing the pages as
other vCPUs may be doing a lockless shadow page walk.
Opportunstically improve the comments restored by the revert (the
code itself is a true revert).
This reverts commit f34d251d66ba263c077ed9d2bbd1874339a4c887.
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
Now that the fast invalidate mechanism has been reintroduced, restore
the performance tweaks for fast invalidation that existed prior to its
removal.
Paraphrashing the original changelog:
Zap at least 10 shadow pages before releasing mmu_lock to reduce the
overhead associated with re-acquiring the lock.
Note: "10" is an arbitrary number, speculated to be high enough so
that a vCPU isn't stuck zapping obsolete pages for an extended period,
but small enough so that other vCPUs aren't starved waiting for
mmu_lock.
This reverts commit 43d2b14b105fb00b8864c7b0ee7043cc1cc4a969.
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
kvm_mmu_invalidate_all_pages""
Now that the fast invalidate mechanism has been reintroduced, restore
the tracepoint associated with said mechanism.
Note, the name of the tracepoint deviates from the original tracepoint
so as to match KVM's current nomenclature.
This reverts commit 42560fb1f3c6c7f730897b7fa7a478bc37e0be50.
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|