Age | Commit message (Collapse) | Author |
|
The first parameter of module_param is @name, but @value is used
in description. Fix it.
Fixes: 546970bc6afc ("param: add kerneldoc to moduleparam.h")
Signed-off-by: Zhenzhong Duan <zhenzhong.duan@oracle.com>
Signed-off-by: Jessica Yu <jeyu@kernel.org>
|
|
Pull in dependencies for the new zoned open/close/finish support.
* for-5.5/block: (32 commits)
block: add zone open, close and finish ioctl support
block: add zone open, close and finish operations
block: Simplify REQ_OP_ZONE_RESET_ALL handling
block: Remove REQ_OP_ZONE_RESET plugging
block: Warn if elevator= parameter is used
block: avoid blk_bio_segment_split for small I/O operations
blk-mq: make sure that line break can be printed
block: sed-opal: Introduce Opal Datastore UID
block: sed-opal: Add support to read/write opal tables generically
block: sed-opal: Generalizing write data to any opal table
bdev: Refresh bdev size for disks without partitioning
bdev: Factor out bdev revalidation into a common helper
blk-mq: avoid sysfs buffer overflow with too many CPU cores
blk-mq: Make blk_mq_run_hw_queue() return void
fcntl: fix typo in RWH_WRITE_LIFE_NOT_SET r/w hint name
blk-mq: fill header with kernel-doc
blk-mq: remove needless goto from blk_mq_get_driver_tag
block: reorder bio::__bi_remaining for better packing
block: Reduce the amount of memory used for tag sets
block: Reduce the amount of memory required per request queue
...
|
|
Introduce three new ioctl commands BLKOPENZONE, BLKCLOSEZONE and
BLKFINISHZONE to allow applications to control the condition of zones
on a zoned block device through the execution of the REQ_OP_ZONE_OPEN,
REQ_OP_ZONE_CLOSE and REQ_OP_ZONE_FINISH operations.
Contains contributions from Matias Bjorling, Hans Holmberg,
Dmitry Fomichev, Keith Busch, Damien Le Moal and Christoph Hellwig.
Reviewed-by: Javier González <javier@javigon.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Ajay Joshi <ajay.joshi@wdc.com>
Signed-off-by: Matias Bjorling <matias.bjorling@wdc.com>
Signed-off-by: Hans Holmberg <hans.holmberg@wdc.com>
Signed-off-by: Dmitry Fomichev <dmitry.fomichev@wdc.com>
Signed-off-by: Keith Busch <kbusch@kernel.org>
Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Zoned block devices (ZBC and ZAC devices) allow an explicit control
over the condition (state) of zones. The operations allowed are:
* Open a zone: Transition to open condition to indicate that a zone will
actively be written
* Close a zone: Transition to closed condition to release the drive
resources used for writing to a zone
* Finish a zone: Transition an open or closed zone to the full
condition to prevent write operations
To enable this control for in-kernel zoned block device users, define
the new request operations REQ_OP_ZONE_OPEN, REQ_OP_ZONE_CLOSE
and REQ_OP_ZONE_FINISH as well as the generic function
blkdev_zone_mgmt() for submitting these operations on a range of zones.
This results in blkdev_reset_zones() removal and replacement with this
new zone magement function. Users of blkdev_reset_zones() (f2fs and
dm-zoned) are updated accordingly.
Contains contributions from Matias Bjorling, Hans Holmberg,
Dmitry Fomichev, Keith Busch, Damien Le Moal and Christoph Hellwig.
Reviewed-by: Javier González <javier@javigon.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Ajay Joshi <ajay.joshi@wdc.com>
Signed-off-by: Matias Bjorling <matias.bjorling@wdc.com>
Signed-off-by: Hans Holmberg <hans.holmberg@wdc.com>
Signed-off-by: Dmitry Fomichev <dmitry.fomichev@wdc.com>
Signed-off-by: Keith Busch <kbusch@kernel.org>
Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
https://git.kernel.org/pub/scm/linux/kernel/git/broonie/sound into for-next
ASoC: Updates for v5.5
Some big changes in the core but more about cleanps and refactorings
than new features, plus a collection of new drivers and lots of small
fixes and improvements to existing ones.
- Lots more cleanups from Morimoto-san. Now that everything is a
component this is mostly about refactorings to clarify and simplify
the core, a combination of things that are no longer required due to
refactorings and spotting similarities.
- Many fixes to the Sound Open Firmware code.
- Wake on voice support for Chromebooks.
- SPI support for RT5677.
- New drivers for Analog Devices ADAU7118, Intel Cannonlake systems
with RT1011 and RT5682, Texas Instruments TAS2562 and TAS2770.
|
|
Those regulators are not actually supported by the AB8500 regulator
driver. There is no ab8500_regulator_info for them and no entry in
ab8505_regulator_match.
As such, they cannot be registered successfully, and looking them
up in ab8505_regulator_match causes an out-of-bounds array read.
Fixes: 547f384f33db ("regulator: ab8500: add support for ab8505")
Cc: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: Stephan Gerhold <stephan@gerhold.net>
Reviewed-by: Linus Walleij <linus.walleij@linaro.org>
Link: https://lore.kernel.org/r/20191106173125.14496-2-stephan@gerhold.net
Signed-off-by: Mark Brown <broonie@kernel.org>
|
|
The USB regulator was removed for AB8500 in
commit 41a06aa738ad ("regulator: ab8500: Remove USB regulator").
It was then added for AB8505 in
commit 547f384f33db ("regulator: ab8500: add support for ab8505").
However, there was never an entry added for it in
ab8505_regulator_match. This causes all regulators after it
to be initialized with the wrong device tree data, eventually
leading to an out-of-bounds array read.
Given that it is not used anywhere in the kernel, it seems
likely that similar arguments against supporting it exist for
AB8505 (it is controlled by hardware).
Therefore, simply remove it like for AB8500 instead of adding
an entry in ab8505_regulator_match.
Fixes: 547f384f33db ("regulator: ab8500: add support for ab8505")
Cc: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: Stephan Gerhold <stephan@gerhold.net>
Reviewed-by: Linus Walleij <linus.walleij@linaro.org>
Link: https://lore.kernel.org/r/20191106173125.14496-1-stephan@gerhold.net
Signed-off-by: Mark Brown <broonie@kernel.org>
|
|
The 'IOMMU_QCOM_SYS_CACHE' IOMMU protection flag is exposed to all
users of the IOMMU API. Despite its name, the idea behind it isn't
especially tied to Qualcomm implementations and could conceivably be
used by other systems.
Rename it to 'IOMMU_SYS_CACHE_ONLY' and update the comment to describe
a bit better the idea behind it.
Cc: Robin Murphy <robin.murphy@arm.com>
Cc: "Isaac J. Manjarres" <isaacm@codeaurora.org>
Signed-off-by: Will Deacon <will@kernel.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/mark/linux into for-next/core
FTRACE_WITH_REGS support for arm64.
* 'arm64/ftrace-with-regs' of git://git.kernel.org/pub/scm/linux/kernel/git/mark/linux:
arm64: ftrace: minimize ifdeffery
arm64: implement ftrace with regs
arm64: asm-offsets: add S_FP
arm64: insn: add encoder for MOV (register)
arm64: module/ftrace: intialize PLT at load time
arm64: module: rework special section handling
module/ftrace: handle patchable-function-entry
ftrace: add ftrace_init_nop()
|
|
Invoke the EFI_RNG_PROTOCOL protocol in the context of the x86 EFI stub,
same as is done on arm/arm64 since commit 568bc4e87033 ("efi/arm*/libstub:
Invoke EFI_RNG_PROTOCOL to seed the UEFI RNG table"). Within the stub,
a Linux-specific RNG seed UEFI config table will be seeded. The EFI routines
in the core kernel will pick that up later, yet still early during boot,
to seed the kernel entropy pool. If CONFIG_RANDOM_TRUST_BOOTLOADER, entropy
is credited for this seed.
Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/kishon/linux-phy into char-misc-next
Kishon writes:
phy: for 5.5
*) Add a new PHY driver for USB3 PHY on Allwinner H6 SoC
*) Add a new PHY driver for Innosilicon Video Combo PHY(MIPI/LVDS/TTL)
*) Add support in xusb-tegra210 PHY driver to get USB device mode functional
in Tegra 210
*) Add support for SM8150 QMP UFS PHY in phy-qcom-qmp PHY driver
*) Fix smatch warning (array off by one) in phy-rcar-gen2 PHY driver
*) Enable mac tx internal delay for rgmii-rxid in phy-gmii-sel driver
*) Fix phy-qcom-usb-hs from registering multiple extcon notifiers during PHY
power cycle
*) Use devm_platform_ioremap_resource() in phy-mvebu-a3700-utmi,
phy-hisi-inno-usb2, phy-histb-combphy and regulator_bulk_set_supply_names()
in xusb to simplify code
*) Remove unused variable in xusb-tegra210 and phy-dm816x-usb
*) Fix sparse warnings in phy-brcm-usb-init
Signed-off-by: Kishon Vijay Abraham I <kishon@ti.com>
* tag 'phy-for-5.5' of git://git.kernel.org/pub/scm/linux/kernel/git/kishon/linux-phy: (28 commits)
phy: phy-rockchip-inno-usb2: add phy description for px30
phy: qcom-usb-hs: Fix extcon double register after power cycle
phy: renesas: phy-rcar-gen2: Fix the array off by one warning
phy: lantiq: vrx200-pcie: fix error return code in ltq_vrx200_pcie_phy_power_on()
dt-bindings: phy: add yaml binding for rockchip,px30-dsi-dphy
phy/rockchip: Add support for Innosilicon MIPI/LVDS/TTL PHY
phy: add PHY_MODE_LVDS
phy: allwinner: add phy driver for USB3 PHY on Allwinner H6 SoC
dt-bindings: Add bindings for USB3 phy on Allwinner H6
phy: qcom-qmp: Add SM8150 QMP UFS PHY support
dt-bindings: phy-qcom-qmp: Add sm8150 UFS phy compatible string
phy: ti: gmii-sel: fix mac tx internal delay for rgmii-rxid
phy: tegra: use regulator_bulk_set_supply_names()
phy: ti: dm816x: remove set but not used variable 'phy_data'
phy: renesas: rcar-gen3-usb2: Fix sysfs interface of "role"
phy: tegra: xusb: Add vbus override support on Tegra186
phy: tegra: xusb: Add vbus override support on Tegra210
phy: tegra: xusb: Add usb3 port fake support on Tegra210
phy: tegra: xusb: Add XUSB dual mode support on Tegra210
dt-bindings: rcar-gen3-phy-usb3: Add r8a774b1 support
...
|
|
At least for me it is difficult to remember the meaning of GPIO
direction values. Define GPIO_LINE_DIRECTION_IN and
GPIO_LINE_DIRECTION_OUT so that occasional GPIO contributors would
not need to always check the meaning of hard coded values 1 and 0.
Signed-off-by: Matti Vaittinen <matti.vaittinen@fi.rohmeurope.com>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
|
|
The structs representing capacity states and performance domains of an
Energy Model are currently only defined for CONFIG_ENERGY_MODEL=y. That
makes it hard for code outside PM_EM to manipulate those structures
without a lot of ifdefery or stubbed accessors.
So, move the declaration of the two structs outside of the
CONFIG_ENERGY_MODEL ifdef. The client code (e.g. EAS or thermal) always
checks the return of em_cpu_get() before using it, so the exising code
is still safe to use as-is.
Reported-by: kbuild test robot <lkp@intel.com>
Signed-off-by: Quentin Perret <qperret@google.com>
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Link: https://lore.kernel.org/r/20191030151451.7961-3-qperret@google.com
|
|
As the conditions are simplified and unified, it is useless to have
different blocks of definitions under the same compiler condition,
let's merge the blocks.
There is no functional change.
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Reviewed-by: Amit Kucheria <amit.kucheria@linaro.org>
Link: https://lore.kernel.org/r/20191030091038.678-2-daniel.lezcano@linaro.org
|
|
The option CONFIG_CPU_THERMAL depends on CONFIG_OF in the Kconfig.
It it pointless to check if CONFIG_OF is set in the header file as
this is always true if CONFIG_CPU_THERMAL is true. Remove it.
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Reviewed-by: Amit Kucheria <amit.kucheria@linaro.org>
Link: https://lore.kernel.org/r/20191030091038.678-1-daniel.lezcano@linaro.org
|
|
There are no users of netlink messages for thermal inside the kernel.
Remove the code and adjust the documentation.
Signed-off-by: Amit Kucheria <amit.kucheria@linaro.org>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Link: https://lore.kernel.org/r/8ff02cf62186c7a54fff325fad40a2e9ca3affa6.1571656014.git.amit.kucheria@linaro.org
|
|
syzbot reported various data-race caused by hrtimer_is_queued() reading
timer->state. A READ_ONCE() is required there to silence the warning.
Also add the corresponding WRITE_ONCE() when timer->state is set.
In remove_hrtimer() the hrtimer_is_queued() helper is open coded to avoid
loading timer->state twice.
KCSAN reported these cases:
BUG: KCSAN: data-race in __remove_hrtimer / tcp_pacing_check
write to 0xffff8880b2a7d388 of 1 bytes by interrupt on cpu 0:
__remove_hrtimer+0x52/0x130 kernel/time/hrtimer.c:991
__run_hrtimer kernel/time/hrtimer.c:1496 [inline]
__hrtimer_run_queues+0x250/0x600 kernel/time/hrtimer.c:1576
hrtimer_run_softirq+0x10e/0x150 kernel/time/hrtimer.c:1593
__do_softirq+0x115/0x33f kernel/softirq.c:292
run_ksoftirqd+0x46/0x60 kernel/softirq.c:603
smpboot_thread_fn+0x37d/0x4a0 kernel/smpboot.c:165
kthread+0x1d4/0x200 drivers/block/aoe/aoecmd.c:1253
ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:352
read to 0xffff8880b2a7d388 of 1 bytes by task 24652 on cpu 1:
tcp_pacing_check net/ipv4/tcp_output.c:2235 [inline]
tcp_pacing_check+0xba/0x130 net/ipv4/tcp_output.c:2225
tcp_xmit_retransmit_queue+0x32c/0x5a0 net/ipv4/tcp_output.c:3044
tcp_xmit_recovery+0x7c/0x120 net/ipv4/tcp_input.c:3558
tcp_ack+0x17b6/0x3170 net/ipv4/tcp_input.c:3717
tcp_rcv_established+0x37e/0xf50 net/ipv4/tcp_input.c:5696
tcp_v4_do_rcv+0x381/0x4e0 net/ipv4/tcp_ipv4.c:1561
sk_backlog_rcv include/net/sock.h:945 [inline]
__release_sock+0x135/0x1e0 net/core/sock.c:2435
release_sock+0x61/0x160 net/core/sock.c:2951
sk_stream_wait_memory+0x3d7/0x7c0 net/core/stream.c:145
tcp_sendmsg_locked+0xb47/0x1f30 net/ipv4/tcp.c:1393
tcp_sendmsg+0x39/0x60 net/ipv4/tcp.c:1434
inet_sendmsg+0x6d/0x90 net/ipv4/af_inet.c:807
sock_sendmsg_nosec net/socket.c:637 [inline]
sock_sendmsg+0x9f/0xc0 net/socket.c:657
BUG: KCSAN: data-race in __remove_hrtimer / __tcp_ack_snd_check
write to 0xffff8880a3a65588 of 1 bytes by interrupt on cpu 0:
__remove_hrtimer+0x52/0x130 kernel/time/hrtimer.c:991
__run_hrtimer kernel/time/hrtimer.c:1496 [inline]
__hrtimer_run_queues+0x250/0x600 kernel/time/hrtimer.c:1576
hrtimer_run_softirq+0x10e/0x150 kernel/time/hrtimer.c:1593
__do_softirq+0x115/0x33f kernel/softirq.c:292
invoke_softirq kernel/softirq.c:373 [inline]
irq_exit+0xbb/0xe0 kernel/softirq.c:413
exiting_irq arch/x86/include/asm/apic.h:536 [inline]
smp_apic_timer_interrupt+0xe6/0x280 arch/x86/kernel/apic/apic.c:1137
apic_timer_interrupt+0xf/0x20 arch/x86/entry/entry_64.S:830
read to 0xffff8880a3a65588 of 1 bytes by task 22891 on cpu 1:
__tcp_ack_snd_check+0x415/0x4f0 net/ipv4/tcp_input.c:5265
tcp_ack_snd_check net/ipv4/tcp_input.c:5287 [inline]
tcp_rcv_established+0x750/0xf50 net/ipv4/tcp_input.c:5708
tcp_v4_do_rcv+0x381/0x4e0 net/ipv4/tcp_ipv4.c:1561
sk_backlog_rcv include/net/sock.h:945 [inline]
__release_sock+0x135/0x1e0 net/core/sock.c:2435
release_sock+0x61/0x160 net/core/sock.c:2951
sk_stream_wait_memory+0x3d7/0x7c0 net/core/stream.c:145
tcp_sendmsg_locked+0xb47/0x1f30 net/ipv4/tcp.c:1393
tcp_sendmsg+0x39/0x60 net/ipv4/tcp.c:1434
inet_sendmsg+0x6d/0x90 net/ipv4/af_inet.c:807
sock_sendmsg_nosec net/socket.c:637 [inline]
sock_sendmsg+0x9f/0xc0 net/socket.c:657
__sys_sendto+0x21f/0x320 net/socket.c:1952
__do_sys_sendto net/socket.c:1964 [inline]
__se_sys_sendto net/socket.c:1960 [inline]
__x64_sys_sendto+0x89/0xb0 net/socket.c:1960
do_syscall_64+0xcc/0x370 arch/x86/entry/common.c:290
Reported by Kernel Concurrency Sanitizer on:
CPU: 1 PID: 24652 Comm: syz-executor.3 Not tainted 5.4.0-rc3+ #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
[ tglx: Added comments ]
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lkml.kernel.org/r/20191106174804.74723-1-edumazet@google.com
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/qcom/linux into arm/drivers
Qualcomm ARM Based Driver Updates for v5.5
* Add Bjorn as QCOM co-maintainer
* Add LLLC yaml bindings and SC7180 support
* Fixups/Cleanup for LLLC
* Add SMD-RPM MSM8976 compatible and interconnect device
* Add missing RPMD SMD perf level
* tag 'qcom-drivers-for-5.5' of git://git.kernel.org/pub/scm/linux/kernel/git/qcom/linux:
MAINTAINERS: Add myself as co-maintainer for QCOM
dt-bindings: msm: Add LLCC for SC7180
dt-bindings: msm: Convert LLCC bindings to YAML
soc: qcom: llcc: Add configuration data for SC7180
soc: qcom: llcc: Move regmap config to local variable
soc: qcom: llcc: Name regmaps to avoid collisions
soc: qcom: Fix llcc-qcom definitions to include
soc: qcom: rpmpd: Add rpm power domains for msm8976
dt-bindings: power: Add missing rpmpd smd performance level
soc: qcom: smd-rpm: Add MSM8976 compatible
soc: qcom: socinfo: add sdm845 and sda845 soc ids
soc: qcom: smd-rpm: Create RPM interconnect proxy child device
soc: qcom: Make llcc-qcom a generic driver
soc: qcom: Rename llcc-slice to llcc-qcom
soc: qcom: llcc cleanup to get rid of sdm845 specific driver file
Link: https://lore.kernel.org/r/1573068840-13098-4-git-send-email-agross@kernel.org
Signed-off-by: Olof Johansson <olof@lixom.net>
|
|
cgroup->bstat_pending is used to determine the base stat delta to
propagate to the parent. While correct, this is different from how
percpu delta is determined for no good reason and the inconsistency
makes the code more difficult to understand.
This patch makes parent propagation delta calculation use the same
method as percpu to global propagation.
* cgroup_base_stat_accumulate() is renamed to cgroup_base_stat_add()
and cgroup_base_stat_sub() is added.
* percpu propagation calculation is updated to use the above helpers.
* cgroup->bstat_pending is replaced with cgroup->last_bstat and
updated to use the same calculation as percpu propagation.
Signed-off-by: Tejun Heo <tj@kernel.org>
|
|
Inline encryption hardware compliant with the UFS v2.1 standard or with
the upcoming version of the eMMC standard has the following properties:
(1) Per I/O request, the encryption key is specified by a previously
loaded keyslot. There might be only a small number of keyslots.
(2) Per I/O request, the starting IV is specified by a 64-bit "data unit
number" (DUN). IV bits 64-127 are assumed to be 0. The hardware
automatically increments the DUN for each "data unit" of
configurable size in the request, e.g. for each filesystem block.
Property (1) makes it inefficient to use the traditional fscrypt
per-file keys. Property (2) precludes the use of the existing
DIRECT_KEY fscrypt policy flag, which needs at least 192 IV bits.
Therefore, add a new fscrypt policy flag IV_INO_LBLK_64 which causes the
encryption to modified as follows:
- The encryption keys are derived from the master key, encryption mode
number, and filesystem UUID.
- The IVs are chosen as (inode_number << 32) | file_logical_block_num.
For filenames encryption, file_logical_block_num is 0.
Since the file nonces aren't used in the key derivation, many files may
share the same encryption key. This is much more efficient on the
target hardware. Including the inode number in the IVs and mixing the
filesystem UUID into the keys ensures that data in different files is
nevertheless still encrypted differently.
Additionally, limiting the inode and block numbers to 32 bits and
placing the block number in the low bits maintains compatibility with
the 64-bit DUN convention (property (2) above).
Since this scheme assumes that inode numbers are stable (which may
preclude filesystem shrinking) and that inode and file logical block
numbers are at most 32-bit, IV_INO_LBLK_64 will only be allowed on
filesystems that meet these constraints. These are acceptable
limitations for the cases where this format would actually be used.
Note that IV_INO_LBLK_64 is an on-disk format, not an implementation.
This patch just adds support for it using the existing filesystem layer
encryption. A later patch will add support for inline encryption.
Reviewed-by: Paul Crowley <paulcrowley@google.com>
Co-developed-by: Satya Tangirala <satyat@google.com>
Signed-off-by: Satya Tangirala <satyat@google.com>
Signed-off-by: Eric Biggers <ebiggers@google.com>
|
|
We have a usecase to use tmpfs as QEMU memory backend and we would like
to take the advantage of THP as well. But, our test shows the EPT is
not PMD mapped even though the underlying THP are PMD mapped on host.
The number showed by /sys/kernel/debug/kvm/largepage is much less than
the number of PMD mapped shmem pages as the below:
7f2778200000-7f2878200000 rw-s 00000000 00:14 262232 /dev/shm/qemu_back_mem.mem.Hz2hSf (deleted)
Size: 4194304 kB
[snip]
AnonHugePages: 0 kB
ShmemPmdMapped: 579584 kB
[snip]
Locked: 0 kB
cat /sys/kernel/debug/kvm/largepages
12
And some benchmarks do worse than with anonymous THPs.
By digging into the code we figured out that commit 127393fbe597 ("mm:
thp: kvm: fix memory corruption in KVM with THP enabled") checks if
there is a single PTE mapping on the page for anonymous THP when setting
up EPT map. But the _mapcount < 0 check doesn't work for page cache THP
since every subpage of page cache THP would get _mapcount inc'ed once it
is PMD mapped, so PageTransCompoundMap() always returns false for page
cache THP. This would prevent KVM from setting up PMD mapped EPT entry.
So we need handle page cache THP correctly. However, when page cache
THP's PMD gets split, kernel just remove the map instead of setting up
PTE map like what anonymous THP does. Before KVM calls get_user_pages()
the subpages may get PTE mapped even though it is still a THP since the
page cache THP may be mapped by other processes at the mean time.
Checking its _mapcount and whether the THP has PTE mapped or not.
Although this may report some false negative cases (PTE mapped by other
processes), it looks not trivial to make this accurate.
With this fix /sys/kernel/debug/kvm/largepage would show reasonable
pages are PMD mapped by EPT as the below:
7fbeaee00000-7fbfaee00000 rw-s 00000000 00:14 275464 /dev/shm/qemu_back_mem.mem.SKUvat (deleted)
Size: 4194304 kB
[snip]
AnonHugePages: 0 kB
ShmemPmdMapped: 557056 kB
[snip]
Locked: 0 kB
cat /sys/kernel/debug/kvm/largepages
271
And the benchmarks are as same as anonymous THPs.
[yang.shi@linux.alibaba.com: v4]
Link: http://lkml.kernel.org/r/1571865575-42913-1-git-send-email-yang.shi@linux.alibaba.com
Link: http://lkml.kernel.org/r/1571769577-89735-1-git-send-email-yang.shi@linux.alibaba.com
Fixes: dd78fedde4b9 ("rmap: support file thp")
Signed-off-by: Yang Shi <yang.shi@linux.alibaba.com>
Reported-by: Gang Deng <gavin.dg@linux.alibaba.com>
Tested-by: Gang Deng <gavin.dg@linux.alibaba.com>
Suggested-by: Hugh Dickins <hughd@google.com>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: <stable@vger.kernel.org> [4.8+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
https://git.kernel.org/pub/scm/linux/kernel/git/krzk/linux into arm/drivers
Samsung soc drivers changes for v5.5
1. Minor fixes to Exynos Chipid driver.
2. Add Exynos Adaptive Supply Voltage driver allowing to adjust voltages
used during CPU frequency scaling based on revision of SoC. This
also pulls dependency from PM/OPP tree - driver uses newly added
dev_pm_opp_adjust_voltage() function.
* tag 'samsung-drivers-5.5' of https://git.kernel.org/pub/scm/linux/kernel/git/krzk/linux:
soc: samsung: exynos-asv: Potential NULL dereference in exynos_asv_update_opps()
soc: samsung: chipid: Drop "syscon" compatible requirement
soc: samsung: Add Exynos Adaptive Supply Voltage driver
PM / OPP: Support adjusting OPP voltages at runtime
soc: samsung: chipid: Make exynos_chipid_early_init() static
Link: https://lore.kernel.org/r/20191104175902.12224-1-krzk@kernel.org
Signed-off-by: Olof Johansson <olof@lixom.net>
|
|
When using patchable-function-entry, the compiler will record the
callsites into a section named "__patchable_function_entries" rather
than "__mcount_loc". Let's abstract this difference behind a new
FTRACE_CALLSITE_SECTION, so that architectures don't have to handle this
explicitly (e.g. with custom module linker scripts).
As parisc currently handles this explicitly, it is fixed up accordingly,
with its custom linker script removed. Since FTRACE_CALLSITE_SECTION is
only defined when DYNAMIC_FTRACE is selected, the parisc module loading
code is updated to only use the definition in that case. When
DYNAMIC_FTRACE is not selected, modules shouldn't have this section, so
this removes some redundant work in that case.
To make sure that this is keep up-to-date for modules and the main
kernel, a comment is added to vmlinux.lds.h, with the existing ifdeffery
simplified for legibility.
I built parisc generic-{32,64}bit_defconfig with DYNAMIC_FTRACE enabled,
and verified that the section made it into the .ko files for modules.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Acked-by: Helge Deller <deller@gmx.de>
Acked-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Reviewed-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Reviewed-by: Torsten Duwe <duwe@suse.de>
Tested-by: Amit Daniel Kachhap <amit.kachhap@arm.com>
Tested-by: Sven Schnelle <svens@stackframe.org>
Tested-by: Torsten Duwe <duwe@suse.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James E.J. Bottomley <James.Bottomley@HansenPartnership.com>
Cc: Jessica Yu <jeyu@kernel.org>
Cc: linux-parisc@vger.kernel.org
|
|
Architectures may need to perform special initialization of ftrace
callsites, and today they do so by special-casing ftrace_make_nop() when
the expected branch address is MCOUNT_ADDR. In some cases (e.g. for
patchable-function-entry), we don't have an mcount-like symbol and don't
want a synthetic MCOUNT_ADDR, but we may need to perform some
initialization of callsites.
To make it possible to separate initialization from runtime
modification, and to handle cases without an mcount-like symbol, this
patch adds an optional ftrace_init_nop() function that architectures can
implement, which does not pass a branch address.
Where an architecture does not provide ftrace_init_nop(), we will fall
back to the existing behaviour of calling ftrace_make_nop() with
MCOUNT_ADDR.
At the same time, ftrace_code_disable() is renamed to
ftrace_nop_initialize() to make it clearer that it is intended to
intialize a callsite into a disabled state, and is not for disabling a
callsite that has been runtime enabled. The kerneldoc description of rec
arguments is updated to cover non-mcount callsites.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Reviewed-by: Amit Daniel Kachhap <amit.kachhap@arm.com>
Reviewed-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Reviewed-by: Miroslav Benes <mbenes@suse.cz>
Reviewed-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Reviewed-by: Torsten Duwe <duwe@suse.de>
Tested-by: Amit Daniel Kachhap <amit.kachhap@arm.com>
Tested-by: Sven Schnelle <svens@stackframe.org>
Tested-by: Torsten Duwe <duwe@suse.de>
Cc: Ingo Molnar <mingo@redhat.com>
|
|
We shouldn't insert things into the NFSPROC4_CLNT enums, since that
causes the nfsstat array to be reordered.
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
|
|
|
|
There are two reasons why CPU idle states may be disabled: either
because the driver has disabled them or because they have been
disabled by user space via sysfs.
In the former case, the state's "disabled" flag is set once during
the initialization of the driver and it is never cleared later (it
is read-only effectively). In the latter case, the "disable" field
of the given state's cpuidle_state_usage struct is set and it may be
changed via sysfs. Thus checking whether or not an idle state has
been disabled involves reading these two flags every time.
In order to avoid the additional check of the state's "disabled" flag
(which is effectively read-only anyway), use the value of it at the
init time to set a (new) flag in the "disable" field of that state's
cpuidle_state_usage structure and use the sysfs interface to
manipulate another (new) flag in it. This way the state is disabled
whenever the "disable" field of its cpuidle_state_usage structure is
nonzero, whatever the reason, and it is the only place to look into
to check whether or not the state has been disabled.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
|
|
Add two utilities to 1) write-protect and 2) clean all ptes pointing into
a range of an address space.
The utilities are intended to aid in tracking dirty pages (either
driver-allocated system memory or pci device memory).
The write-protect utility should be used in conjunction with
page_mkwrite() and pfn_mkwrite() to trigger write page-faults on page
accesses. Typically one would want to use this on sparse accesses into
large memory regions. The clean utility should be used to utilize
hardware dirtying functionality and avoid the overhead of page-faults,
typically on large accesses into small memory regions.
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@surriel.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Huang Ying <ying.huang@intel.com>
Cc: Jérôme Glisse <jglisse@redhat.com>
Cc: Kirill A. Shutemov <kirill@shutemov.name>
Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
Acked-by: Andrew Morton <akpm@linux-foundation.org>
|
|
For users that want to travers all page table entries pointing into a
region of a struct address_space mapping, introduce a walk_page_mapping()
function.
The walk_page_mapping() function will be initially be used for dirty-
tracking in virtual graphics drivers.
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@surriel.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Huang Ying <ying.huang@intel.com>
Cc: Jérôme Glisse <jglisse@redhat.com>
Cc: Kirill A. Shutemov <kirill@shutemov.name>
Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
|
|
The caller needs to make sure that the vma is not torn down during the
lock operation and can also use the i_mmap_rwsem for file-backed vmas.
Remove the BUG_ON. We could, as an alternative, add a test that either
vma->vm_mm->mmap_sem or vma->vm_file->f_mapping->i_mmap_rwsem are held.
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@surriel.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Huang Ying <ying.huang@intel.com>
Cc: Jérôme Glisse <jglisse@redhat.com>
Cc: Kirill A. Shutemov <kirill@shutemov.name>
Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
Acked-by: Kirill A. Shutemov <kirill@shutemov.name>
|
|
|
|
sk_msg_trim() tries to only update curr pointer if it falls into
the trimmed region. The logic, however, does not take into the
account pointer wrapping that sk_msg_iter_var_prev() does nor
(as John points out) the fact that msg->sg is a ring buffer.
This means that when the message was trimmed completely, the new
curr pointer would have the value of MAX_MSG_FRAGS - 1, which is
neither smaller than any other value, nor would it actually be
correct.
Special case the trimming to 0 length a little bit and rework
the comparison between curr and end to take into account wrapping.
This bug caused the TLS code to not copy all of the message, if
zero copy filled in fewer sg entries than memcopy would need.
Big thanks to Alexander Potapenko for the non-KMSAN reproducer.
v2:
- take into account that msg->sg is a ring buffer (John).
Link: https://lore.kernel.org/netdev/20191030160542.30295-1-jakub.kicinski@netronome.com/ (v1)
Fixes: d829e9c4112b ("tls: convert to generic sk_msg interface")
Reported-by: syzbot+f8495bff23a879a6d0bd@syzkaller.appspotmail.com
Reported-by: syzbot+6f50c99e8f6194bf363f@syzkaller.appspotmail.com
Co-developed-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Daniel Borkmann says:
====================
pull-request: bpf 2019-11-02
The following pull-request contains BPF updates for your *net* tree.
We've added 6 non-merge commits during the last 6 day(s) which contain
a total of 8 files changed, 35 insertions(+), 9 deletions(-).
The main changes are:
1) Fix ppc BPF JIT's tail call implementation by performing a second pass
to gather a stable JIT context before opcode emission, from Eric Dumazet.
2) Fix build of BPF samples sys_perf_event_open() usage to compiled out
unavailable test_attr__{enabled,open} checks. Also fix potential overflows
in bpf_map_{area_alloc,charge_init} on 32 bit archs, from Björn Töpel.
3) Fix narrow loads of bpf_sysctl context fields with offset > 0 on big endian
archs like s390x and also improve the test coverage, from Ilya Leoshkevich.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
We can unify string properties initializer macros with integer
initializers.
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
|
|
Instead of explicitly setting values of integer types when copying
property entries lets just copy entire value union when processing
non-array values.
For value arrays we no longer use union of pointers, but rather a single
void pointer, which allows us to remove property_set_pointer().
In property_get_pointer() we do not need to handle each data type
separately, we can simply return either the pointer or pointer to values
union.
We are not losing anything from removing typed pointer union because the
upper layers do their accesses through void pointers anyway, and we
trust the "type" of the property when interpret the data. We rely on
users of property entries on using PROPERTY_ENTRY_XXX() macros to
properly initialize entries instead of poking in the instances directly.
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
|
|
Let's mark PROPERTY_ENTRY_* macros that are internal with double leading
underscores so users are not tempted to use them.
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
|
|
Sometimes we want to initialize property entry array from a regular
pointer, when we can't determine length automatically via ARRAY_SIZE.
Let's introduce PROPERTY_ENTRY_XXX_ARRAY_LEN macros that take explicit
"len" argument.
Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
|
|
This definition is not used anywhere, let's remove it.
Suggested-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
|
|
Add two helper functions, one for IPv4 and one for IPv6, to recognize
the ICMP packets which are error responses.
This packets are special because they have as payload the original
header of the packet which generated it (RFC 792 says at least 8 bytes,
but Linux actually includes much more than that).
Signed-off-by: Matteo Croce <mcroce@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
|
|
|
|
The credit counter now contains both buffer and revoke descriptor block
credits. Rename to counter to h_total_credits to reflect that. No
functional change.
Reviewed-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Jan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20191105164437.32602-21-jack@suse.cz
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
|
Extend functions for starting, extending, and restarting transaction
handles to take number of revoke records handle must be able to
accommodate. These functions then make sure transaction has enough
credits to be able to store resulting revoke descriptor blocks. Also
revoke code tracks number of revoke records created by a handle to catch
situation where some place didn't reserve enough space for revoke
records. Similarly to standard transaction credits, space for unused
reserved revoke records is released when the handle is stopped.
On the ext4 side we currently take a simplistic approach of reserving
space for 1024 revoke records for any transaction. This grows amount of
credits reserved for each handle only by a few and is enough for any
normal workload so that we don't hit warnings in jbd2. We will refine
the logic in following commits.
Signed-off-by: Jan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20191105164437.32602-20-jack@suse.cz
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
|
The function is now just a trivial wrapper returning
journal->j_max_transaction_buffers. Drop it.
Reviewed-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Jan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20191105164437.32602-19-jack@suse.cz
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
|
Currently, journal descriptor blocks were not accounted in
transaction->t_outstanding_credits and we were just leaving some slack
space in the journal for them (in jbd2_log_space_left() and
jbd2_space_needed()). This is making proper accounting (and reservation
we want to add) of descriptor blocks difficult so switch to accounting
descriptor blocks in transaction->t_outstanding_credits and just reserve
the same amount of credits in t_outstanding credits for journal
descriptor blocks when creating transaction.
Signed-off-by: Jan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20191105164437.32602-18-jack@suse.cz
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
|
Provide accessor function to get number of credits available in a handle
and use it from ext4. Later, computation of available credits won't be
so straightforward.
Reviewed-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Jan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20191105164437.32602-11-jack@suse.cz
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
|
One of the more common cases of allocation size calculations is finding
the size of a structure that has a zero-sized array at the end, along
with memory for some number of elements for that array. For example:
struct stripe_c {
...
struct stripe stripe[0];
};
In this case alloc_context() and dm_array_too_big() are removed and
replaced by the direct use of the struct_size() helper in kmalloc().
Notice that open-coded form is prone to type mistakes.
This code was detected with the help of Coccinelle.
Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
|
|
nvmem_cell_write's buf argument uses different types based on
the configuration of CONFIG_NVMEM. The function prototype for
enabled NVMEM uses 'void *' type, but the static dummy function
for disabled NVMEM uses 'const char *' instead. Fix the different
behaviour by always expecting a 'void *' typed buf argument.
Fixes: 7a78a7f7695b ("power: reset: nvmem-reboot-mode: use NVMEM as reboot mode write interface")
Reported-by: kbuild test robot <lkp@intel.com>
Cc: Han Nandor <nandor.han@vaisala.com>
Cc: Srinivas Kandagatla <srinivas.kandagatla@linaro.org>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Sebastian Reichel <sebastian.reichel@collabora.com>
Reviewed-By: Han Nandor <nandor.han@vaisala.com>
Signed-off-by: Srinivas Kandagatla <srinivas.kandagatla@linaro.org>
Link: https://lore.kernel.org/r/20191029114240.14905-2-srinivas.kandagatla@linaro.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
Provide a variant of devm_platform_ioremap_resource() that allows to
lookup resources from platform devices by name rather than by index.
Signed-off-by: Bartosz Golaszewski <bgolaszewski@baylibre.com>
Reviewed-by: Arnd Bergmann <arnd@arndb.de>
Link: https://lore.kernel.org/r/20191022084318.22256-7-brgl@bgdev.pl
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
Provide a write-combined variant of devm_platform_ioremap_resource().
Signed-off-by: Bartosz Golaszewski <bgolaszewski@baylibre.com>
Reviewed-by: Arnd Bergmann <arnd@arndb.de>
Link: https://lore.kernel.org/r/20191022084318.22256-5-brgl@bgdev.pl
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|