Age | Commit message (Collapse) | Author |
|
The HST path selector needs this information to perform path
prediction. For request-based mpath, struct request's io_start_time_ns
is used, while for bio-based, use the start_time stored in dm_io.
Signed-off-by: Gabriel Krisman Bertazi <krisman@collabora.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
|
|
Add functions dm_bufio_issue_discard and dm_bufio_discard_buffers.
dm_bufio_issue_discard sends discard request to the underlying device.
dm_bufio_discard_buffers frees buffers in the range and then calls
dm_bufio_issue_discard.
Also, factor out block_to_sector for reuse in dm_bufio_issue_discard.
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/jic23/iio into staging-next
Jonathan writes:
Second set of new device support, cleanups and features for IIO in the 5.8 cycle
Usual mixed back but with a few subsystem wide or device type
wide cleanups.
New device support
* adis16475
- New driver supporting adis16470, adis16475, adis16477, adis16465,
adis16467, adis16500, adis16505 and adis16507.
Includes some rework of the adis library to simplify using it
for this new driver.
* ak8974
- Add support for Alps hscdt008a. ID only. Related patches add support
for scale.
* atlas-sensor
- Add support for RTD-SM OEM temperature sensor.
* cm32181
- Add support for CM3218 including support for SMBUS alert via
ACPI resources.
* ltc2632
- Add support for ltc2634-12/10/8 DACS including handling per
device type numbers of channels.
Major Features
* cm32181
- ACPI bindings including parsing CPM0 and CPM1 custom ACPI tables.
Includes minor tidy ups and fixes.
* vcnl4000
- Add event support
- Add buffered data capture support
- Add control of sampling frequency
Cleanups and minor fixes.
* core
- Trivial rework of iio_device_alloc to use an early return and
improve readability.
- Precursors to addition of multiple buffer support. So far
minor refactoring.
* subsystem wide
- Use get_unaligned_be24 slightly improve readability over open
coding it.
* adis drivers
- Use iio_get_debugfs_dentry access function.
* bh1780, cm32181, cm3232, gp2ap02a00f, opt3001, st_uvis25, vl6180,
dmard06, kxsd9
- Drop use of of_match_ptr to allow ACPI based probing via PRP0001.
Part of clear out of this to avoid cut and paste into new drivers.
* ad5592r, ad5593r
- Fix typos
* ad5933
- Use managed interfaces to automate error handling and remove.
* ak8974
- Fix wrong number of 'real bits' for buffered data.
- Refactor to pull measurement code out as separate function.
bmp280
- Fix lack of clamp on range during data capture.
* at91-sama5d2_adc
- Handle unfinished conversions correctly.
- Allow use of triggers other than it's own.
- Reorganize buffer setup and tear down as part of long running
subsystem wide rework.
* ccs811
- Add DT binding docs and match table.
- Support external reset and wakeup pins.
* hid-sensors
- Reorganize buffer setup and tear down as part of long running
subsystem wide rework.
* ltr501
- Constify some structs.
* vcnl4000
- Fix an endian issue by using explicit byte swapped i2c accessors.
* tag 'iio-for-5.8b' of git://git.kernel.org/pub/scm/linux/kernel/git/jic23/iio: (74 commits)
iio: light: ltr501: Constify structs
staging: iio: ad5933: attach life-cycle of kfifo buffer to parent device and use managed calls throughout
iio: bmp280: fix compensation of humidity
iio: light: cm32181: Fix integartion time typo
iio: light: cm32181: Add support for parsing CPM0 and CPM1 ACPI tables
iio: light: cm32181: Make lux_per_bit and lux_per_bit_base_it runtime settings
iio: light: cm32181: Use units of 1/100000th for calibscale and lux_per_bit
iio: light: cm32181: Change reg_init to use a bitmap of which registers to init
iio: light: cm32181: Handle CM3218 ACPI devices with 2 I2C resources
iio: light: cm32181: Clean up the probe function a bit
iio: light: cm32181: Add support for the CM3218
iio: light: cm32181: Add some extra register defines
iio: light: cm32181: Add support for ACPI enumeration
iio: light: cm32181: Switch to new style i2c-driver probe function
iio: hid-sensors: move triggered buffer setup into hid_sensor_setup_trigger
iio: vcnl4000: Add buffer support for VCNL4010/20.
iio: vcnl4000: Add sampling frequency support for VCNL4010/20.
iio: vcnl4000: Add event support for VCNL4010/20.
iio: vcnl4000: Factorize data reading and writing.
iio: vcnl4000: Fix i2c swapped word reading.
...
|
|
usb_hcd_pci_probe expects users to call this with driver_data set as
hc_driver, that limits the possibility of using the driver_data for
driver data.
Add hc_driver as argument to usb_hcd_pci_probe and modify the callers
ehci/ohci/xhci/uhci to pass hc_driver as argument and freeup the
driver_data used
Tested xhci driver on Dragon-board RB3, compile tested ehci, ohci and
uhci.
[For all but the xHCI parts]
[For the xhci part]
Suggested-by: Mathias Nyman <mathias.nyman@linux.intel.com>
Acked-by: Alan Stern <stern@rowland.harvard.edu>
Acked-by: Mathias Nyman <mathias.nyman@linux.intel.com>
Signed-off-by: Vinod Koul <vkoul@kernel.org>
Link: https://lore.kernel.org/r/20200514122039.300417-2-vkoul@kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
All the users threat them as immutable - annotate them as such.
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Jiri Slaby <jslaby@suse.com>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Link: https://lore.kernel.org/r/20200513214351.2138580-3-emil.l.velikov@gmail.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
The user is not supposed to thinker with the underlying sysrq_key_op.
Make that explicit by adding a handful of const notations.
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Jiri Slaby <jslaby@suse.com>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Link: https://lore.kernel.org/r/20200513214351.2138580-2-emil.l.velikov@gmail.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
Export a pointer to the sysrq_get_key_op(). This way we can cleanly
unregister it, instead of the current solutions of modifuing it inplace.
Since __sysrq_get_key_op() is no longer used externally, let's make it
a static function.
This patch will allow us to limit access to each and every sysrq op and
constify the sysrq handling.
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Jiri Slaby <jslaby@suse.com>
Cc: linux-kernel@vger.kernel.org
Cc: Richard Henderson <rth@twiddle.net>
Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
Cc: Matt Turner <mattst88@gmail.com>
Cc: linux-alpha@vger.kernel.org
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Link: https://lore.kernel.org/r/20200513214351.2138580-1-emil.l.velikov@gmail.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
We're about to amend uart_get_rs485_mode() to support a GPIO pin for
rs485 bus termination. Retrieving the GPIO descriptor may fail, so
allow uart_get_rs485_mode() to return an errno and change all callers
to check for failure.
The GPIO descriptor is going to be stored in struct uart_port. Pass
that struct to uart_get_rs485_mode() in lieu of a struct device and
struct serial_rs485, both of which are directly accessible from struct
uart_port.
A few drivers call uart_get_rs485_mode() before setting the struct
device pointer in struct uart_port. Shuffle those calls around where
necessary.
[Heiko Stuebner did the ar933x_uart.c portion, hence his Signed-off-by.]
Signed-off-by: Heiko Stuebner <heiko@sntech.de>
Signed-off-by: Lukas Wunner <lukas@wunner.de>
Link: https://lore.kernel.org/r/271e814af4b0db3bffbbb74abf2b46b75add4516.1589285873.git.lukas@wunner.de
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
After moving iommu_group setup to iommu core code [1][2] and removing
private domain support in vt-d [3], there are no users for functions such
as iommu_request_dm_for_dev(), iommu_request_dma_domain_for_dev() and
request_default_domain_for_dev(). So, remove these functions.
[1] commit dce8d6964ebd ("iommu/amd: Convert to probe/release_device()
call-backs")
[2] commit e5d1841f18b2 ("iommu/vt-d: Convert to probe/release_device()
call-backs")
[3] commit 327d5b2fee91 ("iommu/vt-d: Allow 32bit devices to uses DMA
domain")
Signed-off-by: Sai Praneeth Prakhya <sai.praneeth.prakhya@intel.com>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: Lu Baolu <baolu.lu@linux.intel.com>
Link: https://lore.kernel.org/r/20200513224721.20504-1-sai.praneeth.prakhya@intel.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
|
|
... or the odyssey of trying to disable the stack protector for the
function which generates the stack canary value.
The whole story started with Sergei reporting a boot crash with a kernel
built with gcc-10:
Kernel panic — not syncing: stack-protector: Kernel stack is corrupted in: start_secondary
CPU: 1 PID: 0 Comm: swapper/1 Not tainted 5.6.0-rc5—00235—gfffb08b37df9 #139
Hardware name: Gigabyte Technology Co., Ltd. To be filled by O.E.M./H77M—D3H, BIOS F12 11/14/2013
Call Trace:
dump_stack
panic
? start_secondary
__stack_chk_fail
start_secondary
secondary_startup_64
-—-[ end Kernel panic — not syncing: stack—protector: Kernel stack is corrupted in: start_secondary
This happens because gcc-10 tail-call optimizes the last function call
in start_secondary() - cpu_startup_entry() - and thus emits a stack
canary check which fails because the canary value changes after the
boot_init_stack_canary() call.
To fix that, the initial attempt was to mark the one function which
generates the stack canary with:
__attribute__((optimize("-fno-stack-protector"))) ... start_secondary(void *unused)
however, using the optimize attribute doesn't work cumulatively
as the attribute does not add to but rather replaces previously
supplied optimization options - roughly all -fxxx options.
The key one among them being -fno-omit-frame-pointer and thus leading to
not present frame pointer - frame pointer which the kernel needs.
The next attempt to prevent compilers from tail-call optimizing
the last function call cpu_startup_entry(), shy of carving out
start_secondary() into a separate compilation unit and building it with
-fno-stack-protector, was to add an empty asm("").
This current solution was short and sweet, and reportedly, is supported
by both compilers but we didn't get very far this time: future (LTO?)
optimization passes could potentially eliminate this, which leads us
to the third attempt: having an actual memory barrier there which the
compiler cannot ignore or move around etc.
That should hold for a long time, but hey we said that about the other
two solutions too so...
Reported-by: Sergei Trofimovich <slyfox@gentoo.org>
Signed-off-by: Borislav Petkov <bp@suse.de>
Tested-by: Kalle Valo <kvalo@codeaurora.org>
Cc: <stable@vger.kernel.org>
Link: https://lkml.kernel.org/r/20200314164451.346497-1-slyfox@gentoo.org
|
|
The current codebase makes use of the zero-length array language
extension to the C90 standard, but the preferred mechanism to declare
variable-length types such as these ones is a flexible array member[1][2],
introduced in C99:
struct foo {
int stuff;
struct boo array[];
};
By making use of the mechanism above, we will get a compiler warning
in case the flexible array does not occur last in the structure, which
will help us prevent some kind of undefined behavior bugs from being
inadvertently introduced[3] to the codebase from now on.
Also, notice that, dynamic memory allocations won't be affected by
this change:
"Flexible array members have incomplete type, and so the sizeof operator
may not be applied. As a quirk of the original implementation of
zero-length arrays, sizeof evaluates to zero."[1]
sizeof(flexible-array-member) triggers a warning because flexible array
members have incomplete type[1]. There are some instances of code in
which the sizeof operator is being incorrectly/erroneously applied to
zero-length arrays and the result is zero. Such instances may be hiding
some bugs. So, this work (flexible-array member conversions) will also
help to get completely rid of those sorts of issues.
This issue was found with the help of Coccinelle.
[1] https://gcc.gnu.org/onlinedocs/gcc/Zero-Length.html
[2] https://github.com/KSPP/linux/issues/21
[3] commit 76497732932f ("cxgb3/l2t: Fix undefined behaviour")
Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>
Reviewed-by: Peter Rosin <peda@axentia.se>
Signed-off-by: Wolfram Sang <wsa@kernel.org>
|
|
This is an immutable branch shared with the OPP tree. It contains also
the patches to convert the interconnect framework from tristate to bool
after Greg agreed with that. This will make the integration between
the OPP layer and interconnect much easier.
* icc-get-by-index:
interconnect: Add of_icc_get_by_index() helper function
interconnect: Disallow interconnect core to be built as a module
interconnect: Remove unused module exit code from core
Signed-off-by: Georgi Djakov <georgi.djakov@linaro.org>
|
|
Alexei Starovoitov says:
====================
pull-request: bpf-next 2020-05-14
The following pull-request contains BPF updates for your *net-next* tree.
The main changes are:
1) Merged tag 'perf-for-bpf-2020-05-06' from tip tree that includes CAP_PERFMON.
2) support for narrow loads in bpf_sock_addr progs and additional
helpers in cg-skb progs, from Andrey.
3) bpf benchmark runner, from Andrii.
4) arm and riscv JIT optimizations, from Luke.
5) bpf iterator infrastructure, from Yonghong.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Set the correct bit when checking for PHY_BRCM_DIS_TXCRXC_NOENRGY on the
BCM54810 PHY.
Fixes: 0ececcfc9267 ("net: phy: broadcom: Allow BCM54810 to use bcm54xx_adjust_rxrefclk()")
Signed-off-by: Kevin Lo <kevlo@kevlo.org>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Previously we used pcie_find_root_port() to find a Root Port from a PCIe
device and pci_find_pcie_root_port() to find a Root Port from a
Conventional PCI device.
Unify the two functions and use pcie_find_root_port() to find a Root Port
from either a Conventional PCI device or a PCIe device. Then there is no
need to distinguish the type of the device.
Link: https://lore.kernel.org/r/1589019568-5216-1-git-send-email-yangyicong@hisilicon.com
Signed-off-by: Yicong Yang <yangyicong@hisilicon.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Acked-by: Kalle Valo <kvalo@codeaurora.org> # wireless
Acked-by: Mika Westerberg <mika.westerberg@linux.intel.com> # thunderbolt
|
|
On some adjacent code, fix bad code formatting
Signed-off-by: Ariel Elior <ariel.elior@marvell.com>
Signed-off-by: Michal Kalderon <michal.kalderon@marvell.com>
Signed-off-by: Igor Russkikh <irusskikh@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
On different hardware events we have to respond differently,
on some of hardware indications hw attention (error condition)
should be cleared by the driver to continue normal functioning.
Here we introduce attention clear flags, and put them on some
important events (in aeu_descs).
Signed-off-by: Ariel Elior <ariel.elior@marvell.com>
Signed-off-by: Michal Kalderon <michal.kalderon@marvell.com>
Signed-off-by: Igor Russkikh <irusskikh@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Here we introduce qed device error tracking flags and error types.
qed_hw_err_notify is an entrace point to report errors.
It'll notify higher level drivers (qede/qedr/etc) to handle and recover
the error.
List of posible errors comes from hardware interfaces, but could be
extended in future.
Signed-off-by: Ariel Elior <ariel.elior@marvell.com>
Signed-off-by: Michal Kalderon <michal.kalderon@marvell.com>
Signed-off-by: Igor Russkikh <irusskikh@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
security_secid_to_secctx is called by the bpf_lsm hook and a successful
return value (i.e 0) implies that the parameter will be consumed by the
LSM framework. The current behaviour return success when the pointer
isn't initialized when CONFIG_BPF_LSM is enabled, with the default
return from kernel/bpf/bpf_lsm.c.
This is the internal error:
[ 1229.341488][ T2659] usercopy: Kernel memory exposure attempt detected from null address (offset 0, size 280)!
[ 1229.374977][ T2659] ------------[ cut here ]------------
[ 1229.376813][ T2659] kernel BUG at mm/usercopy.c:99!
[ 1229.378398][ T2659] Internal error: Oops - BUG: 0 [#1] PREEMPT SMP
[ 1229.380348][ T2659] Modules linked in:
[ 1229.381654][ T2659] CPU: 0 PID: 2659 Comm: systemd-journal Tainted: G B W 5.7.0-rc5-next-20200511-00019-g864e0c6319b8-dirty #13
[ 1229.385429][ T2659] Hardware name: linux,dummy-virt (DT)
[ 1229.387143][ T2659] pstate: 80400005 (Nzcv daif +PAN -UAO BTYPE=--)
[ 1229.389165][ T2659] pc : usercopy_abort+0xc8/0xcc
[ 1229.390705][ T2659] lr : usercopy_abort+0xc8/0xcc
[ 1229.392225][ T2659] sp : ffff000064247450
[ 1229.393533][ T2659] x29: ffff000064247460 x28: 0000000000000000
[ 1229.395449][ T2659] x27: 0000000000000118 x26: 0000000000000000
[ 1229.397384][ T2659] x25: ffffa000127049e0 x24: ffffa000127049e0
[ 1229.399306][ T2659] x23: ffffa000127048e0 x22: ffffa000127048a0
[ 1229.401241][ T2659] x21: ffffa00012704b80 x20: ffffa000127049e0
[ 1229.403163][ T2659] x19: ffffa00012704820 x18: 0000000000000000
[ 1229.405094][ T2659] x17: 0000000000000000 x16: 0000000000000000
[ 1229.407008][ T2659] x15: 0000000000000000 x14: 003d090000000000
[ 1229.408942][ T2659] x13: ffff80000d5b25b2 x12: 1fffe0000d5b25b1
[ 1229.410859][ T2659] x11: 1fffe0000d5b25b1 x10: ffff80000d5b25b1
[ 1229.412791][ T2659] x9 : ffffa0001034bee0 x8 : ffff00006ad92d8f
[ 1229.414707][ T2659] x7 : 0000000000000000 x6 : ffffa00015eacb20
[ 1229.416642][ T2659] x5 : ffff0000693c8040 x4 : 0000000000000000
[ 1229.418558][ T2659] x3 : ffffa0001034befc x2 : d57a7483a01c6300
[ 1229.420610][ T2659] x1 : 0000000000000000 x0 : 0000000000000059
[ 1229.422526][ T2659] Call trace:
[ 1229.423631][ T2659] usercopy_abort+0xc8/0xcc
[ 1229.425091][ T2659] __check_object_size+0xdc/0x7d4
[ 1229.426729][ T2659] put_cmsg+0xa30/0xa90
[ 1229.428132][ T2659] unix_dgram_recvmsg+0x80c/0x930
[ 1229.429731][ T2659] sock_recvmsg+0x9c/0xc0
[ 1229.431123][ T2659] ____sys_recvmsg+0x1cc/0x5f8
[ 1229.432663][ T2659] ___sys_recvmsg+0x100/0x160
[ 1229.434151][ T2659] __sys_recvmsg+0x110/0x1a8
[ 1229.435623][ T2659] __arm64_sys_recvmsg+0x58/0x70
[ 1229.437218][ T2659] el0_svc_common.constprop.1+0x29c/0x340
[ 1229.438994][ T2659] do_el0_svc+0xe8/0x108
[ 1229.440587][ T2659] el0_svc+0x74/0x88
[ 1229.441917][ T2659] el0_sync_handler+0xe4/0x8b4
[ 1229.443464][ T2659] el0_sync+0x17c/0x180
[ 1229.444920][ T2659] Code: aa1703e2 aa1603e1 910a8260 97ecc860 (d4210000)
[ 1229.447070][ T2659] ---[ end trace 400497d91baeaf51 ]---
[ 1229.448791][ T2659] Kernel panic - not syncing: Fatal exception
[ 1229.450692][ T2659] Kernel Offset: disabled
[ 1229.452061][ T2659] CPU features: 0x240002,20002004
[ 1229.453647][ T2659] Memory Limit: none
[ 1229.455015][ T2659] ---[ end Kernel panic - not syncing: Fatal exception ]---
Rework the so the default return value is -EOPNOTSUPP.
There are likely other callbacks such as security_inode_getsecctx() that
may have the same problem, and that someone that understand the code
better needs to audit them.
Thank you Arnd for helping me figure out what went wrong.
Fixes: 98e828a0650f ("security: Refactor declaration of LSM hooks")
Signed-off-by: Anders Roxell <anders.roxell@linaro.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: James Morris <jamorris@linux.microsoft.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Link: https://lore.kernel.org/bpf/20200512174607.9630-1-anders.roxell@linaro.org
|
|
Merge misc fixes from Andrew Morton:
"7 fixes"
* emailed patches from Andrew Morton <akpm@linux-foundation.org>:
kasan: add missing functions declarations to kasan.h
kasan: consistently disable debugging features
ipc/util.c: sysvipc_find_ipc() incorrectly updates position index
userfaultfd: fix remap event with MREMAP_DONTUNMAP
mm/gup: fix fixup_user_fault() on multiple retries
epoll: call final ep_events_available() check under the lock
mm, memcg: fix inconsistent oom event behavior
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace
Pull more tracing fixes from Steven Rostedt:
"Various tracing fixes:
- Fix a crash when having function tracing and function stack tracing
on the command line.
The ftrace trampolines are created as executable and read only. But
the stack tracer tries to modify them with text_poke() which
expects all kernel text to still be writable at boot. Keep the
trampolines writable at boot, and convert them to read-only with
the rest of the kernel.
- A selftest was triggering in the ring buffer iterator code, that is
no longer valid with the update of keeping the ring buffer writable
while a iterator is reading.
Just bail after three failed attempts to get an event and remove
the warning and disabling of the ring buffer.
- While modifying the ring buffer code, decided to remove all the
unnecessary BUG() calls"
* tag 'trace-v5.7-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
ring-buffer: Remove all BUG() calls
ring-buffer: Don't deactivate the ring buffer on failed iterator reads
x86/ftrace: Have ftrace trampolines turn read-only at the end of system boot up
|
|
A recent commit 9852ae3fe529 ("mm, memcg: consider subtrees in
memory.events") changed the behavior of memcg events, which will now
consider subtrees in memory.events.
But oom_kill event is a special one as it is used in both cgroup1 and
cgroup2. In cgroup1, it is displayed in memory.oom_control. The file
memory.oom_control is in both root memcg and non root memcg, that is
different with memory.event as it only in non-root memcg. That commit
is okay for cgroup2, but it is not okay for cgroup1 as it will cause
inconsistent behavior between root memcg and non-root memcg.
Here's an example on why this behavior is inconsistent in cgroup1.
root memcg
/
memcg foo
/
memcg bar
Suppose there's an oom_kill in memcg bar, then the oon_kill will be
root memcg : memory.oom_control(oom_kill) 0
/
memcg foo : memory.oom_control(oom_kill) 1
/
memcg bar : memory.oom_control(oom_kill) 1
For the non-root memcg, its memory.oom_control(oom_kill) includes its
descendants' oom_kill, but for root memcg, it doesn't include its
descendants' oom_kill. That means, memory.oom_control(oom_kill) has
different meanings in different memcgs. That is inconsistent. Then the
user has to know whether the memcg is root or not.
If we can't fully support it in cgroup1, for example by adding
memory.events.local into cgroup1 as well, then let's don't touch its
original behavior.
Fixes: 9852ae3fe529 ("mm, memcg: consider subtrees in memory.events")
Reported-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: Shakeel Butt <shakeelb@google.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Acked-by: Chris Down <chris@chrisdown.name>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: <stable@vger.kernel.org>
Link: http://lkml.kernel.org/r/20200502141055.7378-1-laoar.shao@gmail.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Blk-crypto delegates crypto operations to inline encryption hardware
when available. The separately configurable blk-crypto-fallback contains
a software fallback to the kernel crypto API - when enabled, blk-crypto
will use this fallback for en/decryption when inline encryption hardware
is not available.
This lets upper layers not have to worry about whether or not the
underlying device has support for inline encryption before deciding to
specify an encryption context for a bio. It also allows for testing
without actual inline encryption hardware - in particular, it makes it
possible to test the inline encryption code in ext4 and f2fs simply by
running xfstests with the inlinecrypt mount option, which in turn allows
for things like the regular upstream regression testing of ext4 to cover
the inline encryption code paths.
For more details, refer to Documentation/block/inline-encryption.rst.
Signed-off-by: Satya Tangirala <satyat@google.com>
Reviewed-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Whenever a device supports blk-integrity, make the kernel pretend that
the device doesn't support inline encryption (essentially by setting the
keyslot manager in the request queue to NULL).
There's no hardware currently that supports both integrity and inline
encryption. However, it seems possible that there will be such hardware
in the near future (like the NVMe key per I/O support that might support
both inline encryption and PI).
But properly integrating both features is not trivial, and without
real hardware that implements both, it is difficult to tell if it will
be done correctly by the majority of hardware that support both.
So it seems best not to support both features together right now, and
to decide what to do at probe time.
Signed-off-by: Satya Tangirala <satyat@google.com>
Reviewed-by: Eric Biggers <ebiggers@google.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
We must have some way of letting a storage device driver know what
encryption context it should use for en/decrypting a request. However,
it's the upper layers (like the filesystem/fscrypt) that know about and
manages encryption contexts. As such, when the upper layer submits a bio
to the block layer, and this bio eventually reaches a device driver with
support for inline encryption, the device driver will need to have been
told the encryption context for that bio.
We want to communicate the encryption context from the upper layer to the
storage device along with the bio, when the bio is submitted to the block
layer. To do this, we add a struct bio_crypt_ctx to struct bio, which can
represent an encryption context (note that we can't use the bi_private
field in struct bio to do this because that field does not function to pass
information across layers in the storage stack). We also introduce various
functions to manipulate the bio_crypt_ctx and make the bio/request merging
logic aware of the bio_crypt_ctx.
We also make changes to blk-mq to make it handle bios with encryption
contexts. blk-mq can merge many bios into the same request. These bios need
to have contiguous data unit numbers (the necessary changes to blk-merge
are also made to ensure this) - as such, it suffices to keep the data unit
number of just the first bio, since that's all a storage driver needs to
infer the data unit number to use for each data block in each bio in a
request. blk-mq keeps track of the encryption context to be used for all
the bios in a request with the request's rq_crypt_ctx. When the first bio
is added to an empty request, blk-mq will program the encryption context
of that bio into the request_queue's keyslot manager, and store the
returned keyslot in the request's rq_crypt_ctx. All the functions to
operate on encryption contexts are in blk-crypto.c.
Upper layers only need to call bio_crypt_set_ctx with the encryption key,
algorithm and data_unit_num; they don't have to worry about getting a
keyslot for each encryption context, as blk-mq/blk-crypto handles that.
Blk-crypto also makes it possible for request-based layered devices like
dm-rq to make use of inline encryption hardware by cloning the
rq_crypt_ctx and programming a keyslot in the new request_queue when
necessary.
Note that any user of the block layer can submit bios with an
encryption context, such as filesystems, device-mapper targets, etc.
Signed-off-by: Satya Tangirala <satyat@google.com>
Reviewed-by: Eric Biggers <ebiggers@google.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Inline Encryption hardware allows software to specify an encryption context
(an encryption key, crypto algorithm, data unit num, data unit size) along
with a data transfer request to a storage device, and the inline encryption
hardware will use that context to en/decrypt the data. The inline
encryption hardware is part of the storage device, and it conceptually sits
on the data path between system memory and the storage device.
Inline Encryption hardware implementations often function around the
concept of "keyslots". These implementations often have a limited number
of "keyslots", each of which can hold a key (we say that a key can be
"programmed" into a keyslot). Requests made to the storage device may have
a keyslot and a data unit number associated with them, and the inline
encryption hardware will en/decrypt the data in the requests using the key
programmed into that associated keyslot and the data unit number specified
with the request.
As keyslots are limited, and programming keys may be expensive in many
implementations, and multiple requests may use exactly the same encryption
contexts, we introduce a Keyslot Manager to efficiently manage keyslots.
We also introduce a blk_crypto_key, which will represent the key that's
programmed into keyslots managed by keyslot managers. The keyslot manager
also functions as the interface that upper layers will use to program keys
into inline encryption hardware. For more information on the Keyslot
Manager, refer to documentation found in block/keyslot-manager.c and
linux/keyslot-manager.h.
Co-developed-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Satya Tangirala <satyat@google.com>
Reviewed-by: Eric Biggers <ebiggers@google.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
POSIX defines faccessat() as having a fourth "flags" argument, while the
linux syscall doesn't have it. Glibc tries to emulate AT_EACCESS and
AT_SYMLINK_NOFOLLOW, but AT_EACCESS emulation is broken.
Add a new faccessat(2) syscall with the added flags argument and implement
both flags.
The value of AT_EACCESS is defined in glibc headers to be the same as
AT_REMOVEDIR. Use this value for the kernel interface as well, together
with the explanatory comment.
Also add AT_EMPTY_PATH support, which is not documented by POSIX, but can
be useful and is trivial to implement.
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
|
|
Systemd is hacking around to get it and it's trivial to add to statx, so...
Cc: linux-api@vger.kernel.org
Cc: linux-man@vger.kernel.org
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
|
|
If mounts are deleted after a read(2) call on /proc/self/mounts (or its
kin), the subsequent read(2) could miss a mount that comes after the
deleted one in the list. This is because the file position is interpreted
as the number mount entries from the start of the list.
E.g. first read gets entries #0 to #9; the seq file index will be 10. Then
entry #5 is deleted, resulting in #10 becoming #9 and #11 becoming #10,
etc... The next read will continue from entry #10, and #9 is missed.
Solve this by adding a cursor entry for each open instance. Taking the
global namespace_sem for write seems excessive, since we are only dealing
with a per-namespace list. Instead add a per-namespace spinlock and use
that together with namespace_sem taken for read to protect against
concurrent modification of the mount list. This may reduce parallelism of
is_local_mountpoint(), but it's hardly a big contention point. We could
also use RCU freeing of cursors to make traversal not need additional
locks, if that turns out to be neceesary.
Only move the cursor once for each read (cursor is not added on open) to
minimize cacheline invalidation. When EOF is reached, the cursor is taken
off the list, in order to prevent an excessive number of cursors due to
inactive open file descriptors.
Reported-by: Karel Zak <kzak@redhat.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
|
|
Whiteouts, unlike real device node should not require privileges to create.
The general concern with device nodes is that opening them can have side
effects. The kernel already avoids zero major (see
Documentation/admin-guide/devices.txt). To be on the safe side the patch
explicitly forbids registering a char device with 0/0 number (see
cdev_add()).
This guarantees that a non-O_PATH open on a whiteout will fail with ENODEV;
i.e. it won't have any side effect.
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
|
|
blk_io_schedule() isn't called from performance sensitive code path, and
it is easier to maintain by exporting it as symbol.
Also blk_io_schedule() is only called by CONFIG_BLOCK code, so it is safe
to do this way. Meantime fixes build failure when CONFIG_BLOCK is off.
Cc: Christoph Hellwig <hch@infradead.org>
Fixes: e6249cdd46e4 ("block: add blk_io_schedule() for avoiding task hung in sync dio")
Reported-by: Satya Tangirala <satyat@google.com>
Tested-by: Satya Tangirala <satyat@google.com>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Delete the duplicate "to", possibly double-typed.
Signed-off-by: Wang Wenhu <wenhu.wang@vivo.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
|
|
It adds an unpack only function for DRM infoframe for dynamic range and
mastering infoframe readout.
It unpacks the information data block contained in the binary buffer into
a structured frame of the HDMI Dynamic Range and Mastering (DRM)
information frame.
In contrast to hdmi_drm_infoframe_unpack() function, it does not verify
a checksum.
It can be used for unpacking a DP HDR Metadata Infoframe SDP case.
DP HDR Metadata Infoframe SDP uses the same Dynamic Range and Mastering
(DRM) information (CTA-861-G spec.) such as HDMI DRM infoframe.
But DP SDP header and payload structure are different from HDMI DRM
Infoframe. Therefore unpacking DRM infoframe for DP requires skipping of
a verifying checksum.
v9: Add clear comments to hdmi_drm_infoframe_unpack_only() and
hdmi_drm_infoframe_unpack() (Laurent Pinchart)
Signed-off-by: Gwan-gyeong Mun <gwan-gyeong.mun@intel.com>
Reviewed-by: Uma Shankar <uma.shankar@intel.com>
Cc: Laurent Pinchart <laurent.pinchart@ideasonboard.com>
Cc: Ville Syrjala <ville.syrjala@linux.intel.com>
Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200514060732.3378396-2-gwan-gyeong.mun@intel.com
|
|
While debugging a boot failure, the following unknown error record was
seen in the boot logs.
<...>
BERT: Error records from previous boot:
[Hardware Error]: event severity: fatal
[Hardware Error]: Error 0, type: fatal
[Hardware Error]: section type: unknown, 81212a96-09ed-4996-9471-8d729c8e69ed
[Hardware Error]: section length: 0x290
[Hardware Error]: 00000000: 00000001 00000000 00000000 00020002 ................
[Hardware Error]: 00000010: 00020002 0000001f 00000320 00000000 ........ .......
[Hardware Error]: 00000020: 00000000 00000000 00000000 00000000 ................
[Hardware Error]: 00000030: 00000000 00000000 00000000 00000000 ................
<...>
On further investigation, it was found that the error record with
UUID (81212a96-09ed-4996-9471-8d729c8e69ed) has been defined in the
UEFI Specification at least since v2.4 and has recently had additional
fields defined in v2.7 Section N.2.10 Firmware Error Record Reference.
Add support for parsing and printing the defined fields to give users
a chance to figure out what went wrong.
Signed-off-by: Punit Agrawal <punit1.agrawal@toshiba.co.jp>
Cc: Ard Biesheuvel <ardb@kernel.org>
Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: James Morse <james.morse@arm.com>
Cc: linux-acpi@vger.kernel.org
Cc: linux-efi@vger.kernel.org
Link: https://lore.kernel.org/r/20200512045502.3810339-1-punit1.agrawal@toshiba.co.jp
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
|
|
git://anongit.freedesktop.org/tegra/linux into drm-fixes
drm/tegra: Fixes for v5.7
This contains a pair of patches which fix SMMU support on Tegra124 and
Tegra210 for host1x and the Tegra DRM driver.
Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Thierry Reding <thierry.reding@gmail.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200508101355.3031268-1-thierry.reding@gmail.com
|
|
Most modern broadcom PHYs support ECD (enhanced cable diagnostics). Add
support for it in the bcm-phy-lib so they can easily be used in the PHY
driver.
There are two access methods for ECD: legacy by expansion registers and
via the new RDB registers which are exclusive. Provide functions in two
variants where the PHY driver can choose from. To keep things simple for
now, we just switch the register access to expansion registers in the
RDB variant for now. On the flipside, we have to keep a bus lock to
prevent any other non-legacy access on the PHY.
The results of the intra-pair tests are inconclusive (at least for the
BCM54140). Most of the times half the length is reported but sometimes
the length is correct.
Signed-off-by: Michael Walle <michael@walle.cc>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Commit b121b341e598 ("bpf: Add PTR_TO_BTF_ID_OR_NULL
support") adds a field btf_id_or_null_non0_off to
bpf_prog->aux structure to indicate that the
first ctx argument is PTR_TO_BTF_ID reg_type and
all others are PTR_TO_BTF_ID_OR_NULL.
This approach does not really scale if we have
other different reg types in the future, e.g.,
a pointer to a buffer.
This patch enables bpf_iter targets registering ctx argument
reg types which may be different from the default one.
For example, for pointers to structures, the default reg_type
is PTR_TO_BTF_ID for tracing program. The target can register
a particular pointer type as PTR_TO_BTF_ID_OR_NULL which can
be used by the verifier to enforce accesses.
Signed-off-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Andrii Nakryiko <andriin@fb.com>
Link: https://lore.kernel.org/bpf/20200513180221.2949882-1-yhs@fb.com
|
|
Change func bpf_iter_unreg_target() parameter from target
name to target reg_info, similar to bpf_iter_reg_target().
Signed-off-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Andrii Nakryiko <andriin@fb.com>
Link: https://lore.kernel.org/bpf/20200513180220.2949737-1-yhs@fb.com
|
|
Currently bpf_iter_reg_target takes parameters from target
and allocates memory to save them. This is really not
necessary, esp. in the future we may grow information
passed from targets to bpf_iter manager.
The patch refactors the code so target reg_info
becomes static and bpf_iter manager can just take
a reference to it.
Signed-off-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20200513180219.2949605-1-yhs@fb.com
|
|
This is to be consistent with tracing and lsm programs
which have prefix "bpf_trace_" and "bpf_lsm_" respectively.
Suggested-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Andrii Nakryiko <andriin@fb.com>
Link: https://lore.kernel.org/bpf/20200513180216.2949387-1-yhs@fb.com
|
|
From the mlx5-next branch at
git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux
Required for dependencies in following patches
* branch 'mellanox/mlx5-next':
net/mlx5: Add support in forward to namespace
{IB/net}/mlx5: Simplify don't trap code
net/mlx5: Replace zero-length array with flexible-array
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
|
|
There's only a single caller of vexpress_config_set_master() from
vexpress-sysreg.c. Let's just make the registers needed available to
vexpress-config and move all the code there. The registers needed aren't
used anywhere else either. With this, we can get rid of the private API
between these 2 drivers.
Cc: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Cc: Linus Walleij <linus.walleij@linaro.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Acked-by: Liviu Dudau <liviu.dudau@arm.com>
Acked-by: Sudeep Holla <sudeep.holla@arm.com>
Acked-by: Lee Jones <lee.jones@linaro.org>
Signed-off-by: Rob Herring <robh@kernel.org>
|
|
The only thing that vexpress-syscfg does is provide a regmap to
vexpress-config bus child devices. There's little reason to have 2
components for this. The current structure with initcall ordering
requirements makes turning these components into modules more difficult.
So let's start to simplify things and merge vexpress-syscfg into
vexpress-config. There's no functional change in this commit and it's
still separate components until subsequent commits.
Cc: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Cc: Linus Walleij <linus.walleij@linaro.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Sudeep Holla <sudeep.holla@arm.com>
Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Acked-by: Liviu Dudau <liviu.dudau@arm.com>
Signed-off-by: Rob Herring <robh@kernel.org>
|
|
The use of any sort of waitqueue (simple or regular) for
wait/waking vcpus has always been an overkill and semantically
wrong. Because this is per-vcpu (which is blocked) there is
only ever a single waiting vcpu, thus no need for any sort of
queue.
As such, make use of the rcuwait primitive, with the following
considerations:
- rcuwait already provides the proper barriers that serialize
concurrent waiter and waker.
- Task wakeup is done in rcu read critical region, with a
stable task pointer.
- Because there is no concurrency among waiters, we need
not worry about rcuwait_wait_event() calls corrupting
the wait->task. As a consequence, this saves the locking
done in swait when modifying the queue. This also applies
to per-vcore wait for powerpc kvm-hv.
The x86 tscdeadline_latency test mentioned in 8577370fb0cb
("KVM: Use simple waitqueue for vcpu->wq") shows that, on avg,
latency is reduced by around 15-20% with this change.
Cc: Paul Mackerras <paulus@ozlabs.org>
Cc: kvmarm@lists.cs.columbia.edu
Cc: linux-mips@vger.kernel.org
Reviewed-by: Marc Zyngier <maz@kernel.org>
Signed-off-by: Davidlohr Bueso <dbueso@suse.de>
Message-Id: <20200424054837.5138-6-dave@stgolabs.net>
[Avoid extra logic changes. - Paolo]
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
This call is lockless and thus should not be trusted blindly.
For example, the return value of rcuwait_wakeup() should be used to
track whether a process was woken up.
Signed-off-by: Davidlohr Bueso <dbueso@suse.de>
Message-Id: <20200424054837.5138-5-dave@stgolabs.net>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
This allows further flexibility for some callers to implement
ad-hoc versions of the generic rcuwait_wait_event(). For example,
kvm will need this to maintain tracing semantics. The naming
is of course similar to what waitqueue apis offer.
Also go ahead and make use of rcu_assign_pointer() for both task
writes as it will make the __rcu sparse people happy - this will
be the special nil case, thus no added serialization.
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Davidlohr Bueso <dbueso@suse.de>
Message-Id: <20200424054837.5138-4-dave@stgolabs.net>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
Propagating the return value of wake_up_process() back to the caller
can come in handy for future users, such as for statistics or
accounting purposes.
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Davidlohr Bueso <dbueso@suse.de>
Message-Id: <20200424054837.5138-3-dave@stgolabs.net>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
|
|
Currently, fs_core supports rule of forward the traffic
to continue matching in the next priority, now we add support
to forward the traffic matching in the next namespace.
Signed-off-by: Maor Gottlieb <maorg@mellanox.com>
Reviewed-by: Mark Bloch <markb@mellanox.com>
Reviewed-by: Mark Zhang <markz@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
|
|
DCACHE_DONTCACHE indicates a dentry should not be cached on final
dput().
Also add a helper function to mark DCACHE_DONTCACHE on all dentries
pointing to a specific inode when that inode is being set I_DONTCACHE.
This facilitates dropping dentry references to inodes sooner which
require eviction to swap S_DAX mode.
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Ira Weiny <ira.weiny@intel.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
|