Age | Commit message (Collapse) | Author |
|
Enable unprivileged user namespace mounts of overlayfs. Overlayfs's
permission model (*) ensures that the mounter itself cannot gain additional
privileges by the act of creating an overlayfs mount.
This feature request is coming from the "rootless" container crowd.
(*) Documentation/filesystems/overlayfs.txt#Permission model
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
|
|
When looking up an inode on the lower layer for which the mounter lacks
read permisison the metacopy check will fail. This causes the lookup to
fail as well, even though the directory is readable.
So ignore EACCES for the "userxattr" case and assume no metacopy for the
unreadable file.
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
|
|
In case the file cannot be opened with O_NOATIME because of lack of
capabilities, then clear O_NOATIME instead of failing.
Remove WARN_ON(), since it would now trigger if O_NOATIME was cleared.
Noticed by Amir Goldstein.
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
|
|
Comment above call already says this, but only EOPNOTSUPP is ignored, other
failures are not.
For example setting "user.*" will fail with EPERM on symlink/special.
Ignore this error as well.
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
|
|
Optionally allow using "user.overlay." namespace instead of
"trusted.overlay."
This is necessary for overlayfs to be able to be mounted in an unprivileged
namepsace.
Make the option explicit, since it makes the filesystem format be
incompatible.
Disable redirect_dir and metacopy options, because these would allow
privilege escalation through direct manipulation of the
"user.overlay.redirect" or "user.overlay.metacopy" xattrs.
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Reviewed-by: Amir Goldstein <amir73il@gmail.com>
|
|
generic_file_splice_read() and iter_file_splice_write() will call back into
f_op->iter_read() and f_op->iter_write() respectively. These already do
the real file lookup and cred override. So the code in ovl_splice_read()
and ovl_splice_write() is redundant.
In addition the ovl_file_accessed() call in ovl_splice_write() is
incorrect, though probably harmless.
Fix by calling generic_file_splice_read() and iter_file_splice_write()
directly.
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
|
|
ovl_ioctl_set_flags() does a capability check using flags, but then the
real ioctl double-fetches flags and uses potentially different value.
The "Check the capability before cred override" comment misleading: user
can skip this check by presenting benign flags first and then overwriting
them to non-benign flags.
Just remove the cred override for now, hoping this doesn't cause a
regression.
The proper solution is to create a new setxflags i_op (patches are in the
works).
Xfstests don't show a regression.
Reported-by: Dmitry Vyukov <dvyukov@google.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Reviewed-by: Amir Goldstein <amir73il@gmail.com>
Fixes: dab5ca8fd9dd ("ovl: add lsattr/chattr support")
Cc: <stable@vger.kernel.org> # v4.19
|
|
CAP_DAC_READ_SEARCH is required by open_by_handle_at(2) so check it in
ovl_decode_real_fh() as well to prevent privilege escalation for
unprivileged overlay mounts.
[Amir] If the mounter is not capable in init ns, ovl_check_origin() and
ovl_verify_index() will not function as expected and this will break index
and nfs export features. So check capability in ovl_can_decode_fh(), to
auto disable those features.
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
|
|
Call remap_verify_area() on the source file as well as the destination.
When called from vfs_dedupe_file_range() the check as already been
performed, but not so if called from layered fs (overlayfs, etc...)
Could ommit the redundant check in vfs_dedupe_file_range(), but leave for
now to get error early (for fear of breaking backward compatibility).
This call shouldn't be performance sensitive.
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
|
|
cap_convert_nscap() does permission checking as well as conversion of the
xattr value conditionally based on fs's user-ns.
This is needed by overlayfs and probably other layered fs (ecryptfs) and is
what vfs_foo() is supposed to do anyway.
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Acked-by: James Morris <jamorris@linux.microsoft.com>
|
|
|
|
|
|
The m_can driver's suspend and resume functions (m_can_class_suspend() and
m_can_class_resume()) make use of dev_get_drvdata() and assume that the drvdata
is a pointer to the struct net_device.
With upcoming conversion of the tcan4x5x driver to pm_runtime this assumption
is no longer valid. As the suspend and resume functions actually need a struct
m_can_classdev pointer, change the m_can_platform and the m_can_pci driver to
hold a pointer to struct m_can_classdev instead, as the tcan4x5x driver already
does.
Link: https://lore.kernel.org/r/20201212175518.139651-8-mkl@pengutronix.de
Reviewed-by: Sean Nyekjaer <sean@geanix.com>
Reviewed-by: Dan Murphy <dmurphy@ti.com>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
|
|
This patch enhances m_can_class_allocate_dev() to allocate driver specific
private data. The driver's private data struct must contain struct
m_can_classdev as its first member followed by the remaining private data.
Link: https://lore.kernel.org/r/20201212175518.139651-7-mkl@pengutronix.de
Reviewed-by: Sean Nyekjaer <sean@geanix.com>
Reviewed-by: Dan Murphy <dmurphy@ti.com>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
|
|
With patch
| dd8088d5a896 PM: runtime: Add pm_runtime_resume_and_get to deal with usage counter
the usual pm_runtime_get_sync() and pm_runtime_put_noidle() in-case-of-error
dance is no longer needed. Convert the m_can driver to use this function.
Link: https://lore.kernel.org/r/20201212175518.139651-6-mkl@pengutronix.de
Reviewed-by: Sean Nyekjaer <sean@geanix.com>
Reviewed-by: Dan Murphy <dmurphy@ti.com>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
|
|
The function m_can_config_endisable() is not used outside of the m_can driver,
so mark it as static.
Link: https://lore.kernel.org/r/20201212175518.139651-5-mkl@pengutronix.de
Reviewed-by: Sean Nyekjaer <sean@geanix.com>
Reviewed-by: Dan Murphy <dmurphy@ti.com>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
|
|
This patch coverts the m_can driver to use cdev as name for struct
m_can_classdev uniformly throughout the whole driver.
Link: https://lore.kernel.org/r/20201212175518.139651-4-mkl@pengutronix.de
Reviewed-by: Sean Nyekjaer <sean@geanix.com>
Reviewed-by: Dan Murphy <dmurphy@ti.com>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
|
|
This patch converts the indention in the m_can driver to kernel coding style.
Link: https://lore.kernel.org/r/20201212175518.139651-3-mkl@pengutronix.de
Reviewed-by: Sean Nyekjaer <sean@geanix.com>
Reviewed-by: Dan Murphy <dmurphy@ti.com>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
|
|
Old versions of the user manual are regularly depublished, so change link to
the linux-can github page, which has a mirror off all published datasheets.
Link: https://lore.kernel.org/r/20201212175518.139651-2-mkl@pengutronix.de
Reviewed-by: Sean Nyekjaer <sean@geanix.com>
Reviewed-by: Dan Murphy <dmurphy@ti.com>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
|
|
The kernel calls these functions on CPU online and hence they must not
be marked __init.
Otherwise if the memory they occupied has been reused the system can
crash in various ways. Sachin reported it caused his LPAR to
spontaneously restart with no other output. With xmon enabled it may
drop into xmon with a dump like:
cpu 0x1: Vector: 700 (Program Check) at [c000000003c5fcb0]
pc: 00000000011e0a78
lr: 00000000011c51d4
sp: c000000003c5ff50
msr: 8000000000081001
current = 0xc000000002c12b00
paca = 0xc000000003cff280 irqmask: 0x03 irq_happened: 0x01
pid = 0, comm = swapper/1
...
[c000000003c5ff50] 0000000000087c38 (unreliable)
[c000000003c5ff70] 000000000003870c
[c000000003c5ff90] 000000000000d108
Fixes: 3b47b7549ead ("powerpc/book3s64/kuap: Move KUAP related function outside radix")
Reported-by: Sachin Sant <sachinp@linux.vnet.ibm.com>
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
[mpe: Expand change log with details and xmon output]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20201214080121.358567-1-aneesh.kumar@linux.ibm.com
|
|
We have no way of tracking server READ_PLUS support in pNFS for now, so
just disable it.
Reported-by: "Mkrtchyan, Tigran" <tigran.mkrtchyan@desy.de>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
|
|
If the server returns more data than we have buffer space for, then
we need to truncate and exit early.
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
|
|
Expanding the READ_PLUS extents can cause the read buffer to overflow.
If it does, then don't error, but just exit early.
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
|
|
If a hole extends beyond the READ_PLUS read buffer, then we want to fill
just the remaining buffer with zeros. Also ignore eof...
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
|
|
The server is allowed to return a hole extent with an offset that starts
before the offset supplied in the READ_PLUS argument. Ensure that we
support that case too.
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
|
|
All XDR opaque object sizes are 32-bit aligned, and a data segment is no
exception.
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
|
|
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
|
|
If we're shifting the page data to the right, and this happens to be a
sparse page array, then we may need to allocate new pages in order to
receive the data.
Reported-by: "Mkrtchyan, Tigran" <tigran.mkrtchyan@desy.de>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
|
|
There are a number of xdr helpers for struct xdr_buf that do not change
the structure itself. Mark those as taking const pointers for
documentation purposes.
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
|
|
Move the setting of the xdr_stream 'nwords' field into the helpers that
reset the xdr_stream cursor.
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
|
|
Clean up callers of _copy_to/from_pages() that still check for a zero
length.
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
|
|
Clean up xdr_shrink_bufhead() to use the new helpers instead of doing
its own thing.
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
|
|
We do want to try to grow the buffer if possible, but if that attempt
fails, we still want to move the data and truncate the XDR message.
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
|
|
The main use case right now for xdr_align_data() is to shift the page
data to the left, and in practice shrink the total XDR data buffer.
This patch ensures that we fix up the accounting for the buffer length
as we shift that data around.
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
|
|
Exit early if the shift is zero.
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
|
|
Use the existing BITS_PER_LONG macro instead of calculating the value.
Signed-off-by: Geliang Tang <geliangtang@gmail.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
|
|
Olga K. observed that rpcrdma_marsh_req() allocates sparse pages
only when it has determined that a Reply chunk is necessary. There
are plenty of cases where no Reply chunk is needed, but the
XDRBUF_SPARSE_PAGES flag is set. The result would be a crash in
rpcrdma_inline_fixup() when it tries to copy parts of the received
Reply into a missing page.
To avoid crashing, handle sparse page allocation up front.
Until XATTR support was added, this issue did not appear often
because the only SPARSE_PAGES consumer always expected a reply large
enough to always require a Reply chunk.
Reported-by: Olga Kornievskaia <kolga@netapp.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
|
|
XDRBUF_SPARSE_PAGES can cause problems for the RDMA transport,
and it's easy enough to allocate enough pages for the request
up front, so do that.
Also, since we've allocated the pages anyway, use the full
page aligned length for the receive buffer. This will allow
caching of valid replies that are too large for the caller,
but that still fit in the allocated pages.
Signed-off-by: Frank van der Linden <fllinden@amazon.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
|
|
When receiving pages data, return value 'ret' when positive includes
`buf->page_base`, so we should subtract that before it is used for
changing `offset` and comparing against `want`.
This was discovered on the very rare cases where the server returned a
chunk of bytes that when added to the already received amount of bytes
for the pages happened to match the current `recv.len`, for example
on this case:
buf->page_base : 258356
actually received from socket: 1740
ret : 260096
want : 260096
In this case neither of the two 'if ... goto out' trigger, and we
continue to tail parsing.
Worth to mention that the ensuing EMSGSIZE from the continued execution of
`xs_read_xdr_buf` may be observed by an application due to 4 superfluous
bytes being added to the pages data.
Fixes: 277e4ab7d530 ("SUNRPC: Simplify TCP receive code by switching to using iterators")
Signed-off-by: Dan Aloni <dan@kernelim.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
|
|
edac-updates-for-v5.11
Signed-off-by: Borislav Petkov <bp@suse.de>
|
|
When xen_blkif_disconnect() is called, the kernel thread behind the
block interface is stopped by calling kthread_stop(ring->xenblkd).
The ring->xenblkd thread pointer being non-NULL determines if the
thread has been already stopped.
Normally, the thread's function xen_blkif_schedule() sets the
ring->xenblkd to NULL, when the thread's main loop ends.
However, when the thread has not been started yet (i.e.
wake_up_process() has not been called on it), the xen_blkif_schedule()
function would not be called yet.
In such case the kthread_stop() call returns -EINTR and the
ring->xenblkd remains dangling.
When this happens, any consecutive call to xen_blkif_disconnect (for
example in frontend_changed() callback) leads to a kernel crash in
kthread_stop() (e.g. NULL pointer dereference in exit_creds()).
This is XSA-350.
Cc: <stable@vger.kernel.org> # 4.12
Fixes: a24fa22ce22a ("xen/blkback: don't use xen_blkif_get() in xen-blkback kthread")
Reported-by: Olivier Benjamin <oliben@amazon.com>
Reported-by: Pawel Wieczorkiewicz <wipawel@amazon.de>
Signed-off-by: Pawel Wieczorkiewicz <wipawel@amazon.de>
Reviewed-by: Julien Grall <jgrall@amazon.com>
Reviewed-by: Juergen Gross <jgross@suse.com>
Signed-off-by: Juergen Gross <jgross@suse.com>
|
|
'xenbus_backend' watches 'state' of devices, which is writable by
guests. Hence, if guests intensively updates it, dom0 will have lots of
pending events that exhausting memory of dom0. In other words, guests
can trigger dom0 memory pressure. This is known as XSA-349. However,
the watch callback of it, 'frontend_changed()', reads only 'state', so
doesn't need to have the pending events.
To avoid the problem, this commit disallows pending watch messages for
'xenbus_backend' using the 'will_handle()' watch callback.
This is part of XSA-349
Cc: stable@vger.kernel.org
Signed-off-by: SeongJae Park <sjpark@amazon.de>
Reported-by: Michael Kurth <mku@amazon.de>
Reported-by: Pawel Wieczorkiewicz <wipawel@amazon.de>
Reviewed-by: Juergen Gross <jgross@suse.com>
Signed-off-by: Juergen Gross <jgross@suse.com>
|
|
This commit adds a counter of pending messages for each watch in the
struct. It is used to skip unnecessary pending messages lookup in
'unregister_xenbus_watch()'. It could also be used in 'will_handle'
callback.
This is part of XSA-349
Cc: stable@vger.kernel.org
Signed-off-by: SeongJae Park <sjpark@amazon.de>
Reported-by: Michael Kurth <mku@amazon.de>
Reported-by: Pawel Wieczorkiewicz <wipawel@amazon.de>
Reviewed-by: Juergen Gross <jgross@suse.com>
Signed-off-by: Juergen Gross <jgross@suse.com>
|
|
This commit adds support of the 'will_handle' watch callback for
'xen_bus_type' users.
This is part of XSA-349
Cc: stable@vger.kernel.org
Signed-off-by: SeongJae Park <sjpark@amazon.de>
Reported-by: Michael Kurth <mku@amazon.de>
Reported-by: Pawel Wieczorkiewicz <wipawel@amazon.de>
Reviewed-by: Juergen Gross <jgross@suse.com>
Signed-off-by: Juergen Gross <jgross@suse.com>
|
|
Some code does not directly make 'xenbus_watch' object and call
'register_xenbus_watch()' but use 'xenbus_watch_path()' instead. This
commit adds support of 'will_handle' callback in the
'xenbus_watch_path()' and it's wrapper, 'xenbus_watch_pathfmt()'.
This is part of XSA-349
Cc: stable@vger.kernel.org
Signed-off-by: SeongJae Park <sjpark@amazon.de>
Reported-by: Michael Kurth <mku@amazon.de>
Reported-by: Pawel Wieczorkiewicz <wipawel@amazon.de>
Reviewed-by: Juergen Gross <jgross@suse.com>
Signed-off-by: Juergen Gross <jgross@suse.com>
|
|
If handling logics of watch events are slower than the events enqueue
logic and the events can be created from the guests, the guests could
trigger memory pressure by intensively inducing the events, because it
will create a huge number of pending events that exhausting the memory.
Fortunately, some watch events could be ignored, depending on its
handler callback. For example, if the callback has interest in only one
single path, the watch wouldn't want multiple pending events. Or, some
watches could ignore events to same path.
To let such watches to volutarily help avoiding the memory pressure
situation, this commit introduces new watch callback, 'will_handle'. If
it is not NULL, it will be called for each new event just before
enqueuing it. Then, if the callback returns false, the event will be
discarded. No watch is using the callback for now, though.
This is part of XSA-349
Cc: stable@vger.kernel.org
Signed-off-by: SeongJae Park <sjpark@amazon.de>
Reported-by: Michael Kurth <mku@amazon.de>
Reported-by: Pawel Wieczorkiewicz <wipawel@amazon.de>
Reviewed-by: Juergen Gross <jgross@suse.com>
Signed-off-by: Juergen Gross <jgross@suse.com>
|
|
syzbot spotted a potential out-of-bounds shift in the PCM OSS layer
where it calculates the buffer size with the arbitrary shift value
given via an ioctl.
Add a range check for avoiding the undefined behavior.
As the value can be treated by a signed integer, the max shift should
be 30.
Reported-by: syzbot+df7dc146ebdd6435eea3@syzkaller.appspotmail.com
Cc: <stable@vger.kernel.org>
Link: https://lore.kernel.org/r/20201209084552.17109-2-tiwai@suse.de
Signed-off-by: Takashi Iwai <tiwai@suse.de>
|
|
syzbot spotted a potential out-of-bounds shift in the USB-audio format
parser that receives the arbitrary shift value from the USB
descriptor.
Add a range check for avoiding the undefined behavior.
Reported-by: syzbot+df7dc146ebdd6435eea3@syzkaller.appspotmail.com
Cc: <stable@vger.kernel.org>
Link: https://lore.kernel.org/r/20201209084552.17109-1-tiwai@suse.de
Signed-off-by: Takashi Iwai <tiwai@suse.de>
|
|
Signed-off-by: Takashi Iwai <tiwai@suse.de>
|
|
Use ERR_CAST() when devm_ioremap_resource() fails.
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Peter Ujfalusi <peter.ujfalusi@ti.com>
Link: https://lore.kernel.org/r/20201214065421.5138-1-peter.ujfalusi@ti.com
Signed-off-by: Vinod Koul <vkoul@kernel.org>
|