Age | Commit message (Collapse) | Author |
|
pfifo_fast used to drop based on qdisc_dev(qdisc)->tx_queue_len,
so we have to resize skb array when we change tx_queue_len.
Other qdiscs which read tx_queue_len are fine because they
all save it to sch->limit or somewhere else in qdisc during init.
They don't have to implement this, it is nicer if they do so
that users don't have to re-configure qdisc after changing
tx_queue_len.
Cc: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Introduce a new qdisc ops ->change_tx_queue_len() so that
each qdisc could decide how to implement this if it wants.
Previously we simply read dev->tx_queue_len, after pfifo_fast
switches to skb array, we need this API to resize the skb array
when we change dev->tx_queue_len.
To avoid handling race conditions with TX BH, we need to
deactivate all TX queues before change the value and bring them
back after we are done, this also makes implementation easier.
Cc: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
This patch promotes the local change_tx_queue_len() to a core
helper function, dev_change_tx_queue_len(), so that rtnetlink
and net-sysfs could share the code. This also prepares for the
following patch.
Note, the -EFAULT in the original code doesn't make sense,
we should propagate the errno from notifiers.
Cc: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound
Pull sound updates from Takashi Iwai:
"The major changes in the core API side in this cycle are the still
on-going ASoC componentization works. Other than that, only few small
changes such as 20bit PCM format support are found.
Meanwhile the rest majority of changes are for ASoC drivers:
- Large cleanups of some of the TI CODEC drivers
- Continued work on Intel ASoC stuff for new quirks, ACPI GPIO
handling, Kconfigs and lots of cleanups
- Refactoring of the Freescale SSI driver, as preliminary work for
the upcoming changes
- Work on ST DFSDM driver, including the required IIO patches
- New drivers for Allwinner A83T, Maxim MAX89373, SocioNext UiniPhier
EVEA Tempo Semiconductor TSCS42xx and TI PCM816x, TAS5722 and
TAS6424 devices
- Removal of dead codes for SN95031 and board drivers
Last but not least, a few HD-audio and USB-audio quirks are included
as usual, too"
* tag 'sound-4.16-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound: (303 commits)
ALSA: hda - Reduce the suspend time consumption for ALC256
ASoC: use seq_file to dump the contents of dai_list,platform_list and codec_list
ASoC: soc-core: add missing EXPORT_SYMBOL_GPL() for snd_soc_rtdcom_lookup
IIO: ADC: stm32-dfsdm: remove unused variable again
ASoC: bcm2835: fix hw_params error when device is in prepared state
ASoC: mxs-sgtl5000: Do not print error on probe deferral
ASoC: sgtl5000: Do not print error on probe deferral
ASoC: Intel: remove select on non-existing SND_SOC_INTEL_COMMON
ALSA: usb-audio: Support changing input on Sound Blaster E1
ASoC: Intel: remove second duplicated assignment to pointer 'res'
ALSA: hda/realtek - update ALC215 depop optimize
ALSA: hda/realtek - Support headset mode for ALC215/ALC285/ALC289
ALSA: pcm: Fix trailing semicolon
ASoC: add Component level .read/.write
ASoC: cx20442: fix regression by adding back .read/.write
ASoC: uda1380: fix regression by adding back .read/.write
ASoC: tlv320dac33: fix regression by adding back .read/.write
ALSA: hda - Use IS_REACHABLE() for dependency on input
IIO: ADC: stm32-dfsdm: fix static check warning
IIO: ADC: stm32-dfsdm: code optimization
...
|
|
Should check result of kstrndup() in case of memory allocation failure.
Signed-off-by: Chengguang Xu <cgxu519@icloud.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
|
|
ceph_reserve_caps() may not reserve enough caps under high memory
pressure, but it saved the needed caps number that expected to
be reserved. When getting caps, crash would happen due to number
mismatch.
Now we will try to trim more caps when failing to allocate memory
for caps need to be reserved, then try again. If still failing to
allocate memory, return -ENOMEM.
Signed-off-by: Zhi Zhang <zhang.david2011@gmail.com>
Reviewed-by: "Yan, Zheng" <zyan@redhat.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
|
|
When called with CHECK_CAPS_AUTHONLY flag, ceph_check_caps() only
processes auth caps. In that case, it's unsafe to remove inode
from mdsc->cap_delay_list, because there can be delayed non-auth
caps.
Besides, ceph_check_caps() may lock/unlock i_ceph_lock several
times, when multiple threads call ceph_check_caps() at the same
time. It's possible that one thread calls __cap_delay_requeue(),
another thread calls __cap_delay_cancel(). __cap_delay_cancel()
should be called at very beginning of ceph_check_caps(), so that
it does not race with __cap_delay_requeue().
Signed-off-by: "Yan, Zheng" <zyan@redhat.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
|
|
"revoking & (CEPH_CAP_FILE_CACHE|CEPH_CAP_FILE_LAZYIO)" has already
been tested before calling try_nonblocking_invalidate()
Signed-off-by: "Yan, Zheng" <zyan@redhat.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
|
|
Signed-off-by: "Yan, Zheng" <zyan@redhat.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
|
|
Signed-off-by: "Yan, Zheng" <zyan@redhat.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
|
|
Signed-off-by: "Yan, Zheng" <zyan@redhat.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
|
|
Previously ceph_read_iter() uses current->journal to pass context info
to ceph_readpages(), so that ceph_readpages() can distinguish read(2)
from readahead(2)/fadvise(2)/madvise(2). The problem is that page fault
can happen when copying data to userspace memory. Page fault may call
other filesystem's page_mkwrite() if the userspace memory is mapped to a
file. The later filesystem may also want to use current->journal.
The fix is define a on-stack data structure in ceph_read_iter(), add it
to context list in ceph_file_info. ceph_readpages() searches the list,
find if there is a context belongs to current thread.
Signed-off-by: "Yan, Zheng" <zyan@redhat.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
|
|
Readdir cache keeps array of dentry pointers in page cache. If any
dentry in readdir cache gets pruned, ceph_d_prune() disables readdir
cache for later readdir syscall. The problem is that ceph_d_prune()
ignores unhashed dentry. Ideally MDS should have already revoked
CEPH_CAP_FILE_SHARED (which also disables readdir cache) when dentry
gets unhashed. But if it is somehow MDS does not properly revoke
CEPH_CAP_FILE_SHARED and the unhashed dentry gets pruned later,
ceph_d_prune() will not disable readdir cache, later readdir may
reference invalid dentry pointer.
The fix is make ceph_d_prune() do extra check for unhashed dentry.
Disable readdir cache if the unhashed dentry is still referenced
by readdir cache.
Another fix in this patch is handle d_splice_alias(). If a dentry
gets spliced into new parent dentry, treat it as if it was pruned
(call ceph_d_prune() for it).
Signed-off-by: "Yan, Zheng" <zyan@redhat.com>
Acked-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
|
|
It allows accessing i_shared_gen without holding i_ceph_lock. It is
preparation for later patch.
Signed-off-by: "Yan, Zheng" <zyan@redhat.com>
Acked-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
|
|
ceph_fill_trace() already calls ceph_invalidate_dir_request() for
traceless reply. No need to duplicate the code in ceph_rename().
Signed-off-by: "Yan, Zheng" <zyan@redhat.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
|
|
MDS need to rdlock directory inode's filelock when handling readdir
request. Voluntarily dropping CEPH_CAP_AUTH_EXCL avoids a cap revoke
message.
Signed-off-by: "Yan, Zheng" <zyan@redhat.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
|
|
For CEPH_SETATTR_ATIME, MDS needs to xlock filelock, Fsxrw caps
are not allowed for xlocked filelock.
For CEPH_SETATTR_SIZE request that truncates file to smaller size,
MDS needs to xlock filelock, Fsxrw caps are not allowed for xlocked
filelock.
Signed-off-by: "Yan, Zheng" <zyan@redhat.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
|
|
MDS need to xlock inode's linklock when handling link/rename requests.
Voluntarily dropping CEPH_CAP_AUTH_EXCL avoids a cap revoke message.
Signed-off-by: "Yan, Zheng" <zyan@redhat.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
|
|
MDS need to rdlock directory inode's authlock when handling these
requests. Voluntarily dropping CEPH_CAP_AUTH_EXCL avoids a cap revoke
message.
Signed-off-by: "Yan, Zheng" <zyan@redhat.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
|
|
This feature bit restricts older clients from performing certain
maintenance operations against an image (e.g. clone, snap create).
krbd does not perform maintenance operations.
Cc: stable@vger.kernel.org
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Reviewed-by: Jason Dillaman <dillaman@redhat.com>
|
|
We don't stop device before reset owner, this means we could try to
serve any virtqueue kick before reset dev->worker. This will result a
warn since the work was pending at llist during owner resetting. Fix
this by stopping device during owner reset.
Reported-by: syzbot+eb17c6162478cc50632c@syzkaller.appspotmail.com
Fixes: 3a4d5c94e9593 ("vhost_net: a kernel-level virtio server")
Signed-off-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Nicolas Dichtel says:
====================
net: Ease to follow an interface that moves to another netns
The goal of this series is to ease the user to follow an interface that
moves to another netns.
After this series, with a patched iproute2:
$ ip netns
bar
foo
$ ip monitor link &
$ ip link set dummy0 netns foo
Deleted 5: dummy0: <BROADCAST,NOARP> mtu 1500 qdisc noop state DOWN group default
link/ether 6e:a7:82:35:96:46 brd ff:ff:ff:ff:ff:ff new-nsid 0 new-ifindex 6
=> new nsid: 0, new ifindex: 6 (was 5 in the previous netns)
$ ip link set eth1 netns bar
Deleted 3: eth1: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default
link/ether 52:54:01:12:34:57 brd ff:ff:ff:ff:ff:ff new-nsid 1 new-ifindex 3
=> new nsid: 1, new ifindex: 3 (same ifindex)
$ ip netns
bar (id: 1)
foo (id: 0)
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The goal is to let the user follow an interface that moves to another
netns.
CC: Jiri Benc <jbenc@redhat.com>
CC: Christian Brauner <christian.brauner@ubuntu.com>
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Reviewed-by: Jiri Benc <jbenc@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The user should be able to follow any interface that moves to another
netns. There is no reason to hide physical interfaces.
CC: Jiri Benc <jbenc@redhat.com>
CC: Christian Brauner <christian.brauner@ubuntu.com>
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Reviewed-by: Jiri Benc <jbenc@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
to module name
It was found that ethtool provides unexisting module name while
it queries the specified network device for associated driver
information. Then user tries to unload that module by provided
module name and fails.
This happens because ethtool reads value of DRV_NAME macro,
while module name is defined at the driver's Makefile.
This patch is to correct Cavium CN88xx Thunder NIC driver names
(DRV_NAME macro) 'thunder-nicvf' to 'nicvf' and 'thunder-nic'
to 'nicpf', sync bgx and xcv driver names accordingly to their
module names.
Signed-off-by: Vadim Lomovtsev <Vadim.Lomovtsev@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs
Pull init_task initializer cleanups from David Howells:
"It doesn't seem useful to have the init_task in a header file rather
than in a normal source file. We could consolidate init_task handling
instead and expand out various macros.
Here's a series of patches that consolidate init_task handling:
(1) Make THREAD_SIZE available to vmlinux.lds for cris, hexagon and
openrisc.
(2) Alter the INIT_TASK_DATA linker script macro to set
init_thread_union and init_stack rather than defining these in C.
Insert init_task and init_thread_into into the init_stack area in
the linker script as appropriate to the configuration, with
different section markers so that they end up correctly ordered.
We can then get merge ia64's init_task.c into the main one.
We then have a bunch of single-use INIT_*() macros that seem only
to be macros because they used to be used per-arch. We can then
expand these in place of the user and get rid of a few lines and
a lot of backslashes.
(3) Expand INIT_TASK() in place.
(4) Expand in place various small INIT_*() macros that are defined
conditionally. Expand them and surround them by #if[n]def/#endif
in the .c file as it takes fewer lines.
(5) Expand INIT_SIGNALS() and INIT_SIGHAND() in place.
(6) Expand INIT_STRUCT_PID in place.
These macros can then be discarded"
* tag 'init_task-20180117' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs:
Expand INIT_STRUCT_PID and remove
Expand the INIT_SIGNALS and INIT_SIGHAND macros and remove
Expand various INIT_* macros and remove
Expand INIT_TASK() in init/init_task.c and remove
Construct init thread stack in the linker script rather than by union
openrisc: Make THREAD_SIZE available to vmlinux.lds
hexagon: Make THREAD_SIZE available to vmlinux.lds
cris: Make THREAD_SIZE available to vmlinux.lds
|
|
Michael S. Tsirkin says:
====================
ptr_ring fixes
This fixes a bunch of issues around ptr_ring use in net core.
One of these: "tap: fix use-after-free" is also needed on net,
but can't be backported cleanly.
I will post a net patch separately.
Lightly tested - Jason, could you pls confirm this
addresses the security issue you saw with ptr_ring?
Testing reports would be appreciated too.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Tested-by: Jason Wang <jasowang@redhat.com>
Acked-by: Jason Wang <jasowang@redhat.com>
|
|
Offset 128 overlaps the last word of the redzone.
Use 132 which is always beyond that.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
This is to make ptr_ring test build again.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
We don't rely on lockless guarantees, but it
seems cleaner than inverting __ptr_ring_peek.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
In theory compiler could tear queue loads or stores in two. It does not
seem to be happening in practice but it seems easier to convert the
cases where this would be a problem to READ/WRITE_ONCE than worry about
it.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
__skb_array_empty should use __ptr_ring_empty since that's the only
legal lockless function.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
This reverts commit bcecb4bbf88aa03171c30652bca761cf27755a6b.
If we try to allocate an extra entry as the above commit did, and when
the requested size is UINT_MAX, addition overflows causing zero size to
be passed to kmalloc().
kmalloc then returns ZERO_SIZE_PTR with a subsequent crash.
Reported-by: syzbot+87678bcf753b44c39b67@syzkaller.appspotmail.com
Cc: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Similar to bcecb4bbf88a ("net: ptr_ring: otherwise safe empty checks can
overrun array bounds") a lockless use of __ptr_ring_full might
cause an out of bounds access.
We can fix this, but it's easier to just disallow lockless
__ptr_ring_full for now.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Lockless access to __ptr_ring_full is only legal if ring is
never resized, otherwise it might cause use-after free errors.
Simply drop the lockless test, we'll drop the packet
a bit later when produce fails.
Fixes: 362899b8 ("macvtap: switch to use skb array")
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Lockless __ptr_ring_empty requires that consumer head is read and
written at once, atomically. Annotate accordingly to make sure compiler
does it correctly. Switch locked callers to __ptr_ring_peek which does
not support the lockless operation.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The only function safe to call without locks
is __ptr_ring_empty. Move documentation about
lockless use there to make sure people do not
try to use __ptr_ring_peek outside locks.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The comment near __ptr_ring_peek says:
* If ring is never resized, and if the pointer is merely
* tested, there's no need to take the lock - see e.g. __ptr_ring_empty.
but this was in fact never possible since consumer_head would sometimes
point outside the ring. Refactor the code so that it's always
pointing within a ring.
Fixes: c5ad119fb6c09 ("net: sched: pfifo_fast use skb_array")
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Before this patch, if function gfs2_unlink failed to get a valid
transaction (for example, not enough journal blocks) it would go
to label out_end_trans which did gfs2_trans_end. But if the
trans_begin failed, there's no transaction to end, and trying to
do so results in: kernel BUG at fs/gfs2/trans.c:117!
This patch changes the goto so that it does not try to end a
non-existent transaction.
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
|
|
If a sk_v6_rcv_saddr is !IPV6_ADDR_ANY and !IPV6_ADDR_MAPPED, it
implicitly implies it is an ipv6only socket. However, in inet6_bind(),
this addr_type checking and setting sk->sk_ipv6only to 1 are only done
after sk->sk_prot->get_port(sk, snum) has been completed successfully.
This inconsistency between sk_v6_rcv_saddr and sk_ipv6only confuses
the 'get_port()'.
In particular, when binding SO_REUSEPORT UDP sockets,
udp_reuseport_add_sock(sk,...) is called. udp_reuseport_add_sock()
checks "ipv6_only_sock(sk2) == ipv6_only_sock(sk)" before adding sk to
sk2->sk_reuseport_cb. In this case, ipv6_only_sock(sk2) could be
1 while ipv6_only_sock(sk) is still 0 here. The end result is,
reuseport_alloc(sk) is called instead of adding sk to the existing
sk2->sk_reuseport_cb.
It can be reproduced by binding two SO_REUSEPORT UDP sockets on an
IPv6 address (!ANY and !MAPPED). Only one of the socket will
receive packet.
The fix is to set the implicit sk_ipv6only before calling get_port().
The original sk_ipv6only has to be saved such that it can be restored
in case get_port() failed. The situation is similar to the
inet_reset_saddr(sk) after get_port() has failed.
Thanks to Calvin Owens <calvinowens@fb.com> who created an easy
reproduction which leads to a fix.
Fixes: e32ea7e74727 ("soreuseport: fast reuseport UDP socket selection")
Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Christian Brauner says:
====================
rtnetlink: enable IFLA_IF_NETNSID for RTM_{DEL,SET}LINK
Based on the previous discussion this enables passing a IFLA_IF_NETNSID
property along with RTM_SETLINK and RTM_DELLINK requests. The patch for
RTM_NEWLINK will be sent out in a separate patch since there are more
corner-cases to think about.
Changelog 2018-01-24:
* Preserve old behavior and report -ENODEV when either ifindex or ifname is
provided and IFLA_GROUP is set. Spotted by Wolfgang Bumiller.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
- Backwards Compatibility:
If userspace wants to determine whether RTM_DELLINK supports the
IFLA_IF_NETNSID property they should first send an RTM_GETLINK request
with IFLA_IF_NETNSID on lo. If either EACCESS is returned or the reply
does not include IFLA_IF_NETNSID userspace should assume that
IFLA_IF_NETNSID is not supported on this kernel.
If the reply does contain an IFLA_IF_NETNSID property userspace
can send an RTM_DELLINK with a IFLA_IF_NETNSID property. If they receive
EOPNOTSUPP then the kernel does not support the IFLA_IF_NETNSID property
with RTM_DELLINK. Userpace should then fallback to other means.
- Security:
Callers must have CAP_NET_ADMIN in the owning user namespace of the
target network namespace.
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
- Backwards Compatibility:
If userspace wants to determine whether RTM_SETLINK supports the
IFLA_IF_NETNSID property they should first send an RTM_GETLINK request
with IFLA_IF_NETNSID on lo. If either EACCESS is returned or the reply
does not include IFLA_IF_NETNSID userspace should assume that
IFLA_IF_NETNSID is not supported on this kernel.
If the reply does contain an IFLA_IF_NETNSID property userspace
can send an RTM_SETLINK with a IFLA_IF_NETNSID property. If they receive
EOPNOTSUPP then the kernel does not support the IFLA_IF_NETNSID property
with RTM_SETLINK. Userpace should then fallback to other means.
To retain backwards compatibility the kernel will first check whether a
IFLA_NET_NS_PID or IFLA_NET_NS_FD property has been passed. If either
one is found it will be used to identify the target network namespace.
This implies that users who do not care whether their running kernel
supports IFLA_IF_NETNSID with RTM_SETLINK can pass both
IFLA_NET_NS_{FD,PID} and IFLA_IF_NETNSID referring to the same network
namespace.
- Security:
Callers must have CAP_NET_ADMIN in the owning user namespace of the
target network namespace.
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
RTM_{NEW,SET}LINK already allow operations on other network namespaces
by identifying the target network namespace through IFLA_NET_NS_{FD,PID}
properties. This is done by looking for the corresponding properties in
do_setlink(). Extend do_setlink() to also look for the IFLA_IF_NETNSID
property. This introduces no functional changes since all callers of
do_setlink() currently block IFLA_IF_NETNSID by reporting an error before
they reach do_setlink().
This introduces the helpers:
static struct net *rtnl_link_get_net_by_nlattr(struct net *src_net, struct
nlattr *tb[])
static struct net *rtnl_link_get_net_capable(const struct sk_buff *skb,
struct net *src_net,
struct nlattr *tb[], int cap)
to simplify permission checks and target network namespace retrieval for
RTM_* requests that already support IFLA_NET_NS_{FD,PID} but get extended
to IFLA_IF_NETNSID. To perserve backwards compatibility the helpers look
for IFLA_NET_NS_{FD,PID} properties first before checking for
IFLA_IF_NETNSID.
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
But reject reflink + DAX file systems for now until the code to
support reflinks on DAX is actually implemented.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
[darrick: port to 4.16]
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
|
|
xfs_bmap_btalloc is given a range of file offset blocks that must be
allocated to some data/attr/cow fork. If the fork has an extent size
hint associated with it, the request will be enlarged on both ends to
try to satisfy the alignment hint. If free space is fragmentated,
sometimes we can allocate some blocks but not enough to fulfill any of
the requested range. Since bmapi_allocate always trims the new extent
mapping to match the originally requested range, this results in
bmapi_write returning zero and no mapping.
The consequences of this vary -- buffered writes will simply re-call
bmapi_write until it can satisfy at least one block from the original
request. Direct IO overwrites notice nmaps == 0 and return -ENOSPC
through the dio mechanism out to userspace with the weird result that
writes fail even when we have enough space because the ENOSPC return
overrides any partial write status. For direct CoW writes the situation
was disastrous because nobody notices us returning an invalid zero-length
wrong-offset mapping to iomap and the write goes off into space.
Therefore, if free space is so fragmented that we managed to allocate
some space but not enough to map into even a single block of the
original allocation request range, we should break the alignment hint in
order to guarantee at least some forward progress for the direct write.
If we return a short allocation to iomap_apply it'll call back about the
remaining blocks.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
|
|
There's a really bad bug in xfs_reflink_allocate_cow -- if bmapi_write
can return a zero error code but no mappings. This happens if there's
an extent size hint (which causes allocation requests to be rounded to
extsz granularity internally), but there wasn't a big enough chunk of
free space to start filling at the extsz granularity and fill even one
block of the range that we actually requested.
In any case, if we got no mappings we can't possibly do anything useful
with the contents of imap, so we must bail out with ENOSPC here.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
|
|
Don't let the iomap callback get away with feeding us a garbage zero
length mapping -- there was a bug in xfs that resulted in those leaking
out to hilarious effect.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
|
|
Since the CoW fork only exists in memory, it is incorrect to update the
on-disk quota block counts when we modify the CoW fork. Unlike the data
fork, even real extents in the CoW fork are only delalloc-style
reservations (on-disk they're owned by the refcountbt) so they must not
be tracked in the on disk quota info. Ensure the i_delayed_blks
accounting reflects this too.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
|