Age | Commit message (Collapse) | Author |
|
mem memcpy'
To bring in the change made in this cset:
f94909ceb1ed4bfd ("x86: Prepare asm files for straight-line-speculation")
It silences these perf tools build warnings, no change in the tools:
Warning: Kernel ABI header at 'tools/arch/x86/lib/memcpy_64.S' differs from latest version at 'arch/x86/lib/memcpy_64.S'
diff -u tools/arch/x86/lib/memcpy_64.S arch/x86/lib/memcpy_64.S
Warning: Kernel ABI header at 'tools/arch/x86/lib/memset_64.S' differs from latest version at 'arch/x86/lib/memset_64.S'
diff -u tools/arch/x86/lib/memset_64.S arch/x86/lib/memset_64.S
The code generated was checked before and after using 'objdump -d /tmp/build/perf/bench/mem-memcpy-x86-64-asm.o',
no changes.
Cc: Borislav Petkov <bp@suse.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
To pick up fixes and get in line with other trees, powerpc kernel
mostly this time, but BPF as well.
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
We encountered some crashes caused by the race between SMC-R
link access and link clear that triggered by abnormal link
group termination, such as port error.
Here is an example of this kind of crashes:
BUG: kernel NULL pointer dereference, address: 0000000000000000
Workqueue: smc_hs_wq smc_listen_work [smc]
RIP: 0010:smc_llc_flow_initiate+0x44/0x190 [smc]
Call Trace:
<TASK>
? __smc_buf_create+0x75a/0x950 [smc]
smcr_lgr_reg_rmbs+0x2a/0xbf [smc]
smc_listen_work+0xf72/0x1230 [smc]
? process_one_work+0x25c/0x600
process_one_work+0x25c/0x600
worker_thread+0x4f/0x3a0
? process_one_work+0x600/0x600
kthread+0x15d/0x1a0
? set_kthread_struct+0x40/0x40
ret_from_fork+0x1f/0x30
</TASK>
smc_listen_work() __smc_lgr_terminate()
---------------------------------------------------------------
| smc_lgr_free()
| |- smcr_link_clear()
| |- memset(lnk, 0)
smc_listen_rdma_reg() |
|- smcr_lgr_reg_rmbs() |
|- smc_llc_flow_initiate() |
|- access lnk->lgr (panic) |
These crashes are similarly caused by clearing SMC-R link
resources when some functions is still accessing to them.
This patch tries to fix the issue by introducing reference
count of SMC-R links and ensuring that the sensitive resources
of links won't be cleared until reference count reaches zero.
The operation to the SMC-R link reference count can be concluded
as follows:
object [hold or initialized as 1] [put]
--------------------------------------------------------------------
links smcr_link_init() smcr_link_clear()
connections smc_conn_create() smc_conn_free()
Through this way, the clear of SMC-R links is later than the
free of all the smc connections above it, thus avoiding the
unsafe reference to SMC-R links.
Signed-off-by: Wen Gu <guwen@linux.alibaba.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
It is no longer suitable to identify whether a smc connection
is registered in a link group through checking if conn->lgr
is NULL, because conn->lgr won't be reset even the connection
is unregistered from a link group.
So this patch introduces a new helper smc_conn_lgr_valid() and
replaces all the check of conn->lgr in original implementation
with the new helper to judge if conn->lgr is valid to use.
Signed-off-by: Wen Gu <guwen@linux.alibaba.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Both fields can be read/written without synchronization,
add proper accessors and documentation.
Fixes: d5dd88794a13 ("inet: fix various use-after-free in defrags units")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Wen Gu says:
====================
net/smc: Fixes for race in smc link group termination
We encountered some crashes recently and they are caused by the
race between the access and free of link/link group in abnormal
smc link group termination. The crashes can be reproduced in
frequent abnormal link group termination, like setting RNICs up/down.
This set of patches tries to fix this by extending the life cycle
of link/link group to ensure that they won't be referred to after
cleared or freed.
v1 -> v2:
- Improve some comments.
- Move codes of waking up lgrs_deleted wait queue from smc_lgr_free()
to __smc_lgr_free().
- Move codes of waking up links_deleted wait queue from smcr_link_clear()
to __smcr_link_clear().
- Move codes of smc_ibdev_cnt_dec() and put_device() from smcr_link_clear()
to __smcr_link_clear()
- Move smc_lgr_put() to the end of __smcr_link_clear().
- Call smc_lgr_put() after 'out' tag in smcr_link_init() when link
initialization fails.
- Modify the location where smc connection holds the lgr or link.
before:
* hold lgr in smc_lgr_register_conn().
* hold link in smcr_lgr_conn_assign_link().
after:
* hold both lgr and link in smc_conn_create().
Modify the location to symmetrical with the place where smc connections
put the lgr or link, which is smc_conn_free().
- Initialize conn->freed as zero in smc_conn_create().
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
We encountered some crashes caused by the race between the access
and the termination of link groups.
Here are some of panic stacks we met:
1) Race between smc_clc_wait_msg() and __smc_lgr_terminate()
BUG: kernel NULL pointer dereference, address: 00000000000002f0
Workqueue: smc_hs_wq smc_listen_work [smc]
RIP: 0010:smc_clc_wait_msg+0x3eb/0x5c0 [smc]
Call Trace:
<TASK>
? smc_clc_send_accept+0x45/0xa0 [smc]
? smc_clc_send_accept+0x45/0xa0 [smc]
smc_listen_work+0x783/0x1220 [smc]
? finish_task_switch+0xc4/0x2e0
? process_one_work+0x1ad/0x3c0
process_one_work+0x1ad/0x3c0
worker_thread+0x4c/0x390
? rescuer_thread+0x320/0x320
kthread+0x149/0x190
? set_kthread_struct+0x40/0x40
ret_from_fork+0x1f/0x30
</TASK>
smc_listen_work() abnormal case like port error
---------------------------------------------------------------
| __smc_lgr_terminate()
| |- smc_conn_kill()
| |- smc_lgr_unregister_conn()
| |- set conn->lgr = NULL
smc_clc_wait_msg() |
|- access conn->lgr (panic) |
2) Race between smc_setsockopt() and __smc_lgr_terminate()
BUG: kernel NULL pointer dereference, address: 00000000000002e8
RIP: 0010:smc_setsockopt+0x17a/0x280 [smc]
Call Trace:
<TASK>
__sys_setsockopt+0xfc/0x190
__x64_sys_setsockopt+0x20/0x30
do_syscall_64+0x34/0x90
entry_SYSCALL_64_after_hwframe+0x44/0xae
</TASK>
smc_setsockopt() abnormal case like port error
--------------------------------------------------------------
| __smc_lgr_terminate()
| |- smc_conn_kill()
| |- smc_lgr_unregister_conn()
| |- set conn->lgr = NULL
mod_delayed_work() |
|- access conn->lgr (panic) |
There are some other panic places and they are caused by the
similar reason as described above, which is accessing link
group after termination, thus getting a NULL pointer or invalid
resource.
Currently, there seems to be no synchronization between the
link group access and a sudden termination of it. This patch
tries to fix this by introducing reference count of link group
and not freeing link group until reference count is zero.
Link group might be referred to by links or smc connections. So
the operation to the link group reference count can be concluded
as follows:
object [hold or initialized as 1] [put]
-------------------------------------------------------------------
link group smc_lgr_create() smc_lgr_free()
connections smc_conn_create() smc_conn_free()
links smcr_link_init() smcr_link_clear()
Througth this way, we extend the life cycle of link group and
ensure it is longer than the life cycle of connections and links
above it, so that avoid invalid access to link group after its
termination.
Signed-off-by: Wen Gu <guwen@linux.alibaba.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
timeout in settings is used by each case under the same directory, so
it should adapt to the maximum runtime.
A normally running net/fib_nexthops.sh may be killed by this unsuitable
timeout. Furthermore, since the defect[1] of kselftests framework,
net/fib_nexthops.sh which might take at least (300 * 4) seconds would
block the whole kselftests framework previously.
$ git grep -w 'sleep 300' tools/testing/selftests/net
tools/testing/selftests/net/fib_nexthops.sh: sleep 300
tools/testing/selftests/net/fib_nexthops.sh: sleep 300
tools/testing/selftests/net/fib_nexthops.sh: sleep 300
tools/testing/selftests/net/fib_nexthops.sh: sleep 300
Enlarge the timeout by plus 300 based on the obvious largest runtime
to avoid the blocking.
[1]: https://www.spinics.net/lists/kernel/msg4185370.html
Signed-off-by: Zhou Jie <zhoujie2011@fujitsu.com>
Signed-off-by: Li Zhijian <lizhijian@fujitsu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Since commit b39648079db4 ("net: mscc: ocelot: disable flow control on
NPI interface"), flow control should be disabled on the DSA CPU port
when used in NPI mode.
However, the commit blamed in the Fixes: tag below broke this, because
it allowed felix_phylink_mac_link_up() to overwrite SYS_PAUSE_CFG_PAUSE_ENA
for the DSA CPU port.
This issue became noticeable since the device tree update from commit
8fcea7be5736 ("arm64: dts: ls1028a: mark internal links between Felix
and ENETC as capable of flow control").
The solution is to check whether this is the currently configured NPI
port from ocelot_phylink_mac_link_up(), and to not modify the statically
disabled PAUSE frame transmission if it is.
When the port is configured for lossless mode as opposed to tail drop
mode, but the link partner (DSA master) doesn't observe the transmitted
PAUSE frames, the switch termination throughput is much worse, as can be
seen below.
Before:
root@debian:~# iperf3 -c 192.168.100.2
Connecting to host 192.168.100.2, port 5201
[ 5] local 192.168.100.1 port 37504 connected to 192.168.100.2 port 5201
[ ID] Interval Transfer Bitrate Retr Cwnd
[ 5] 0.00-1.00 sec 28.4 MBytes 238 Mbits/sec 357 22.6 KBytes
[ 5] 1.00-2.00 sec 33.6 MBytes 282 Mbits/sec 426 19.8 KBytes
[ 5] 2.00-3.00 sec 34.0 MBytes 285 Mbits/sec 343 21.2 KBytes
[ 5] 3.00-4.00 sec 32.9 MBytes 276 Mbits/sec 354 22.6 KBytes
[ 5] 4.00-5.00 sec 32.3 MBytes 271 Mbits/sec 297 18.4 KBytes
^C[ 5] 5.00-5.06 sec 2.05 MBytes 270 Mbits/sec 45 19.8 KBytes
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval Transfer Bitrate Retr
[ 5] 0.00-5.06 sec 163 MBytes 271 Mbits/sec 1822 sender
[ 5] 0.00-5.06 sec 0.00 Bytes 0.00 bits/sec receiver
After:
root@debian:~# iperf3 -c 192.168.100.2
Connecting to host 192.168.100.2, port 5201
[ 5] local 192.168.100.1 port 49470 connected to 192.168.100.2 port 5201
[ ID] Interval Transfer Bitrate Retr Cwnd
[ 5] 0.00-1.00 sec 112 MBytes 941 Mbits/sec 259 143 KBytes
[ 5] 1.00-2.00 sec 110 MBytes 920 Mbits/sec 329 144 KBytes
[ 5] 2.00-3.00 sec 112 MBytes 936 Mbits/sec 255 144 KBytes
[ 5] 3.00-4.00 sec 110 MBytes 927 Mbits/sec 355 105 KBytes
[ 5] 4.00-5.00 sec 110 MBytes 926 Mbits/sec 350 156 KBytes
[ 5] 5.00-6.00 sec 110 MBytes 925 Mbits/sec 305 148 KBytes
[ 5] 6.00-7.00 sec 110 MBytes 924 Mbits/sec 320 143 KBytes
[ 5] 7.00-8.00 sec 110 MBytes 925 Mbits/sec 273 97.6 KBytes
[ 5] 8.00-9.00 sec 109 MBytes 913 Mbits/sec 299 141 KBytes
[ 5] 9.00-10.00 sec 110 MBytes 922 Mbits/sec 287 146 KBytes
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval Transfer Bitrate Retr
[ 5] 0.00-10.00 sec 1.08 GBytes 926 Mbits/sec 3032 sender
[ 5] 0.00-10.00 sec 1.08 GBytes 925 Mbits/sec receiver
Fixes: de274be32cb2 ("net: dsa: felix: set TX flow control according to the phylink_mac_link_up resolution")
Reported-by: Xiaoliang Yang <xiaoliang.yang_1@nxp.com>
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The pointer skb is redundant, it is assigned a value that is never
read and hence can be removed. Cleans up clang scan warning:
drivers/atm/iphase.c:205:18: warning: Although the value stored
to 'skb' is used in the enclosing expression, the value is never
actually read from 'skb' [deadcode.DeadStores]
Signed-off-by: Colin Ian King <colin.i.king@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The uapi headers are missing the ceph definition. Move it there so
userland apps can ID cephfs.
Signed-off-by: Jeff Layton <jlayton@kernel.org>
Reviewed-by: Ilya Dryomov <idryomov@gmail.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
|
|
The newcaps has already included the Ls, no need to check it again.
Signed-off-by: Xiubo Li <xiubli@redhat.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
|
|
CephFS is a bit unlike most other filesystems in that it only
conditionally does buffered I/O based on the caps that it gets from the
MDS. In most cases, unless there is contended access for an inode the
MDS does give Fbc caps to the client, so the unbuffered codepaths are
only infrequently traveled and are difficult to test.
At one time, the "-o sync" mount option would give you this behavior,
but that was removed in commit 7ab9b3807097 ("ceph: Don't use
ceph-sync-mode for synchronous-fs.").
Add a new mount option to tell the client to ignore Fbc caps when doing
I/O, and to use the synchronous codepaths exclusively, even on
non-O_DIRECT file descriptors. We already have an ioctl that forces this
behavior on a per-file basis, so we can just always set the CEPH_F_SYNC
flag in the file description on such mounts.
Additionally, this patch also changes the client to not request Fbc when
doing direct I/O. We aren't using the cache with O_DIRECT so we don't
have any need for those caps.
Signed-off-by: Jeff Layton <jlayton@kernel.org>
Acked-by: Greg Farnum <gfarnum@redhat.com>
Reviewed-by: Venky Shankar <vshankar@redhat.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
|
|
玮文 胡 reported seeing the WARN_RATELIMIT pop when writing to an
inode that had been transplanted into the stray dir. The client was
trying to look up the quotarealm info from the parent and that tripped
the warning.
Change the ceph_vino_is_reserved helper to not throw a warning for
MDS stray directories (0x100 - 0x1ff), only for reserved dirs that
are not in that range.
Also, fix ceph_has_realms_with_quotas to return false when encountering
a reserved inode.
URL: https://tracker.ceph.com/issues/53180
Reported-by: Hu Weiwen <sehuww@mail.scut.edu.cn>
Signed-off-by: Jeff Layton <jlayton@kernel.org>
Reviewed-by: Luis Henriques <lhenriques@suse.de>
Reviewed-by: Xiubo Li <xiubli@redhat.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
|
|
This pops every second and isn't very useful.
Signed-off-by: Jeff Layton <jlayton@kernel.org>
Reviewed-by: Xiubo Li <xiubli@redhat.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
|
|
Don't populate the const array spaces on the stack but make it static
const and make the pointer an array to remove a dereference. Shrinks
object code a little too. Also clean up intent, currently it is spaces
and should be a tab.
Signed-off-by: Colin Ian King <colin.i.king@googlemail.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
|
|
Problem:
The statfs reports incorrect free/available space for quota less then
CEPH_BLOCK size (4M).
Solution:
For quota less than CEPH_BLOCK size, smaller block size of 4K is used.
But if quota is less than 4K, it is decided to go with binary use/free
of 4K block. For quota size less than 4K size, report the
total=used=4K,free=0 when quota is full and total=free=4K,used=0
otherwise.
Signed-off-by: Kotresh HR <khiremat@redhat.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
|
|
Add read-only module parameters for supported mount syntaxes. Primary
user is the user-space mount helper for catching v2 syntax bugs during
testing by cross verifying if the kernel supports v2 syntax on mount
failure.
Signed-off-by: Venky Shankar <vshankar@redhat.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
|
|
Signed-off-by: Venky Shankar <vshankar@redhat.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
|
|
Note that the new monitors are just shown in /proc/mounts.
Ceph does not (re)connect to new monitors yet.
[ jlayton: s/printk\(KERN_NOTICE/pr_notice(/
s/strcmp/strcmp_null/ ]
Signed-off-by: Venky Shankar <vshankar@redhat.com>
Signed-off-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
|
|
Old mount device syntax (source) has the following problems:
- mounts to the same cluster but with different fsnames
and/or creds have identical device string which can
confuse xfstests.
- Userspace mount helper tool resolves monitor addresses
and fill in mon addrs automatically, but that means the
device shown in /proc/mounts is different than what was
used for mounting.
New device syntax is as follows:
cephuser@fsid.mycephfs2=/path
Note, there is no "monitor address" in the device string.
That gets passed in as mount option. This keeps the device
string same when monitor addresses change (on remounts).
Also note that the userspace mount helper tool is backward
compatible. I.e., the mount helper will fallback to using
old syntax after trying to mount with the new syntax.
[ idryomov: drop CEPH_MON_ADDR_MNTOPT_DELIM ]
Signed-off-by: Venky Shankar <vshankar@redhat.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
|
|
... as it is too generic. also, use __func__ when logging
rather than hardcoding the function name.
Signed-off-by: Venky Shankar <vshankar@redhat.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
|
|
... and remove hardcoded function name in ceph_parse_ips().
[ idryomov: delim parameter, drop CEPH_ADDR_PARSE_DEFAULT_DELIM ]
Signed-off-by: Venky Shankar <vshankar@redhat.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
|
|
The attach callback of struct Qdisc_ops is used by only a few qdiscs:
mq, mqprio and htb. qdisc_graft() contains the following logic
(pseudocode):
if (!qdisc->ops->attach) {
if (ingress)
do ingress stuff;
else
do egress stuff;
}
if (!ingress) {
...
if (qdisc->ops->attach)
qdisc->ops->attach(qdisc);
} else {
...
}
As we see, the attach callback is not called if the qdisc is being
attached to ingress (TC_H_INGRESS). That wasn't a problem for mq and
mqprio, since they contain a check that they are attached to TC_H_ROOT,
and they can't be attached to TC_H_INGRESS anyway.
However, the commit cited below added the attach callback to htb. It is
needed for the hardware offload, but in the non-offload mode it
simulates the "do egress stuff" part of the pseudocode above. The
problem is that when htb is attached to ingress, neither "do ingress
stuff" nor attach() is called. It results in an inconsistency, and the
following message is printed to dmesg:
unregister_netdevice: waiting for lo to become free. Usage count = 2
This commit addresses the issue by running "do ingress stuff" in the
ingress flow even in the attach callback is present, which is fine,
because attach isn't going to be called afterwards.
The bug was found by syzbot and reported by Eric.
Fixes: d03b195b5aa0 ("sch_htb: Hierarchical QoS hardware offload")
Signed-off-by: Maxim Mikityanskiy <maximmi@nvidia.com>
Reported-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Modem from ZTE MF286D is an Qualcomm MDM9250 based 3G/4G modem.
T: Bus=02 Lev=01 Prnt=01 Port=00 Cnt=01 Dev#= 3 Spd=5000 MxCh= 0
D: Ver= 3.00 Cls=00(>ifc ) Sub=00 Prot=00 MxPS= 9 #Cfgs= 1
P: Vendor=19d2 ProdID=1485 Rev=52.87
S: Manufacturer=ZTE,Incorporated
S: Product=ZTE Technologies MSM
S: SerialNumber=MF286DZTED000000
C:* #Ifs= 7 Cfg#= 1 Atr=80 MxPwr=896mA
A: FirstIf#= 0 IfCount= 2 Cls=02(comm.) Sub=06 Prot=00
I:* If#= 0 Alt= 0 #EPs= 1 Cls=02(comm.) Sub=02 Prot=ff Driver=rndis_host
E: Ad=82(I) Atr=03(Int.) MxPS= 8 Ivl=32ms
I:* If#= 1 Alt= 0 #EPs= 2 Cls=0a(data ) Sub=00 Prot=00 Driver=rndis_host
E: Ad=81(I) Atr=02(Bulk) MxPS=1024 Ivl=0ms
E: Ad=01(O) Atr=02(Bulk) MxPS=1024 Ivl=0ms
I:* If#= 2 Alt= 0 #EPs= 2 Cls=ff(vend.) Sub=ff Prot=ff Driver=option
E: Ad=83(I) Atr=02(Bulk) MxPS=1024 Ivl=0ms
E: Ad=02(O) Atr=02(Bulk) MxPS=1024 Ivl=0ms
I:* If#= 3 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=ff Prot=ff Driver=option
E: Ad=85(I) Atr=03(Int.) MxPS= 10 Ivl=32ms
E: Ad=84(I) Atr=02(Bulk) MxPS=1024 Ivl=0ms
E: Ad=03(O) Atr=02(Bulk) MxPS=1024 Ivl=0ms
I:* If#= 4 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=ff Prot=ff Driver=option
E: Ad=87(I) Atr=03(Int.) MxPS= 10 Ivl=32ms
E: Ad=86(I) Atr=02(Bulk) MxPS=1024 Ivl=0ms
E: Ad=04(O) Atr=02(Bulk) MxPS=1024 Ivl=0ms
I:* If#= 5 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=ff Prot=ff Driver=qmi_wwan
E: Ad=88(I) Atr=03(Int.) MxPS= 8 Ivl=32ms
E: Ad=8e(I) Atr=02(Bulk) MxPS=1024 Ivl=0ms
E: Ad=0f(O) Atr=02(Bulk) MxPS=1024 Ivl=0ms
I:* If#= 6 Alt= 0 #EPs= 2 Cls=ff(vend.) Sub=42 Prot=01 Driver=usbfs
E: Ad=05(O) Atr=02(Bulk) MxPS=1024 Ivl=0ms
E: Ad=89(I) Atr=02(Bulk) MxPS=1024 Ivl=0ms
Signed-off-by: Pawel Dembicki <paweldembicki@gmail.com>
Acked-by: Bjørn Mork <bjorn@mork.no>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Entries without dmi_table nor codec_hid field need to be placed after
entries with these two fields or they will be always selected.
Signed-off-by: Brent Lu <brent.lu@intel.com>
Link: https://lore.kernel.org/r/20220113105220.1114694-3-brent.lu@intel.com
Signed-off-by: Takashi Iwai <tiwai@suse.de>
|
|
Add rules to select SOF driver for Jasper Lake systems if digital
microphone is present or the system is a Chromebook.
Signed-off-by: Brent Lu <brent.lu@intel.com>
Link: https://lore.kernel.org/r/20220113105220.1114694-2-brent.lu@intel.com
Signed-off-by: Takashi Iwai <tiwai@suse.de>
|
|
Fix cma_heap_buffer mutex locking critical section to protect vmap_cnt
and vaddr.
Fixes: a5d2d29e24be ("dma-buf: heaps: Move heap-helper logic into the cma_heap implementation")
Signed-off-by: Weizhao Ouyang <o451686892@gmail.com>
Acked-by: John Stultz <john.stultz@linaro.org>
Signed-off-by: Sumit Semwal <sumit.semwal@linaro.org>
Link: https://patchwork.freedesktop.org/patch/msgid/20220104073545.124244-1-o451686892@gmail.com
|
|
Add the visible flag to the toc_stack variable to make it visible for
assembly code and to avoid a sparse warning.
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Helge Deller <deller@gmx.de>
|
|
Clang static analysis reports this problem
dw-i3c-master.c:799:9: warning: The result of the left shift is
undefined because the left operand is negative
COMMAND_PORT_DEV_INDEX(pos) |
^~~~~~~~~~~~~~~~~~~~~~~~~~~
pos can be negative because dw_i3c_master_get_free_pos() can return an
error. So check for an error.
Fixes: 1dd728f5d4d4 ("i3c: master: Add driver for Synopsys DesignWare IP")
Signed-off-by: Tom Rix <trix@redhat.com>
Signed-off-by: Alexandre Belloni <alexandre.belloni@bootlin.com>
Link: https://lore.kernel.org/r/20220108150948.3988790-1-trix@redhat.com
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux
Pull clk updates from Stephen Boyd:
"We have a couple patches in the framework core this time around but
they're mostly minor cleanups and some debugfs stuff. The real work
that's in here is the typical pile of clk driver updates and new SoC
support.
Per usual (or maybe just recent trends), Qualcomm gains a handful of
SoC drivers additions and has the largest diffstat. After that there
are quite a few updates to the Allwinner (sunxi) drivers to support
modular drivers and Renesas is heavily updated to add more support for
various clks.
Overall it looks pretty normal.
New Drivers:
- Add MDMA and BDMA clks to Ingenic JZ4760 and JZ4770
- MediaTek mt7986 SoC basic support
- Clock and reset driver for Toshiba Visconti SoCs
- Initial clock driver for the Exynos7885 SoC (Samsung Galaxy A8)
- Allwinner D1 clks
- Lan966x Generic Clock Controller driver and associated DT bindings
- Qualcomm SDX65, SM8450, and MSM8976 GCC clks
- Qualcomm SDX65 and SM8450 RPMh clks
Updates:
- Set suppress_bind_attrs to true for i.MX8ULP driver
- Switch from do_div to div64_ul for throughout all i.MX drivers
- Fix imx8mn_clko1_sels for i.MX8MN
- Remove unused IPG_AUDIO_ROOT from i.MX8MP
- Switch parent for audio_root_clk to audio ahb in i.MX8MP driver
- Removal of all remaining uses of __clk_lookup() in
drivers/clk/samsung
- Refactoring of the CPU clocks registration to use common interface
- An update of the Exynos850 driver (support for more clock domains)
required by the E850-96 development board
- Prep for runtime PM and generic power domains on Tegra
- Support modular Allwinner clk drivers via platform bus
- Lan966x clock driver extended to support clock gating
- Add serial (SCI1), watchdog (WDT), timer (OSTM), SPI (RSPI), and
thermal (TSU) clocks and resets on Renesas RZ/G2L
- Rework SDHI clock handling in the Renesas R-Car Gen3 and RZ/G2
clock drivers, and in the Renesas SDHI driver
- Make the Cortex-A55 (I) clock on Renesas RZ/G2L programmable
- Document support for the new Renesas R-Car S4-8 (R8A779F0) SoC
- Add support for the new Renesas R-Car S4-8 (R8A779F0) SoC
- Add GPU clock and resets on Renesas RZ/G2L
- Add clk-provider.h to various Qualcomm clk drivers
- devm version of clk_hw_register_gate()
- kerneldoc fixes in a couple drivers"
* tag 'clk-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux: (131 commits)
clk: visconti: Remove pointless NULL check in visconti_pll_add_lookup()
clk: mediatek: add mt7986 clock support
clk: mediatek: add mt7986 clock IDs
dt-bindings: clock: mediatek: document clk bindings for mediatek mt7986 SoC
clk: mediatek: clk-gate: Use regmap_{set/clear}_bits helpers
clk: mediatek: clk-gate: Shrink by adding clockgating bit check helper
clk: x86: Fix clk_gate_flags for RV_CLK_GATE
clk: x86: Use dynamic con_id string during clk registration
ACPI: APD: Add a fmw property clk-name
drivers: acpi: acpi_apd: Remove unused device property "is-rv"
x86: clk: clk-fch: Add support for newer family of AMD's SOC
clk: ingenic: Add MDMA and BDMA clocks
dt-bindings: clk/ingenic: Add MDMA and BDMA clocks
clk: bm1880: remove kfrees on static allocations
clk: Drop unused COMMON_CLK_STM32MP157_SCMI config
clk: st: clkgen-mux: search reg within node or parent
clk: st: clkgen-fsyn: search reg within node or parent
clk: Enable/Disable runtime PM for clk_summary
MAINTAINERS: Add entries for Toshiba Visconti PLL and clock controller
clk: visconti: Add support common clock driver and reset driver
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/pavel/linux-leds
Pull LED updates from Pavel Machek:
"Nothing major is happening here"
* tag 'leds-5.17-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/pavel/linux-leds:
leds: lp55xx: initialise output direction from dts
ARM: dts: omap3-n900: Fix lp5523 for multi color
leds: ktd2692: Drop calling dev_of_node() in ktd2692_parse_dt
leds: lgm-sso: Get rid of duplicate of_node assignment
leds: tca6507: Get rid of duplicate of_node assignment
leds: leds-fsg: Drop FSG3 LED driver
leds: lp50xx: remove unused variable
dt-bindings: leds: Replace moonlight with indicator in mt6360 example
leds: led-core: Update fwnode with device_set_node
leds: tca6507: use swap() to make code cleaner
leds: Add mt6360 driver
dt-bindings: leds: Add bindings for MT6360 LED
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux
Pull devicetree updates from Rob Herring:
"Bindings:
- DT schema conversions for Samsung clocks, RNG bindings, Qcom
Command DB and rmtfs, gpio-restart, i2c-mux-gpio, i2c-mux-pinctl,
Tegra I2C and BPMP, pwm-vibrator, Arm DSU, and Cadence macb
- DT schema conversions for Broadcom platforms: interrupt
controllers, STB GPIO, STB waketimer, STB reset, iProc MDIO mux,
iProc PCIe, Cygnus PCIe PHY, PWM, USB BDC, BCM6328 LEDs, TMON,
SYSTEMPORT, AMAC, Northstar 2 PCIe PHY, GENET, moca PHY, GISB
arbiter, and SATA
- Add binding schemas for Tegra210 EMC table, TI DC-DC converters,
- Clean-ups of MDIO bus schemas to fix 'unevaluatedProperties' issues
- More fixes due to 'unevaluatedProperties' enabling
- Data type fixes and clean-ups of binding examples found in
preparation to move to validating DTB files directly (instead of
intermediate YAML representation.
- Vendor prefixes for T-Head Semiconductor, OnePlus, and Sunplus
- Add various new compatible strings
DT core:
- Silence a warning for overlapping reserved memory regions
- Reimplement unittest overlay tracking
- Fix stack frame size warning in unittest
- Clean-ups of early FDT scanning functions
- Fix handling of "linux,usable-memory-range" on EFI booted systems
- Add support for 'fail' status on CPU nodes
- Improve error message in of_phandle_iterator_next()
- kbuild: Disable duplicate unit-address warnings for disabled nodes"
* tag 'devicetree-for-5.17' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux: (114 commits)
dt-bindings: net: mdio: Drop resets/reset-names child properties
dt-bindings: clock: samsung: convert S5Pv210 to dtschema
dt-bindings: clock: samsung: convert Exynos5410 to dtschema
dt-bindings: clock: samsung: convert Exynos5260 to dtschema
dt-bindings: clock: samsung: extend Exynos7 bindings with UFS
dt-bindings: clock: samsung: convert Exynos7 to dtschema
dt-bindings: clock: samsung: convert Exynos5433 to dtschema
dt-bindings: i2c: maxim,max96712: Add bindings for Maxim Integrated MAX96712
dt-bindings: iio: adi,ltc2983: Fix 64-bit property sizes
dt-bindings: power: maxim,max17040: Fix incorrect type for 'maxim,rcomp'
dt-bindings: interrupt-controller: arm,gic-v3: Fix 'interrupts' cell size in example
dt-bindings: iio/magnetometer: yamaha,yas530: Fix invalid 'interrupts' in example
dt-bindings: clock: imx5: Drop clock consumer node from example
dt-bindings: Drop required 'interrupt-parent'
dt-bindings: net: ti,dp83869: Drop value on boolean 'ti,max-output-impedance'
dt-bindings: net: wireless: mt76: Fix 8-bit property sizes
dt-bindings: PCI: snps,dw-pcie-ep: Drop conflicting 'max-functions' schema
dt-bindings: i2c: st,stm32-i2c: Make each example a separate entry
dt-bindings: net: stm32-dwmac: Make each example a separate entry
dt-bindings: net: Cleanup MDIO node schemas
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip
Pull xen updates from Juergen Gross:
- a fix for the Xen gntdev driver
- a fix for running as Xen dom0 booted via EFI and the EFI framebuffer
being located above 4GB
- a series for support of mapping other guest's memory by using zone
device when running as Xen guest on Arm
* tag 'for-linus-5.17-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
dt-bindings: xen: Clarify "reg" purpose
arm/xen: Read extended regions from DT and init Xen resource
xen/unpopulated-alloc: Add mechanism to use Xen resource
xen/balloon: Bring alloc(free)_xenballooned_pages helpers back
arm/xen: Switch to use gnttab_setup_auto_xlat_frames() for DT
xen/unpopulated-alloc: Drop check for virt_addr_valid() in fill_list()
xen/x86: obtain upper 32 bits of video frame buffer address for Dom0
xen/gntdev: fix unmap notification order
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 core updates from Borislav Petkov:
- Get rid of all the .fixup sections because this generates
misleading/wrong stacktraces and confuse RELIABLE_STACKTRACE and
LIVEPATCH as the backtrace misses the function which is being fixed
up.
- Add Straight Line Speculation mitigation support which uses a new
compiler switch -mharden-sls= which sticks an INT3 after a RET or an
indirect branch in order to block speculation after them. Reportedly,
CPUs do speculate behind such insns.
- The usual set of cleanups and improvements
* tag 'x86_core_for_v5.17_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (32 commits)
x86/entry_32: Fix segment exceptions
objtool: Remove .fixup handling
x86: Remove .fixup section
x86/word-at-a-time: Remove .fixup usage
x86/usercopy: Remove .fixup usage
x86/usercopy_32: Simplify __copy_user_intel_nocache()
x86/sgx: Remove .fixup usage
x86/checksum_32: Remove .fixup usage
x86/vmx: Remove .fixup usage
x86/kvm: Remove .fixup usage
x86/segment: Remove .fixup usage
x86/fpu: Remove .fixup usage
x86/xen: Remove .fixup usage
x86/uaccess: Remove .fixup usage
x86/futex: Remove .fixup usage
x86/msr: Remove .fixup usage
x86/extable: Extend extable functionality
x86/entry_32: Remove .fixup usage
x86/entry_64: Remove .fixup usage
x86/copy_mc_64: Remove .fixup usage
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf updates from Borislav Petkov:
"Cleanup of the perf/kvm interaction."
* tag 'perf_core_for_v5.17_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
perf: Drop guest callback (un)register stubs
KVM: arm64: Drop perf.c and fold its tiny bits of code into arm.c
KVM: arm64: Hide kvm_arm_pmu_available behind CONFIG_HW_PERF_EVENTS=y
KVM: arm64: Convert to the generic perf callbacks
KVM: x86: Move Intel Processor Trace interrupt handler to vmx.c
KVM: Move x86's perf guest info callbacks to generic KVM
KVM: x86: More precisely identify NMI from guest when handling PMI
KVM: x86: Drop current_vcpu for kvm_running_vcpu + kvm_arch_vcpu variable
perf/core: Use static_call to optimize perf_guest_info_callbacks
perf: Force architectures to opt-in to guest callbacks
perf: Add wrappers for invoking guest callbacks
perf/core: Rework guest callbacks to prepare for static_call support
perf: Drop dead and useless guest "support" from arm, csky, nds32 and riscv
perf: Stop pretending that perf can handle multiple guest callbacks
KVM: x86: Register Processor Trace interrupt hook iff PT enabled in guest
KVM: x86: Register perf callbacks after calling vendor's hardware_setup()
perf: Protect perf_guest_cbs with RCU
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu
Pull iommu updates from Joerg Roedel:
- Identity domain support for virtio-iommu
- Move flush queue code into iommu-dma
- Some fixes for AMD IOMMU suspend/resume support when x2apic is used
- Arm SMMU Updates from Will Deacon:
- Revert evtq and priq back to their former sizes
- Return early on short-descriptor page-table allocation failure
- Fix page fault reporting for Adreno GPU on SMMUv2
- Make SMMUv3 MMU notifier ops 'const'
- Numerous new compatible strings for Qualcomm SMMUv2 implementations
- Various smaller fixes and cleanups
* tag 'iommu-updates-v5.17' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu: (38 commits)
iommu/iova: Temporarily include dma-mapping.h from iova.h
iommu: Move flush queue data into iommu_dma_cookie
iommu/iova: Move flush queue code to iommu-dma
iommu/iova: Consolidate flush queue code
iommu/vt-d: Use put_pages_list
iommu/amd: Use put_pages_list
iommu/amd: Simplify pagetable freeing
iommu/iova: Squash flush_cb abstraction
iommu/iova: Squash entry_dtor abstraction
iommu/iova: Fix race between FQ timeout and teardown
iommu/amd: Fix typo in *glues … together* in comment
iommu/vt-d: Remove unused dma_to_mm_pfn function
iommu/vt-d: Drop duplicate check in dma_pte_free_pagetable()
iommu/vt-d: Use bitmap_zalloc() when applicable
iommu/amd: Remove useless irq affinity notifier
iommu/amd: X2apic mode: mask/unmask interrupts on suspend/resume
iommu/amd: X2apic mode: setup the INTX registers on mask/unmask
iommu/amd: X2apic mode: re-enable after resume
iommu/amd: Restore GA log/tail pointer on host resume
iommu/iova: Move fast alloc size roundup into alloc_iova_fast()
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl
Pull CXL (Compute Express Link) updates from Dan Williams:
"The highlight is initial support for CXL memory hotplug. The static
NUMA node (ACPI SRAT Physical Address to Proximity Domain) information
known to platform firmware is extended to support the potential
performance-class / memory-target nodes dynamically created from
available CXL memory device capacity.
New unit test infrastructure is added for validating health
information payloads.
Fixes to module reload stress and stack usage from exposure in -next
are included. A symbol rename and some other miscellaneous fixups are
included as well.
Summary:
- Rework ACPI sub-table infrastructure to optionally be used outside
of __init scenarios and use it for CEDT.CFMWS sub-table parsing.
- Add support for extending num_possible_nodes by the potential
hotplug CXL memory ranges
- Extend tools/testing/cxl with mock memory device health information
- Fix a module-reload workqueue race
- Fix excessive stack-frame usage
- Rename the driver context data structure from "cxl_mem" since that
name collides with a proposed driver name
- Use EXPORT_SYMBOL_NS_GPL instead of -DDEFAULT_SYMBOL_NAMESPACE at
build time"
* tag 'cxl-for-5.17' of git://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl:
cxl/core: Remove cxld_const_init in cxl_decoder_alloc()
cxl/pmem: Fix module reload vs workqueue state
ACPI: NUMA: Add a node and memblk for each CFMWS not in SRAT
cxl/test: Mock acpi_table_parse_cedt()
cxl/acpi: Convert CFMWS parsing to ACPI sub-table helpers
ACPI: Add a context argument for table parsing handlers
ACPI: Teach ACPI table parsing about the CEDT header format
ACPI: Keep sub-table parsing infrastructure available for modules
tools/testing/cxl: add mock output for the GET_HEALTH_INFO command
cxl/memdev: Remove unused cxlmd field
cxl/core: Convert to EXPORT_SYMBOL_NS_GPL
cxl/memdev: Change cxl_mem to a more descriptive name
cxl/mbox: Remove bad comment
cxl/pmem: Fix reference counting for delayed work
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm
Pull dax and libnvdimm updates from Dan Williams:
"The bulk of this is a rework of the dax_operations API after
discovering the obstacles it posed to the work-in-progress DAX+reflink
support for XFS and other copy-on-write filesystem mechanics.
Primarily the need to plumb a block_device through the API to handle
partition offsets was a sticking point and Christoph untangled that
dependency in addition to other cleanups to make landing the
DAX+reflink support easier.
The DAX_PMEM_COMPAT option has been around for 4 years and not only
are distributions shipping userspace that understand the current
configuration API, but some are not even bothering to turn this option
on anymore, so it seems a good time to remove it per the deprecation
schedule. Recall that this was added after the device-dax subsystem
moved from /sys/class/dax to /sys/bus/dax for its sysfs organization.
All recent functionality depends on /sys/bus/dax.
Some other miscellaneous cleanups and reflink prep patches are
included as well.
Summary:
- Simplify the dax_operations API:
- Eliminate bdev_dax_pgoff() in favor of the filesystem
maintaining and applying a partition offset to all its DAX iomap
operations.
- Remove wrappers and device-mapper stacked callbacks for
->copy_from_iter() and ->copy_to_iter() in favor of moving
block_device relative offset responsibility to the
dax_direct_access() caller.
- Remove the need for an @bdev in filesystem-DAX infrastructure
- Remove unused uio helpers copy_from_iter_flushcache() and
copy_mc_to_iter() as only the non-check_copy_size() versions are
used for DAX.
- Prepare XFS for the pending (next merge window) DAX+reflink support
- Remove deprecated DEV_DAX_PMEM_COMPAT support
- Cleanup a straggling misuse of the GUID api"
* tag 'libnvdimm-for-5.17' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm: (38 commits)
iomap: Fix error handling in iomap_zero_iter()
ACPI: NFIT: Import GUID before use
dax: remove the copy_from_iter and copy_to_iter methods
dax: remove the DAXDEV_F_SYNC flag
dax: simplify dax_synchronous and set_dax_synchronous
uio: remove copy_from_iter_flushcache() and copy_mc_to_iter()
iomap: turn the byte variable in iomap_zero_iter into a ssize_t
memremap: remove support for external pgmap refcounts
fsdax: don't require CONFIG_BLOCK
iomap: build the block based code conditionally
dax: fix up some of the block device related ifdefs
fsdax: shift partition offset handling into the file systems
dax: return the partition offset from fs_dax_get_by_bdev
iomap: add a IOMAP_DAX flag
xfs: pass the mapping flags to xfs_bmbt_to_iomap
xfs: use xfs_direct_write_iomap_ops for DAX zeroing
xfs: move dax device handling into xfs_{alloc,free}_buftarg
ext4: cleanup the dax handling in ext4_fill_super
ext2: cleanup the dax handling in ext2_fill_super
fsdax: decouple zeroing from the iomap buffered I/O code
...
|
|
While experimenting with FOU encapsulation Amir noticed that encapsulated IPv6
traffic fails to be delivered, if the peer IP address is configured locally.
It can be easily verified by creating a sit interface like below:
$ sudo ip link add name fou_test type sit remote 127.0.0.1 encap fou encap-sport auto encap-dport 1111
$ sudo ip link set fou_test up
and sending some IPv4 and IPv6 traffic to it
$ ping -I fou_test -c 1 1.1.1.1
$ ping6 -I fou_test -c 1 fe80::d0b0:dfff:fe4c:fcbc
"tcpdump -i any udp dst port 1111" will confirm that only the first IPv4 ping
was encapsulated and attempted to be delivered.
This seems like a limitation: for example, in a cloud environment the "peer"
service may be arbitrarily scheduled on any server within the cluster, where all
nodes are trying to send encapsulated traffic. And the unlucky node will not be
able to. Moreover, delivering encapsulated IPv4 traffic locally is allowed.
But I may not have all the context about this restriction and this code predates
the observable git history.
Reported-by: Amir Razmjou <arazmjou@cloudflare.com>
Signed-off-by: Ignat Korchagin <ignat@cloudflare.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Link: https://lore.kernel.org/r/20220107123842.211335-1-ignat@cloudflare.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Since all MIPS-specific code has been removed from driver, allow it to be
enabled for COMPILE_TEST on all architectures.
Mark it as tristate and remove MIPS the MIPS dependency.
Signed-off-by: Sergio Paracuellos <sergio.paracuellos@gmail.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
|
|
The MT7621 PCIe host controller driver can be built as a module, but it
lacks a MODULE_LICENSE(), which causes a build error:
ERROR: modpost: missing MODULE_LICENSE() in drivers/pci/controller/pcie-mt7621.o
Add MODULE_LICENSE() to the driver.
Fixes: 2bdd5238e756 ("PCI: mt7621: Add MediaTek MT7621 PCIe host controller driver")
Link: https://lore.kernel.org/r/20211207104924.21327-5-sergio.paracuellos@gmail.com
Signed-off-by: Yanteng Si <siyanteng@loongson.cn>
Signed-off-by: Sergio Paracuellos <sergio.paracuellos@gmail.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Krzysztof Wilczyński <kw@linux.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs
Pull fscache rewrite from David Howells:
"This is a set of patches that rewrites the fscache driver and the
cachefiles driver, significantly simplifying the code compared to
what's upstream, removing the complex operation scheduling and object
state machine in favour of something much smaller and simpler.
The series is structured such that the first few patches disable
fscache use by the network filesystems using it, remove the cachefiles
driver entirely and as much of the fscache driver as can be got away
with without causing build failures in the network filesystems.
The patches after that recreate fscache and then cachefiles,
attempting to add the pieces in a logical order. Finally, the
filesystems are reenabled and then the very last patch changes the
documentation.
[!] Note: I have dropped the cifs patch for the moment, leaving local
caching in cifs disabled. I've been having trouble getting that
working. I think I have it done, but it needs more testing (there
seem to be some test failures occurring with v5.16 also from
xfstests), so I propose deferring that patch to the end of the
merge window.
WHY REWRITE?
============
Fscache's operation scheduling API was intended to handle sequencing
of cache operations, which were all required (where possible) to run
asynchronously in parallel with the operations being done by the
network filesystem, whilst allowing the cache to be brought online and
offline and to interrupt service for invalidation.
With the advent of the tmpfile capacity in the VFS, however, an
opportunity arises to do invalidation much more simply, without having
to wait for I/O that's actually in progress: Cachefiles can simply
create a tmpfile, cut over the file pointer for the backing object
attached to a cookie and abandon the in-progress I/O, dismissing it
upon completion.
Future work here would involve using Omar Sandoval's vfs_link() with
AT_LINK_REPLACE[1] to allow an extant file to be displaced by a new
hard link from a tmpfile as currently I have to unlink the old file
first.
These patches can also simplify the object state handling as I/O
operations to the cache don't all have to be brought to a stop in
order to invalidate a file. To that end, and with an eye on to writing
a new backing cache model in the future, I've taken the opportunity to
simplify the indexing structure.
I've separated the index cookie concept from the file cookie concept
by C type now. The former is now called a "volume cookie" (struct
fscache_volume) and there is a container of file cookies. There are
then just the two levels. All the index cookie levels are collapsed
into a single volume cookie, and this has a single printable string as
a key. For instance, an AFS volume would have a key of something like
"afs,example.com,1000555", combining the filesystem name, cell name
and volume ID. This is freeform, but must not have '/' chars in it.
I've also eliminated all pointers back from fscache into the network
filesystem. This required the duplication of a little bit of data in
the cookie (cookie key, coherency data and file size), but it's not
actually that much. This gets rid of problems with making sure we keep
netfs data structures around so that the cache can access them.
These patches mean that most of the code that was in the drivers
before is simply gone and those drivers are now almost entirely new
code. That being the case, there doesn't seem any particular reason to
try and maintain bisectability across it. Further, there has to be a
point in the middle where things are cut over as there's a single
point everything has to go through (ie. /dev/cachefiles) and it can't
be in use by two drivers at once.
ISSUES YET OUTSTANDING
======================
There are some issues still outstanding, unaddressed by this patchset,
that will need fixing in future patchsets, but that don't stop this
series from being usable:
(1) The cachefiles driver needs to stop using the backing filesystem's
metadata to store information about what parts of the cache are
populated. This is not reliable with modern extent-based
filesystems.
Fixing this is deferred to a separate patchset as it involves
negotiation with the network filesystem and the VM as to how much
data to download to fulfil a read - which brings me on to (2)...
(2) NFS (and CIFS with the dropped patch) do not take account of how
the cache would like I/O to be structured to meet its granularity
requirements. Previously, the cache used page granularity, which
was fine as the network filesystems also dealt in page
granularity, and the backing filesystem (ext4, xfs or whatever)
did whatever it did out of sight. However, we now have folios to
deal with and the cache will now have to store its own metadata to
track its contents.
The change I'm looking at making for cachefiles is to store
content bitmaps in one or more xattrs and making a bit in the map
correspond to something like a 256KiB block. However, the size of
an xattr and the fact that they have to be read/updated in one go
means that I'm looking at covering 1GiB of data per 512-byte map
and storing each map in an xattr. Cachefiles has the potential to
grow into a fully fledged filesystem of its very own if I'm not
careful.
However, I'm also looking at changing things even more radically
and going to a different model of how the cache is arranged and
managed - one that's more akin to the way, say, openafs does
things - which brings me on to (3)...
(3) The way cachefilesd does culling is very inefficient for large
caches and it would be better to move it into the kernel if I can
as cachefilesd has to keep asking the kernel if it can cull a
file. Changing the way the backend works would allow this to be
addressed.
BITS THAT MAY BE CONTROVERSIAL
==============================
There are some bits I've added that may be controversial:
(1) I've provided a flag, S_KERNEL_FILE, that cachefiles uses to check
if a files is already being used by some other kernel service
(e.g. a duplicate cachefiles cache in the same directory) and
reject it if it is. This isn't entirely necessary, but it helps
prevent accidental data corruption.
I don't want to use S_SWAPFILE as that has other effects, but
quite possibly swapon() should set S_KERNEL_FILE too.
Note that it doesn't prevent userspace from interfering, though
perhaps it should. (I have made it prevent a marked directory from
being rmdir-able).
(2) Cachefiles wants to keep the backing file for a cookie open whilst
we might need to write to it from network filesystem writeback.
The problem is that the network filesystem unuses its cookie when
its file is closed, and so we have nothing pinning the cachefiles
file open and it will get closed automatically after a short time
to avoid EMFILE/ENFILE problems.
Reopening the cache file, however, is a problem if this is being
done due to writeback triggered by exit(). Some filesystems will
oops if we try to open a file in that context because they want to
access current->fs or suchlike.
To get around this, I added the following:
(A) An inode flag, I_PINNING_FSCACHE_WB, to be set on a network
filesystem inode to indicate that we have a usage count on the
cookie caching that inode.
(B) A flag in struct writeback_control, unpinned_fscache_wb, that
is set when __writeback_single_inode() clears the last dirty
page from i_pages - at which point it clears
I_PINNING_FSCACHE_WB and sets this flag.
This has to be done here so that clearing I_PINNING_FSCACHE_WB
can be done atomically with the check of PAGECACHE_TAG_DIRTY
that clears I_DIRTY_PAGES.
(C) A function, fscache_set_page_dirty(), which if it is not set,
sets I_PINNING_FSCACHE_WB and calls fscache_use_cookie() to
pin the cache resources.
(D) A function, fscache_unpin_writeback(), to be called by
->write_inode() to unuse the cookie.
(E) A function, fscache_clear_inode_writeback(), to be called when
the inode is evicted, before clear_inode() is called. This
cleans up any lingering I_PINNING_FSCACHE_WB.
The network filesystem can then use these tools to make sure that
fscache_write_to_cache() can write locally modified data to the
cache as well as to the server.
For the future, I'm working on write helpers for netfs lib that
should allow this facility to be removed by keeping track of the
dirty regions separately - but that's incomplete at the moment and
is also going to be affected by folios, one way or another, since
it deals with pages"
Link: https://lore.kernel.org/all/510611.1641942444@warthog.procyon.org.uk/
Tested-by: Dominique Martinet <asmadeus@codewreck.org> # 9p
Tested-by: kafs-testing@auristor.com # afs
Tested-by: Jeff Layton <jlayton@kernel.org> # ceph
Tested-by: Dave Wysochanski <dwysocha@redhat.com> # nfs
Tested-by: Daire Byrne <daire@dneg.com> # nfs
* tag 'fscache-rewrite-20220111' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs: (67 commits)
9p, afs, ceph, nfs: Use current_is_kswapd() rather than gfpflags_allow_blocking()
fscache: Add a tracepoint for cookie use/unuse
fscache: Rewrite documentation
ceph: add fscache writeback support
ceph: conversion to new fscache API
nfs: Implement cache I/O by accessing the cache directly
nfs: Convert to new fscache volume/cookie API
9p: Copy local writes to the cache when writing to the server
9p: Use fscache indexing rewrite and reenable caching
afs: Skip truncation on the server of data we haven't written yet
afs: Copy local writes to the cache when writing to the server
afs: Convert afs to use the new fscache API
fscache, cachefiles: Display stat of culling events
fscache, cachefiles: Display stats of no-space events
cachefiles: Allow cachefiles to actually function
fscache, cachefiles: Store the volume coherency data
cachefiles: Implement the I/O routines
cachefiles: Implement cookie resize for truncate
cachefiles: Implement begin and end I/O operation
cachefiles: Implement backing file wrangling
...
|
|
On the MIPS ralink mt7621 platform, we need to set up I/O coherency units
based on the host bridge apertures.
To remove this arch dependency from the driver itself, move the coherency
setup from the driver to pcibios_root_bridge_prepare().
[bhelgaas: squash add/remove into one patch, commit log]
Link: https://lore.kernel.org/r/20211207104924.21327-3-sergio.paracuellos@gmail.com
Link: https://lore.kernel.org/r/20211207104924.21327-4-sergio.paracuellos@gmail.com
Signed-off-by: Sergio Paracuellos <sergio.paracuellos@gmail.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Guenter Roeck <linux@roeck-us.net> # arch/mips
Acked-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de> # arch/mips
|
|
When pci_register_host_bridge() is called, bridge->windows are already
available. However these windows are being moved temporarily from there.
To let pcibios_root_bridge_prepare() have access to these windows, move the
windows movement after calling this function. This is useful for the MIPS
ralink mt7621 platform so it can set up I/O coherence units and avoid
custom MIPS code in the mt7621 PCIe controller driver.
Link: https://lore.kernel.org/r/20211207104924.21327-2-sergio.paracuellos@gmail.com
Signed-off-by: Sergio Paracuellos <sergio.paracuellos@gmail.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Acked-by: Arnd Bergmann <arnd@arndb.de>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse
Pull fuse updates from Miklos Szeredi:
- Fix a regression introduced in 5.15
- Extend the size of the FUSE_INIT request to accommodate for more
flags. There's a slight possibility of a regression for obscure fuse
servers; if this happens, then more complexity will need to be added
to the protocol
- Allow the DAX property to be controlled by the server on a per-inode
basis in virtiofs
- Allow sending security context to the server when creating a file or
directory
* tag 'fuse-update-5.17' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse:
Documentation/filesystem/dax: DAX on virtiofs
fuse: mark inode DONT_CACHE when per inode DAX hint changes
fuse: negotiate per inode DAX in FUSE_INIT
fuse: enable per inode DAX
fuse: support per inode DAX in fuse protocol
fuse: make DAX mount option a tri-state
fuse: add fuse_should_enable_dax() helper
fuse: Pass correct lend value to filemap_write_and_wait_range()
fuse: send security context of inode on file
fuse: extend init flags
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs
Pull UDF / reiserfs updates from Jan Kara:
"One UDF fix and one reiserfs cleanup"
* tag 'fs_for_v5.17-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs:
udf: Fix error handling in udf_new_inode()
reiserfs: don't use congestion_wait()
|
|
Sparse complains about mt7621_pci_ops symbol is not declared and asks if
it should be declared as static instead. Sparse is right. Hence declare
symbol as static.
Link: https://lore.kernel.org/r/20211117152952.12271-1-sergio.paracuellos@gmail.com
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Sergio Paracuellos <sergio.paracuellos@gmail.com>
Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Krzysztof Wilczyński <kw@linux.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs
Pull fanotify updates from Jan Kara:
"Support for new FAN_RENAME fanotify event and support for reporting
child info in directory fanotify events (FAN_REPORT_TARGET_FID)"
* tag 'fsnotify_for_v5.17-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs:
fanotify: wire up FAN_RENAME event
fanotify: report old and/or new parent+name in FAN_RENAME event
fanotify: record either old name new name or both for FAN_RENAME
fanotify: record old and new parent and name in FAN_RENAME event
fanotify: support secondary dir fh and name in fanotify_info
fanotify: use helpers to parcel fanotify_info buffer
fanotify: use macros to get the offset to fanotify_info buffer
fsnotify: generate FS_RENAME event with rich information
fanotify: introduce group flag FAN_REPORT_TARGET_FID
fsnotify: separate mark iterator type from object type enum
fsnotify: clarify object type argument
|
|
Pull iomap updates from Matthew Wilcox:
"Convert xfs/iomap to use folios.
This should be all that is needed for XFS to use large folios. There
is no code in this pull request to create large folios, but no
additional changes should be needed to XFS or iomap once they are
created.
Usually this would have come from Darrick, and we had intended that it
would come that route. Between the holidays and various things which
Darrick needed to work on, he asked if I could send things directly.
There weren't any other iomap patches pending for this release, which
probably also played a role"
* tag 'iomap-5.17' of git://git.infradead.org/users/willy/linux: (26 commits)
iomap: Inline __iomap_zero_iter into its caller
xfs: Support large folios
iomap: Support large folios in invalidatepage
iomap: Convert iomap_migrate_page() to use folios
iomap: Convert iomap_add_to_ioend() to take a folio
iomap: Simplify iomap_do_writepage()
iomap: Simplify iomap_writepage_map()
iomap,xfs: Convert ->discard_page to ->discard_folio
iomap: Convert iomap_write_end_inline to take a folio
iomap: Convert iomap_write_begin() and iomap_write_end() to folios
iomap: Convert __iomap_zero_iter to use a folio
iomap: Allow iomap_write_begin() to be called with the full length
iomap: Convert iomap_page_mkwrite to use a folio
iomap: Convert readahead and readpage to use a folio
iomap: Convert iomap_read_inline_data to take a folio
iomap: Use folio offsets instead of page offsets
iomap: Convert bio completions to use folios
iomap: Pass the iomap_page into iomap_set_range_uptodate
iomap: Add iomap_invalidate_folio
iomap: Convert iomap_releasepage to use a folio
...
|