Age | Commit message (Collapse) | Author |
|
For the general and state management nfs_client under each mount, create
symlinks to their respective rpc_client sysfs entries.
Signed-off-by: Benjamin Coddington <bcodding@redhat.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
|
|
Create a sysfs directory for each mount that corresponds to the mount's
nfs_server struct. As the mount is being constructed, use the name
"server-n", but rename it to the "MAJOR:MINOR" of the mount after assigning
a device_id. The rename approach allows us to populate the mount's directory
with links to the various rpc_client objects during the mount's
construction. The naming convention (MAJOR:MINOR) can be used to reference
a particular NFS mount's sysfs tree.
Signed-off-by: Benjamin Coddington <bcodding@redhat.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
|
|
Expand the NFS network-namespaced sysfs from /sys/fs/nfs/net down one level
into /sys/fs/nfs by moving the "net" kobject onto struct
nfs_netns_client and setting it up during network namespace init.
This prepares the way for superblock kobjects within /sys/fs/nfs that will
only be visible to matching network namespaces.
Signed-off-by: Benjamin Coddington <bcodding@redhat.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
|
|
When booting with "intremap=off" and "x2apic_phys" on the kernel command
line, the physical x2APIC driver ends up being used even when x2APIC
mode is disabled ("intremap=off" disables x2APIC mode). This happens
because the first compound condition check in x2apic_phys_probe() is
false due to x2apic_mode == 0 and so the following one returns true
after default_acpi_madt_oem_check() having already selected the physical
x2APIC driver.
This results in the following panic:
kernel BUG at arch/x86/kernel/apic/io_apic.c:2409!
invalid opcode: 0000 [#1] PREEMPT SMP NOPTI
CPU: 0 PID: 0 Comm: swapper/0 Not tainted 6.4.0-rc2-ver4.1rc2 #2
Hardware name: Dell Inc. PowerEdge R6515/07PXPY, BIOS 2.3.6 07/06/2021
RIP: 0010:setup_IO_APIC+0x9c/0xaf0
Call Trace:
<TASK>
? native_read_msr
apic_intr_mode_init
x86_late_time_init
start_kernel
x86_64_start_reservations
x86_64_start_kernel
secondary_startup_64_no_verify
</TASK>
which is:
setup_IO_APIC:
apic_printk(APIC_VERBOSE, "ENABLING IO-APIC IRQs\n");
for_each_ioapic(ioapic)
BUG_ON(mp_irqdomain_create(ioapic));
Return 0 to denote that x2APIC has not been enabled when probing the
physical x2APIC driver.
[ bp: Massage commit message heavily. ]
Fixes: 9ebd680bd029 ("x86, apic: Use probe routines to simplify apic selection")
Signed-off-by: Dheeraj Kumar Srivastava <dheerajkumar.srivastava@amd.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Kishon Vijay Abraham I <kvijayab@amd.com>
Reviewed-by: Vasant Hegde <vasant.hegde@amd.com>
Reviewed-by: Cyrill Gorcunov <gorcunov@gmail.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20230616212236.1389-1-dheerajkumar.srivastava@amd.com
|
|
In preparation to make objects below /sys/fs/nfs namespace aware, we need
to define our own kobj_type for the nfs kset so that we can add the
.child_ns_type member in a following patch. No functional change here, only
the unrolling of kset_create_and_add().
Signed-off-by: Benjamin Coddington <bcodding@redhat.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
|
|
Match the variable names to the sysfs structure.
Signed-off-by: Benjamin Coddington <bcodding@redhat.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
|
|
Be brief and match the subsystem name. There's no need to distinguish this
kset variable from the server.
Signed-off-by: Benjamin Coddington <bcodding@redhat.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
|
|
Kuniyuki Iwashima says:
====================
ipv6: Random cleanup for Extension Header.
This series (1) cleans up pskb_may_pull() in some functions, where needed
data are already pulled by their caller, (2) removes redundant multicast
test, and (3) optimises reload timing of the header.
====================
Link: https://lore.kernel.org/r/20230614230107.22301-1-kuniyu@amazon.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
ipv6_destopt_rcv() and ipv6_parse_hopopts() pulls these data
- Hop-by-Hop/Destination Options Header : 8
- Hdr Ext Len : skb_transport_header(skb)[1] << 3
and calls ip6_parse_tlv(), so it need not check if skb_headlen() is less
than skb_transport_offset(skb) + (skb_transport_header(skb)[1] << 3).
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
We need not reload hdr in ipv6_srh_rcv() unless we call
pskb_expand_head().
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
ipv6_rthdr_rcv() pulls these data
- Segment Routing Header : 8
- Hdr Ext Len : skb_transport_header(skb)[1] << 3
needed by ipv6_srh_rcv(), so pskb_pull() in ipv6_srh_rcv() never
fails and can be replaced with skb_pull().
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
ipv6_rpl_srh_rcv() checks if ipv6_hdr(skb)->daddr or ohdr->rpl_segaddr[i]
is the multicast address with ipv6_addr_type().
We have the same check for ipv6_hdr(skb)->daddr in ipv6_rthdr_rcv(), so we
need not recheck it in ipv6_rpl_srh_rcv().
Also, we should use ipv6_addr_is_multicast() for ohdr->rpl_segaddr[i]
instead of ipv6_addr_type().
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
As Eric Dumazet pointed out [0], ipv6_rthdr_rcv() pulls these data
- Segment Routing Header : 8
- Hdr Ext Len : skb_transport_header(skb)[1] << 3
needed by ipv6_rpl_srh_rcv(). We can remove pskb_may_pull() and
replace pskb_pull() with skb_pull() in ipv6_rpl_srh_rcv().
Link: https://lore.kernel.org/netdev/CANn89iLboLwLrHXeHJucAqBkEL_S0rJFog68t7wwwXO-aNf5Mg@mail.gmail.com/ [0]
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
If we disable quotas while we have a relocation of a metadata block group
that has extents belonging to the quota root, we can cause the relocation
to fail with -ENOENT. This is because relocation builds backref nodes for
extents of the quota root and later needs to walk the backrefs and access
the quota root - however if in between a task disables quotas, it results
in deleting the quota root from the root tree (with btrfs_del_root(),
called from btrfs_quota_disable().
This can be sporadically triggered by test case btrfs/255 from fstests:
$ ./check btrfs/255
FSTYP -- btrfs
PLATFORM -- Linux/x86_64 debian0 6.4.0-rc6-btrfs-next-134+ #1 SMP PREEMPT_DYNAMIC Thu Jun 15 11:59:28 WEST 2023
MKFS_OPTIONS -- /dev/sdc
MOUNT_OPTIONS -- /dev/sdc /home/fdmanana/btrfs-tests/scratch_1
btrfs/255 6s ... _check_dmesg: something found in dmesg (see /home/fdmanana/git/hub/xfstests/results//btrfs/255.dmesg)
- output mismatch (see /home/fdmanana/git/hub/xfstests/results//btrfs/255.out.bad)
--- tests/btrfs/255.out 2023-03-02 21:47:53.876609426 +0000
+++ /home/fdmanana/git/hub/xfstests/results//btrfs/255.out.bad 2023-06-16 10:20:39.267563212 +0100
@@ -1,2 +1,4 @@
QA output created by 255
+ERROR: error during balancing '/home/fdmanana/btrfs-tests/scratch_1': No such file or directory
+There may be more info in syslog - try dmesg | tail
Silence is golden
...
(Run 'diff -u /home/fdmanana/git/hub/xfstests/tests/btrfs/255.out /home/fdmanana/git/hub/xfstests/results//btrfs/255.out.bad' to see the entire diff)
Ran: btrfs/255
Failures: btrfs/255
Failed 1 of 1 tests
To fix this make the quota disable operation take the cleaner mutex, as
relocation of a block group also takes this mutex. This is also what we
do when deleting a subvolume/snapshot, we take the cleaner mutex in the
cleaner kthread (at cleaner_kthread()) and then we call btrfs_del_root()
at btrfs_drop_snapshot() while under the protection of the cleaner mutex.
Fixes: bed92eae26cc ("Btrfs: qgroup implementation and prototypes")
CC: stable@vger.kernel.org # 5.4+
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs
Pull AFS writeback fixes from David Howells:
- release the acquired batch before returning if we got >=5 skips
- retry a page we had to wait for rather than skipping over it after
the wait
* tag 'afs-fixes-20230719' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs:
afs: Fix waiting for writeback then skipping folio
afs: Fix dangling folio ref counts in writeback
|
|
Add a comment to struct btrfs_fs_info::dirty_cowonly_roots to mention
that struct btrfs_fs_info::trans_lock is the lock that protects that
list.
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
|
|
When deleting the free space tree we are deleting the free space root
from the list fs_info->dirty_cowonly_roots without taking the lock that
protects it, which is struct btrfs_fs_info::trans_lock.
This unsynchronized list manipulation may cause chaos if there's another
concurrent manipulation of this list, such as when adding a root to it
with ctree.c:add_root_to_dirty_list().
This can result in all sorts of weird failures caused by a race, such as
the following crash:
[337571.278245] general protection fault, probably for non-canonical address 0xdead000000000108: 0000 [#1] PREEMPT SMP PTI
[337571.278933] CPU: 1 PID: 115447 Comm: btrfs Tainted: G W 6.4.0-rc6-btrfs-next-134+ #1
[337571.279153] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04/01/2014
[337571.279572] RIP: 0010:commit_cowonly_roots+0x11f/0x250 [btrfs]
[337571.279928] Code: 85 38 06 00 (...)
[337571.280363] RSP: 0018:ffff9f63446efba0 EFLAGS: 00010206
[337571.280582] RAX: ffff942d98ec2638 RBX: ffff9430b82b4c30 RCX: 0000000449e1c000
[337571.280798] RDX: dead000000000100 RSI: ffff9430021e4900 RDI: 0000000000036070
[337571.281015] RBP: ffff942d98ec2000 R08: ffff942d98ec2000 R09: 000000000000015b
[337571.281254] R10: 0000000000000009 R11: 0000000000000001 R12: ffff942fe8fbf600
[337571.281476] R13: ffff942dabe23040 R14: ffff942dabe20800 R15: ffff942d92cf3b48
[337571.281723] FS: 00007f478adb7340(0000) GS:ffff94349fa40000(0000) knlGS:0000000000000000
[337571.281950] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[337571.282184] CR2: 00007f478ab9a3d5 CR3: 000000001e02c001 CR4: 0000000000370ee0
[337571.282416] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[337571.282647] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[337571.282874] Call Trace:
[337571.283101] <TASK>
[337571.283327] ? __die_body+0x1b/0x60
[337571.283570] ? die_addr+0x39/0x60
[337571.283796] ? exc_general_protection+0x22e/0x430
[337571.284022] ? asm_exc_general_protection+0x22/0x30
[337571.284251] ? commit_cowonly_roots+0x11f/0x250 [btrfs]
[337571.284531] btrfs_commit_transaction+0x42e/0xf90 [btrfs]
[337571.284803] ? _raw_spin_unlock+0x15/0x30
[337571.285031] ? release_extent_buffer+0x103/0x130 [btrfs]
[337571.285305] reset_balance_state+0x152/0x1b0 [btrfs]
[337571.285578] btrfs_balance+0xa50/0x11e0 [btrfs]
[337571.285864] ? __kmem_cache_alloc_node+0x14a/0x410
[337571.286086] btrfs_ioctl+0x249a/0x3320 [btrfs]
[337571.286358] ? mod_objcg_state+0xd2/0x360
[337571.286577] ? refill_obj_stock+0xb0/0x160
[337571.286798] ? seq_release+0x25/0x30
[337571.287016] ? __rseq_handle_notify_resume+0x3ba/0x4b0
[337571.287235] ? percpu_counter_add_batch+0x2e/0xa0
[337571.287455] ? __x64_sys_ioctl+0x88/0xc0
[337571.287675] __x64_sys_ioctl+0x88/0xc0
[337571.287901] do_syscall_64+0x38/0x90
[337571.288126] entry_SYSCALL_64_after_hwframe+0x72/0xdc
[337571.288352] RIP: 0033:0x7f478aaffe9b
So fix this by locking struct btrfs_fs_info::trans_lock before deleting
the free space root from that list.
Fixes: a5ed91828518 ("Btrfs: implement the free space B-tree")
CC: stable@vger.kernel.org # 4.14+
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
|
|
When disabling quotas we are deleting the quota root from the list
fs_info->dirty_cowonly_roots without taking the lock that protects it,
which is struct btrfs_fs_info::trans_lock. This unsynchronized list
manipulation may cause chaos if there's another concurrent manipulation
of this list, such as when adding a root to it with
ctree.c:add_root_to_dirty_list().
This can result in all sorts of weird failures caused by a race, such as
the following crash:
[337571.278245] general protection fault, probably for non-canonical address 0xdead000000000108: 0000 [#1] PREEMPT SMP PTI
[337571.278933] CPU: 1 PID: 115447 Comm: btrfs Tainted: G W 6.4.0-rc6-btrfs-next-134+ #1
[337571.279153] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04/01/2014
[337571.279572] RIP: 0010:commit_cowonly_roots+0x11f/0x250 [btrfs]
[337571.279928] Code: 85 38 06 00 (...)
[337571.280363] RSP: 0018:ffff9f63446efba0 EFLAGS: 00010206
[337571.280582] RAX: ffff942d98ec2638 RBX: ffff9430b82b4c30 RCX: 0000000449e1c000
[337571.280798] RDX: dead000000000100 RSI: ffff9430021e4900 RDI: 0000000000036070
[337571.281015] RBP: ffff942d98ec2000 R08: ffff942d98ec2000 R09: 000000000000015b
[337571.281254] R10: 0000000000000009 R11: 0000000000000001 R12: ffff942fe8fbf600
[337571.281476] R13: ffff942dabe23040 R14: ffff942dabe20800 R15: ffff942d92cf3b48
[337571.281723] FS: 00007f478adb7340(0000) GS:ffff94349fa40000(0000) knlGS:0000000000000000
[337571.281950] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[337571.282184] CR2: 00007f478ab9a3d5 CR3: 000000001e02c001 CR4: 0000000000370ee0
[337571.282416] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[337571.282647] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[337571.282874] Call Trace:
[337571.283101] <TASK>
[337571.283327] ? __die_body+0x1b/0x60
[337571.283570] ? die_addr+0x39/0x60
[337571.283796] ? exc_general_protection+0x22e/0x430
[337571.284022] ? asm_exc_general_protection+0x22/0x30
[337571.284251] ? commit_cowonly_roots+0x11f/0x250 [btrfs]
[337571.284531] btrfs_commit_transaction+0x42e/0xf90 [btrfs]
[337571.284803] ? _raw_spin_unlock+0x15/0x30
[337571.285031] ? release_extent_buffer+0x103/0x130 [btrfs]
[337571.285305] reset_balance_state+0x152/0x1b0 [btrfs]
[337571.285578] btrfs_balance+0xa50/0x11e0 [btrfs]
[337571.285864] ? __kmem_cache_alloc_node+0x14a/0x410
[337571.286086] btrfs_ioctl+0x249a/0x3320 [btrfs]
[337571.286358] ? mod_objcg_state+0xd2/0x360
[337571.286577] ? refill_obj_stock+0xb0/0x160
[337571.286798] ? seq_release+0x25/0x30
[337571.287016] ? __rseq_handle_notify_resume+0x3ba/0x4b0
[337571.287235] ? percpu_counter_add_batch+0x2e/0xa0
[337571.287455] ? __x64_sys_ioctl+0x88/0xc0
[337571.287675] __x64_sys_ioctl+0x88/0xc0
[337571.287901] do_syscall_64+0x38/0x90
[337571.288126] entry_SYSCALL_64_after_hwframe+0x72/0xdc
[337571.288352] RIP: 0033:0x7f478aaffe9b
So fix this by locking struct btrfs_fs_info::trans_lock before deleting
the quota root from that list.
Fixes: bed92eae26cc ("Btrfs: qgroup implementation and prototypes")
CC: stable@vger.kernel.org # 4.14+
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
|
|
The btrfs_inode_mod_outstanding_extents trace event only shows the modified
number to the number of outstanding extents. It would be helpful if we can
see the resulting extent number as well.
Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
|
|
Add two new bits to the IMA_EXT_0 key for ZBA, ZBB, and ZBS extensions.
These are accurately reported per CPU.
Signed-off-by: Evan Green <evan@rivosinc.com>
Reviewed-by: Andrew Jones <ajones@ventanamicro.com>
Reviewed-by: Conor Dooley <conor.dooley@microchip.com>
Reviewed-by: Heiko Stuebner <heiko.stuebner@vrull.eu>
Link: https://lore.kernel.org/r/20230509182504.2997252-4-evan@rivosinc.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
|
|
The kernel maintains a mask of ISA extensions ANDed together across all
harts. Let's also keep a bitmap of ISA extensions for each CPU. Although
the kernel is currently unlikely to enable a feature that exists only on
some CPUs, we want the ability to report asymmetric CPU extensions
accurately to usermode.
Note that riscv_fill_hwcaps() runs before the per_cpu_offsets are built,
which is why I've used a [NR_CPUS] array rather than per_cpu() data.
Signed-off-by: Evan Green <evan@rivosinc.com>
Reviewed-by: Andrew Jones <ajones@ventanamicro.com>
Reviewed-by: Conor Dooley <conor.dooley@microchip.com>
Reviewed-by: Palmer Dabbelt <palmer@rivosinc.com>
Link: https://lore.kernel.org/r/20230509182504.2997252-3-evan@rivosinc.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
|
|
Add the Zba address bit manipulation extension and Zbs single bit
instructions extension into those the kernel is aware of and maintains
in its riscv_isa bitmap.
Signed-off-by: Evan Green <evan@rivosinc.com>
Reviewed-by: Andrew Jones <ajones@ventanamicro.com>
Reviewed-by: Palmer Dabbelt <palmer@rivosinc.com>
Reviewed-by: Conor Dooley <conor.dooley@microchip.com>
Reviewed-by: Heiko Stuebner <heiko.stuebner@vrull.eu>
Link: https://lore.kernel.org/r/20230509182504.2997252-2-evan@rivosinc.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
|
|
After some discussion, we decided that controlling transport layer
security policy should be separate from the setting for the user
authentication flavor. To accomplish this, add a new NFS mount
option to select a transport layer security policy for RPC
operations associated with the mount point.
xprtsec=none - Transport layer security is forced off.
xprtsec=tls - Establish an encryption-only TLS session. If
the initial handshake fails, the mount fails.
If TLS is not available on a reconnect, drop
the connection and try again.
xprtsec=mtls - Both sides authenticate and an encrypted
session is created. If the initial handshake
fails, the mount fails. If TLS is not available
on a reconnect, drop the connection and try
again.
To support client peer authentication (mtls), the handshake daemon
will have configurable default authentication material (certificate
or pre-shared key). In the future, mount options can be added that
can provide this material on a per-mount basis.
Updates to mount.nfs (to support xprtsec=auto) and nfs(5) will be
sent under separate cover.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
|
|
The new field is used to match struct nfs_clients that have the same
TLS policy setting.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
|
|
Use the new TLS handshake API to enable the SunRPC client code
to request a TLS handshake. This implements support for RFC 9289,
only on TCP sockets.
Upper layers such as NFS use RPC-with-TLS to protect in-transit
traffic.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
|
|
Cleanup bindings dropping unneeded quotes. Once all these are fixed,
checking for this can be enabled in yamllint.
Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Reviewed-by: Conor Dooley <conor.dooley@microchip.com>
Acked-by: Rob Herring <robh@kernel.org>
Acked-by: Palmer Dabbelt <palmer@rivosinc.com>
Link: https://lore.kernel.org/r/20230609140706.64623-1-krzysztof.kozlowski@linaro.org
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
|
|
Currently the PMU driver is using DT based lookup to
find the INTC node for sscofpmf extension. This will not work
for ACPI based systems causing the driver to fail to register
the PMU overflow interrupt handler.
Hence, change the code to use the standard interface to find
the INTC node which works irrespective of DT or ACPI.
Signed-off-by: Sunil V L <sunilvl@ventanamicro.com>
Reviewed-by: Conor Dooley <conor.dooley@microchip.com>
Reviewed-by: Atish Patra <atishp@rivosinc.com>
Link: https://lore.kernel.org/r/20230607112417.782085-3-sunilvl@ventanamicro.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
|
|
The arch specific __acpi_map_table can be wrapper around either
early_memremap or early_ioremap. But early_memremap
routine works with normal pointers whereas __acpi_map_table expects
pointers in iomem address space. This causes kernel test bot to fail
while using the sparse tool. Fix the issue by using early_ioremap and
similar fix done for __acpi_unmap_table.
Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202305201427.I7QhPjNW-lkp@intel.com/
Fixes: a91a9ffbd3a5 ("RISC-V: Add support to build the ACPI core")
Signed-off-by: Sunil V L <sunilvl@ventanamicro.com>
Reviewed-by: Conor Dooley <conor.dooley@microchip.com>
Link: https://lore.kernel.org/r/20230607112417.782085-2-sunilvl@ventanamicro.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
|
|
The suspend_restore_csrs is called in both __hibernate_cpu_resume
and the `else` of subsequent swsusp_arch_suspend.
Removing the first call makes both suspend_{save,restore}_csrs
left in swsusp_arch_suspend for clean code.
Fixes: c0317210012e ("RISC-V: Add arch functions to support hibernation/suspend-to-disk")
Reviewed-by: Conor Dooley <conor.dooley@microchip.com>
Reviewed-by: JeeHeng Sia <jeeheng.sia@starfivetech.com>
Signed-off-by: Song Shuai <songshuaishuai@tinylab.org>
Link: https://lore.kernel.org/r/20230522025020.285042-1-songshuaishuai@tinylab.org
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
|
|
No need to link the x1/ra reg via jalr before suspend_restore_regs
So it's better to replace jalr with jr.
Signed-off-by: Song Shuai <suagrfillet@gmail.com>
Reviewed-by: JeeHeng Sia <jeeheng.sia@starfivetech.com >
Link: https://lore.kernel.org/r/20230519060854.214138-1-suagrfillet@gmail.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
|
|
kTLS sockets use CMSG to report decryption errors and the need
for session re-keying.
For RPC-with-TLS, an "application data" message contains a ULP
payload, and that is passed along to the RPC client. An "alert"
message triggers connection reset. Everything else is discarded.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
|
|
The RPC header parser doesn't recognize TLS handshake traffic, so it
will close the connection prematurely with an error. To avoid that,
shunt the transport's data_ready callback when there is a TLS
handshake in progress.
The XPRT_SOCK_IGNORE_RECV flag will be toggled by code added in a
subsequent patch.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
|
|
The new authentication flavor is used only to discover peer support
for RPC-over-TLS.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
|
|
Overlayfs creates the real underlying files with fake f_path, whose
f_inode is on the underlying fs and f_path on overlayfs.
Those real files were open with FMODE_NONOTIFY, because fsnotify code was
not prapared to handle fsnotify hooks on files with fake path correctly
and fanotify would report unexpected event->fd with fake overlayfs path,
when the underlying fs was being watched.
Teach fsnotify to handle events on the real files, and do not set real
files to FMODE_NONOTIFY to allow operations on real file (e.g. open,
access, modify, close) to generate async and permission events.
Because fsnotify does not have notifications on address space
operations, we do not need to worry about ->vm_file not reporting
events to a watched overlayfs when users are accessing a mapped
overlayfs file.
Acked-by: Jan Kara <jack@suse.cz>
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Message-Id: <20230615112229.2143178-6-amir73il@gmail.com>
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
Pass the upper layer's rpc_create_args to the rpc_clnt_new()
tracepoint so additional parts of the upper layer's request can be
recorded.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
|
|
Add an initial set of policies along with fields for upper layers to
pass the requested policy down to the transport layer.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
|
|
Overlayfs uses open_with_fake_path() to allocate internal kernel files,
with a "fake" path - whose f_path is not on the same fs as f_inode.
Allocate a container struct backing_file for those internal files, that
is used to hold the "fake" ovl path along with the real path.
backing_file_real_path() can be used to access the stored real path.
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Message-Id: <20230615112229.2143178-5-amir73il@gmail.com>
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
Add some missing observability to the fs_context tracepoints
added by commit 33ce83ef0bb0 ("NFS: Replace fs_context-related
dprintk() call sites with tracepoints").
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
|
|
NFS is primarily name-spaced using network namespaces. However it
contacts rpcbind (and gss_proxy) using AF_UNIX sockets which are
name-spaced using the mount namespaces. This requires a container using
NFSv3 (the form that requires rpcbind) to manage both network and mount
namespaces, which can seem an unnecessary burden.
As NFS is primarily a network service it makes sense to use network
namespaces as much as possible, and to prefer to communicate with an
rpcbind running in the same network namespace. This can be done, while
preserving the benefits of AF_UNIX sockets, by using an abstract socket
address.
An abstract address has a nul at the start of sun_path, and a length
that is exactly the complete size of the sockaddr_un up to the end of
the name, NOT including any trailing nul (which is not part of the
address).
Abstract addresses are local to a network namespace - regular AF_UNIX
path names a resolved in the mount namespace ignoring the network
namespace.
This patch causes rpcb to first try an abstract address before
continuing with regular AF_UNIX and then IP addresses. This ensures
backwards compatibility.
Choosing the name needs some care as the same address will be configured
for rpcbind, and needs to be built in to libtirpc for this enhancement
to be fully successful. There is no formal standard for choosing
abstract addresses. The defacto standard appears to be to use a path
name similar to what would be used for a filesystem AF_UNIX address -
but with a leading nul.
In that case
"\0/var/run/rpcbind.sock"
seems like the best choice. However at this time /var/run is deprecated
in favour of /run, so
"\0/run/rpcbind.sock"
might be better.
Though as we are deliberately moving away from using the filesystem it
might seem more sensible to explicitly break the connection and just
have
"\0rpcbind.socket"
using the same name as the systemd unit file..
This patch chooses the second option, which seems least likely to raise
objections.
Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
|
|
An "abtract" address for an AF_UNIX socket start with a nul and can
contain any bytes for the given length, but traditionally doesn't
contain other nuls. When reported, the leading nul is replaced by '@'.
sunrpc currently rejects connections to these addresses and reports them
as an empty string. To provide support for future use of these
addresses, allow them for outgoing connections and report them more
usefully.
Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
|
|
Use a common helper init_file() instead of __alloc_file() for
alloc_empty_file*() helpers and improrve the documentation.
This is needed for a follow up patch that allocates a backing_file
container.
Suggested-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Message-Id: <20230615112229.2143178-4-amir73il@gmail.com>
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
cachefiles uses kernel_open_tmpfile() to open kernel internal tmpfile
without accounting for nr_files.
cachefiles uses open_with_fake_path() for the same reason without the
need for a fake path.
Fork open_with_fake_path() to kernel_file_open() which only does the
noaccount part and use it in cachefiles.
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Message-Id: <20230615112229.2143178-3-amir73il@gmail.com>
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
Otherwise, `stat` will report a stale value to users.
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
|
|
Fold them into the other NFS v4.2 operations in the right spots and
adjust spacing to keep the same style.
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
|
|
Overlayfs and cachefiles use vfs_open_tmpfile() to open a tmpfile
without accounting for nr_files.
Rename this helper to kernel_tmpfile_open() to better reflect this
helper is used for kernel internal users.
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Message-Id: <20230615112229.2143178-2-amir73il@gmail.com>
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
I add commends above each function to match the style of the other
nfs4_xdr_dec_*() functions. I also remove the unnecessary #ifdef
CONFIG_NFS_V4_2 that was added around this code, since we are already in
a v4.2-only file.
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
|
|
They should be in the nfs4_xdr_enc_*() section, and not at the bottom of
the file.
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
|
|
Move them out of the encode_*() section and into the decode_*() section
where it belongs.
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
|
|
Move the function to be with the other encode_*() functions, instead of
in the middle of the nfs4_xdr_enc_*() section.
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
|
|
When updating the ctime, we also want to update i_version.
This is just something I noticed by inspection. There is probably no way
to test this today unless you can somehow get to this inode via nfsd.
Still, I think it's the right thing to do for consistency's sake.
David Sterba's comment: I don't see anything wrong with setting the
iversion bit, however I also don't see where this would be useful.
Agreed with the consistency, otherwise the time is updated when device
super block is wiped or a device initialized, both are big events so
missing that due to lack of iversion update seems unlikely. I'll add it
to the queue, thanks.
Signed-off-by: Jeff Layton <jlayton@kernel.org>
[ add comments ]
Signed-off-by: David Sterba <dsterba@suse.com>
|