Age | Commit message (Collapse) | Author |
|
The section at the top of random.c which documents the input functions
available does not document add_hwgenerator_randomness() which might lead
a reader to overlook it. Add a brief note about it.
Signed-off-by: Mark Brown <broonie@kernel.org>
[Jason: reorganize position of function in doc comment and also document
add_bootloader_randomness() while we're at it.]
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
|
|
This is handy not just for humans, but also so that the 0-day bot can
automatically test posted mailing list patches against the right tree.
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
|
|
The second parameter of bpf_d_path() can only accept writable
memories. Rdonly_mem obtained from bpf_per_cpu_ptr() can not
be passed into bpf_d_path for modification. This patch adds
a selftest to verify this behavior.
Signed-off-by: Hao Luo <haoluo@google.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/bpf/20220106205525.2116218-1-haoluo@google.com
|
|
This adds documention for:
- bpf_map_delete_batch()
- bpf_map_lookup_batch()
- bpf_map_lookup_and_delete_batch()
- bpf_map_update_batch()
This also updates the public API for the `keys` parameter
of `bpf_map_delete_batch()`, and both the
`keys` and `values` parameters of `bpf_map_update_batch()`
to be constants.
Signed-off-by: Grant Seltzer <grantseltzer@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/bpf/20220106201304.112675-1-grantseltzer@gmail.com
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace
Pull tracing fixes from Steven Rostedt:
"Three minor tracing fixes:
- Fix missing prototypes in sample module for direct functions
- Fix check of valid buffer in get_trace_buf()
- Fix annotations of percpu pointers"
* tag 'trace-v5.16-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
tracing: Tag trace_percpu_buffer as a percpu pointer
tracing: Fix check for trace_percpu_buffer validity in get_trace_buf()
ftrace/samples: Add missing prototypes direct functions
|
|
After commit dc26532aed0a ("cgroup: rstat: punt root-level optimization to
individual controllers"), each rstat on updated_children list has its
->updated_next not NULL.
This means we can remove the check on ->updated_next, if we make sure
the subtree from @root is on list, which could be done by checking
updated_next for root.
tj: Coding style fixes.
Signed-off-by: Wei Yang <richard.weiyang@gmail.com>
Reviewed-by: Michal Koutný <mkoutny@suse.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
|
|
PT_REGS*() macro on some architectures force-cast struct pt_regs to
other types (user_pt_regs, etc) and might drop volatile modifiers, if any.
Volatile isn't really required as pt_regs value isn't supposed to change
during the BPF program run, so this is correct behavior.
But progs/loop3.c relies on that volatile modifier to ensure that loop
is preserved. Fix loop3.c by declaring i and sum variables as volatile
instead. It preserves the loop and makes the test pass on all
architectures (including s390x which is currently broken).
Fixes: 3cc31d794097 ("libbpf: Normalize PT_REGS_xxx() macro definitions")
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20220106205156.955373-1-andrii@kernel.org
|
|
Instead of do while unconditionally, let's put the loop variant in
while.
Signed-off-by: Wei Yang <richard.weiyang@gmail.com>
Reviewed-by: Michal Koutný <mkoutny@suse.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
|
|
When a task is writing to an fd opened by a different task, the perm check
should use the cgroup namespace of the latter task. Add a test for it.
Tested-by: Michal Koutný <mkoutny@suse.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
|
|
When a task is writing to an fd opened by a different task, the perm check
should use the credentials of the latter task. Add a test for it.
Tested-by: Michal Koutný <mkoutny@suse.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
|
|
0644 is an odd perm to create a cgroup which is a directory. Use the regular
0755 instead. This is necessary for euid switching test case.
Reviewed-by: Michal Koutný <mkoutny@suse.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
|
|
cgroup process migration permission checks are performed at write time as
whether a given operation is allowed or not is dependent on the content of
the write - the PID. This currently uses current's cgroup namespace which is
a potential security weakness as it may allow scenarios where a less
privileged process tricks a more privileged one into writing into a fd that
it created.
This patch makes cgroup remember the cgroup namespace at the time of open
and uses it for migration permission checks instad of current's. Note that
this only applies to cgroup2 as cgroup1 doesn't have namespace support.
This also fixes a use-after-free bug on cgroupns reported in
https://lore.kernel.org/r/00000000000048c15c05d0083397@google.com
Note that backporting this fix also requires the preceding patch.
Reported-by: "Eric W. Biederman" <ebiederm@xmission.com>
Suggested-by: Linus Torvalds <torvalds@linuxfoundation.org>
Cc: Michal Koutný <mkoutny@suse.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Reviewed-by: Michal Koutný <mkoutny@suse.com>
Reported-by: syzbot+50f5cf33a284ce738b62@syzkaller.appspotmail.com
Link: https://lore.kernel.org/r/00000000000048c15c05d0083397@google.com
Fixes: 5136f6365ce3 ("cgroup: implement "nsdelegate" mount option")
Signed-off-by: Tejun Heo <tj@kernel.org>
|
|
of->priv is currently used by each interface file implementation to store
private information. This patch collects the current two private data usages
into struct cgroup_file_ctx which is allocated and freed by the common path.
This allows generic private data which applies to multiple files, which will
be used to in the following patch.
Note that cgroup_procs iterator is now embedded as procs.iter in the new
cgroup_file_ctx so that it doesn't need to be allocated and freed
separately.
v2: union dropped from cgroup_file_ctx and the procs iterator is embedded in
cgroup_file_ctx as suggested by Linus.
v3: Michal pointed out that cgroup1's procs pidlist uses of->priv too.
Converted. Didn't change to embedded allocation as cgroup1 pidlists get
stored for caching.
Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Reviewed-by: Michal Koutný <mkoutny@suse.com>
|
|
cgroup process migration permission checks are performed at write time as
whether a given operation is allowed or not is dependent on the content of
the write - the PID. This currently uses current's credentials which is a
potential security weakness as it may allow scenarios where a less
privileged process tricks a more privileged one into writing into a fd that
it created.
This patch makes both cgroup2 and cgroup1 process migration interfaces to
use the credentials saved at the time of open (file->f_cred) instead of
current's.
Reported-by: "Eric W. Biederman" <ebiederm@xmission.com>
Suggested-by: Linus Torvalds <torvalds@linuxfoundation.org>
Fixes: 187fe84067bd ("cgroup: require write perm on common ancestor when moving processes on the default hierarchy")
Reviewed-by: Michal Koutný <mkoutny@suse.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
|
|
ssh://gitlab.freedesktop.org/agd5f/linux into drm-fixes
amd-drm-fixes-5.16-2021-12-31:
amdgpu:
- Suspend/resume fix
- Restore runtime pm behavior with efifb
Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Alex Deucher <alexander.deucher@amd.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20211231143825.11479-1-alexander.deucher@amd.com
|
|
There are currently 2 ways to create a set of sysfs files for a
kobj_type, through the default_attrs field, and the default_groups
field. Move the firmware efi sysfs code to use default_groups
field which has been the preferred way since aa30f47cf666 ("kobject: Add
support for default attribute groups to kobj_type") so that we can soon
get rid of the obsolete default_attrs field.
Cc: Ard Biesheuvel <ardb@kernel.org>
Cc: linux-efi@vger.kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
|
|
In an effort to ensure the initrd observed and used by the OS is
the same one that was meant to be loaded, which is difficult to
guarantee otherwise, let's measure the initrd if the EFI stub and
specifically the newly introduced LOAD_FILE2 protocol was used.
Modify the initrd loading sequence so that the contents of the initrd
are measured into PCR9. Note that the patch is currently using
EV_EVENT_TAG to create the eventlog entry instead of EV_IPL. According
to the TCP PC Client specification this is used for PCRs defined for OS
and application usage.
Co-developed-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Ilias Apalodimas <ilias.apalodimas@linaro.org>
Link: https://lore.kernel.org/r/20211119114745.1560453-5-ilias.apalodimas@linaro.org
[ardb: add braces to initializer of tagged_event_data]
Link: https://github.com/ClangBuiltLinux/linux/issues/1547
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
|
|
ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/song/md into for-5.17/drivers
Pull MD updates from Song:
"The major changes are:
- REQ_NOWAIT support, by Vishal Verma
- raid6 benchmark optimization, by Dirk Müller
- Fix for acct bioset, by Xiao Ni
- Clean up max_queued_requests, by Mariusz Tkaczyk
- PREEMPT_RT optimization, by Davidlohr Bueso
- Use default_groups in kobj_type, by Greg Kroah-Hartman"
* 'md-next' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/song/md:
md: use default_groups in kobj_type
md: Move alloc/free acct bioset in to personality
lib/raid6: Use strict priority ranking for pq gen() benchmarking
lib/raid6: skip benchmark of non-chosen xor_syndrome functions
md: fix spelling of "its"
md: raid456 add nowait support
md: raid10 add nowait support
md: raid1 add nowait support
md: add support for REQ_NOWAIT
md: drop queue limitation for RAID1 and RAID10
md/raid5: play nice with PREEMPT_RT
|
|
Inodes aren't supposed to have a project id of -1U (aka 4294967295) but
the kernel hasn't always validated FSSETXATTR correctly. Flag this as
something for the sysadmin to check out.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
|
|
Online fsck depends on callers holding ILOCK_EXCL from the time they
decide to update a block mapping until after they've updated the reverse
mapping records to guarantee the stability of both mapping records.
Unfortunately, the quota code drops ILOCK_EXCL at the first transaction
roll in the dquot allocation process, which breaks that assertion. This
leads to sporadic failures in the online rmap repair code if the repair
code grabs the AGF after bmapi_write maps a new block into the quota
file's data fork but before it can finish the deferred rmap update.
Fix this by rewriting the function to hold the ILOCK until after the
transaction commit like all other bmap updates do, and get rid of the
dqread wrapper that does nothing but complicate the codebase.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
|
|
mp is being initialized to log->l_mp but this is never read
as record is overwritten later on. Remove the redundant
assignment.
Cleans up the following clang-analyzer warning:
fs/xfs/xfs_log_recover.c:3543:20: warning: Value stored to 'mp' during
its initialization is never read [clang-analyzer-deadcode.DeadStores].
Reported-by: Abaci Robot <abaci@linux.alibaba.com>
Signed-off-by: Jiapeng Chong <jiapeng.chong@linux.alibaba.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
|
|
Oh, let me count the ways that the kvmalloc API sucks dog eggs.
The problem is when we are logging lots of large objects, we hit
kvmalloc really damn hard with costly order allocations, and
behaviour utterly sucks:
- 49.73% xlog_cil_commit
- 31.62% kvmalloc_node
- 29.96% __kmalloc_node
- 29.38% kmalloc_large_node
- 29.33% __alloc_pages
- 24.33% __alloc_pages_slowpath.constprop.0
- 18.35% __alloc_pages_direct_compact
- 17.39% try_to_compact_pages
- compact_zone_order
- 15.26% compact_zone
5.29% __pageblock_pfn_to_page
3.71% PageHuge
- 1.44% isolate_migratepages_block
0.71% set_pfnblock_flags_mask
1.11% get_pfnblock_flags_mask
- 0.81% get_page_from_freelist
- 0.59% _raw_spin_lock_irqsave
- do_raw_spin_lock
__pv_queued_spin_lock_slowpath
- 3.24% try_to_free_pages
- 3.14% shrink_node
- 2.94% shrink_slab.constprop.0
- 0.89% super_cache_count
- 0.66% xfs_fs_nr_cached_objects
- 0.65% xfs_reclaim_inodes_count
0.55% xfs_perag_get_tag
0.58% kfree_rcu_shrink_count
- 2.09% get_page_from_freelist
- 1.03% _raw_spin_lock_irqsave
- do_raw_spin_lock
__pv_queued_spin_lock_slowpath
- 4.88% get_page_from_freelist
- 3.66% _raw_spin_lock_irqsave
- do_raw_spin_lock
__pv_queued_spin_lock_slowpath
- 1.63% __vmalloc_node
- __vmalloc_node_range
- 1.10% __alloc_pages_bulk
- 0.93% __alloc_pages
- 0.92% get_page_from_freelist
- 0.89% rmqueue_bulk
- 0.69% _raw_spin_lock
- do_raw_spin_lock
__pv_queued_spin_lock_slowpath
13.73% memcpy_erms
- 2.22% kvfree
On this workload, that's almost a dozen CPUs all trying to compact
and reclaim memory inside kvmalloc_node at the same time. Yet it is
regularly falling back to vmalloc despite all that compaction, page
and shrinker reclaim that direct reclaim is doing. Copying all the
metadata is taking far less CPU time than allocating the storage!
Direct reclaim should be considered extremely harmful.
This is a high frequency, high throughput, CPU usage and latency
sensitive allocation. We've got memory there, and we're using
kvmalloc to allow memory allocation to avoid doing lots of work to
try to do contiguous allocations.
Except it still does *lots of costly work* that is unnecessary.
Worse: the only way to avoid the slowpath page allocation trying to
do compaction on costly allocations is to turn off direct reclaim
(i.e. remove __GFP_RECLAIM_DIRECT from the gfp flags).
Unfortunately, the stupid kvmalloc API then says "oh, this isn't a
GFP_KERNEL allocation context, so you only get kmalloc!". This
cuts off the vmalloc fallback, and this leads to almost instant OOM
problems which ends up in filesystems deadlocks, shutdowns and/or
kernel crashes.
I want some basic kvmalloc behaviour:
- kmalloc for a contiguous range with fail fast semantics - no
compaction direct reclaim if the allocation enters the slow path.
- run normal vmalloc (i.e. GFP_KERNEL) if kmalloc fails
The really, really stupid part about this is these kvmalloc() calls
are run under memalloc_nofs task context, so all the allocations are
always reduced to GFP_NOFS regardless of the fact that kvmalloc
requires GFP_KERNEL to be passed in. IOWs, we're already telling
kvmalloc to behave differently to the gfp flags we pass in, but it
still won't allow vmalloc to be run with anything other than
GFP_KERNEL.
So, this patch open codes the kvmalloc() in the commit path to have
the above described behaviour. The result is we more than halve the
CPU time spend doing kvmalloc() in this path and transaction commits
with 64kB objects in them more than doubles. i.e. we get ~5x
reduction in CPU usage per costly-sized kvmalloc() invocation and
the profile looks like this:
- 37.60% xlog_cil_commit
16.01% memcpy_erms
- 8.45% __kmalloc
- 8.04% kmalloc_order_trace
- 8.03% kmalloc_order
- 7.93% alloc_pages
- 7.90% __alloc_pages
- 4.05% __alloc_pages_slowpath.constprop.0
- 2.18% get_page_from_freelist
- 1.77% wake_all_kswapds
....
- __wake_up_common_lock
- 0.94% _raw_spin_lock_irqsave
- 3.72% get_page_from_freelist
- 2.43% _raw_spin_lock_irqsave
- 5.72% vmalloc
- 5.72% __vmalloc_node_range
- 4.81% __get_vm_area_node.constprop.0
- 3.26% alloc_vmap_area
- 2.52% _raw_spin_lock
- 1.46% _raw_spin_lock
0.56% __alloc_pages_bulk
- 4.66% kvfree
- 3.25% vfree
- __vfree
- 3.23% __vunmap
- 1.95% remove_vm_area
- 1.06% free_vmap_area_noflush
- 0.82% _raw_spin_lock
- 0.68% _raw_spin_lock
- 0.92% _raw_spin_lock
- 1.40% kfree
- 1.36% __free_pages
- 1.35% __free_pages_ok
- 1.02% _raw_spin_lock_irqsave
It's worth noting that over 50% of the CPU time spent allocating
these shadow buffers is now spent on spinlocks. So the shadow buffer
allocation overhead is greatly reduced by getting rid of direct
reclaim from kmalloc, and could probably be made even less costly if
vmalloc() didn't use global spinlocks to protect it's structures.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Allison Henderson <allison.henderson@oracle.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
|
|
There are currently 2 ways to create a set of sysfs files for a
kobj_type, through the default_attrs field, and the default_groups
field. Move the xfs sysfs code to use default_groups field which has
been the preferred way since aa30f47cf666 ("kobject: Add support for
default attribute groups to kobj_type") so that we can soon get rid of
the obsolete default_attrs field.
Cc: "Darrick J. Wong" <djwong@kernel.org>
Cc: linux-xfs@vger.kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
|
|
There are currently 2 ways to create a set of sysfs files for a
kobj_type, through the default_attrs field, and the default_groups
field. Move the md rdev sysfs code to use default_groups field which
has been the preferred way since commit aa30f47cf666 ("kobject: Add
support for default attribute groups to kobj_type") so that we can soon
get rid of the obsolete default_attrs field.
Cc: Song Liu <song@kernel.org>
Cc: linux-raid@vger.kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Song Liu <song@kernel.org>
|
|
A phandle for 'interrupts' value is wrong and should be one or more numbers.
Signed-off-by: Rob Herring <robh@kernel.org>
Link: https://lore.kernel.org/r/20220106182518.1435497-9-robh@kernel.org
Signed-off-by: Mark Brown <broonie@kernel.org>
|
|
kfree() and bitmap_free() are the same. But using the latter is more
consistent when freeing memory allocated with bitmap_zalloc().
Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Tested-by: Gurucharan G <gurucharanx.g@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
|
|
When a bitmap is local to a function, it is safe to use the non-atomic
__[set|clear]_bit(). No concurrent accesses can occur.
Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Tested-by: Gurucharan G <gurucharanx.g@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
|
|
The 'possible_idx' bitmap is set just after it is zeroed, so we can save
the first step.
The 'free_idx' bitmap is used only at the end of the function as the
result of a bitmap xor operation. So there is no need to explicitly
zero it before.
So, slightly simply the code and remove 2 useless 'bitmap_zero()' call
Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Tested-by: Sandeep Penigalapati <sandeep.penigalapati@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
|
|
In current switchdev implementation, every VF PR is assigned to
individual ring on switchdev ctrl VSI. For slow-path traffic, there
is a mapping VF->ring done in software based on src_vsi value (by
calling ice_eswitch_get_target_netdev function).
With this change, HW solution is introduced which is more
efficient. For each VF, src MAC (VF's MAC) filter will be created,
which forwards packets to the corresponding switchdev ctrl VSI queue
based on src MAC address.
This filter has to be removed and then replayed in case of
resetting one VF. Keep information about this rule in repr->mac_rule,
thanks to that we know which rule has to be removed and replayed
for a given VF.
In case of CORE/GLOBAL all rules are removed
automatically. We have to take care of readding them. This is done
by ice_replay_vsi_adv_rule.
When driver leaves switchdev mode, remove all advanced rules
from switchdev ctrl VSI. This is done by ice_rem_adv_rule_for_vsi.
Flag repr->rule_added is needed because in some cases reset
might be triggered before VF sends request to add MAC.
Co-developed-by: Grzegorz Nitka <grzegorz.nitka@intel.com>
Signed-off-by: Grzegorz Nitka <grzegorz.nitka@intel.com>
Signed-off-by: Wojciech Drewek <wojciech.drewek@intel.com>
Tested-by: Sandeep Penigalapati <sandeep.penigalapati@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
|
|
The init_freq_invariance_cppc function is implemented in smpboot and depends on
CONFIG_SMP.
MODPOST vmlinux.symvers
MODINFO modules.builtin.modinfo
GEN modules.builtin
LD .tmp_vmlinux.kallsyms1
ld: drivers/acpi/cppc_acpi.o: in function `acpi_cppc_processor_probe':
/home/ray/brahma3/linux/drivers/acpi/cppc_acpi.c:819: undefined reference to `init_freq_invariance_cppc'
make: *** [Makefile:1161: vmlinux] Error 1
See https://lore.kernel.org/lkml/484af487-7511-647e-5c5b-33d4429acdec@infradead.org/.
Fixes: 41ea667227ba ("x86, sched: Calculate frequency invariance for AMD systems")
Reported-by: kernel test robot <lkp@intel.com>
Reported-by: Randy Dunlap <rdunlap@infradead.org>
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Huang Rui <ray.huang@amd.com>
[ rjw: Subject edits ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
|
|
The following changes were made:
1. Align function signatures to 80 characters per line.
2. Remove tabs for variable assignment and use 1 space instead.
3. Don't compare to NULL in "if" clause.
4. Remove strange indentations.
This will ease on the maintenance of the driver for the future.
Link: https://lore.kernel.org/r/20211215135721.3662-7-mgurtovoy@nvidia.com
Signed-off-by: Max Gurtovoy <mgurtovoy@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
|
|
The AMD P-State driver is based on ACPI CPPC function, so ACPI should be
dependence of this driver in the kernel config.
In file included from ../drivers/cpufreq/amd-pstate.c:40:0:
../include/acpi/processor.h:226:2: error: unknown type name ‘phys_cpuid_t’
phys_cpuid_t phys_id; /* CPU hardware ID such as APIC ID for x86 */
^~~~~~~~~~~~
../include/acpi/processor.h:355:1: error: unknown type name ‘phys_cpuid_t’; did you mean ‘phys_addr_t’?
phys_cpuid_t acpi_get_phys_id(acpi_handle, int type, u32 acpi_id);
^~~~~~~~~~~~
phys_addr_t
CC drivers/rtc/rtc-rv3029c2.o
../include/acpi/processor.h:356:1: error: unknown type name ‘phys_cpuid_t’; did you mean ‘phys_addr_t’?
phys_cpuid_t acpi_map_madt_entry(u32 acpi_id);
^~~~~~~~~~~~
phys_addr_t
../include/acpi/processor.h:357:20: error: unknown type name ‘phys_cpuid_t’; did you mean ‘phys_addr_t’?
int acpi_map_cpuid(phys_cpuid_t phys_id, u32 acpi_id);
^~~~~~~~~~~~
phys_addr_t
See https://lore.kernel.org/lkml/20e286d4-25d7-fb6e-31a1-4349c805aae3@infradead.org/.
Reported-by: Randy Dunlap <rdunlap@infradead.org>
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Huang Rui <ray.huang@amd.com>
[ rjw: Subject edits ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
|
|
Add the description of @req and @boost_supported in struct amd_cpudata
kernel-doc comment to remove warnings found by running scripts/kernel-doc,
which is caused by using 'make W=1'.
drivers/cpufreq/amd-pstate.c:104: warning: Function parameter or member
'req' not described in 'amd_cpudata'
drivers/cpufreq/amd-pstate.c:104: warning: Function parameter or member
'boost_supported' not described in 'amd_cpudata'
Reported-by: Abaci Robot <abaci@linux.alibaba.com>
Signed-off-by: Yang Li <yang.lee@linux.alibaba.com>
Acked-by: Huang Rui <ray.huang@amd.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
|
|
ice_replay_vsi_adv_rule will replay advanced rules for a given VSI.
Exit this function when list of rules for given recipe is empty.
Do not add rule when given vsi_handle does not match vsi_handle
from the rule info.
Use ICE_MAX_NUM_RECIPES instead of ICE_SW_LKUP_LAST in order to find
advanced rules as well.
Signed-off-by: Victor Raj <victor.raj@intel.com>
Signed-off-by: Wojciech Drewek <wojciech.drewek@intel.com>
Tested-by: Sandeep Penigalapati <sandeep.penigalapati@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
|
|
bioset acct is only needed for raid0 and raid5. Therefore, md_run only
allocates it for raid0 and raid5. However, this does not cover
personality takeover, which may cause uninitialized bioset. For example,
the following repro steps:
mdadm -CR /dev/md0 -l1 -n2 /dev/loop0 /dev/loop1
mdadm --wait /dev/md0
mkfs.xfs /dev/md0
mdadm /dev/md0 --grow -l5
mount /dev/md0 /mnt
causes panic like:
[ 225.933939] BUG: kernel NULL pointer dereference, address: 0000000000000000
[ 225.934903] #PF: supervisor instruction fetch in kernel mode
[ 225.935639] #PF: error_code(0x0010) - not-present page
[ 225.936361] PGD 0 P4D 0
[ 225.936677] Oops: 0010 [#1] PREEMPT SMP DEBUG_PAGEALLOC KASAN PTI
[ 225.937525] CPU: 27 PID: 1133 Comm: mount Not tainted 5.16.0-rc3+ #706
[ 225.938416] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-2.module_el8.4.0+547+a85d02ba 04/01/2014
[ 225.939922] RIP: 0010:0x0
[ 225.940289] Code: Unable to access opcode bytes at RIP 0xffffffffffffffd6.
[ 225.941196] RSP: 0018:ffff88815897eff0 EFLAGS: 00010246
[ 225.941897] RAX: 0000000000000000 RBX: 0000000000092800 RCX: ffffffff81370a39
[ 225.942813] RDX: dffffc0000000000 RSI: 0000000000000000 RDI: 0000000000092800
[ 225.943772] RBP: 1ffff1102b12fe04 R08: fffffbfff0b43c01 R09: fffffbfff0b43c01
[ 225.944807] R10: ffffffff85a1e007 R11: fffffbfff0b43c00 R12: ffff88810eaaaf58
[ 225.945757] R13: 0000000000000000 R14: ffff88810eaaafb8 R15: ffff88815897f040
[ 225.946709] FS: 00007ff3f2505080(0000) GS:ffff888fb5e00000(0000) knlGS:0000000000000000
[ 225.947814] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 225.948556] CR2: ffffffffffffffd6 CR3: 000000015aa5a006 CR4: 0000000000370ee0
[ 225.949537] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 225.950455] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 225.951414] Call Trace:
[ 225.951787] <TASK>
[ 225.952120] mempool_alloc+0xe5/0x250
[ 225.952625] ? mempool_resize+0x370/0x370
[ 225.953187] ? rcu_read_lock_sched_held+0xa1/0xd0
[ 225.953862] ? rcu_read_lock_bh_held+0xb0/0xb0
[ 225.954464] ? sched_clock_cpu+0x15/0x120
[ 225.955019] ? find_held_lock+0xac/0xd0
[ 225.955564] bio_alloc_bioset+0x1ed/0x2a0
[ 225.956080] ? lock_downgrade+0x3a0/0x3a0
[ 225.956644] ? bvec_alloc+0xc0/0xc0
[ 225.957135] bio_clone_fast+0x19/0x80
[ 225.957651] raid5_make_request+0x1370/0x1b70
[ 225.958286] ? sched_clock_cpu+0x15/0x120
[ 225.958797] ? __lock_acquire+0x8b2/0x3510
[ 225.959339] ? raid5_get_active_stripe+0xce0/0xce0
[ 225.959986] ? lock_is_held_type+0xd8/0x130
[ 225.960528] ? rcu_read_lock_sched_held+0xa1/0xd0
[ 225.961135] ? rcu_read_lock_bh_held+0xb0/0xb0
[ 225.961703] ? sched_clock_cpu+0x15/0x120
[ 225.962232] ? lock_release+0x27a/0x6c0
[ 225.962746] ? do_wait_intr_irq+0x130/0x130
[ 225.963302] ? lock_downgrade+0x3a0/0x3a0
[ 225.963815] ? lock_release+0x6c0/0x6c0
[ 225.964348] md_handle_request+0x342/0x530
[ 225.964888] ? set_in_sync+0x170/0x170
[ 225.965397] ? blk_queue_split+0x133/0x150
[ 225.965988] ? __blk_queue_split+0x8b0/0x8b0
[ 225.966524] ? submit_bio_checks+0x3b2/0x9d0
[ 225.967069] md_submit_bio+0x127/0x1c0
[...]
Fix this by moving alloc/free of acct bioset to pers->run and pers->free.
While we are on this, properly handle md_integrity_register() error in
raid0_run().
Fixes: daee2024715d (md: check level before create and exit io_acct_set)
Cc: stable@vger.kernel.org
Acked-by: Guoqing Jiang <guoqing.jiang@linux.dev>
Signed-off-by: Xiao Ni <xni@redhat.com>
Signed-off-by: Song Liu <song@kernel.org>
|
|
On x86_64, currently 3 variants of AVX512, 3 variants of AVX2
and 3 variants of SSE2 are benchmarked on initialization, taking
between 144-153 jiffies. Testing across a hardware pool of
various generations of intel cpus I could not find a single
case where SSE2 won over AVX2 or AVX512. There are cases where
AVX2 wins over AVX512 however.
Change "prefer" into an integer priority field (similar to
how recov selection works) to have more than one ranking level
available, which is backwards compatible with existing behavior.
Give AVX2/512 variants higher priority over SSE2 in order to skip
SSE testing when AVX is available. in a AVX2/x86_64/HZ=250 case this
saves in the order of 200ms of initialization time.
Signed-off-by: Dirk Müller <dmueller@suse.de>
Acked-by: Paul Menzel <pmenzel@molgen.mpg.de>
Signed-off-by: Song Liu <song@kernel.org>
|
|
In commit fe5cbc6e06c7 ("md/raid6 algorithms: delta syndrome functions")
a xor_syndrome() benchmarking was added also to the raid6_choose_gen()
function. However, the results of that benchmarking were intentionally
discarded and did not influence the choice. It picked the
xor_syndrome() variant related to the best performing gen_syndrome().
Reduce runtime of raid6_choose_gen() without modifying its outcome by
only benchmarking the xor_syndrome() of the best gen_syndrome() variant.
For a HZ=250 x86_64 system with avx2 and without avx512 this removes
5 out of 6 xor() benchmarks, saving 340ms of raid6 initialization time.
Signed-off-by: Dirk Müller <dmueller@suse.de>
Signed-off-by: Song Liu <song@kernel.org>
|
|
Use the possessive "its" instead of the contraction "it's"
in printed messages.
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Cc: Song Liu <song@kernel.org>
Cc: linux-raid@vger.kernel.org
Signed-off-by: Song Liu <song@kernel.org>
|
|
Returns EAGAIN in case the raid456 driver would block waiting for reshape.
Reviewed-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Vishal Verma <vverma@digitalocean.com>
Signed-off-by: Song Liu <song@kernel.org>
|
|
This adds nowait support to the RAID10 driver. Very similar to
raid1 driver changes. It makes RAID10 driver return with EAGAIN
for situations where it could wait for eg:
- Waiting for the barrier,
- Reshape operation,
- Discard operation.
wait_barrier() and regular_request_wait() fn are modified to return bool
to support error for wait barriers. They returns true in case of wait
or if wait is not required and returns false if wait was required
but not performed to support nowait.
Reviewed-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Vishal Verma <vverma@digitalocean.com>
Signed-off-by: Song Liu <song@kernel.org>
|
|
This adds nowait support to the RAID1 driver. It makes RAID1 driver
return with EAGAIN for situations where it could wait for eg:
- Waiting for the barrier,
wait_barrier() fn is modified to return bool to support error for
wait barriers. It returns true in case of wait or if wait is not
required and returns false if wait was required but not performed
to support nowait.
Reviewed-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Vishal Verma <vverma@digitalocean.com>
Signed-off-by: Song Liu <song@kernel.org>
|
|
commit 021a24460dc2 ("block: add QUEUE_FLAG_NOWAIT") added support
for checking whether a given bdev supports handling of REQ_NOWAIT or not.
Since then commit 6abc49468eea ("dm: add support for REQ_NOWAIT and enable
it for linear target") added support for REQ_NOWAIT for dm. This uses
a similar approach to incorporate REQ_NOWAIT for md based bios.
This patch was tested using t/io_uring tool within FIO. A nvme drive
was partitioned into 2 partitions and a simple raid 0 configuration
/dev/md0 was created.
md0 : active raid0 nvme4n1p1[1] nvme4n1p2[0]
937423872 blocks super 1.2 512k chunks
Before patch:
$ ./t/io_uring /dev/md0 -p 0 -a 0 -d 1 -r 100
Running top while the above runs:
$ ps -eL | grep $(pidof io_uring)
38396 38396 pts/2 00:00:00 io_uring
38396 38397 pts/2 00:00:15 io_uring
38396 38398 pts/2 00:00:13 iou-wrk-38397
We can see iou-wrk-38397 io worker thread created which gets created
when io_uring sees that the underlying device (/dev/md0 in this case)
doesn't support nowait.
After patch:
$ ./t/io_uring /dev/md0 -p 0 -a 0 -d 1 -r 100
Running top while the above runs:
$ ps -eL | grep $(pidof io_uring)
38341 38341 pts/2 00:10:22 io_uring
38341 38342 pts/2 00:10:37 io_uring
After running this patch, we don't see any io worker thread
being created which indicated that io_uring saw that the
underlying device does support nowait. This is the exact behaviour
noticed on a dm device which also supports nowait.
For all the other raid personalities except raid0, we would need
to train pieces which involves make_request fn in order for them
to correctly handle REQ_NOWAIT.
Reviewed-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Vishal Verma <vverma@digitalocean.com>
Signed-off-by: Song Liu <song@kernel.org>
|
|
As suggested by Neil Brown[1], this limitation seems to be
deprecated.
With plugging in use, writes are processed behind the raid thread
and conf->pending_count is not increased. This limitation occurs only
if caller doesn't use plugs.
It can be avoided and often it is (with plugging). There are no reports
that queue is growing to enormous size so remove queue limitation for
non-plugged IOs too.
[1] https://lore.kernel.org/linux-raid/162496301481.7211.18031090130574610495@noble.neil.brown.name
Signed-off-by: Mariusz Tkaczyk <mariusz.tkaczyk@linux.intel.com>
Signed-off-by: Song Liu <song@kernel.org>
|
|
raid_run_ops() relies on the implicitly disabled preemption for
its percpu ops, although this is really about CPU locality. This
breaks RT semantics as it can take regular (and thus sleeping)
spinlocks, such as stripe_lock.
Add a local_lock such that non-RT does not change and continues
to be just map to preempt_disable/enable, but makes RT happy as
the region will use a per-CPU spinlock and thus be preemptible
and still guarantee CPU locality.
Signed-off-by: Davidlohr Bueso <dbueso@suse.de>
Signed-off-by: Song Liu <songliubraving@fb.com>
|
|
We missed adding handle_err for gpi mode, so add a new function
spi_geni_handle_err() which would call handle_fifo_timeout() or newly
added handle_gpi_timeout() based on mode
Fixes: b59c122484ec ("spi: spi-geni-qcom: Add support for GPI dma")
Reported-by: Douglas Anderson <dianders@chromium.org>
Reviewed-by: Douglas Anderson <dianders@chromium.org>
Signed-off-by: Vinod Koul <vkoul@kernel.org>
Link: https://lore.kernel.org/r/20220103071118.27220-2-vkoul@kernel.org
Signed-off-by: Mark Brown <broonie@kernel.org>
|
|
Before we invoke spi_finalize_current_transfer() in
spi_gsi_callback_result() we should set the spi->cur_msg->status as
appropriate (0 for success, error otherwise).
The helps to return error on transfer and not wait till it timesout on
error
Fixes: b59c122484ec ("spi: spi-geni-qcom: Add support for GPI dma")
Signed-off-by: Vinod Koul <vkoul@kernel.org>
Reviewed-by: Douglas Anderson <dianders@chromium.org>
Link: https://lore.kernel.org/r/20220103071118.27220-1-vkoul@kernel.org
Signed-off-by: Mark Brown <broonie@kernel.org>
|
|
struct uart_port contains a cached copy of the Modem Control signals.
It is used to skip register writes in uart_update_mctrl() if the new
signal state equals the old signal state. It also avoids a register
read to obtain the current state of output signals.
When a uart_port is registered, uart_configure_port() changes signal
state but neglects to keep the cached copy in sync. That may cause
a subsequent register write to be incorrectly skipped. Fix it before
it trips somebody up.
This behavior has been present ever since the serial core was introduced
in 2002:
https://git.kernel.org/history/history/c/33c0d1b0c3eb
So far it was never an issue because the cached copy is initialized to 0
by kzalloc() and when uart_configure_port() is executed, at most DTR has
been set by uart_set_options() or sunsu_console_setup(). Therefore,
a stable designation seems unnecessary.
Signed-off-by: Lukas Wunner <lukas@wunner.de>
Link: https://lore.kernel.org/r/bceeaba030b028ed810272d55d5fc6f3656ddddb.1641129752.git.lukas@wunner.de
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
In this error handling, "transmit_chars_dma" function will call
"transmit_chars_pio" once per characters. But "transmit_chars_pio" will
continue to send characters while xmit buffer is not empty.
Remove this useless loop, one call is sufficient.
Signed-off-by: Valentin Caron <valentin.caron@foss.st.com>
Link: https://lore.kernel.org/r/20220104182445.4195-5-valentin.caron@foss.st.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
If flow control is enabled, framework will call stop_tx to
pause transfer and then call start_tx to resume transfer.
Clear USART_CR3_DMAT bit in stop_tx ops to pause DMA transfer.
Signed-off-by: Erwan Le Ray <erwan.leray@foss.st.com>
Signed-off-by: Valentin Caron <valentin.caron@foss.st.com>
Link: https://lore.kernel.org/r/20220104182445.4195-4-valentin.caron@foss.st.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
TX DMA state condition is handled by tx_dma_busy boolean.
This boolean is set when dma descriptor is requested and reset when dma
channel is stopped (dma_terminate).
In stm32_usart_serial_remove(), stm32_usart_stop_tx() and
stm32_usart_transmit_chars_dma() fallback error case, DMA channel is
stopped but tx_dma_busy is not handled.
Rework the driver by using two new functions to solve this issue:
- stm32_usart_tx_dma_started return true if DMA TX have a descriptor.
- stm32_usart_tx_dma_enabled return true if DMAT bit is set.
stm32_usart_tx_dma_started uses tx_dma_busy flag to prevent dual DMA
transaction at the same time. This flag is set when a DMA transaction
begins and is unset when dmaengine_terminate_async function is called.
A new DMA transaction cannot be created if this flag is set.
Create a new function "stm32_usart_tx_dma_terminate" to be sure the flag
is unset after each call of dmaengine_terminate_async.
Signed-off-by: Erwan Le Ray <erwan.leray@foss.st.com>
Signed-off-by: Valentin Caron <valentin.caron@foss.st.com>
Link: https://lore.kernel.org/r/20220104182445.4195-3-valentin.caron@foss.st.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|