Age | Commit message (Collapse) | Author |
|
Stephen reports:
Documentation/core-api/cleanup:7: include/linux/cleanup.h:73: ERROR: Unexpected indentation. [docutils]
Documentation/core-api/cleanup:7: include/linux/cleanup.h:74: WARNING: Block quote ends without a blank line; unexpected unindent. [docutils]
Which points out that the ACQUIRE() example in cleanup.h missed the "::"
suffix to mark the following text as a code-block.
Fixes: 857d18f23ab1 ("cleanup: Introduce ACQUIRE() and ACQUIRE_ERR() for conditional locks")
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Closes: http://lore.kernel.org/20250717173354.34375751@canb.auug.org.au
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Acked-by: Randy Dunlap <rdunlap@infradead.org>
Tested-by: Randy Dunlap <rdunlap@infradead.org>
Link: https://patch.msgid.link/20250717163036.1275791-1-dan.j.williams@intel.com
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
|
|
The two str_has_prefix() and strstarts() are about the same
with a slight difference on what they return. Group them in
the header.
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Link: https://lore.kernel.org/r/20250711085514.1294428-1-andriy.shevchenko@linux.intel.com
Signed-off-by: Kees Cook <kees@kernel.org>
|
|
Several places in the kernel do class shifting to match whether a PCI
device is display class. Add pci_is_display() for those places to use.
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Daniel Dadap <ddadap@nvidia.com>
Reviewed-by: Simona Vetter <simona.vetter@ffwll.ch>
Link: https://patch.msgid.link/20250717173812.3633478-2-superm1@kernel.org
|
|
Add more detail to the kernel-doc function-header comments for
stop_machine(), stop_machine_cpuslocked(), and stop_core_cpuslocked().
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
|
|
Cross-merge networking fixes after downstream PR (net-6.16-rc7).
Conflicts:
Documentation/netlink/specs/ovpn.yaml
880d43ca9aa4 ("netlink: specs: clean up spaces in brackets")
af52020fc599 ("ovpn: reject unexpected netlink attributes")
drivers/net/phy/phy_device.c
a44312d58e78 ("net: phy: Don't register LEDs for genphy")
f0f2b992d818 ("net: phy: Don't register LEDs for genphy")
https://lore.kernel.org/20250710114926.7ec3a64f@kernel.org
drivers/net/wireless/intel/iwlwifi/fw/regulatory.c
drivers/net/wireless/intel/iwlwifi/mld/regulatory.c
5fde0fcbd760 ("wifi: iwlwifi: mask reserved bits in chan_state_active_bitmap")
ea045a0de3b9 ("wifi: iwlwifi: add support for accepting raw DSM tables by firmware")
net/ipv6/mcast.c
ae3264a25a46 ("ipv6: mcast: Delay put pmc->idev in mld_del_delrec()")
a8594c956cc9 ("ipv6: mcast: Avoid a duplicate pointer check in mld_del_delrec()")
https://lore.kernel.org/8cc52891-3653-4b03-a45e-05464fe495cf@kernel.org
No adjacent changes.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
|
|
Before the commit 36df6e3dbd7e ("cgroup: make css_rstat_updated nmi
safe"), the struct llist_node is expected to be private to the one
inserting the node to the lockless list or the one removing the node
from the lockless list. After the mentioned commit, the llist_node in
the rstat code is per-cpu shared between the stacked contexts i.e.
process, softirq, hardirq & nmi. It is possible the compiler may tear
the loads or stores of llist_node. Let's avoid that.
KCSAN reported the following race:
Reported by Kernel Concurrency Sanitizer on:
CPU: 60 UID: 0 PID: 5425 ... 6.16.0-rc3-next-20250626 #1 NONE
Tainted: [E]=UNSIGNED_MODULE
Hardware name: ...
==================================================================
==================================================================
BUG: KCSAN: data-race in css_rstat_flush / css_rstat_updated
write to 0xffffe8fffe1c85f0 of 8 bytes by task 1061 on cpu 1:
css_rstat_flush+0x1b8/0xeb0
__mem_cgroup_flush_stats+0x184/0x190
flush_memcg_stats_dwork+0x22/0x50
process_one_work+0x335/0x630
worker_thread+0x5f1/0x8a0
kthread+0x197/0x340
ret_from_fork+0xd3/0x110
ret_from_fork_asm+0x11/0x20
read to 0xffffe8fffe1c85f0 of 8 bytes by task 3551 on cpu 15:
css_rstat_updated+0x81/0x180
mod_memcg_lruvec_state+0x113/0x2d0
__mod_lruvec_state+0x3d/0x50
lru_add+0x21e/0x3f0
folio_batch_move_lru+0x80/0x1b0
__folio_batch_add_and_move+0xd7/0x160
folio_add_lru_vma+0x42/0x50
do_anonymous_page+0x892/0xe90
__handle_mm_fault+0xfaa/0x1520
handle_mm_fault+0xdc/0x350
do_user_addr_fault+0x1dc/0x650
exc_page_fault+0x5c/0x110
asm_exc_page_fault+0x22/0x30
value changed: 0xffffe8fffe18e0d0 -> 0xffffe8fffe1c85f0
$ ./scripts/faddr2line vmlinux css_rstat_flush+0x1b8/0xeb0
css_rstat_flush+0x1b8/0xeb0:
init_llist_node at include/linux/llist.h:86
(inlined by) llist_del_first_init at include/linux/llist.h:308
(inlined by) css_process_update_tree at kernel/cgroup/rstat.c:148
(inlined by) css_rstat_updated_list at kernel/cgroup/rstat.c:258
(inlined by) css_rstat_flush at kernel/cgroup/rstat.c:389
$ ./scripts/faddr2line vmlinux css_rstat_updated+0x81/0x180
css_rstat_updated+0x81/0x180:
css_rstat_updated at kernel/cgroup/rstat.c:90 (discriminator 1)
These are expected race and a simple READ_ONCE/WRITE_ONCE resolves these
reports. However let's add comments to explain the race and the need for
memory barriers if stronger guarantees are needed.
More specifically the rstat updater and the flusher can race and cause a
scenario where the stats updater skips adding the css to the lockless
list but the flusher might not see those updates done by the skipped
updater. This is benign race and the subsequent flusher will flush those
stats and at the moment there aren't any rstat users which are not fine
with this kind of race. However some future user might want more
stricter guarantee, so let's add appropriate comments to ease the job of
future users.
Signed-off-by: Shakeel Butt <shakeel.butt@linux.dev>
Reviewed-by: Paul E. McKenney <paulmck@kernel.org>
Fixes: 36df6e3dbd7e ("cgroup: make css_rstat_updated nmi safe")
Signed-off-by: Tejun Heo <tj@kernel.org>
|
|
Relocate the function max_pow_of_two_factor() to common ilog2.h from the
xfs code, as it will be used elsewhere.
Also simplify the function, as advised by Mikulas Patocka.
Signed-off-by: John Garry <john.g.garry@oracle.com>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Link: https://lore.kernel.org/r/20250711105258.3135198-2-john.g.garry@oracle.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Correct a typo error in the NVMe status code constant from
NVME_SC_SELT_TEST_IN_PROGRESS to NVME_SC_SELF_TEST_IN_PROGRESS to
accurately reflect its meaning.
Signed-off-by: Alok Tiwari <alok.a.tiwari@oracle.com>
Reviewed-by: Randy Dunlap <rdunlap@infradead.org>
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/linux
Tony Nguyen says:
====================
Add RDMA support for Intel IPU E2000 in idpf
Tatyana Nikolova says:
This idpf patch series is the second part of the staged submission for
introducing RDMA RoCEv2 support for the IPU E2000 line of products,
referred to as GEN3.
To support RDMA GEN3 devices, the idpf driver uses common definitions
of the IIDC interface and implements specific device functionality in
iidc_rdma_idpf.h.
The IPU model can host one or more logical network endpoints called
vPorts per PCI function that are flexibly associated with a physical
port or an internal communication port.
Other features as it pertains to GEN3 devices include:
* MMIO learning
* RDMA capability negotiation
* RDMA vectors discovery between idpf and control plane
These patches are split from the submission "Add RDMA support for Intel
IPU E2000 (GEN3)" [1]. The patches have been tested on a range of hosts
and platforms with a variety of general RDMA applications which include
standalone verbs (rping, perftest, etc.), storage and HPC applications.
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
[1] https://lore.kernel.org/all/20240724233917.704-1-tatyana.e.nikolova@intel.com/
This idpf patch series is the second part of the staged submission for
introducing RDMA RoCEv2 support for the IPU E2000 line of products,
referred to as GEN3.
To support RDMA GEN3 devices, the idpf driver uses common definitions
of the IIDC interface and implements specific device functionality in
iidc_rdma_idpf.h.
The IPU model can host one or more logical network endpoints called
vPorts per PCI function that are flexibly associated with a physical
port or an internal communication port.
Other features as it pertains to GEN3 devices include:
* MMIO learning
* RDMA capability negotiation
* RDMA vectors discovery between idpf and control plane
These patches are split from the submission "Add RDMA support for Intel
IPU E2000 (GEN3)" [1]. The patches have been tested on a range of hosts
and platforms with a variety of general RDMA applications which include
standalone verbs (rping, perftest, etc.), storage and HPC applications.
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
[1] https://lore.kernel.org/all/20240724233917.704-1-tatyana.e.nikolova@intel.com/
IWL reviews:
v3: https://lore.kernel.org/all/20250708210554.1662-1-tatyana.e.nikolova@intel.com/
v2: https://lore.kernel.org/all/20250612220002.1120-1-tatyana.e.nikolova@intel.com/
v1 (split from previous series):
https://lore.kernel.org/all/20250523170435.668-1-tatyana.e.nikolova@intel.com/
v3: https://lore.kernel.org/all/20250207194931.1569-1-tatyana.e.nikolova@intel.com/
RFC v2: https://lore.kernel.org/all/20240824031924.421-1-tatyana.e.nikolova@intel.com/
RFC: https://lore.kernel.org/all/20240724233917.704-1-tatyana.e.nikolova@intel.com/
* 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/linux:
idpf: implement get LAN MMIO memory regions
idpf: implement IDC vport aux driver MTU change handler
idpf: implement remaining IDC RDMA core callbacks and handlers
idpf: implement RDMA vport auxiliary dev create, init, and destroy
idpf: implement core RDMA auxiliary dev create, init, and destroy
idpf: use reserved RDMA vectors from control plane
====================
Link: https://patch.msgid.link/20250714181002.2865694-1-anthony.l.nguyen@intel.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
Update header inclusions to follow IWYU (Include What You Use)
principle.
Note that kernel.h is discouraged to be included as it's written
at the top of that file.
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Reviewed-by: Guenter Roeck <linux@roeck-us.net>
Link: https://lore.kernel.org/r/20250708133646.70384-3-andriy.shevchenko@linux.intel.com
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Wim Van Sebroeck <wim@linux-watchdog.org>
|
|
qcom_scm_shm_bridge_enable() is used early in the SCM initialization
routine. It makes an SCM call and so expects the internal __scm pointer
in the SCM driver to be assigned. For this reason the tzmem memory pool
is allocated *after* this pointer is assigned. However, this can lead to
a crash if another consumer of the SCM API makes a call using the memory
pool between the assignment of the __scm pointer and the initialization
of the tzmem memory pool.
As qcom_scm_shm_bridge_enable() is a special case, not meant to be
called by ordinary users, pull it into the local SCM header. Make it
take struct device as argument. This is the device that will be used to
make the SCM call as opposed to the global __scm pointer. This will
allow us to move the tzmem initialization *before* the __scm assignment
in the core SCM driver.
Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org>
Reviewed-by: Konrad Dybcio <konrad.dybcio@oss.qualcomm.com>
Link: https://lore.kernel.org/r/20250630-qcom-scm-race-v2-2-fa3851c98611@linaro.org
Signed-off-by: Bjorn Andersson <andersson@kernel.org>
|
|
qcom_scm_shm_bridge_create() and qcom_scm_shm_bridge_delete() take
struct device as argument but don't use it. Remove it from these
functions' prototypes.
Reviewed-by: Konrad Dybcio <konrad.dybcio@oss.qualcomm.com>
Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org>
Link: https://lore.kernel.org/r/20250630-qcom-scm-race-v2-1-fa3851c98611@linaro.org
Signed-off-by: Bjorn Andersson <andersson@kernel.org>
|
|
The 'commit 35f96de04127 ("bpf: Introduce BPF token object")' added
BPF token as a new kind of BPF kernel object. And BPF_OBJ_GET_INFO_BY_FD
already used to get BPF object info, so we can also get token info with
this cmd.
One usage scenario, when program runs failed with token, because of
the permission failure, we can report what BPF token is allowing with
this API for debugging.
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Tao Chen <chen.dylane@linux.dev>
Link: https://lore.kernel.org/r/20250716134654.1162635-1-chen.dylane@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
The memory uncorrected error could be signaled by asynchronous interrupt
(specifically, SPI in arm64 platform), e.g. when an error is detected by
a background scrubber, or signaled by synchronous exception
(specifically, data abort exception in arm64 platform), e.g. when a CPU
tries to access a poisoned cache line. Currently, both synchronous and
asynchronous errors use memory_failure_queue() to schedule
memory_failure() to exectute in a kworker context.
As a result, when a user-space process is accessing a poisoned data, a
data abort is taken and the memory_failure() is executed in the kworker
context, which:
- will send wrong si_code by SIGBUS signal in early_kill mode, and
- can not kill the user-space in some cases resulting a synchronous
error infinite loop
Issue 1: send wrong si_code in early_kill mode
Since commit a70297d22132 ("ACPI: APEI: set memory failure flags as
MF_ACTION_REQUIRED on synchronous events")', the flag MF_ACTION_REQUIRED
could be used to determine whether a synchronous exception occurs on
ARM64 platform. When a synchronous exception is detected, the kernel is
expected to terminate the current process which has accessed a poisoned
page. This is done by sending a SIGBUS signal with error code
BUS_MCEERR_AR, indicating an action-required machine check error on
read.
However, when kill_proc() is called to terminate the processes who has
the poisoned page mapped, it sends the incorrect SIGBUS error code
BUS_MCEERR_AO because the context in which it operates is not the one
where the error was triggered.
To reproduce this problem:
#sysctl -w vm.memory_failure_early_kill=1
vm.memory_failure_early_kill = 1
# STEP2: inject an UCE error and consume it to trigger a synchronous error
#einj_mem_uc single
0: single vaddr = 0xffffb0d75400 paddr = 4092d55b400
injecting ...
triggering ...
signal 7 code 5 addr 0xffffb0d75000
page not present
Test passed
The si_code (code 5) from einj_mem_uc indicates that it is BUS_MCEERR_AO
error and it is not factually correct.
After this change:
# STEP1: enable early kill mode
#sysctl -w vm.memory_failure_early_kill=1
vm.memory_failure_early_kill = 1
# STEP2: inject an UCE error and consume it to trigger a synchronous error
#einj_mem_uc single
0: single vaddr = 0xffffb0d75400 paddr = 4092d55b400
injecting ...
triggering ...
signal 7 code 4 addr 0xffffb0d75000
page not present
Test passed
The si_code (code 4) from einj_mem_uc indicates that it is a BUS_MCEERR_AR
error as expected.
Issue 2: a synchronous error infinite loop
If a user-space process, e.g. devmem, accesses a poisoned page for which
the HWPoison flag is set, kill_accessing_process() is called to send
SIGBUS to current processs with error info. Since the memory_failure()
is executed in the kworker context, it will just do nothing but return
EFAULT. So, devmem will access the posioned page and trigger an
exception again, resulting in a synchronous error infinite loop. Such
exception loop may cause platform firmware to exceed some threshold and
reboot when Linux could have recovered from this error.
To reproduce this problem:
# STEP 1: inject an UCE error, and kernel will set HWPosion flag for related page
#einj_mem_uc single
0: single vaddr = 0xffffb0d75400 paddr = 4092d55b400
injecting ...
triggering ...
signal 7 code 4 addr 0xffffb0d75000
page not present
Test passed
# STEP 2: access the same page and it will trigger a synchronous error infinite loop
devmem 0x4092d55b400
To fix above two issues, queue memory_failure() as a task_work so that
it runs in the context of the process that is actually consuming the
poisoned data.
Signed-off-by: Shuai Xue <xueshuai@linux.alibaba.com>
Tested-by: Ma Wupeng <mawupeng1@huawei.com>
Reviewed-by: Kefeng Wang <wangkefeng.wang@huawei.com>
Reviewed-by: Xiaofei Tan <tanxiaofei@huawei.com>
Reviewed-by: Baolin Wang <baolin.wang@linux.alibaba.com>
Reviewed-by: Jarkko Sakkinen <jarkko@kernel.org>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Reviewed-by: Jane Chu <jane.chu@oracle.com>
Reviewed-by: Yazen Ghannam <yazen.ghannam@amd.com>
Reviewed-by: Hanjun Guo <guohanjun@huawei.com>
Link: https://patch.msgid.link/20250714114212.31660-3-xueshuai@linux.alibaba.com
[ rjw: Changelog edits ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
|
|
This helper function returns the current power status of a given generic
power domain.
As example, remoteproc/imx_rproc.c can now use this function to check
the power status of the remote core to properly set "attached" or
"offline" modes.
Suggested-by: Ulf Hansson <ulf.hansson@linaro.org>
Reviewed-by: Bjorn Andersson <andersson@kernel.org>
Reviewed-by: Peng Fan <peng.fan@nxp.com>
Signed-off-by: Hiago De Franco <hiago.franco@toradex.com>
Link: https://lore.kernel.org/r/20250629172512.14857-2-hiagofranco@gmail.com
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
|
|
Use ACQUIRE() to cleanup conditional locking paths in the CXL driver
The ACQUIRE() macro and its associated ACQUIRE_ERR() helpers, like
scoped_cond_guard(), arrange for scoped-based conditional locking. Unlike
scoped_cond_guard(), these macros arrange for an ERR_PTR() to be retrieved
representing the state of the conditional lock.
The goal of this conversion is to complete the removal of all explicit
unlock calls in the subsystem. I.e. the methods to acquire a lock are
solely via guard(), scoped_guard() (for limited cases), or ACQUIRE(). All
unlock is implicit / scope-based. In order to make sure all lock sites are
converted, the existing rwsem's are consolidated and renamed in 'struct
cxl_rwsem'. While that makes the patch noisier it gives a clean cut-off
between old-world (explicit unlock allowed), and new world (explicit unlock
deleted).
Cc: David Lechner <dlechner@baylibre.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Fabio M. De Francesco <fabio.m.de.francesco@linux.intel.com>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: Jonathan Cameron <jonathan.cameron@huawei.com>
Cc: Dave Jiang <dave.jiang@intel.com>
Cc: Alison Schofield <alison.schofield@intel.com>
Cc: Vishal Verma <vishal.l.verma@intel.com>
Cc: Ira Weiny <ira.weiny@intel.com>
Cc: Shiju Jose <shiju.jose@huawei.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
Reviewed-by: Fabio M. De Francesco <fabio.m.de.francesco@linux.intel.com>
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Tested-by: Shiju Jose <shiju.jose@huawei.com>
Link: https://patch.msgid.link/20250711234932.671292-9-dan.j.williams@intel.com
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
|
|
scoped_cond_guard(), automatic cleanup for conditional locks, has a couple
pain points:
* It causes existing straight-line code to be re-indented into a new
bracketed scope. While this can be mitigated by a new helper function
to contain the scope, that is not always a comfortable conversion.
* The return code from the conditional lock is tossed in favor of a scheme
to pass a 'return err;' statement to the macro.
Other attempts to clean this up, to behave more like guard() [1], got hung
up trying to both establish and evaluate the conditional lock in one
statement.
ACQUIRE() solves this by reflecting the result of the condition in the
automatic variable established by the lock CLASS(). The result is
separately retrieved with the ACQUIRE_ERR() helper, effectively a PTR_ERR()
operation.
Link: http://lore.kernel.org/all/Z1LBnX9TpZLR5Dkf@gmail.com [1]
Link: http://patch.msgid.link/20250512105026.GP4439@noisy.programming.kicks-ass.net
Link: http://patch.msgid.link/20250512185817.GA1808@noisy.programming.kicks-ass.net
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: David Lechner <dlechner@baylibre.com>
Cc: Fabio M. De Francesco <fabio.m.de.francesco@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
[djbw: wrap Peter's proposal with changelog and comments]
Co-developed-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
Link: https://patch.msgid.link/20250711234932.671292-2-dan.j.williams@intel.com
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
|
|
Add write_begin_get_folio() to simplify the common folio lookup logic
used by filesystem ->write_begin() implementations.
This helper wraps __filemap_get_folio() with common flags such as
FGP_WRITEBEGIN, conditional FGP_DONTCACHE, and set folio order based
on the write length.
Part of a series refactoring address_space_operations write_begin and
write_end callbacks to use struct kiocb for passing write context and
flags.
Signed-off-by: Taotao Chen <chentaotao@didiglobal.com>
Link: https://lore.kernel.org/20250716093559.217344-5-chentaotao@didiglobal.com
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
Change the address_space_operations callbacks write_begin() and
write_end() to take struct kiocb * as the first argument instead of
struct file *.
Update all affected function prototypes, implementations, call sites,
and related documentation across VFS, filesystems, and block layer.
Part of a series refactoring address_space_operations write_begin and
write_end callbacks to use struct kiocb for passing write context and
flags.
Signed-off-by: Taotao Chen <chentaotao@didiglobal.com>
Link: https://lore.kernel.org/20250716093559.217344-4-chentaotao@didiglobal.com
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
Now that the driver core can properly handle constant struct bus_type,
move the fsi_bus_type variable to be a constant structure as well,
placing it into read-only memory which can not be modified at runtime.
Cc: Ninad Palsule <ninad@linux.ibm.com>
Cc: linux-fsi@lists.ozlabs.org
Reviewed-by: Eddie James <eajames@linux.ibm.com>
Link: https://lore.kernel.org/r/2025070100-overblown-busily-a04b@gregkh
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
There is a warning in the kerneldoc documentation of container_of() that
constness of its ptr argument is lost. While this is a valid suggestion
container_of_const() should be used instead, the vast majority of new
code still uses container_of():
$ git diff v6.13 v6.14|grep container_of\(|wc -l
646
$ git diff v6.13 v6.14|grep container_of_const|wc -l
9
Make an explicit recommendation to use container_of_const().
Signed-off-by: Sakari Ailus <sakari.ailus@linux.intel.com>
Link: https://lore.kernel.org/r/20250520103437.468691-1-sakari.ailus@linux.intel.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
Introduce struct ata_reset_operations to aggregate in a single structure
the definitions of the 4 reset methods (prereset, softreset, hardreset
and postreset) for a port. This new structure is used in struct ata_port
to define the reset methods for a regular port (reset field) and for a
port-multiplier port (pmp_reset field). A pointer to either of these
fields replaces the 4 reset method arguments passed to ata_eh_recover()
and ata_eh_reset().
The definition of the reset methods for all drivers is changed to use
the reset and pmp_reset fields in struct ata_port_operations.
A large number of files is modifed, but no functional changes are
introduced.
Suggested-by: Niklas Cassel <cassel@kernel.org>
Signed-off-by: Damien Le Moal <dlemoal@kernel.org>
Reviewed-by: Niklas Cassel <cassel@kernel.org>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Link: https://lore.kernel.org/r/20250716020315.235457-3-dlemoal@kernel.org
Signed-off-by: Niklas Cassel <cassel@kernel.org>
|
|
The only reason for ata_do_eh() to exist is that the two caller sites,
ata_std_error_handler() and ata_sff_error_handler() may pass it a
NULL hardreset operation so that the built-in (generic) hardreset
operation for a driver is ignored if the adapter SCR access is not
available.
However, ata_std_error_handler() and ata_sff_error_handler()
modifications of the hardreset port operation can easily be combined as
they are mutually exclusive. That is, a driver using sata_std_hardreset()
as its hardreset operation cannot use sata_sff_hardreset() and
vice-versa.
With this observation, ata_do_eh() can be removed and its code moved to
ata_std_error_handler(). The condition used to ignore the built-in
hardreset port operation is modified to be the one that was used in
ata_sff_error_handler(). This requires defining a stub for the function
sata_sff_hardreset() to avoid compilation errors when CONFIG_ATA_SFF is
not enabled. Furthermore, instead of modifying the local hardreset
operation definition, set the ATA_LFLAG_NO_HRST link flag to prevent
the use of built-in hardreset methods for ports without a valid scr_read
function. This flag is checked in ata_eh_reset() and if set, the
hardreset method is ignored.
This change simplifies ata_sff_error_handler() as this function now only
needs to call ata_std_error_handler().
No functional changes.
Signed-off-by: Damien Le Moal <dlemoal@kernel.org>
Reviewed-by: Niklas Cassel <cassel@kernel.org>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Link: https://lore.kernel.org/r/20250716020315.235457-2-dlemoal@kernel.org
Signed-off-by: Niklas Cassel <cassel@kernel.org>
|
|
This commit removes the SRCU-lite implementation, which has been replaced
by SRCU-fast.
Both SRCU-lite and SRCU-fast provide faster readers by dropping the
smp_mb() call from their lock and unlock primitives, but incur a pair
of added RCU grace periods during the SRCU grace period. There is a
trivial mapping from the SRCU-lite API to that of SRCU-fast, so there
should be no transition issues.
[ paulmck: Apply Christoph Hellwig feedback. ]
Signed-off-by: "Paul E. McKenney" <paulmck@kernel.org>
Signed-off-by: Neeraj Upadhyay (AMD) <neeraj.upadhyay@kernel.org>
|
|
Because SRCU-lite is being replaced by SRCU-fast, this commit removes
support for SRCU-lite from rcutorture.c
Both SRCU-lite and SRCU-fast provide faster readers by dropping the
smp_mb() call from their lock and unlock primitives, but incur a pair
of added RCU grace periods during the SRCU grace period. There is a
trivial mapping from the SRCU-lite API to that of SRCU-fast, so there
should be no transition issues.
[ paulmck: Apply Christoph Hellwig feedback. ]
Signed-off-by: "Paul E. McKenney" <paulmck@kernel.org>
Signed-off-by: Neeraj Upadhyay (AMD) <neeraj.upadhyay@kernel.org>
|
|
This moves bnxt_hsi.h contents to a common location so it can be
properly referenced by bnxt_en, bnxt_re, and bnge.
Signed-off-by: Andy Gospodarek <gospo@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: Pavan Chebbi <pavan.chebbi@broadcom.com>
Reviewed-by: Vadim Fedorenko <vadim.fedorenko@linux.dev>
Link: https://patch.msgid.link/20250714170202.39688-1-gospo@broadcom.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
The kerneldoc for hid_report_len() needs to be improved. The
description of the @report argument is ungrammatical, and the
documentation does not explain under what circumstances the report
length will include the byte reserved for the report ID.
Let's fix up the kerneldoc.
Signed-off-by: Alan Stern <stern@rowland.harvard.edu>
Link: https://patch.msgid.link/1c8416cb-7347-4a06-b00a-20518069d263@rowland.harvard.edu
Signed-off-by: Benjamin Tissoires <bentiss@kernel.org>
|
|
https://git.kernel.org/pub/scm/linux/kernel/git/pinctrl/samsung into devel
Samsung pinctrl drivers changes for v6.17
Add support for programming wake up for Google GS101 SoC pin
controllers, so the SoC can be properly woken up from low power states.
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
|
|
Currently all filesystems which implement super_operations::shutdown()
can not afford losing a device.
Thus fs_bdev_mark_dead() will just call the ->shutdown() callback for the
involved filesystem.
But it will no longer be the case, as multi-device filesystems like
btrfs and bcachefs can handle certain device loss without the need to
shutdown the whole filesystem.
To allow those multi-device filesystems to be integrated to use
fs_holder_ops:
- Add a new super_operations::remove_bdev() callback
- Try ->remove_bdev() callback first inside fs_bdev_mark_dead()
If the callback returned 0, meaning the fs can handling the device
loss, then exit without doing anything else.
If there is no such callback or the callback returned non-zero value,
continue to shutdown the filesystem as usual.
This means the new remove_bdev() should only do the check on whether the
operation can continue, and if so do the fs specific handlings.
The shutdown handling should still be handled by the existing
->shutdown() callback.
For all existing filesystems with shutdown callback, there is no change
to the code nor behavior.
Btrfs is going to implement both the ->remove_bdev() and ->shutdown()
callbacks soon.
Signed-off-by: Qu Wenruo <wqu@suse.com>
Link: https://lore.kernel.org/09909fcff7f2763cc037fec97ac2482bdc0a12cb.1752470276.git.wqu@suse.com
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
There is currently hard-coded logic spread around the tree for
determining the note name for regset notes emitted in coredumps.
Now that the names are declared explicitly in <uapi/elf.h>, this can be
simplified.
In preparation for getting rid of the special-case logic, add an
explicit core_note_name field in struct user_regset for specifying the
note name explicitly. To help avoid mistakes, a convenience macro
USER_REGSET_NOTE_TYPE() is provided to set .core_note_type and
.core_note_name based on the note type.
When dumping core, use the new field to set the note name, if the
regset specifies it.
Signed-off-by: Dave Martin <Dave.Martin@arm.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Kees Cook <kees@kernel.org>
Cc: Akihiko Odaki <akihiko.odaki@daynix.com>
Acked-by: Alexander Gordeev <agordeev@linux.ibm.com> # s390
Reviewed-by: Akihiko Odaki <odaki@rsg.ci.i.u-tokyo.ac.jp>
Link: https://lore.kernel.org/r/20250701135616.29630-3-Dave.Martin@arm.com
Signed-off-by: Kees Cook <kees@kernel.org>
|
|
Commit 7717cb9bdd04 ("regset: new method and helpers for it") added a
new interface ->regset_get() for struct user_regset, and commit
1e6986c9db21 ("regset: kill ->get()") got rid of the old interface.
The kerneldoc comment block was never updated to take account of this
change, though.
Update it.
No functional change.
Signed-off-by: Dave Martin <Dave.Martin@arm.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Kees Cook <kees@kernel.org>
Cc: Akihiko Odaki <akihiko.odaki@daynix.com>
Reviewed-by: Akihiko Odaki <odaki@rsg.ci.i.u-tokyo.ac.jp>
Link: https://lore.kernel.org/r/20250701135616.29630-2-Dave.Martin@arm.com
Signed-off-by: Kees Cook <kees@kernel.org>
|
|
Returning a large structure from the lock_stats() function causes clang
to have multiple copies of it on the stack and copy between them, which
can end up exceeding the frame size warning limit:
kernel/locking/lockdep.c:300:25: error: stack frame size (1464) exceeds limit (1280) in 'lock_stats' [-Werror,-Wframe-larger-than]
300 | struct lock_class_stats lock_stats(struct lock_class *class)
Change the calling conventions to directly operate on the caller's copy,
which apparently is what gcc does already.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Boqun Feng <boqun.feng@gmail.com>
Link: https://lore.kernel.org/r/20250610092941.2642847-1-arnd@kernel.org
|
|
A net device has a threaded sysctl that can be used to enable threaded
NAPI polling on all of the NAPI contexts under that device. Allow
enabling threaded NAPI polling at individual NAPI level using netlink.
Extend the netlink operation `napi-set` and allow setting the threaded
attribute of a NAPI. This will enable the threaded polling on a NAPI
context.
Add a test in `nl_netdev.py` that verifies various cases of threaded
NAPI being set at NAPI and at device level.
Tested
./tools/testing/selftests/net/nl_netdev.py
TAP version 13
1..7
ok 1 nl_netdev.empty_check
ok 2 nl_netdev.lo_check
ok 3 nl_netdev.page_pool_check
ok 4 nl_netdev.napi_list_check
ok 5 nl_netdev.dev_set_threaded
ok 6 nl_netdev.napi_set_threaded
ok 7 nl_netdev.nsim_rxq_reset_down
# Totals: pass:7 fail:0 xfail:0 xpass:0 skip:0 error:0
Signed-off-by: Samiullah Khawaja <skhawaja@google.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Link: https://patch.msgid.link/20250710211203.3979655-1-skhawaja@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux
Tariq Toukan says:
====================
mlx5-next updates 2025-07-14
* 'mlx5-next' of git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux:
net/mlx5: IFC updates for disabled host PF
net/mlx5: Expose disciplined_fr_counter through HCA capabilities in mlx5_ifc
RDMA/mlx5: Fix UMR modifying of mkey page size
net/mlx5: Expose HCA capability bits for mkey max page size
====================
Link: https://patch.msgid.link/1752481357-34780-1-git-send-email-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
This is a follow-up for commit eb1ac9ff6c4a5 ("ipv6: anycast: Don't
hold RTNL for IPV6_JOIN_ANYCAST.").
We should not add a new device lookup API without netdevice_tracker.
Let's pass netdevice_tracker to dev_get_by_flags_rcu() and rename it
with netdev_ prefix to match other newer APIs.
Note that we always use GFP_ATOMIC for netdev_hold() as it's expected
to be called under RCU.
Suggested-by: Jakub Kicinski <kuba@kernel.org>
Link: https://lore.kernel.org/netdev/20250708184053.102109f6@kernel.org/
Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20250711051120.2866855-1-kuniyu@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Drivers could leverage the fact that the VF BAR MMIO reservation is created
for total number of VFs supported by the device by resizing the BAR to
larger size when smaller number of VFs is enabled.
Add pci_iov_vf_bar_set_size() to control the size and a
pci_iov_vf_bar_get_sizes() helper to get the VF BAR sizes that will allow
up to num_vfs to be successfully enabled with the current underlying
reservation size.
Signed-off-by: Michał Winiarski <michal.winiarski@intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Link: https://patch.msgid.link/20250702093522.518099-6-michal.winiarski@intel.com
|
|
The RDMA driver needs to map its own MMIO regions for the sake of
performance, meaning the IDPF needs to avoid mapping portions of the BAR
space. However, to be HW agnostic, the IDPF cannot assume where
these are and must avoid mapping hard coded regions as much as possible.
The IDPF maps the bare minimum to load and communicate with the
control plane, i.e., the mailbox registers and the reset state
registers. Because of how and when mailbox register offsets are
initialized, it is easier to adjust the existing defines to be relative
to the mailbox region starting address. Use a specific mailbox register
write function that uses these relative offsets. The reset state
register addresses are calculated the same way as for other registers,
described below.
The IDPF then calls a new virtchnl op to fetch a list of MMIO regions
that it should map. The addresses for the registers in these regions are
calculated by determining what region the register resides in, adjusting
the offset to be relative to that region, and then adding the
register's offset to that region's mapped address.
If the new virtchnl op is not supported, the IDPF will fallback to
mapping the whole bar. However, it will still map them as separate
regions outside the mailbox and reset state registers. This way we can
use the same logic in both cases to access the MMIO space.
Reviewed-by: Madhu Chittim <madhu.chittim@intel.com>
Signed-off-by: Joshua Hay <joshua.a.hay@intel.com>
Signed-off-by: Tatyana Nikolova <tatyana.e.nikolova@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
|
|
Implement the functions to create, initialize, and destroy an RDMA vport
auxiliary device. The vport aux dev creation is dependent on the
core aux device to call idpf_idc_vport_dev_ctrl to signal that it is
ready for vport aux devices. Implement that core callback to either
create and initialize the vport aux dev or deinitialize.
RDMA vport aux dev creation is also dependent on the control plane to
tell us the vport is RDMA enabled. Add a flag in the create vport
message to signal individual vport RDMA capabilities.
Reviewed-by: Madhu Chittim <madhu.chittim@intel.com>
Signed-off-by: Joshua Hay <joshua.a.hay@intel.com>
Signed-off-by: Tatyana Nikolova <tatyana.e.nikolova@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
|
|
Add the initial idpf_idc.c file with the functions to kick off the IDC
initialization, create and initialize a core RDMA auxiliary device, and
destroy said device.
The RDMA core has a dependency on the vports being created by the
control plane before it can be initialized. Therefore, once all the
vports are up after a hard reset (either during driver load a function
level reset), the core RDMA device info will be created. It is populated
with the function type (as distinguished by the IDC initialization
function pointer), the core idc_ops function points (just stubs for
now), the reserved RDMA MSIX table, and various other info the core RDMA
auxiliary driver will need. It is then plugged on to the bus.
During a function level reset or driver unload, the device will be
unplugged from the bus and destroyed.
Reviewed-by: Madhu Chittim <madhu.chittim@intel.com>
Signed-off-by: Joshua Hay <joshua.a.hay@intel.com>
Signed-off-by: Tatyana Nikolova <tatyana.e.nikolova@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
|
|
This shrinks the struct by 4 bytes, but also takes it from 19 to 18
cachelines on x86_64.
Signed-off-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
|
|
Nothing returns this error code.
Signed-off-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
|
|
In the case of an unknown error code from svc_authenticate or
pg_authenticate, return AUTH_ERROR with a status of AUTH_FAILED. Also
add the other auth_stat value from RFC 5531, and document all the status
codes.
Signed-off-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
|
|
The rqst argument to xdr_init_encode_pages is set to NULL by all callers,
and pages is always set to buf->pages. Remove the two arguments and
hardcode the assignments.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
|
|
With PREEMPT_RT enabled, some of the calls to put_task_struct() coming
from rt_mutex_adjust_prio_chain() could happen in preemptible context and
with a mutex enqueued. That could lead to this sequence:
rt_mutex_adjust_prio_chain()
put_task_struct()
__put_task_struct()
sched_ext_free()
spin_lock_irqsave()
rtlock_lock() ---> TRIGGERS
lockdep_assert(!current->pi_blocked_on);
This is not a SCHED_EXT bug. The first cleanup function called by
__put_task_struct() is sched_ext_free() and it happens to take a
(RT) spin_lock, which in the scenario described above, would trigger
the lockdep assertion of "!current->pi_blocked_on".
Crystal Wood was able to identify the problem as __put_task_struct()
being called during rt_mutex_adjust_prio_chain(), in the context of
a process with a mutex enqueued.
Instead of adding more complex conditions to decide when to directly
call __put_task_struct() and when to defer the call, unconditionally
resort to the deferred call on PREEMPT_RT to simplify the code.
Fixes: 893cdaaa3977 ("sched: avoid false lockdep splat in put_task_struct()")
Suggested-by: Crystal Wood <crwood@redhat.com>
Signed-off-by: Luis Claudio R. Goncalves <lgoncalv@redhat.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Wander Lairson Costa <wander@redhat.com>
Reviewed-by: Valentin Schneider <vschneid@redhat.com>
Reviewed-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Link: https://lore.kernel.org/r/aGvTz5VaPFyj0pBV@uudg.org
|
|
This lets us assert mutex::wait_lock is held whenever we access
p->blocked_on, as well as warn us for unexpected state changes.
[fix conflicts, call in more places]
[jstultz: tweaked commit subject, reworked a good bit]
Signed-off-by: Valentin Schneider <valentin.schneider@arm.com>
Signed-off-by: Connor O'Brien <connoro@google.com>
Signed-off-by: John Stultz <jstultz@google.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Tested-by: K Prateek Nayak <kprateek.nayak@amd.com>
Link: https://lkml.kernel.org/r/20250712033407.2383110-4-jstultz@google.com
|
|
Track the blocked-on relation for mutexes, to allow following this
relation at schedule time.
task
| blocked-on
v
mutex
| owner
v
task
This all will be used for tracking blocked-task/mutex chains
with the prox-execution patch in a similar fashion to how
priority inheritance is done with rt_mutexes.
For serialization, blocked-on is only set by the task itself
(current). And both when setting or clearing (potentially by
others), is done while holding the mutex::wait_lock.
[minor changes while rebasing]
[jstultz: Fix blocked_on tracking in __mutex_lock_common in error paths]
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Juri Lelli <juri.lelli@redhat.com>
Signed-off-by: Connor O'Brien <connoro@google.com>
Signed-off-by: John Stultz <jstultz@google.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Tested-by: K Prateek Nayak <kprateek.nayak@amd.com>
Link: https://lkml.kernel.org/r/20250712033407.2383110-3-jstultz@google.com
|
|
Add a CONFIG_SCHED_PROXY_EXEC option, along with a boot argument
sched_proxy_exec= that can be used to disable the feature at boot
time if CONFIG_SCHED_PROXY_EXEC was enabled.
Also uses this option to allow the rq->donor to be different from
rq->curr.
Signed-off-by: John Stultz <jstultz@google.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Tested-by: K Prateek Nayak <kprateek.nayak@amd.com>
Link: https://lkml.kernel.org/r/20250712033407.2383110-2-jstultz@google.com
|
|
Avoid merge conflicts
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
|
|
Extend the devfreq_dev_profile to allow drivers optionally create
device-specific sysfs ABIs together with other common devfreq ABIs under
the devfreq device path.
Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
Reviewed-by: Huisong Li <lihuisong@huawei.com>
Signed-off-by: Jie Zhan <zhanjie9@hisilicon.com>
Signed-off-by: Chanwoo Choi <cw00.choi@samsung.com>
Link: https://patchwork.kernel.org/project/linux-pm/patch/20250623143401.4095045-2-zhanjie9@hisilicon.com/
|