Age | Commit message (Collapse) | Author |
|
The correct place for this definition is the nvme rdma header file and
not the common nvme header file.
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Israel Rukshin <israelr@nvidia.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Max Gurtovoy <mgurtovoy@nvidia.com>
Signed-off-by: Keith Busch <kbusch@kernel.org>
|
|
We have a macro. It should be used.
Signed-off-by: Oliver Neukum <oneukum@suse.com>
Reviewed-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Link: https://lore.kernel.org/r/20240229132401.3270-1-oneukum@suse.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/westeri/thunderbolt into usb-next
Mika writes:
thunderbolt: Changes for v6.9 merge window
This includes following USB4/Thunderbolt changes for the v6.9 merge
window:
- Reset the topology also for USB4 v1 routers on driver load
- DisplayPort tunneling and bandwidth allocation mode improvements
- Tracepoint support for the control channel
- Couple of minor fixes and cleanups.
All these have been in linux-next with no reported issues.
* tag 'thunderbolt-for-v6.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/westeri/thunderbolt: (23 commits)
thunderbolt: Constify the struct device_type usage
thunderbolt: Add trace events support for the control channel
thunderbolt: Keep the domain powered when USB4 port is in redrive mode
thunderbolt: Improve DisplayPort tunnel setup process to be more robust
thunderbolt: Calculate DisplayPort tunnel bandwidth after DPRX capabilities read
thunderbolt: Reserve released DisplayPort bandwidth for a group for 10 seconds
thunderbolt: Introduce tb_tunnel_direction_downstream()
thunderbolt: Re-order bandwidth group functions
thunderbolt: Fail the failed bandwidth request properly
thunderbolt: Log an error if DPTX request is not cleared
thunderbolt: Handle bandwidth allocation mode disable request
thunderbolt: Re-calculate estimated bandwidth when allocation mode is enabled
thunderbolt: Use DP_LOCAL_CAP for maximum bandwidth calculation
thunderbolt: Correct typo in host_reset parameter
thunderbolt: Skip discovery also in USB4 v2 host
thunderbolt: Reset only non-USB4 host routers in resume
thunderbolt: Remove usage of the deprecated ida_simple_xx() API
thunderbolt: Fix rollback in tb_port_lane_bonding_enable() for lane 1
thunderbolt: Fix XDomain rx_lanes_show and tx_lanes_show
thunderbolt: Reset topology created by the boot firmware
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/coresight/linux into char-misc-next
Suzuki writes:
coresight: hwtracing subsystem updates for v6.9
Changes targeting Linux v6.9 include:
- CoreSight: Enable W=1 warnings as default
- CoreSight: Clean up sysfs/perf mode handling for tracing
- Support for Qualcomm TPDM CMB Dataset
- Miscellaneous fixes to the CoreSight subsystem
- Fix for hisi_ptt PMU to reject events targeting other PMUs
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
* tag 'coresight-next-v6.9' of git://git.kernel.org/pub/scm/linux/kernel/git/coresight/linux: (32 commits)
coresight-tpda: Change qcom,dsb-element-size to qcom,dsb-elem-bits
dt-bindings: arm: qcom,coresight-tpdm: Rename qcom,dsb-element-size
hwtracing: hisi_ptt: Move type check to the beginning of hisi_ptt_pmu_event_init()
coresight: tpdm: Fix build break due to uninitialised field
coresight: etm4x: Set skip_power_up in etm4_init_arch_data function
coresight-tpdm: Add msr register support for CMB
dt-bindings: arm: qcom,coresight-tpdm: Add support for TPDM CMB MSR register
coresight-tpdm: Add timestamp control register support for the CMB
coresight-tpdm: Add pattern registers support for CMB
coresight-tpdm: Add support to configure CMB
coresight-tpda: Add support to configure CMB element
coresight-tpdm: Add CMB dataset support
dt-bindings: arm: qcom,coresight-tpdm: Add support for CMB element size
coresight-tpdm: Optimize the useage of tpdm_has_dsb_dataset
coresight-tpdm: Optimize the store function of tpdm simple dataset
coresight: Add helper for setting csdev->mode
coresight: Add a helper for getting csdev->mode
coresight: Add helper for atomically taking the device
coresight: Add explicit member initializers to coresight_dev_type
coresight: Remove unused stubs
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/mani/mhi into char-misc-next
Manivannan writes:
MHI Host
========
- Added new MHI_PM_SYS_ERR_FAIL state to the MHI state machine to properly
cleanup the channel state if the device fails to respond to the MHI reset
during SYS_ERR handling. This issue was discovered with the Qualcomm AIC100 AI
accelerator device.
- Modified the code that reads and exposes the OEM_PK_HASH registers through
sysfs to read them on-demand instead of reading once during boot. Qualcomm
AIC100 devices support provisioning the keys dynamically, so this allows the
users to know the upto date information.
- Added tracepoint support to expose the debug information over tracefs.
- Reverted the commit that reads the MHI device revision from the device during
boot. This is done because the read info was not used anywhere (dead code) and
also it is not possible to read the revision info from all the devices.
- Constified the modem config for Telit FN980 modem as required by the MHI core.
MHI Endpoint
============
- Replaced kzalloc() with kcalloc() in an effort to avoid integer overflows
during multiplication. Even though there is no potential overflow in the
endpoint code, this is done for the sake of uniformity and best practice.
- Fixed the kmem_cache_create() failure check to use the correct variable.
* tag 'mhi-for-v6.9' of git://git.kernel.org/pub/scm/linux/kernel/git/mani/mhi:
bus: mhi: host: pci_generic: constify modem_telit_fn980_hw_v1_config
bus: mhi: host: Change the trace string for the userspace tools mapping
bus: mhi: ep: check the correct variable in mhi_ep_register_controller()
Revert "bus: mhi: core: Add support for reading MHI info from device"
bus: mhi: host: Add tracing support
bus: mhi: ep: Use kcalloc() instead of kzalloc()
bus: mhi: host: Read PK HASH dynamically
bus: mhi: host: Add MHI_PM_SYS_ERR_FAIL state
|
|
In commit 19416123ab3e ("block: define 'struct bvec_iter' as packed"),
what we need is to save the 4byte padding, and avoid `bio` to spread on
one extra cache line.
It is enough to define it as '__packed __aligned(4)', as '__packed'
alone means byte aligned, and can cause compiler to generate horrible
code on architectures that don't support unaligned access in case that
bvec_iter is embedded in other structures.
Cc: Mikulas Patocka <mpatocka@redhat.com>
Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Fixes: 19416123ab3e ("block: define 'struct bvec_iter' as packed")
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Functions which can't access MFRL (Management Firmware Reset Level)
register, have no use of fw_reset structures or events. Remove fw_reset
structures allocation and registration for fw reset events notifications
for these functions.
Having the devlink param enable_remote_dev_reset on functions that don't
have this capability is misleading as these functions are not allowed to
influence the reset flow. Hence, this patch removes this parameter for
such functions.
In addition, return not supported on devlink reload action fw_activate
for these functions.
Fixes: 38b9f903f22b ("net/mlx5: Handle sync reset request event")
Signed-off-by: Moshe Shemesh <moshe@nvidia.com>
Reviewed-by: Aya Levin <ayal@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
|
The __is_constexpr() macro is dark magic. Shed some light on it with
a comment to explain how and why it works.
Acked-by: Gustavo A. R. Silva <gustavoars@kernel.org>
Reviewed-by: Jani Nikula <jani.nikula@intel.com>
Link: https://lore.kernel.org/r/20240301044428.work.411-kees@kernel.org
Signed-off-by: Kees Cook <keescook@chromium.org>
|
|
A common use of type_max() is to find the max for the type of a
variable. Using the pattern type_max(typeof(var)) is needlessly
verbose. Instead, since typeof(type) == type we can just explicitly
call typeof() on the argument to type_max() and type_min(). Add
wrappers for readability.
We can do some replacements right away:
$ git grep '\btype_\(min\|max\)(typeof' | wc -l
11
Link: https://lore.kernel.org/r/20240301062221.work.840-kees@kernel.org
Signed-off-by: Kees Cook <keescook@chromium.org>
|
|
Since commit 43a7206b0963 ("driver core: class: make class_register() take
a const *"), the driver core allows for struct class to be in read-only
memory, so move the power_supply_class structure to be declared at build
time placing it into read-only memory, instead of having to be dynamically
allocated at boot time.
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Suggested-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Ricardo B. Marliere <ricardo@marliere.net>
Reviewed-by: Linus Walleij <linus.walleij@linaro.org>
Link: https://lore.kernel.org/r/20240301-class_cleanup-power-v1-1-97e0b7bf9c94@marliere.net
Signed-off-by: Sebastian Reichel <sebastian.reichel@collabora.com>
|
|
The x86 architecture has an idle routine for AMD CPUs which are affected
by erratum 400. On the affected CPUs the local APIC timer stops in the
C1E halt state.
It therefore requires tick broadcasting. The invocation of
tick_broadcast_enter()/exit() from this function violates the RCU
constraints because it can end up in lockdep or tracing, which
rightfully triggers a warning.
tick_broadcast_enter()/exit() must be invoked before ct_cpuidle_enter()
and after ct_cpuidle_exit() in default_idle_call().
Add a static branch conditional invocation of tick_broadcast_enter()/exit()
into this function to allow X86 to replace the AMD specific idle code. It's
guarded by a config switch which will be selected by x86. Otherwise it's
a NOOP.
Reported-by: Borislav Petkov <bp@alien8.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/20240229142248.266708822@linutronix.de
|
|
Add a small wrapper around blk_stack_limits that allows passing a bdev
for the bottom device and prints an error in case of misaligned
device. The name fits into the new queue limits API and the intent is
to eventually replace disk_stack_limits.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20240228225653.947152-3-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Add a small wrapper around queue_limits_commit_update for stacking
drivers that don't want to update existing limits, but set an
entirely new set.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20240228225653.947152-2-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Chain RDMA Writes that convey Write chunks onto the local Send
chain. This means all WRs for an RPC Reply are now posted with a
single ib_post_send() call, and there is a single Send completion
when all of these are done. That reduces both the per-transport
doorbell rate and completion rate.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
|
|
Refactor to eventually enable svcrdma to post the Write WRs for each
RPC response using the same ib_post_send() as the Send WR (ie, as a
single WR chain).
svc_rdma_result_payload (originally svc_rdma_read_payload) was added
so that the upper layer XDR encoder could identify a range of bytes
to be possibly conveyed by RDMA (if a Write chunk was provided by
the client).
The purpose of commit f6ad77590a5d ("svcrdma: Post RDMA Writes while
XDR encoding replies") was to post as much of the result payload
outside of svc_rdma_sendto() as possible because svc_rdma_sendto()
used to be called with the xpt_mutex held.
However, since commit ca4faf543a33 ("SUNRPC: Move xpt_mutex into
socket xpo_sendto methods"), the xpt_mutex is no longer held when
calling svc_rdma_sendto(). Thus, that benefit is no longer an issue.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
|
|
Reduce the doorbell and Send completion rates when sending RPC/RDMA
replies that have Reply chunks. NFS READDIR procedures typically
return their result in a Reply chunk, for example.
Instead of calling ib_post_send() to post the Write WRs for the
Reply chunk, and then calling it again to post the Send WR that
conveys the transport header, chain the Write WRs to the Send WR
and call ib_post_send() only once.
Thanks to the Send Queue completion ordering rules, when the Send
WR completes, that guarantees that Write WRs posted before it have
also completed successfully. Thus all Write WRs for the Reply chunk
can remain unsignaled. Instead of handling a Write completion and
then a Send completion, only the Send completion is seen, and it
handles clean up for both the Writes and the Send.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
|
|
Since the RPC transaction's svc_rdma_send_ctxt will stay around for
the duration of the RDMA Write operation, the write_info structure
for the Reply chunk can reside in the request's svc_rdma_send_ctxt
instead of being allocated separately.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
|
|
Eventually I'd like the server to post the reply's Send WR along
with any Write WRs using only a single call to ib_post_send(), in
order to reduce the NIC's doorbell rate.
To do this, add an anchor for a WR chain to svc_rdma_send_ctxt, and
refactor svc_rdma_send() to post this WR chain to the Send Queue. For
the moment, the posted chain will continue to contain a single Send
WR.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
|
|
Now that this isn't used anywhere, remove it.
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
|
|
Since only one service actually reports the rpc stats there's not much
of a reason to have a pointer to it in the svc_program struct. Adjust
the svc_create_pooled function to take the sv_stats as an argument and
pass the struct through there as desired instead of getting it from the
svc_program->pg_stats.
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
|
|
Make pointer to fwnode_handle a pointer to const for code safety.
Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Link: https://lore.kernel.org/r/20240216144027.185959-3-krzysztof.kozlowski@linaro.org
Signed-off-by: Joerg Roedel <jroedel@suse.de>
|
|
The xlate callbacks are supposed to translate of_phandle_args to proper
provider without modifying the of_phandle_args. Make the argument
pointer to const for code safety and readability.
Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Link: https://lore.kernel.org/r/20240216144027.185959-2-krzysztof.kozlowski@linaro.org
Signed-off-by: Joerg Roedel <jroedel@suse.de>
|
|
Moving pidfds from the anonymous inode infrastructure to a separate tiny
in-kernel filesystem similar to sockfs, pipefs, and anon_inodefs causes
selinux denials and thus various userspace components that make heavy
use of pidfds to fail as pidfds used anon_inode_getfile() which aren't
subject to any LSM hooks. But dentry_open() is and that would cause
regressions.
The failures that are seen are selinux denials. But the core failure is
dbus-broker. That cascades into other services failing that depend on
dbus-broker. For example, when dbus-broker fails to start polkit and all
the others won't be able to work because they depend on dbus-broker.
The reason for dbus-broker failing is because it doesn't handle failures
for SO_PEERPIDFD correctly. Last kernel release we introduced
SO_PEERPIDFD (and SCM_PIDFD). SO_PEERPIDFD allows dbus-broker and polkit
and others to receive a pidfd for the peer of an AF_UNIX socket. This is
the first time in the history of Linux that we can safely authenticate
clients in a race-free manner.
dbus-broker immediately made use of this but messed up the error
checking. It only allowed EINVAL as a valid failure for SO_PEERPIDFD.
That's obviously problematic not just because of LSM denials but because
of seccomp denials that would prevent SO_PEERPIDFD from working; or any
other new error code from there.
So this is catching a flawed implementation in dbus-broker as well. It
has to fallback to the old pid-based authentication when SO_PEERPIDFD
doesn't work no matter the reasons otherwise it'll always risk such
failures. So overall that LSM denial should not have caused dbus-broker
to fail. It can never assume that a feature released one kernel ago like
SO_PEERPIDFD can be assumed to be available.
So, the next fix separate from the selinux policy update is to try and
fix dbus-broker at [3]. That should make it into Fedora as well. In
addition the selinux reference policy should also be updated. See [4]
for that. If Selinux is in enforcing mode in userspace and it encounters
anything that it doesn't know about it will deny it by default. And the
policy is entirely in userspace including declaring new types for stuff
like nsfs or pidfs to allow it.
For now we continue to raise S_PRIVATE on the inode if it's a pidfs
inode which means things behave exactly like before.
Link: https://bugzilla.redhat.com/show_bug.cgi?id=2265630
Link: https://github.com/fedora-selinux/selinux-policy/pull/2050
Link: https://github.com/bus1/dbus-broker/pull/343 [3]
Link: https://github.com/SELinuxProject/refpolicy/pull/762 [4]
Reported-by: Nathan Chancellor <nathan@kernel.org>
Link: https://lore.kernel.org/r/20240222190334.GA412503@dev-arch.thelio-3990X
Link: https://lore.kernel.org/r/20240218-neufahrzeuge-brauhaus-fb0eb6459771@brauner
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
Use the newly added path_from_stashed() helper for nsfs.
Link: https://lore.kernel.org/r/20240218-neufahrzeuge-brauhaus-fb0eb6459771@brauner
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
This moves pidfds from the anonymous inode infrastructure to a tiny
pseudo filesystem. This has been on my todo for quite a while as it will
unblock further work that we weren't able to do simply because of the
very justified limitations of anonymous inodes. Moving pidfds to a tiny
pseudo filesystem allows:
* statx() on pidfds becomes useful for the first time.
* pidfds can be compared simply via statx() and then comparing inode
numbers.
* pidfds have unique inode numbers for the system lifetime.
* struct pid is now stashed in inode->i_private instead of
file->private_data. This means it is now possible to introduce
concepts that operate on a process once all file descriptors have been
closed. A concrete example is kill-on-last-close.
* file->private_data is freed up for per-file options for pidfds.
* Each struct pid will refer to a different inode but the same struct
pid will refer to the same inode if it's opened multiple times. In
contrast to now where each struct pid refers to the same inode. Even
if we were to move to anon_inode_create_getfile() which creates new
inodes we'd still be associating the same struct pid with multiple
different inodes.
The tiny pseudo filesystem is not visible anywhere in userspace exactly
like e.g., pipefs and sockfs. There's no lookup, there's no complex
inode operations, nothing. Dentries and inodes are always deleted when
the last pidfd is closed.
We allocate a new inode for each struct pid and we reuse that inode for
all pidfds. We use iget_locked() to find that inode again based on the
inode number which isn't recycled. We allocate a new dentry for each
pidfd that uses the same inode. That is similar to anonymous inodes
which reuse the same inode for thousands of dentries. For pidfds we're
talking way less than that. There usually won't be a lot of concurrent
openers of the same struct pid. They can probably often be counted on
two hands. I know that systemd does use separate pidfd for the same
struct pid for various complex process tracking issues. So I think with
that things actually become way simpler. Especially because we don't
have to care about lookup. Dentries and inodes continue to be always
deleted.
The code is entirely optional and fairly small. If it's not selected we
fallback to anonymous inodes. Heavily inspired by nsfs which uses a
similar stashing mechanism just for namespaces.
Link: https://lore.kernel.org/r/20240213-vfs-pidfd_fs-v1-2-f863f58cfce1@kernel.org
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
It is now possible to disable BQL, but that causes the cpsw driver to break:
drivers/net/ethernet/ti/am65-cpsw-nuss.c:297:28: error: no member named 'dql' in 'struct netdev_queue'
297 | dql_avail(&netif_txq->dql),
There is already a helper function in net/sch_generic.h that could
be used to help here. Move its implementation into the common
linux/netdevice.h along with the other bql interfaces and change
both users over to the new interface.
Fixes: ea7f3cfaa588 ("net: bql: allow the config to be disabled")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
There is no need to wrap calls to the no_printk() helper inside an
always-false check, as no_printk() already does that internally.
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
IPv6 TX and RX fast path use the following fields:
- disable_ipv6
- hop_limit
- mtu6
- forwarding
- disable_policy
- proxy_ndp
Place them in a group to increase data locality.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The VMBUS_RING_SIZE macro adds space for a ring buffer header to the
requested ring buffer size. The header size is always 1 page, and so
its size varies based on the PAGE_SIZE for which the kernel is built.
If the requested ring buffer size is a large power-of-2 size and the header
size is small, the resulting size is inefficient in its use of memory.
For example, a 512 Kbyte ring buffer with a 4 Kbyte page size results in
a 516 Kbyte allocation, which is rounded to up 1 Mbyte by the memory
allocator, and wastes 508 Kbytes of memory.
In such situations, the exact size of the ring buffer isn't that important,
and it's OK to allocate the 4 Kbyte header at the beginning of the 512
Kbytes, leaving the ring buffer itself with just 508 Kbytes. The memory
allocation can be 512 Kbytes instead of 1 Mbyte and nothing is wasted.
Update VMBUS_RING_SIZE to implement this approach for "large" ring buffer
sizes. "Large" is somewhat arbitrarily defined as 8 times the size of
the ring buffer header (which is of size PAGE_SIZE). For example, for
4 Kbyte PAGE_SIZE, ring buffers of 32 Kbytes and larger use the first
4 Kbytes as the ring buffer header. For 64 Kbyte PAGE_SIZE, ring buffers
of 512 Kbytes and larger use the first 64 Kbytes as the ring buffer
header. In both cases, smaller sizes add space for the header so
the ring size isn't reduced too much by using part of the space for
the header. For example, with a 64 Kbyte page size, we don't want
a 128 Kbyte ring buffer to be reduced to 64 Kbytes by allocating half
of the space for the header. In such a case, the memory allocation
is less efficient, but it's the best that can be done.
While the new algorithm slightly changes the amount of space allocated
for ring buffers by drivers that use VMBUS_RING_SIZE, the devices aren't
known to be sensitive to small changes in ring buffer size, so there
shouldn't be any effect.
Fixes: c1135c7fd0e9 ("Drivers: hv: vmbus: Introduce types of GPADL")
Fixes: 6941f67ad37d ("hv_netvsc: Calculate correct ring size when PAGE_SIZE is not 4 Kbytes")
Closes: https://bugzilla.kernel.org/show_bug.cgi?id=218502
Cc: stable@vger.kernel.org
Signed-off-by: Michael Kelley <mhklinux@outlook.com>
Reviewed-by: Saurabh Sengar <ssengar@linux.microsoft.com>
Reviewed-by: Dexuan Cui <decui@microsoft.com>
Tested-by: Souradeep Chakrabarti <schakrabarti@linux.microsoft.com>
Link: https://lore.kernel.org/r/20240229004533.313662-1-mhklinux@outlook.com
Signed-off-by: Wei Liu <wei.liu@kernel.org>
Message-ID: <20240229004533.313662-1-mhklinux@outlook.com>
|
|
The new flags parameter allows controlling
- Whether or not the units suffix is separated by a space, for
compatibility with sort -h
- Whether or not to append a B suffix - we're not always printing
bytes.
Co-developed-by: Kent Overstreet <kent.overstreet@linux.dev>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Reviewed-by: Kent Overstreet <kent.overstreet@linux.dev>
Link: https://lore.kernel.org/r/20240229205345.93902-1-andriy.shevchenko@linux.intel.com
Signed-off-by: Kees Cook <keescook@chromium.org>
|
|
Cross-merge networking fixes after downstream PR.
Conflicts:
net/mptcp/protocol.c
adf1bb78dab5 ("mptcp: fix snd_wnd initialization for passive socket")
9426ce476a70 ("mptcp: annotate lockless access for RX path fields")
https://lore.kernel.org/all/20240228103048.19255709@canb.auug.org.au/
Adjacent changes:
drivers/dpll/dpll_core.c
0d60d8df6f49 ("dpll: rely on rcu for netdev_dpll_pin()")
e7f8df0e81bf ("dpll: move xa_erase() call in to match dpll_pin_alloc() error path order")
drivers/net/veth.c
1ce7d306ea63 ("veth: try harder when allocating queue memory")
0bef512012b1 ("net: add netdev_lockdep_set_classes() to virtual drivers")
drivers/net/wireless/intel/iwlwifi/mvm/d3.c
8c9bef26e98b ("wifi: iwlwifi: mvm: d3: implement suspend with MLO")
78f65fbf421a ("wifi: iwlwifi: mvm: ensure offloading TID queue exists")
net/wireless/nl80211.c
f78c1375339a ("wifi: nl80211: reject iftype change with mesh ID change")
414532d8aa89 ("wifi: cfg80211: use IEEE80211_MAX_MESH_ID_LEN appropriately")
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Boqun pointed out that workqueues aren't handling BH work items on offlined
CPUs. Unlike tasklet which transfers out the pending tasks from
CPUHP_SOFTIRQ_DEAD, BH workqueue would just leave them pending which is
problematic. Note that this behavior is specific to BH workqueues as the
non-BH per-CPU workers just become unbound when the CPU goes offline.
This patch fixes the issue by draining the pending BH work items from an
offlined CPU from CPUHP_SOFTIRQ_DEAD. Because work items carry more context,
it's not as easy to transfer the pending work items from one pool to
another. Instead, run BH work items which execute the offlined pools on an
online CPU.
Note that this assumes that no further BH work items will be queued on the
offlined CPUs. This assumption is shared with tasklet and should be fine for
conversions. However, this issue also exists for per-CPU workqueues which
will just keep executing work items queued after CPU offline on unbound
workers and workqueue should reject per-CPU and BH work items queued on
offline CPUs. This will be addressed separately later.
Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-and-reviewed-by: Boqun Feng <boqun.feng@gmail.com>
Link: http://lkml.kernel.org/r/Zdvw0HdSXcU3JZ4g@boqun-archlinux
|
|
The check_shl_overflow() uses u64 type that is defined in types.h.
Instead of including that header, just switch to use POD type
directly.
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Acked-by: Kees Cook <keescook@chromium.org>
Link: https://lore.kernel.org/r/20240228204919.3680786-2-andriy.shevchenko@linux.intel.com
Signed-off-by: Kees Cook <keescook@chromium.org>
|
|
The lib/cmdline.c is basically a set of some small string parsers
which are wide used in the kernel. Their prototypes belong to the
string.h rather then kernel.h.
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Link: https://lore.kernel.org/r/20231003130142.2936503-1-andriy.shevchenko@linux.intel.com
Signed-off-by: Kees Cook <keescook@chromium.org>
|
|
Improve the reporting of buffer overflows under CONFIG_FORTIFY_SOURCE to
help accelerate debugging efforts. The calculations are all just sitting
in registers anyway, so pass them along to the function to be reported.
For example, before:
detected buffer overflow in memcpy
and after:
memcpy: detected buffer overflow: 4096 byte read of buffer size 1
Link: https://lore.kernel.org/r/20230407192717.636137-10-keescook@chromium.org
Signed-off-by: Kees Cook <keescook@chromium.org>
|
|
The standard C string APIs were not designed to have a failure mode;
they were expected to always succeed without memory safety issues.
Normally, CONFIG_FORTIFY_SOURCE will use fortify_panic() to stop
processing, as truncating a read or write may provide an even worse
system state. However, this creates a problem for testing under things
like KUnit, which needs a way to survive failures.
When building with CONFIG_KUNIT, provide a failure path for all users
of fortify_panic, and track whether the failure was a read overflow or
a write overflow, for KUnit tests to examine. Inspired by similar logic
in the slab tests.
Signed-off-by: Kees Cook <keescook@chromium.org>
|
|
In preparation for KUnit testing and further improvements in fortify
failure reporting, split out the report and encode the function and access
failure (read or write overflow) into a single u8 argument. This mainly
ends up saving a tiny bit of space in the data segment. For a defconfig
with FORTIFY_SOURCE enabled:
$ size gcc/vmlinux.before gcc/vmlinux.after
text data bss dec hex filename
26132309 9760658 2195460 38088427 2452eeb gcc/vmlinux.before
26132386 9748382 2195460 38076228 244ff44 gcc/vmlinux.after
Reviewed-by: Alexander Lobakin <aleksander.lobakin@intel.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
|
|
Mark the various refcount_t functions with __signed_wrap, as we depend
on the wrapping behavior to detect the overflow and perform saturation.
Silences warnings seen with the LKDTM REFCOUNT_* tests:
UBSAN: signed-integer-overflow in ../include/linux/refcount.h:189:11
2147483647 + 1 cannot be represented in type 'int'
Reviewed-by: Miguel Ojeda <ojeda@kernel.org>
Link: https://lore.kernel.org/r/20240221051634.work.287-kees@kernel.org
Signed-off-by: Kees Cook <keescook@chromium.org>
|
|
Add str_plural() helper to replace existing open implementations
used by many drivers and help improve future user facing messages.
Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Link: https://lore.kernel.org/r/20240214165015.1656-1-michal.wajdeczko@intel.com
Signed-off-by: Kees Cook <keescook@chromium.org>
|
|
This allows replacements of the idioms "var += offset" and "var -=
offset" with the wrapping_assign_add() and wrapping_assign_sub() helpers
respectively. They will avoid wrap-around sanitizer instrumentation.
Add to the selftests to validate behavior and lack of side-effects.
Reviewed-by: Marco Elver <elver@google.com>
Acked-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
|
|
Provide helpers that will perform wrapping addition, subtraction, or
multiplication without tripping the arithmetic wrap-around sanitizers. The
first argument is the type under which the wrap-around should happen
with. In other words, these two calls will get very different results:
wrapping_mul(int, 50, 50) == 2500
wrapping_mul(u8, 50, 50) == 196
Add to the selftests to validate behavior and lack of side-effects.
Reviewed-by: Gustavo A. R. Silva <gustavoars@kernel.org>
Reviewed-by: Marco Elver <elver@google.com>
Acked-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
|
|
The check_*_overflow() helpers will return results with potentially
wrapped-around values. These values have always been checked by the
selftests, so avoid the confusing language in the kern-doc. The idea of
"safe for use" was relative to the expectation of whether or not the
caller wants a wrapped value -- the calculation itself will always follow
arithmetic wrapping rules.
Reviewed-by: Gustavo A. R. Silva <gustavoars@kernel.org>
Acked-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
|
|
The wordpart.h header is collecting APIs related to the handling
parts of the word (usually in byte granularity). The upper_*_bits()
and lower_*_bits() are good candidates to be moved to there.
This helps to clean up header dependency hell with regard to kernel.h
as the latter gathers completely unrelated stuff together and slows
down compilation (especially when it's included into other header).
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Link: https://lore.kernel.org/r/20240214172752.3605073-1-andriy.shevchenko@linux.intel.com
Reviewed-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Kees Cook <keescook@chromium.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Pull networking fixes from Jakub Kicinski:
"Including fixes from bluetooth, WiFi and netfilter.
We have one outstanding issue with the stmmac driver, which may be a
LOCKDEP false positive, not a blocker.
Current release - regressions:
- netfilter: nf_tables: re-allow NFPROTO_INET in
nft_(match/target)_validate()
- eth: ionic: fix error handling in PCI reset code
Current release - new code bugs:
- eth: stmmac: complete meta data only when enabled, fix null-deref
- kunit: fix again checksum tests on big endian CPUs
Previous releases - regressions:
- veth: try harder when allocating queue memory
- Bluetooth:
- hci_bcm4377: do not mark valid bd_addr as invalid
- hci_event: fix handling of HCI_EV_IO_CAPA_REQUEST
Previous releases - always broken:
- info leak in __skb_datagram_iter() on netlink socket
- mptcp:
- map v4 address to v6 when destroying subflow
- fix potential wake-up event loss due to sndbuf auto-tuning
- fix double-free on socket dismantle
- wifi: nl80211: reject iftype change with mesh ID change
- fix small out-of-bound read when validating netlink be16/32 types
- rtnetlink: fix error logic of IFLA_BRIDGE_FLAGS writing back
- ipv6: fix potential "struct net" ref-leak in inet6_rtm_getaddr()
- ip_tunnel: prevent perpetual headroom growth with huge number of
tunnels on top of each other
- mctp: fix skb leaks on error paths of mctp_local_output()
- eth: ice: fixes for DPLL state reporting
- dpll: rely on rcu for netdev_dpll_pin() to prevent UaF
- eth: dpaa: accept phy-interface-type = '10gbase-r' in the device
tree"
* tag 'net-6.8-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (73 commits)
dpll: fix build failure due to rcu_dereference_check() on unknown type
kunit: Fix again checksum tests on big endian CPUs
tls: fix use-after-free on failed backlog decryption
tls: separate no-async decryption request handling from async
tls: fix peeking with sync+async decryption
tls: decrement decrypt_pending if no async completion will be called
gtp: fix use-after-free and null-ptr-deref in gtp_newlink()
net: hsr: Use correct offset for HSR TLV values in supervisory HSR frames
igb: extend PTP timestamp adjustments to i211
rtnetlink: fix error logic of IFLA_BRIDGE_FLAGS writing back
tools: ynl: fix handling of multiple mcast groups
selftests: netfilter: add bridge conntrack + multicast test case
netfilter: bridge: confirm multicast packets before passing them up the stack
netfilter: nf_tables: allow NFPROTO_INET in nft_(match/target)_validate()
Bluetooth: qca: Fix triggering coredump implementation
Bluetooth: hci_qca: Set BDA quirk bit if fwnode exists in DT
Bluetooth: qca: Fix wrong event type for patch config command
Bluetooth: Enforce validation on max value of connection interval
Bluetooth: hci_event: Fix handling of HCI_EV_IO_CAPA_REQUEST
Bluetooth: mgmt: Fix limited discoverable off timeout
...
|
|
The SLAB allocator has been removed sine 6.8-rc1 [1], so there is no user
with SLAB_MEM_SPREAD and cpuset_do_slab_mem_spread(). Then SLAB_MEM_SPREAD
is marked as unused by [2]. Here we can remove
cpuset_do_slab_mem_spread(). For more details, please check [3].
[1] https://lore.kernel.org/linux-mm/20231120-slab-remove-slab-v2-0-9c9c70177183@suse.cz/
[2] https://lore.kernel.org/linux-kernel/20240223-slab-cleanup-flags-v2-0-02f1753e8303@suse.cz/T/
[3] https://lore.kernel.org/lkml/32bc1403-49da-445a-8c00-9686a3b0d6a3@redhat.com/T/#mf14b838c5e0e77f4756d436bac3d8c0447ea4350
Signed-off-by: Xiongwei Song <xiongwei.song@windriver.com>
Reviewed-by: Waiman Long <longman@redhat.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
|
|
Tasmiya reports that their compiler complains that we deref
a pointer to unknown type with rcu_dereference_rtnl():
include/linux/rcupdate.h:439:9: error: dereferencing pointer to incomplete type ‘struct dpll_pin’
Unclear what compiler it is, at the moment, and we can't report
but since DPLL can't be a module - move the code from the header
into the source file.
Fixes: 0d60d8df6f49 ("dpll: rely on rcu for netdev_dpll_pin()")
Reported-by: Tasmiya Nalatwad <tasmiya@linux.vnet.ibm.com>
Link: https://lore.kernel.org/all/3fcf3a2c-1c1b-42c1-bacb-78fdcd700389@linux.vnet.ibm.com/
Signed-off-by: Eric Dumazet <edumazet@google.com>
Link: https://lore.kernel.org/r/20240229190515.2740221-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Resolving a frequency to an efficient one should not transgress
policy->max (which can be set for thermal reason) and policy->min.
Currently, there is possibility where scaling_cur_freq can exceed
scaling_max_freq when scaling_max_freq is an inefficient frequency.
Add a check to ensure that resolving a frequency will respect
policy->min/max.
Cc: All applicable <stable@vger.kernel.org>
Fixes: 1f39fa0dccff ("cpufreq: Introducing CPUFREQ_RELATION_E")
Signed-off-by: Shivnandan Kumar <quic_kshivnan@quicinc.com>
[ rjw: Whitespace adjustment, changelog edits ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
|
|
There is low probability that an out-of-bounds segment will be got
on a small-capacity device. In order to prevent subsequent write requests
allocating block address from this invalid segment, which may cause
unexpected issue, stop checkpoint should be performed.
Also introduce a new stop cp reason: STOP_CP_REASON_NO_SEGMENT.
Note, f2fs_stop_checkpoint(, false) is complex and it may sleep, so we should
move it outside segmap_lock spinlock coverage in get_new_segment().
Signed-off-by: Zhiguo Niu <zhiguo.niu@unisoc.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf
Pablo Neira Ayuso says:
====================
Netfilter fixes for net
Patch #1 restores NFPROTO_INET with nft_compat, from Ignat Korchagin.
Patch #2 fixes an issue with bridge netfilter and broadcast/multicast
packets.
There is a day 0 bug in br_netfilter when used with connection tracking.
Conntrack assumes that an nf_conn structure that is not yet added to
hash table ("unconfirmed"), is only visible by the current cpu that is
processing the sk_buff.
For bridge this isn't true, sk_buff can get cloned in between, and
clones can be processed in parallel on different cpu.
This patch disables NAT and conntrack helpers for multicast packets.
Patch #3 adds a selftest to cover for the br_netfilter bug.
netfilter pull request 24-02-29
* tag 'nf-24-02-29' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf:
selftests: netfilter: add bridge conntrack + multicast test case
netfilter: bridge: confirm multicast packets before passing them up the stack
netfilter: nf_tables: allow NFPROTO_INET in nft_(match/target)_validate()
====================
Link: https://lore.kernel.org/r/20240229000135.8780-1-pablo@netfilter.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
We create a custom compatible for the STA2X11 IP block as integrated
into the Mobileye EyeQ5 platform. Its wake and alternate functions have
been disabled, we want to avoid touching those registers.
We both do: (1) early return in functions that do not support the
platform, but with warnings, and (2) avoid calling those functions in
the first place.
We ensure that pinctrl-nomadik is not used with this STA2X11 variant.
Reviewed-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: Théo Lebrun <theo.lebrun@bootlin.com>
Link: https://lore.kernel.org/r/20240228-mbly-gpio-v2-24-3ba757474006@bootlin.com
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
|