summaryrefslogtreecommitdiff
path: root/include
AgeCommit message (Collapse)Author
2024-03-01block: add a queue_limits_stack_bdev helperChristoph Hellwig
Add a small wrapper around blk_stack_limits that allows passing a bdev for the bottom device and prints an error in case of misaligned device. The name fits into the new queue limits API and the intent is to eventually replace disk_stack_limits. Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20240228225653.947152-3-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-03-01block: add a queue_limits_set helperChristoph Hellwig
Add a small wrapper around queue_limits_commit_update for stacking drivers that don't want to update existing limits, but set an entirely new set. Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20240228225653.947152-2-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-03-01svcrdma: Add Write chunk WRs to the RPC's Send WR chainChuck Lever
Chain RDMA Writes that convey Write chunks onto the local Send chain. This means all WRs for an RPC Reply are now posted with a single ib_post_send() call, and there is a single Send completion when all of these are done. That reduces both the per-transport doorbell rate and completion rate. Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-03-01svcrdma: Post WRs for Write chunks in svc_rdma_sendto()Chuck Lever
Refactor to eventually enable svcrdma to post the Write WRs for each RPC response using the same ib_post_send() as the Send WR (ie, as a single WR chain). svc_rdma_result_payload (originally svc_rdma_read_payload) was added so that the upper layer XDR encoder could identify a range of bytes to be possibly conveyed by RDMA (if a Write chunk was provided by the client). The purpose of commit f6ad77590a5d ("svcrdma: Post RDMA Writes while XDR encoding replies") was to post as much of the result payload outside of svc_rdma_sendto() as possible because svc_rdma_sendto() used to be called with the xpt_mutex held. However, since commit ca4faf543a33 ("SUNRPC: Move xpt_mutex into socket xpo_sendto methods"), the xpt_mutex is no longer held when calling svc_rdma_sendto(). Thus, that benefit is no longer an issue. Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-03-01svcrdma: Post the Reply chunk and Send WR togetherChuck Lever
Reduce the doorbell and Send completion rates when sending RPC/RDMA replies that have Reply chunks. NFS READDIR procedures typically return their result in a Reply chunk, for example. Instead of calling ib_post_send() to post the Write WRs for the Reply chunk, and then calling it again to post the Send WR that conveys the transport header, chain the Write WRs to the Send WR and call ib_post_send() only once. Thanks to the Send Queue completion ordering rules, when the Send WR completes, that guarantees that Write WRs posted before it have also completed successfully. Thus all Write WRs for the Reply chunk can remain unsignaled. Instead of handling a Write completion and then a Send completion, only the Send completion is seen, and it handles clean up for both the Writes and the Send. Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-03-01svcrdma: Move write_info for Reply chunks into struct svc_rdma_send_ctxtChuck Lever
Since the RPC transaction's svc_rdma_send_ctxt will stay around for the duration of the RDMA Write operation, the write_info structure for the Reply chunk can reside in the request's svc_rdma_send_ctxt instead of being allocated separately. Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-03-01svcrdma: Post Send WR chainChuck Lever
Eventually I'd like the server to post the reply's Send WR along with any Write WRs using only a single call to ib_post_send(), in order to reduce the NIC's doorbell rate. To do this, add an anchor for a WR chain to svc_rdma_send_ctxt, and refactor svc_rdma_send() to post this WR chain to the Send Queue. For the moment, the posted chain will continue to contain a single Send WR. Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-03-01NFSD: Add callback operation lifetime trace pointsChuck Lever
Help observe the flow of callback operations. bc_shutdown() records exactly when the backchannel RPC client is destroyed and cl_cb_client is replaced with NULL. Examples include: nfsd-955 [004] 650.013997: nfsd_cb_queue: addr=192.168.122.6:0 client 65b3c5b8:f541f749 cb=0xffff8881134b02f8 (first try) kworker/u21:4-497 [004] 650.014050: nfsd_cb_seq_status: task:00000001@00000001 sessionid=65b3c5b8:f541f749:00000001:00000000 tk_status=-107 seq_status=1 kworker/u21:4-497 [004] 650.014051: nfsd_cb_restart: addr=192.168.122.6:0 client 65b3c5b8:f541f749 cb=0xffff88810e39f400 (first try) kworker/u21:4-497 [004] 650.014066: nfsd_cb_queue: addr=192.168.122.6:0 client 65b3c5b8:f541f749 cb=0xffff88810e39f400 (need restart) kworker/u16:0-10 [006] 650.065750: nfsd_cb_start: addr=192.168.122.6:0 client 65b3c5b8:f541f749 state=UNKNOWN kworker/u16:0-10 [006] 650.065752: nfsd_cb_bc_update: addr=192.168.122.6:0 client 65b3c5b8:f541f749 cb=0xffff8881134b02f8 (first try) kworker/u16:0-10 [006] 650.065754: nfsd_cb_bc_shutdown: addr=192.168.122.6:0 client 65b3c5b8:f541f749 cb=0xffff8881134b02f8 (first try) kworker/u16:0-10 [006] 650.065810: nfsd_cb_new_state: addr=192.168.122.6:0 client 65b3c5b8:f541f749 state=DOWN Reviewed-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: Benjamin Coddington <bcodding@redhat.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-03-01sunrpc: remove ->pg_stats from svc_programJosef Bacik
Now that this isn't used anywhere, remove it. Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-03-01sunrpc: pass in the sv_stats struct through svc_create_pooledJosef Bacik
Since only one service actually reports the rpc stats there's not much of a reason to have a pointer to it in the svc_program struct. Adjust the svc_create_pooled function to take the sv_stats as an argument and pass the struct through there as desired instead of getting it from the svc_program->pg_stats. Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-03-01iommu: constify fwnode in iommu_ops_from_fwnode()Krzysztof Kozlowski
Make pointer to fwnode_handle a pointer to const for code safety. Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Link: https://lore.kernel.org/r/20240216144027.185959-3-krzysztof.kozlowski@linaro.org Signed-off-by: Joerg Roedel <jroedel@suse.de>
2024-03-01iommu: constify of_phandle_args in xlateKrzysztof Kozlowski
The xlate callbacks are supposed to translate of_phandle_args to proper provider without modifying the of_phandle_args. Make the argument pointer to const for code safety and readability. Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Link: https://lore.kernel.org/r/20240216144027.185959-2-krzysztof.kozlowski@linaro.org Signed-off-by: Joerg Roedel <jroedel@suse.de>
2024-03-01pidfs: convert to path_from_stashed() helperChristian Brauner
Moving pidfds from the anonymous inode infrastructure to a separate tiny in-kernel filesystem similar to sockfs, pipefs, and anon_inodefs causes selinux denials and thus various userspace components that make heavy use of pidfds to fail as pidfds used anon_inode_getfile() which aren't subject to any LSM hooks. But dentry_open() is and that would cause regressions. The failures that are seen are selinux denials. But the core failure is dbus-broker. That cascades into other services failing that depend on dbus-broker. For example, when dbus-broker fails to start polkit and all the others won't be able to work because they depend on dbus-broker. The reason for dbus-broker failing is because it doesn't handle failures for SO_PEERPIDFD correctly. Last kernel release we introduced SO_PEERPIDFD (and SCM_PIDFD). SO_PEERPIDFD allows dbus-broker and polkit and others to receive a pidfd for the peer of an AF_UNIX socket. This is the first time in the history of Linux that we can safely authenticate clients in a race-free manner. dbus-broker immediately made use of this but messed up the error checking. It only allowed EINVAL as a valid failure for SO_PEERPIDFD. That's obviously problematic not just because of LSM denials but because of seccomp denials that would prevent SO_PEERPIDFD from working; or any other new error code from there. So this is catching a flawed implementation in dbus-broker as well. It has to fallback to the old pid-based authentication when SO_PEERPIDFD doesn't work no matter the reasons otherwise it'll always risk such failures. So overall that LSM denial should not have caused dbus-broker to fail. It can never assume that a feature released one kernel ago like SO_PEERPIDFD can be assumed to be available. So, the next fix separate from the selinux policy update is to try and fix dbus-broker at [3]. That should make it into Fedora as well. In addition the selinux reference policy should also be updated. See [4] for that. If Selinux is in enforcing mode in userspace and it encounters anything that it doesn't know about it will deny it by default. And the policy is entirely in userspace including declaring new types for stuff like nsfs or pidfs to allow it. For now we continue to raise S_PRIVATE on the inode if it's a pidfs inode which means things behave exactly like before. Link: https://bugzilla.redhat.com/show_bug.cgi?id=2265630 Link: https://github.com/fedora-selinux/selinux-policy/pull/2050 Link: https://github.com/bus1/dbus-broker/pull/343 [3] Link: https://github.com/SELinuxProject/refpolicy/pull/762 [4] Reported-by: Nathan Chancellor <nathan@kernel.org> Link: https://lore.kernel.org/r/20240222190334.GA412503@dev-arch.thelio-3990X Link: https://lore.kernel.org/r/20240218-neufahrzeuge-brauhaus-fb0eb6459771@brauner Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-03-01nsfs: convert to path_from_stashed() helperChristian Brauner
Use the newly added path_from_stashed() helper for nsfs. Link: https://lore.kernel.org/r/20240218-neufahrzeuge-brauhaus-fb0eb6459771@brauner Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-03-01pidfd: add pidfsChristian Brauner
This moves pidfds from the anonymous inode infrastructure to a tiny pseudo filesystem. This has been on my todo for quite a while as it will unblock further work that we weren't able to do simply because of the very justified limitations of anonymous inodes. Moving pidfds to a tiny pseudo filesystem allows: * statx() on pidfds becomes useful for the first time. * pidfds can be compared simply via statx() and then comparing inode numbers. * pidfds have unique inode numbers for the system lifetime. * struct pid is now stashed in inode->i_private instead of file->private_data. This means it is now possible to introduce concepts that operate on a process once all file descriptors have been closed. A concrete example is kill-on-last-close. * file->private_data is freed up for per-file options for pidfds. * Each struct pid will refer to a different inode but the same struct pid will refer to the same inode if it's opened multiple times. In contrast to now where each struct pid refers to the same inode. Even if we were to move to anon_inode_create_getfile() which creates new inodes we'd still be associating the same struct pid with multiple different inodes. The tiny pseudo filesystem is not visible anywhere in userspace exactly like e.g., pipefs and sockfs. There's no lookup, there's no complex inode operations, nothing. Dentries and inodes are always deleted when the last pidfd is closed. We allocate a new inode for each struct pid and we reuse that inode for all pidfds. We use iget_locked() to find that inode again based on the inode number which isn't recycled. We allocate a new dentry for each pidfd that uses the same inode. That is similar to anonymous inodes which reuse the same inode for thousands of dentries. For pidfds we're talking way less than that. There usually won't be a lot of concurrent openers of the same struct pid. They can probably often be counted on two hands. I know that systemd does use separate pidfd for the same struct pid for various complex process tracking issues. So I think with that things actually become way simpler. Especially because we don't have to care about lookup. Dentries and inodes continue to be always deleted. The code is entirely optional and fairly small. If it's not selected we fallback to anonymous inodes. Heavily inspired by nsfs which uses a similar stashing mechanism just for namespaces. Link: https://lore.kernel.org/r/20240213-vfs-pidfd_fs-v1-2-f863f58cfce1@kernel.org Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-03-01crypto: remove CONFIG_CRYPTO_STATSEric Biggers
Remove support for the "Crypto usage statistics" feature (CONFIG_CRYPTO_STATS). This feature does not appear to have ever been used, and it is harmful because it significantly reduces performance and is a large maintenance burden. Covering each of these points in detail: 1. Feature is not being used Since these generic crypto statistics are only readable using netlink, it's fairly straightforward to look for programs that use them. I'm unable to find any evidence that any such programs exist. For example, Debian Code Search returns no hits except the kernel header and kernel code itself and translations of the kernel header: https://codesearch.debian.net/search?q=CRYPTOCFGA_STAT&literal=1&perpkg=1 The patch series that added this feature in 2018 (https://lore.kernel.org/linux-crypto/1537351855-16618-1-git-send-email-clabbe@baylibre.com/) said "The goal is to have an ifconfig for crypto device." This doesn't appear to have happened. It's not clear that there is real demand for crypto statistics. Just because the kernel provides other types of statistics such as I/O and networking statistics and some people find those useful does not mean that crypto statistics are useful too. Further evidence that programs are not using CONFIG_CRYPTO_STATS is that it was able to be disabled in RHEL and Fedora as a bug fix (https://gitlab.com/redhat/centos-stream/src/kernel/centos-stream-9/-/merge_requests/2947). Even further evidence comes from the fact that there are and have been bugs in how the stats work, but they were never reported. For example, before Linux v6.7 hash stats were double-counted in most cases. There has also never been any documentation for this feature, so it might be hard to use even if someone wanted to. 2. CONFIG_CRYPTO_STATS significantly reduces performance Enabling CONFIG_CRYPTO_STATS significantly reduces the performance of the crypto API, even if no program ever retrieves the statistics. This primarily affects systems with large number of CPUs. For example, https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2039576 reported that Lustre client encryption performance improved from 21.7GB/s to 48.2GB/s by disabling CONFIG_CRYPTO_STATS. It can be argued that this means that CONFIG_CRYPTO_STATS should be optimized with per-cpu counters similar to many of the networking counters. But no one has done this in 5+ years. This is consistent with the fact that the feature appears to be unused, so there seems to be little interest in improving it as opposed to just disabling it. It can be argued that because CONFIG_CRYPTO_STATS is off by default, performance doesn't matter. But Linux distros tend to error on the side of enabling options. The option is enabled in Ubuntu and Arch Linux, and until recently was enabled in RHEL and Fedora (see above). So, even just having the option available is harmful to users. 3. CONFIG_CRYPTO_STATS is a large maintenance burden There are over 1000 lines of code associated with CONFIG_CRYPTO_STATS, spread among 32 files. It significantly complicates much of the implementation of the crypto API. After the initial submission, many fixes and refactorings have consumed effort of multiple people to keep this feature "working". We should be spending this effort elsewhere. Cc: Corentin Labbe <clabbe@baylibre.com> Signed-off-by: Eric Biggers <ebiggers@google.com> Acked-by: Ard Biesheuvel <ardb@kernel.org> Acked-by: Corentin Labbe <clabbe@baylibre.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2024-03-01Merge tag 'qcom-arm64-for-6.9' of ↵Arnd Bergmann
https://git.kernel.org/pub/scm/linux/kernel/git/qcom/linux into soc/dt Qualcomm ARM64 DeviceTree updates for v6.9 Four variants of Samsung Galaxy Core Prime and Grand Prime, built on MSM8916, and the Hardware Development Kit (HDK) for SM8550, are introduced. On X Elite audio and compute remoteprocs, IPCC, PCIe, AOSS QMP, SMP2P, TCSR, USB, display, audio, and soundwire support is introduced, and enabled across the CRD and QCP devices. For SM8650 PCIe controllers are moved to GIC-ITS and msi-map-mask is defined. Missing qlink-logging reserved-memory region is added for the modem remoteproc. FastRPC compute contexts are marked dma-coherent. Audio, USB Type-C and PM8010 support is introduced across MTP and QRD devices. GPU cooling devices are hooked up across MSM8916, MSM8939, SC8180X, SDM630, SDM845, SM6115, SM8150, SM8250, SM8350, and SM8550. UFS PHY clocks are corrected across MSM8996, MSM8998, SC8180X, SC8280XP, SDM845, SM6115, SM6125, SM8150, SM8250, SM8350, SM8550, and SM8650. PCI MSI interrupts are wired up across SM8150, SM8250, SM8350, SM8450, SM8550, SM8650, SC7280, and SC8180X On IPQ6018 QUP5 I2C, tsens sand thermal zones are defined. The Inline Crypto Engine (ICE) is enabled for IPQ9574. On MSM8953 the GPU and its IOMMU is introduced, the reset for the display subsystem is also wired up. VLS CLAMP registers are specified for USB3 PHYs on MSM8998, QCM2290, and SM6115. USB Type-C port management is enabled on QRB4210 RB2. On the SA8295P ADP the MAX20411 regulator powering the GPU rails is introduced and the GPU is enabled. The first PCI instance on SA8540P Ride is disabled for now, as a fix for the interrupt storm produced here has not been presented. On SA8775P the firmware memory map has changed and is updated. Safety IRQ is added to the Ethernet controller. On SC7180 UFS support is introduced and the cros-ec-spi is marked as wakeup source. For SC7280 capacity and DPC properties are added, cryptobam definition is improved to work in more firmware environments, more Chrome-specific properties are moved out from main dtsi, and cros-ec-spi is maked as a wakeup source. Slimbus definition is added to the platform. A missing reserved-memory range is added to Fairphone FP5, PMIC GLINK and Venus are enabled. LEDs are introduced and voltage settings corrected on the QCM6490 IDP, and RB3gen2 sees the same voltage changes and GCC protected clocks are introduced to make the board boot properly. RPMh sleep stats and a variety of cleanups and fixes are introduced for SC8180X. On SC8280XP the additional tsens instances are introduced. Camera Subsystem and Camera Control Interface (CCI) are added. PMIC die-temp vadc channels are introduced on the CRD, to allow ADC channels to be tied to the shared PMIC temp-alarms, to actually report temperature. On SDM630 USB QMP PHY support is introduced and enabled on the Inforce IFC6560 board. On the various Sony Xperia XA2 variants WLED is enabled and configured. On SM6350 display subsystem interconnects and tsens-based thermal zones are added. On SM7125 UFS support is added. On Fairphone FP4, on SM7225, display and GPU are enabled, and firmware paths are corrected. SM8150 PCIe controller definitions are corrected. As with SM8650, the SM8550 the fastrpc compute contexts are marked dm-coherent, and PCIe controllers are moved to use GIC-ITS. The UFS controller frequency definition is moved to the generic opp-table. Touchscreen is enabled on the QRD device. As usual, a variety of smaller cleanups and corrections to match DeviceTree bindings and style guidelines are introduced across the various files. * tag 'qcom-arm64-for-6.9' of https://git.kernel.org/pub/scm/linux/kernel/git/qcom/linux: (176 commits) arm64: dts: qcom: sm6115: fix USB PHY configuration arm64: dts: sm8650: Add msi-map-mask for PCIe nodes arm64: dts: qcom: replace underscores in node names dt-bindings: arm: qcom: Add Samsung Galaxy Tab 4 10.1 LTE arm64: dts: qcom: pm4125: define USB-C related blocks arm64: dts: qcom: sa8540p-ride: disable pcie2a node arm64: dts: qcom: sc7280: add slimbus DT node arm64: dts: qcom: sc7280: Add capacity and DPC properties arm64: dts: qcom: pmi632: Add PBS client and use in LPG node arm64: dts: qcom: sm8550: Use GIC-ITS for PCIe0 and PCIe1 arm64: dts: qcom: sm8150: correct PCIe wake-gpios arm64: dts: qcom: sdm845-db845c: correct PCIe wake-gpios arm64: dts: qcom: sm7225-fairphone-fp4: Enable display and GPU arm64: dts: qcom: sm6350: Remove "disabled" state of GMU arm64: dts: qcom: msm8916-samsung-fortuna/rossa: Add fuel gauge arm64: dts: qcom: sm6350: Add interconnect for MDSS arm64: dts: qcom: msm8916-samsung-fortuna/rossa: Add initial device trees arm64: dts: qcom: sm8550: Switch UFS from opp-table-hz to opp-v2 arm64: dts: qcom: sc8180x: describe all PCI MSI interrupts arm64: dts: qcom: minor whitespace cleanup ... Link: https://lore.kernel.org/r/20240225050146.484422-1-andersson@kernel.org Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2024-03-01net: bql: fix building with BQL disabledArnd Bergmann
It is now possible to disable BQL, but that causes the cpsw driver to break: drivers/net/ethernet/ti/am65-cpsw-nuss.c:297:28: error: no member named 'dql' in 'struct netdev_queue' 297 | dql_avail(&netif_txq->dql), There is already a helper function in net/sch_generic.h that could be used to help here. Move its implementation into the common linux/netdevice.h along with the other bql interfaces and change both users over to the new interface. Fixes: ea7f3cfaa588 ("net: bql: allow the config to be disabled") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-03-01Simplify net_dbg_ratelimited() dummyGeert Uytterhoeven
There is no need to wrap calls to the no_printk() helper inside an always-false check, as no_printk() already does that internally. Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-03-01ipv6: annotate data-races around idev->cnf.ignore_routes_with_linkdownEric Dumazet
idev->cnf.ignore_routes_with_linkdown can be used without any locks, add appropriate annotations. Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-03-01ipv6: annotate data-races around cnf.forwardingEric Dumazet
idev->cnf.forwarding and net->ipv6.devconf_all->forwarding might be read locklessly, add appropriate READ_ONCE() and WRITE_ONCE() annotations. Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-03-01ipv6: annotate data-races around cnf.mtu6Eric Dumazet
idev->cnf.mtu6 might be read locklessly, add appropriate READ_ONCE() and WRITE_ONCE() annotations. Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-03-01ipv6: add ipv6_devconf_read_txrx cacheline_groupEric Dumazet
IPv6 TX and RX fast path use the following fields: - disable_ipv6 - hop_limit - mtu6 - forwarding - disable_policy - proxy_ndp Place them in a group to increase data locality. Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-03-01Drivers: hv: vmbus: Calculate ring buffer size for more efficient use of memoryMichael Kelley
The VMBUS_RING_SIZE macro adds space for a ring buffer header to the requested ring buffer size. The header size is always 1 page, and so its size varies based on the PAGE_SIZE for which the kernel is built. If the requested ring buffer size is a large power-of-2 size and the header size is small, the resulting size is inefficient in its use of memory. For example, a 512 Kbyte ring buffer with a 4 Kbyte page size results in a 516 Kbyte allocation, which is rounded to up 1 Mbyte by the memory allocator, and wastes 508 Kbytes of memory. In such situations, the exact size of the ring buffer isn't that important, and it's OK to allocate the 4 Kbyte header at the beginning of the 512 Kbytes, leaving the ring buffer itself with just 508 Kbytes. The memory allocation can be 512 Kbytes instead of 1 Mbyte and nothing is wasted. Update VMBUS_RING_SIZE to implement this approach for "large" ring buffer sizes. "Large" is somewhat arbitrarily defined as 8 times the size of the ring buffer header (which is of size PAGE_SIZE). For example, for 4 Kbyte PAGE_SIZE, ring buffers of 32 Kbytes and larger use the first 4 Kbytes as the ring buffer header. For 64 Kbyte PAGE_SIZE, ring buffers of 512 Kbytes and larger use the first 64 Kbytes as the ring buffer header. In both cases, smaller sizes add space for the header so the ring size isn't reduced too much by using part of the space for the header. For example, with a 64 Kbyte page size, we don't want a 128 Kbyte ring buffer to be reduced to 64 Kbytes by allocating half of the space for the header. In such a case, the memory allocation is less efficient, but it's the best that can be done. While the new algorithm slightly changes the amount of space allocated for ring buffers by drivers that use VMBUS_RING_SIZE, the devices aren't known to be sensitive to small changes in ring buffer size, so there shouldn't be any effect. Fixes: c1135c7fd0e9 ("Drivers: hv: vmbus: Introduce types of GPADL") Fixes: 6941f67ad37d ("hv_netvsc: Calculate correct ring size when PAGE_SIZE is not 4 Kbytes") Closes: https://bugzilla.kernel.org/show_bug.cgi?id=218502 Cc: stable@vger.kernel.org Signed-off-by: Michael Kelley <mhklinux@outlook.com> Reviewed-by: Saurabh Sengar <ssengar@linux.microsoft.com> Reviewed-by: Dexuan Cui <decui@microsoft.com> Tested-by: Souradeep Chakrabarti <schakrabarti@linux.microsoft.com> Link: https://lore.kernel.org/r/20240229004533.313662-1-mhklinux@outlook.com Signed-off-by: Wei Liu <wei.liu@kernel.org> Message-ID: <20240229004533.313662-1-mhklinux@outlook.com>
2024-02-29lib/string_helpers: Add flags param to string_get_size()Andy Shevchenko
The new flags parameter allows controlling - Whether or not the units suffix is separated by a space, for compatibility with sort -h - Whether or not to append a B suffix - we're not always printing bytes. Co-developed-by: Kent Overstreet <kent.overstreet@linux.dev> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev> Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Reviewed-by: Kent Overstreet <kent.overstreet@linux.dev> Link: https://lore.kernel.org/r/20240229205345.93902-1-andriy.shevchenko@linux.intel.com Signed-off-by: Kees Cook <keescook@chromium.org>
2024-03-01Merge tag 'drm-misc-next-2024-02-29' of ↵Dave Airlie
https://anongit.freedesktop.org/git/drm/drm-misc into drm-next drm-misc-next for v6.9: UAPI Changes: Cross-subsystem Changes: backlight: - corgi: include backlight header fbdev: - Cleanup includes in public header file - fbtft: Include backlight header Core Changes: edid: - Remove built-in EDID data dp: - Avoid AUX transfers on powered-down displays - Add VSC SDP helpers modesetting: - Add sanity checks for polling - Cleanups scheduler: - Cleanups tests: - Add helpers for mode-setting tests Driver Changes: i915: - Use shared VSC SDP helper mgag200: - Work around PCI write bursts mxsfb: - Use managed mode config nouveau: - Include backlight header where necessary qiac: - Cleanups sun4: - HDMI: updates to atomic mode setting tegra: - Fix GEM refounting in error paths tidss: - Fix multi display - Fix initial Z position v3d: - Support display MMU page size Signed-off-by: Dave Airlie <airlied@redhat.com> From: Thomas Zimmermann <tzimmermann@suse.de> Link: https://patchwork.freedesktop.org/patch/msgid/20240229084806.GA21616@localhost.localdomain
2024-03-01Merge tag 'drm-xe-fixes-2024-02-29' of ↵Dave Airlie
https://gitlab.freedesktop.org/drm/xe/kernel into drm-fixes UAPI Changes: - A couple of tracepoint updates from Priyanka and Lucas. - Make sure BINDs are completed before accepting UNBINDs on LR vms. - Don't arbitrarily restrict max number of batched binds. - Add uapi for dumpable bos (agreed on IRC). - Remove unused uapi flags and a leftover comment. Driver Changes: - A couple of fixes related to the execlist backend. Signed-off-by: Dave Airlie <airlied@redhat.com> From: Thomas Hellstrom <thomas.hellstrom@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/ZeCBg4MA2hd1oggN@fedora
2024-03-01Merge tag 'drm-misc-fixes-2024-02-29' of ↵Dave Airlie
https://anongit.freedesktop.org/git/drm/drm-misc into drm-fixes A reset fix for host1x, a resource leak fix and a probe fix for aux-hpd, a use-after-free fix and a boot fix for a pmic_glink qcom driver in drivers/soc, a fix for the simpledrm/tegra transition, a kunit fix for the TTM tests, a font handling fix for fbcon, two allocation fixes and a kunit test to cover them for drm/buddy Signed-off-by: Dave Airlie <airlied@redhat.com> From: Maxime Ripard <mripard@redhat.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240229-angelic-adorable-teal-fbfabb@houat
2024-03-01Merge tag 'drm-intel-gt-next-2024-02-28' of ↵Dave Airlie
git://anongit.freedesktop.org/drm/drm-intel into drm-next Driver Changes: Fixes: - Add some boring kerneldoc (Tvrtko Ursulin) - Check before removing mm notifier (Nirmoy Signed-off-by: Dave Airlie <airlied@redhat.com> From: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/Zd889Wvu/ZKZSK4/@tursulin-desk
2024-02-29Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski
Cross-merge networking fixes after downstream PR. Conflicts: net/mptcp/protocol.c adf1bb78dab5 ("mptcp: fix snd_wnd initialization for passive socket") 9426ce476a70 ("mptcp: annotate lockless access for RX path fields") https://lore.kernel.org/all/20240228103048.19255709@canb.auug.org.au/ Adjacent changes: drivers/dpll/dpll_core.c 0d60d8df6f49 ("dpll: rely on rcu for netdev_dpll_pin()") e7f8df0e81bf ("dpll: move xa_erase() call in to match dpll_pin_alloc() error path order") drivers/net/veth.c 1ce7d306ea63 ("veth: try harder when allocating queue memory") 0bef512012b1 ("net: add netdev_lockdep_set_classes() to virtual drivers") drivers/net/wireless/intel/iwlwifi/mvm/d3.c 8c9bef26e98b ("wifi: iwlwifi: mvm: d3: implement suspend with MLO") 78f65fbf421a ("wifi: iwlwifi: mvm: ensure offloading TID queue exists") net/wireless/nl80211.c f78c1375339a ("wifi: nl80211: reject iftype change with mesh ID change") 414532d8aa89 ("wifi: cfg80211: use IEEE80211_MAX_MESH_ID_LEN appropriately") Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-02-29bpf: Replace bpf_lpm_trie_key 0-length array with flexible arrayKees Cook
Replace deprecated 0-length array in struct bpf_lpm_trie_key with flexible array. Found with GCC 13: ../kernel/bpf/lpm_trie.c:207:51: warning: array subscript i is outside array bounds of 'const __u8[0]' {aka 'const unsigned char[]'} [-Warray-bounds=] 207 | *(__be16 *)&key->data[i]); | ^~~~~~~~~~~~~ ../include/uapi/linux/swab.h:102:54: note: in definition of macro '__swab16' 102 | #define __swab16(x) (__u16)__builtin_bswap16((__u16)(x)) | ^ ../include/linux/byteorder/generic.h:97:21: note: in expansion of macro '__be16_to_cpu' 97 | #define be16_to_cpu __be16_to_cpu | ^~~~~~~~~~~~~ ../kernel/bpf/lpm_trie.c:206:28: note: in expansion of macro 'be16_to_cpu' 206 | u16 diff = be16_to_cpu(*(__be16 *)&node->data[i] ^ | ^~~~~~~~~~~ In file included from ../include/linux/bpf.h:7: ../include/uapi/linux/bpf.h:82:17: note: while referencing 'data' 82 | __u8 data[0]; /* Arbitrary size */ | ^~~~ And found at run-time under CONFIG_FORTIFY_SOURCE: UBSAN: array-index-out-of-bounds in kernel/bpf/lpm_trie.c:218:49 index 0 is out of range for type '__u8 [*]' Changing struct bpf_lpm_trie_key is difficult since has been used by userspace. For example, in Cilium: struct egress_gw_policy_key { struct bpf_lpm_trie_key lpm_key; __u32 saddr; __u32 daddr; }; While direct references to the "data" member haven't been found, there are static initializers what include the final member. For example, the "{}" here: struct egress_gw_policy_key in_key = { .lpm_key = { 32 + 24, {} }, .saddr = CLIENT_IP, .daddr = EXTERNAL_SVC_IP & 0Xffffff, }; To avoid the build time and run time warnings seen with a 0-sized trailing array for struct bpf_lpm_trie_key, introduce a new struct that correctly uses a flexible array for the trailing bytes, struct bpf_lpm_trie_key_u8. As part of this, include the "header" portion (which is just the "prefixlen" member), so it can be used by anything building a bpf_lpr_trie_key that has trailing members that aren't a u8 flexible array (like the self-test[1]), which is named struct bpf_lpm_trie_key_hdr. Unfortunately, C++ refuses to parse the __struct_group() helper, so it is not possible to define struct bpf_lpm_trie_key_hdr directly in struct bpf_lpm_trie_key_u8, so we must open-code the union directly. Adjust the kernel code to use struct bpf_lpm_trie_key_u8 through-out, and for the selftest to use struct bpf_lpm_trie_key_hdr. Add a comment to the UAPI header directing folks to the two new options. Reported-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Kees Cook <keescook@chromium.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Gustavo A. R. Silva <gustavoars@kernel.org> Closes: https://paste.debian.net/hidden/ca500597/ Link: https://lore.kernel.org/all/202206281009.4332AA33@keescook/ [1] Link: https://lore.kernel.org/bpf/20240222155612.it.533-kees@kernel.org
2024-02-29workqueue: Drain BH work items on hot-unplugged CPUsTejun Heo
Boqun pointed out that workqueues aren't handling BH work items on offlined CPUs. Unlike tasklet which transfers out the pending tasks from CPUHP_SOFTIRQ_DEAD, BH workqueue would just leave them pending which is problematic. Note that this behavior is specific to BH workqueues as the non-BH per-CPU workers just become unbound when the CPU goes offline. This patch fixes the issue by draining the pending BH work items from an offlined CPU from CPUHP_SOFTIRQ_DEAD. Because work items carry more context, it's not as easy to transfer the pending work items from one pool to another. Instead, run BH work items which execute the offlined pools on an online CPU. Note that this assumes that no further BH work items will be queued on the offlined CPUs. This assumption is shared with tasklet and should be fine for conversions. However, this issue also exists for per-CPU workqueues which will just keep executing work items queued after CPU offline on unbound workers and workqueue should reject per-CPU and BH work items queued on offline CPUs. This will be addressed separately later. Signed-off-by: Tejun Heo <tj@kernel.org> Reported-and-reviewed-by: Boqun Feng <boqun.feng@gmail.com> Link: http://lkml.kernel.org/r/Zdvw0HdSXcU3JZ4g@boqun-archlinux
2024-02-29overflow: Use POD in check_shl_overflow()Andy Shevchenko
The check_shl_overflow() uses u64 type that is defined in types.h. Instead of including that header, just switch to use POD type directly. Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Acked-by: Kees Cook <keescook@chromium.org> Link: https://lore.kernel.org/r/20240228204919.3680786-2-andriy.shevchenko@linux.intel.com Signed-off-by: Kees Cook <keescook@chromium.org>
2024-02-29kernel.h: Move lib/cmdline.c prototypes to string.hAndy Shevchenko
The lib/cmdline.c is basically a set of some small string parsers which are wide used in the kernel. Their prototypes belong to the string.h rather then kernel.h. Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Link: https://lore.kernel.org/r/20231003130142.2936503-1-andriy.shevchenko@linux.intel.com Signed-off-by: Kees Cook <keescook@chromium.org>
2024-02-29fortify: Improve buffer overflow reportingKees Cook
Improve the reporting of buffer overflows under CONFIG_FORTIFY_SOURCE to help accelerate debugging efforts. The calculations are all just sitting in registers anyway, so pass them along to the function to be reported. For example, before: detected buffer overflow in memcpy and after: memcpy: detected buffer overflow: 4096 byte read of buffer size 1 Link: https://lore.kernel.org/r/20230407192717.636137-10-keescook@chromium.org Signed-off-by: Kees Cook <keescook@chromium.org>
2024-02-29fortify: Provide KUnit counters for failure testingKees Cook
The standard C string APIs were not designed to have a failure mode; they were expected to always succeed without memory safety issues. Normally, CONFIG_FORTIFY_SOURCE will use fortify_panic() to stop processing, as truncating a read or write may provide an even worse system state. However, this creates a problem for testing under things like KUnit, which needs a way to survive failures. When building with CONFIG_KUNIT, provide a failure path for all users of fortify_panic, and track whether the failure was a read overflow or a write overflow, for KUnit tests to examine. Inspired by similar logic in the slab tests. Signed-off-by: Kees Cook <keescook@chromium.org>
2024-02-29fortify: Split reporting and avoid passing string pointerKees Cook
In preparation for KUnit testing and further improvements in fortify failure reporting, split out the report and encode the function and access failure (read or write overflow) into a single u8 argument. This mainly ends up saving a tiny bit of space in the data segment. For a defconfig with FORTIFY_SOURCE enabled: $ size gcc/vmlinux.before gcc/vmlinux.after text data bss dec hex filename 26132309 9760658 2195460 38088427 2452eeb gcc/vmlinux.before 26132386 9748382 2195460 38076228 244ff44 gcc/vmlinux.after Reviewed-by: Alexander Lobakin <aleksander.lobakin@intel.com> Signed-off-by: Kees Cook <keescook@chromium.org>
2024-02-29refcount: Annotated intentional signed integer wrap-aroundKees Cook
Mark the various refcount_t functions with __signed_wrap, as we depend on the wrapping behavior to detect the overflow and perform saturation. Silences warnings seen with the LKDTM REFCOUNT_* tests: UBSAN: signed-integer-overflow in ../include/linux/refcount.h:189:11 2147483647 + 1 cannot be represented in type 'int' Reviewed-by: Miguel Ojeda <ojeda@kernel.org> Link: https://lore.kernel.org/r/20240221051634.work.287-kees@kernel.org Signed-off-by: Kees Cook <keescook@chromium.org>
2024-02-29lib/string_choices: Add str_plural() helperMichal Wajdeczko
Add str_plural() helper to replace existing open implementations used by many drivers and help improve future user facing messages. Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Link: https://lore.kernel.org/r/20240214165015.1656-1-michal.wajdeczko@intel.com Signed-off-by: Kees Cook <keescook@chromium.org>
2024-02-29overflow: Introduce wrapping_assign_add() and wrapping_assign_sub()Kees Cook
This allows replacements of the idioms "var += offset" and "var -= offset" with the wrapping_assign_add() and wrapping_assign_sub() helpers respectively. They will avoid wrap-around sanitizer instrumentation. Add to the selftests to validate behavior and lack of side-effects. Reviewed-by: Marco Elver <elver@google.com> Acked-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Kees Cook <keescook@chromium.org>
2024-02-29overflow: Introduce wrapping_add(), wrapping_sub(), and wrapping_mul()Kees Cook
Provide helpers that will perform wrapping addition, subtraction, or multiplication without tripping the arithmetic wrap-around sanitizers. The first argument is the type under which the wrap-around should happen with. In other words, these two calls will get very different results: wrapping_mul(int, 50, 50) == 2500 wrapping_mul(u8, 50, 50) == 196 Add to the selftests to validate behavior and lack of side-effects. Reviewed-by: Gustavo A. R. Silva <gustavoars@kernel.org> Reviewed-by: Marco Elver <elver@google.com> Acked-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Kees Cook <keescook@chromium.org>
2024-02-29overflow: Adjust check_*_overflow() kern-doc to reflect resultsKees Cook
The check_*_overflow() helpers will return results with potentially wrapped-around values. These values have always been checked by the selftests, so avoid the confusing language in the kern-doc. The idea of "safe for use" was relative to the expectation of whether or not the caller wants a wrapped value -- the calculation itself will always follow arithmetic wrapping rules. Reviewed-by: Gustavo A. R. Silva <gustavoars@kernel.org> Acked-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Kees Cook <keescook@chromium.org>
2024-02-29kernel.h: Move upper_*_bits() and lower_*_bits() to wordpart.hAndy Shevchenko
The wordpart.h header is collecting APIs related to the handling parts of the word (usually in byte granularity). The upper_*_bits() and lower_*_bits() are good candidates to be moved to there. This helps to clean up header dependency hell with regard to kernel.h as the latter gathers completely unrelated stuff together and slows down compilation (especially when it's included into other header). Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Link: https://lore.kernel.org/r/20240214172752.3605073-1-andriy.shevchenko@linux.intel.com Reviewed-by: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Kees Cook <keescook@chromium.org>
2024-02-29Merge branch 'icc-sm7150' into icc-nextGeorgi Djakov
Add dt-bindings and interconnect driver support for the Qualcomm SM7150 SoC. * icc-sm7150 dt-bindings: interconnect: Add Qualcomm SM7150 DT bindings interconnect: qcom: Add SM7150 driver support Link: https://lore.kernel.org/r/20240222174250.80493-1-danila@jiaxyga.com Signed-off-by: Georgi Djakov <djakov@kernel.org>
2024-02-29dt-bindings: interconnect: Add Qualcomm SM7150 DT bindingsDanila Tikhonov
The Qualcomm SM7150 platform has several bus fabrics that could be controlled and tuned dynamically according to the bandwidth demand. Signed-off-by: Danila Tikhonov <danila@jiaxyga.com> Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Link: https://lore.kernel.org/r/20240222174250.80493-2-danila@jiaxyga.com Signed-off-by: Georgi Djakov <djakov@kernel.org>
2024-02-29Merge tag 'net-6.8-rc7' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Pull networking fixes from Jakub Kicinski: "Including fixes from bluetooth, WiFi and netfilter. We have one outstanding issue with the stmmac driver, which may be a LOCKDEP false positive, not a blocker. Current release - regressions: - netfilter: nf_tables: re-allow NFPROTO_INET in nft_(match/target)_validate() - eth: ionic: fix error handling in PCI reset code Current release - new code bugs: - eth: stmmac: complete meta data only when enabled, fix null-deref - kunit: fix again checksum tests on big endian CPUs Previous releases - regressions: - veth: try harder when allocating queue memory - Bluetooth: - hci_bcm4377: do not mark valid bd_addr as invalid - hci_event: fix handling of HCI_EV_IO_CAPA_REQUEST Previous releases - always broken: - info leak in __skb_datagram_iter() on netlink socket - mptcp: - map v4 address to v6 when destroying subflow - fix potential wake-up event loss due to sndbuf auto-tuning - fix double-free on socket dismantle - wifi: nl80211: reject iftype change with mesh ID change - fix small out-of-bound read when validating netlink be16/32 types - rtnetlink: fix error logic of IFLA_BRIDGE_FLAGS writing back - ipv6: fix potential "struct net" ref-leak in inet6_rtm_getaddr() - ip_tunnel: prevent perpetual headroom growth with huge number of tunnels on top of each other - mctp: fix skb leaks on error paths of mctp_local_output() - eth: ice: fixes for DPLL state reporting - dpll: rely on rcu for netdev_dpll_pin() to prevent UaF - eth: dpaa: accept phy-interface-type = '10gbase-r' in the device tree" * tag 'net-6.8-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (73 commits) dpll: fix build failure due to rcu_dereference_check() on unknown type kunit: Fix again checksum tests on big endian CPUs tls: fix use-after-free on failed backlog decryption tls: separate no-async decryption request handling from async tls: fix peeking with sync+async decryption tls: decrement decrypt_pending if no async completion will be called gtp: fix use-after-free and null-ptr-deref in gtp_newlink() net: hsr: Use correct offset for HSR TLV values in supervisory HSR frames igb: extend PTP timestamp adjustments to i211 rtnetlink: fix error logic of IFLA_BRIDGE_FLAGS writing back tools: ynl: fix handling of multiple mcast groups selftests: netfilter: add bridge conntrack + multicast test case netfilter: bridge: confirm multicast packets before passing them up the stack netfilter: nf_tables: allow NFPROTO_INET in nft_(match/target)_validate() Bluetooth: qca: Fix triggering coredump implementation Bluetooth: hci_qca: Set BDA quirk bit if fwnode exists in DT Bluetooth: qca: Fix wrong event type for patch config command Bluetooth: Enforce validation on max value of connection interval Bluetooth: hci_event: Fix handling of HCI_EV_IO_CAPA_REQUEST Bluetooth: mgmt: Fix limited discoverable off timeout ...
2024-02-29cgroup/cpuset: Remove cpuset_do_slab_mem_spread()Xiongwei Song
The SLAB allocator has been removed sine 6.8-rc1 [1], so there is no user with SLAB_MEM_SPREAD and cpuset_do_slab_mem_spread(). Then SLAB_MEM_SPREAD is marked as unused by [2]. Here we can remove cpuset_do_slab_mem_spread(). For more details, please check [3]. [1] https://lore.kernel.org/linux-mm/20231120-slab-remove-slab-v2-0-9c9c70177183@suse.cz/ [2] https://lore.kernel.org/linux-kernel/20240223-slab-cleanup-flags-v2-0-02f1753e8303@suse.cz/T/ [3] https://lore.kernel.org/lkml/32bc1403-49da-445a-8c00-9686a3b0d6a3@redhat.com/T/#mf14b838c5e0e77f4756d436bac3d8c0447ea4350 Signed-off-by: Xiongwei Song <xiongwei.song@windriver.com> Reviewed-by: Waiman Long <longman@redhat.com> Signed-off-by: Tejun Heo <tj@kernel.org>
2024-02-29dpll: fix build failure due to rcu_dereference_check() on unknown typeEric Dumazet
Tasmiya reports that their compiler complains that we deref a pointer to unknown type with rcu_dereference_rtnl(): include/linux/rcupdate.h:439:9: error: dereferencing pointer to incomplete type ‘struct dpll_pin’ Unclear what compiler it is, at the moment, and we can't report but since DPLL can't be a module - move the code from the header into the source file. Fixes: 0d60d8df6f49 ("dpll: rely on rcu for netdev_dpll_pin()") Reported-by: Tasmiya Nalatwad <tasmiya@linux.vnet.ibm.com> Link: https://lore.kernel.org/all/3fcf3a2c-1c1b-42c1-bacb-78fdcd700389@linux.vnet.ibm.com/ Signed-off-by: Eric Dumazet <edumazet@google.com> Link: https://lore.kernel.org/r/20240229190515.2740221-1-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-02-29cpufreq: Limit resolving a frequency to policy min/maxShivnandan Kumar
Resolving a frequency to an efficient one should not transgress policy->max (which can be set for thermal reason) and policy->min. Currently, there is possibility where scaling_cur_freq can exceed scaling_max_freq when scaling_max_freq is an inefficient frequency. Add a check to ensure that resolving a frequency will respect policy->min/max. Cc: All applicable <stable@vger.kernel.org> Fixes: 1f39fa0dccff ("cpufreq: Introducing CPUFREQ_RELATION_E") Signed-off-by: Shivnandan Kumar <quic_kshivnan@quicinc.com> [ rjw: Whitespace adjustment, changelog edits ] Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2024-02-29f2fs: stop checkpoint when get a out-of-bounds segmentZhiguo Niu
There is low probability that an out-of-bounds segment will be got on a small-capacity device. In order to prevent subsequent write requests allocating block address from this invalid segment, which may cause unexpected issue, stop checkpoint should be performed. Also introduce a new stop cp reason: STOP_CP_REASON_NO_SEGMENT. Note, f2fs_stop_checkpoint(, false) is complex and it may sleep, so we should move it outside segmap_lock spinlock coverage in get_new_segment(). Signed-off-by: Zhiguo Niu <zhiguo.niu@unisoc.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>