summaryrefslogtreecommitdiff
path: root/include
AgeCommit message (Collapse)Author
2024-05-14Merge tag 'thermal-6.10-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull thermal control updates from Rafael Wysocki: "The most significant part of this is a rework of thermal governors, including a redesign of the thermal governor interface and changes to make some of them take trip point hysteresis into account properly, as well as some related cleanups of the thermal governors and thermal core. The above is based on preliminary changes refactoring thermal data structures and moving the definitions of some of them into the thermal core which also ensure that trip point crossing notifications will be sent to user space via netlink and recorded in the debug statistics in temperature order. In addition, netlink bind/unbind notifications are added to the thermal core and the Intel HFI driver is modified to use them to avoid sending netlink messages until there are subscribers. Apart from that, multiple thermal drivers are updated which includes new hardware support (MediaTek MT8188 and MT8186, Amlogic A1 thermal sensor, Loongson-2K2000, Lmh QCM2290), fixes, cleanups and documentation updates, and the recently added thermal debug code is fixed and cleaned up. Specifics: - Redesign the thermal governor interface to allow the governors to work in a more straightforward way (Rafael Wysocki) - Make thermal governors take the current trip point thresholds into account in their computations which allows trip hysteresis to be observed more accurately (Rafael Wysocki) - Make the thermal core manage passive polling for thermal zones and remove passive polling management from thermal governors (Rafael Wysocki) - Refactor trip point representation and move the definition of thermal governor and thermal zone device structures to the thermal core (Rafael Wysocki) - Sort trip point crossing notifications and debug recording of trip point crossing events by temperature (Rafael Wysocki) - Improve the handling of cooling device states and thermal mitigation episodes in progress in the thermal debug code (Rafael Wysocki) - Avoid excessive updates of trip point statistics and clean up the printing of thermal mitigation episode information (Rafael Wysocki) - Clean up thermal governors and thermal core (Rafael Wysocki) - Allow thermal drivers to register notifiers that will be invoked on netlink events like BIND and UNBIND, so that they can adjust their activity depending on whether or not there are any subscribers of netlink messages coming from them, and make the Intel HFI driver use this mechanism (Stanislaw Gruszka) - Adjust the update delay and capabilities-per-event values in the Intel HFI thermal driver to prevent it from missing events and allow it to process more data in one go (Ricardo Neri) - Add missing MODULE_DESCRIPTION() to multiple files in the int340x_thermal and intel_soc_dts_iosf drivers (Srinivas Pandruvada) - Replace deprecated strncpy() with strscpy() in the int340x_thermal driver (Justin Stitt) - Add QCM2290 compatible DT bindings for Lmh and fix a NULL pointer dereference in the lmh driver when the SCM is not present (Konrad Dybcio) - Use the strreplace() function instead of doing it manually in the Armada driver (Rasmus Villemoes) - Convert st,stih407-thermal to DT schema and fix up missing properties (Raphael Gallais-Pou) - Add suspend/resume by restoring the context of the tsens sensor (Priyansh Jain) - Support A1 SoC family Thermal Sensor controller and add the DT bindings (Dmitry Rokosov) - Improve the temperature approximation calculation and consolidate the Tj constant into a shared area of the structure instead of duplicating it on the Rcar Gen3 (Niklas Söderlund) - Fix the Mediatek LVTS sensor coefficient for the MT8192 in order to support it correctly (Hsin-Te Yuan) - Fix a NULL pointer dereference in the tsens driver when the function compute_intercept_slope() is called with a NULL parameter (Aleksandr Mishin) - Remove some unused fields in struct qpnp_tm_chip and k3_bandgap (Christophe Jaillet) - Fix up calibration efuse data decoding, consolidate the code by checking boundaries and refactor some part of the LVTS Mediatek driver. After setting the scene, add MT8186 and MT8188 along with the DT bindings (Nicolas Pitre) - Add Loongson-2K2000 support after some minor code adjustements and providing the DT bindings definition (Binbin Zhou)" * tag 'thermal-6.10-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (72 commits) thermal: intel: hfi: Increase the number of CPU capabilities per netlink event thermal: intel: hfi: Rename HFI_MAX_THERM_NOTIFY_COUNT thermal: intel: hfi: Shorten the thermal netlink event delay to 100ms thermal: intel: hfi: Rename HFI_UPDATE_INTERVAL thermal: intel: Add missing module description thermal: core: Move passive polling management to the core thermal: core: Do not call handle_thermal_trip() if zone temperature is invalid thermal: trip: Add missing empty code line thermal/debugfs: Avoid printing zero duration for mitigation events in progress thermal/debugfs: Pass cooling device state to thermal_debug_cdev_add() thermal/debugfs: Create records for cdev states as they get used thermal: core: Introduce thermal_governor_trip_crossed() thermal/debugfs: Make tze_seq_show() skip invalid trips and trips with no stats thermal/debugfs: Rename thermal_debug_update_temp() to thermal_debug_update_trip_stats() thermal/debugfs: Clean up thermal_debug_update_temp() thermal/debugfs: Avoid excessive updates of trip point statistics thermal: core: Relocate critical and hot trip handling thermal: core: Drop the .throttle() governor callback thermal: gov_user_space: Use .trip_crossed() instead of .throttle() thermal: gov_fair_share: Eliminate unnecessary integer divisions ...
2024-05-14cgroup/rstat: add cgroup_rstat_cpu_lock helpers and tracepointsJesper Dangaard Brouer
This closely resembles helpers added for the global cgroup_rstat_lock in commit fc29e04ae1ad ("cgroup/rstat: add cgroup_rstat_lock helpers and tracepoints"). This is for the per CPU lock cgroup_rstat_cpu_lock. Based on production workloads, we observe the fast-path "update" function cgroup_rstat_updated() is invoked around 3 million times per sec, while the "flush" function cgroup_rstat_flush_locked(), walking each possible CPU, can see periodic spikes of 700 invocations/sec. For this reason, the tracepoints are split into normal and fastpath versions for this per-CPU lock. Making it feasible for production to continuously monitor the non-fastpath tracepoint to detect lock contention issues. The reason for monitoring is that lock disables IRQs which can disturb e.g. softirq processing on the local CPUs involved. When the global cgroup_rstat_lock stops disabling IRQs (e.g converted to a mutex), this per CPU lock becomes the next bottleneck that can introduce latency variations. A practical bpftrace script for monitoring contention latency: bpftrace -e ' tracepoint:cgroup:cgroup_rstat_cpu_lock_contended { @start[tid]=nsecs; @cnt[probe]=count()} tracepoint:cgroup:cgroup_rstat_cpu_locked { if (args->contended) { @wait_ns=hist(nsecs-@start[tid]); delete(@start[tid]);} @cnt[probe]=count()} interval:s:1 {time("%H:%M:%S "); print(@wait_ns); print(@cnt); clear(@cnt);}' Signed-off-by: Jesper Dangaard Brouer <hawk@kernel.org> Signed-off-by: Tejun Heo <tj@kernel.org>
2024-05-14Merge tag 'linux_kselftest-kunit-6.10-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest Pull kunit updates from Shuah Khan: - fix race condition in try-catch completion - change __kunit_test_suites_init() to exit early if there is nothing to test - change string-stream-test to use KUNIT_DEFINE_ACTION_WRAPPER - move fault tests behind KUNIT_FAULT_TEST Kconfig option - kthread test fixes and improvements - iov_iter test fixes * tag 'linux_kselftest-kunit-6.10-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest: kunit: bail out early in __kunit_test_suites_init() if there are no suites to test kunit: string-stream-test: use KUNIT_DEFINE_ACTION_WRAPPER kunit: test: Move fault tests behind KUNIT_FAULT_TEST Kconfig option kunit: unregister the device on error kunit: Fix race condition in try-catch completion kunit: Add tests for fault kunit: Print last test location on fault kunit: Fix KUNIT_SUCCESS() calls in iov_iter tests kunit: Handle test faults kunit: Fix timeout message kunit: Fix kthread reference kunit: Handle thread creation error
2024-05-14s390/iucv: Unexport iucv_rootHeiko Carstens
There is no user of iucv_root outside of the core IUCV code left. Therefore remove the EXPORT_SYMBOL. Acked-by: Alexandra Winter <wintera@linux.ibm.com> Link: https://lore.kernel.org/r/20240506194454.1160315-7-hca@linux.ibm.com Signed-off-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
2024-05-14s390/iucv: Provide iucv_alloc_device() / iucv_release_device()Heiko Carstens
Provide iucv_alloc_device() and iucv_release_device() helper functions, which can be used to deduplicate more or less identical IUCV device allocation and release code in four different drivers. Suggested-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Alexandra Winter <wintera@linux.ibm.com> Link: https://lore.kernel.org/r/20240506194454.1160315-2-hca@linux.ibm.com Signed-off-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
2024-05-14Merge tag 'arm64-upstream' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux Pull arm64 updates from Will Deacon: "The most interesting parts are probably the mm changes from Ryan which optimise the creation of the linear mapping at boot and (separately) implement write-protect support for userfaultfd. Outside of our usual directories, the Kbuild-related changes under scripts/ have been acked by Masahiro whilst the drivers/acpi/ parts have been acked by Rafael and the addition of cpumask_any_and_but() has been acked by Yury. ACPI: - Support for the Firmware ACPI Control Structure (FACS) signature feature which is used to reboot out of hibernation on some systems Kbuild: - Support for building Flat Image Tree (FIT) images, where the kernel Image is compressed alongside a set of devicetree blobs Memory management: - Optimisation of our early page-table manipulation for creation of the linear mapping - Support for userfaultfd write protection, which brings along some nice cleanups to our handling of invalid but present ptes - Extend our use of range TLBI invalidation at EL1 Perf and PMUs: - Ensure that the 'pmu->parent' pointer is correctly initialised by PMU drivers - Avoid allocating 'cpumask_t' types on the stack in some PMU drivers - Fix parsing of the CPU PMU "version" field in assembly code, as it doesn't follow the usual architectural rules - Add best-effort unwinding support for USER_STACKTRACE - Minor driver fixes and cleanups Selftests: - Minor cleanups to the arm64 selftests (missing NULL check, unused variable) Miscellaneous: - Add a command-line alias for disabling 32-bit application support - Add part number for Neoverse-V2 CPUs - Minor fixes and cleanups" * tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: (64 commits) arm64/mm: Fix pud_user_accessible_page() for PGTABLE_LEVELS <= 2 arm64/mm: Add uffd write-protect support arm64/mm: Move PTE_PRESENT_INVALID to overlay PTE_NG arm64/mm: Remove PTE_PROT_NONE bit arm64/mm: generalize PMD_PRESENT_INVALID for all levels arm64: simplify arch_static_branch/_jump function arm64: Add USER_STACKTRACE support arm64: Add the arm64.no32bit_el0 command line option drivers/perf: hisi: hns3: Actually use devm_add_action_or_reset() drivers/perf: hisi: hns3: Fix out-of-bound access when valid event group drivers/perf: hisi_pcie: Fix out-of-bound access when valid event group kselftest: arm64: Add a null pointer check arm64: defer clearing DAIF.D arm64: assembler: update stale comment for disable_step_tsk arm64/sysreg: Update PIE permission encodings kselftest/arm64: Remove unused parameters in abi test perf/arm-spe: Assign parents for event_source device perf/arm-smmuv3: Assign parents for event_source device perf/arm-dsu: Assign parents for event_source device perf/arm-dmc620: Assign parents for event_source device ...
2024-05-14Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski
Merge in late fixes to prepare for the 6.10 net-next PR. Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-05-14net: gro: fix napi_gro_cb zeroed alignmentRichard Gobert
Add 2 byte padding to napi_gro_cb struct to ensure zeroed member is aligned after flush_id member was removed in the original commit. Fixes: 4b0ebbca3e16 ("net: gro: move L3 flush checks to tcp_gro_receive and udp_gro_receive_segment") Suggested-by: Geert Uytterhoeven <geert@linux-m68k.org> Signed-off-by: Richard Gobert <richardbgobert@gmail.com> Link: https://lore.kernel.org/r/fca08735-c245-49e5-af72-82900634f144@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-05-14Merge tag 'irq-core-2024-05-12' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull interrupt subsystem updates from Thomas Gleixner: "Core code: - Interrupt storm detection for the lockup watchdog: Lockups which are caused by interrupt storms are not easy to debug because there is no information about the events which make the lockup detector trigger. To make this more user friendly, provide an extenstion to interrupt statistics which allows to take snapshots and an interface to retrieve the delta to the snapshot. Use this new mechanism in the watchdog code to do a two stage lockup analysis by taking the snapshot and printing the deltas for the topmost active interrupts on the second trigger. Note: This contains both the interrupt and the watchdog changes as the latter depend on the former obviously. - Avoid summation loops in the /proc/interrupts output and use the global counter when possible - Skip suspended interrupts on CPU hotplug operations to ensure that they are not delivered before the system resumes the device drivers when coming out of suspend. - On CPU hot-unplug interrupts which are affine to the outgoing CPU are migrated to a different CPU in the affinity mask. This can fail when the CPUs have no vectors left. Instead of giving up try to migrate it to any online CPU and thereby breaking the affinity setting in order to prevent a stale device interrupt which targets an offline CPU - The usual small cleanups Driver code: - Support for the RISCV AIA MSI controller - Make the interrupt allocation for the Loongson PCH controller more flexible to prevent vector exhaustion - The usual set of cleanups and fixes all over the place" * tag 'irq-core-2024-05-12' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (51 commits) irqchip/gic-v3-its: Remove BUG_ON in its_vpe_irq_domain_alloc cpuidle: Avoid explicit cpumask allocation on stack irqchip/sifive-plic: Avoid explicit cpumask allocation on stack irqchip/riscv-aplic-direct: Avoid explicit cpumask allocation on stack irqchip/loongson-eiointc: Avoid explicit cpumask allocation on stack irqchip/gic-v3-its: Avoid explicit cpumask allocation on stack irqchip/irq-bcm6345-l1: Avoid explicit cpumask allocation on stack cpumask: Introduce cpumask_first_and_and() irqchip/irq-brcmstb-l2: Avoid saving mask on shutdown genirq: Reuse irq_is_nmi() genirq/cpuhotplug: Retry with cpu_online_mask when migration fails genirq/cpuhotplug: Skip suspended interrupts when restoring affinity arm64: dts: st: Add interrupt parent to pinctrl on stm32mp251 arm64: dts: st: Add exti1 and exti2 nodes on stm32mp251 ARM: dts: stm32: List exti parent interrupts on stm32mp131 ARM: dts: stm32: List exti parent interrupts on stm32mp151 arm64: Kconfig.platforms: Enable STM32_EXTI for ARCH_STM32 irqchip/stm32-exti: Mark events reserved with RIF configuration check irqchip/stm32-exti: Skip secure events irqchip/stm32-exti: Convert driver to standard PM ...
2024-05-14Merge tag 'timers-core-2024-05-12' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull timers and timekeeping updates from Thomas Gleixner: "Core code: - Make timekeeping and VDSO time readouts resilent against math overflow: In guest context the kernel is prone to math overflow when the host defers the timer interrupt due to overload, malfunction or malice. This can be mitigated by checking the clocksource delta for the maximum deferrement which is readily available. If that value is exceeded then the code uses a slowpath function which can handle the multiplication overflow. This functionality is enabled unconditionally in the kernel, but made conditional in the VDSO code. The latter is conditional because it allows architectures to optimize the check so it is not causing performance regressions. On X86 this is achieved by reworking the existing check for negative TSC deltas as a negative delta obviously exceeds the maximum deferrement when it is evaluated as an unsigned value. That avoids two conditionals in the hotpath and allows to hide both the negative delta and the large delta handling in the same slow path. - Add an initial minimal ktime_t abstraction for Rust - The usual boring cleanups and enhancements Drivers: - Boring updates to device trees and trivial enhancements in various drivers" * tag 'timers-core-2024-05-12' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (33 commits) clocksource/drivers/arm_arch_timer: Mark hisi_161010101_oem_info const clocksource/drivers/timer-ti-dm: Remove an unused field in struct dmtimer clocksource/drivers/renesas-ostm: Avoid reprobe after successful early probe clocksource/drivers/renesas-ostm: Allow OSTM driver to reprobe for RZ/V2H(P) SoC dt-bindings: timer: renesas: ostm: Document Renesas RZ/V2H(P) SoC rust: time: doc: Add missing C header links clocksource: Make the int help prompt unit readable in ncurses hrtimer: Rename __hrtimer_hres_active() to hrtimer_hres_active() timerqueue: Remove never used function timerqueue_node_expires() rust: time: Add Ktime vdso: Fix powerpc build U64_MAX undeclared error clockevents: Convert s[n]printf() to sysfs_emit() clocksource: Convert s[n]printf() to sysfs_emit() clocksource: Make watchdog and suspend-timing multiplication overflow safe timekeeping: Let timekeeping_cycles_to_ns() handle both under and overflow timekeeping: Make delta calculation overflow safe timekeeping: Prepare timekeeping_cycles_to_ns() for overflow safety timekeeping: Fold in timekeeping_delta_to_ns() timekeeping: Consolidate timekeeping helpers timekeeping: Refactor timekeeping helpers ...
2024-05-14Bluetooth: hci_core: Fix not handling hdev->le_num_of_adv_sets=1Luiz Augusto von Dentz
If hdev->le_num_of_adv_sets is set to 1 it means that only handle 0x00 can be used, but since the MGMT interface instances start from 1 (instance 0 means all instances in case of MGMT_OP_REMOVE_ADVERTISING) the code needs to map the instance to handle otherwise users will not be able to advertise as instance 1 would attempt to use handle 0x01. Fixes: 1d0fac2c38ed ("Bluetooth: Use controller sets when available") Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2024-05-14Bluetooth: HCI: Remove HCI_AMP supportLuiz Augusto von Dentz
Since BT_HS has been remove HCI_AMP controllers no longer has any use so remove it along with the capability of creating AMP controllers. Since we no longer need to differentiate between AMP and Primary controllers, as only HCI_PRIMARY is left, this also remove hdev->dev_type altogether. Fixes: e7b02296fb40 ("Bluetooth: Remove BT_HS") Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2024-05-14Bluetooth: L2CAP: Fix div-by-zero in l2cap_le_flowctl_init()Sungwoo Kim
l2cap_le_flowctl_init() can cause both div-by-zero and an integer overflow since hdev->le_mtu may not fall in the valid range. Move MTU from hci_dev to hci_conn to validate MTU and stop the connection process earlier if MTU is invalid. Also, add a missing validation in read_buffer_size() and make it return an error value if the validation fails. Now hci_conn_add() returns ERR_PTR() as it can fail due to the both a kzalloc failure and invalid MTU value. divide error: 0000 [#1] PREEMPT SMP KASAN NOPTI CPU: 0 PID: 67 Comm: kworker/u5:0 Tainted: G W 6.9.0-rc5+ #20 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.15.0-1 04/01/2014 Workqueue: hci0 hci_rx_work RIP: 0010:l2cap_le_flowctl_init+0x19e/0x3f0 net/bluetooth/l2cap_core.c:547 Code: e8 17 17 0c 00 66 41 89 9f 84 00 00 00 bf 01 00 00 00 41 b8 02 00 00 00 4c 89 fe 4c 89 e2 89 d9 e8 27 17 0c 00 44 89 f0 31 d2 <66> f7 f3 89 c3 ff c3 4d 8d b7 88 00 00 00 4c 89 f0 48 c1 e8 03 42 RSP: 0018:ffff88810bc0f858 EFLAGS: 00010246 RAX: 00000000000002a0 RBX: 0000000000000000 RCX: dffffc0000000000 RDX: 0000000000000000 RSI: ffff88810bc0f7c0 RDI: ffffc90002dcb66f RBP: ffff88810bc0f880 R08: aa69db2dda70ff01 R09: 0000ffaaaaaaaaaa R10: 0084000000ffaaaa R11: 0000000000000000 R12: ffff88810d65a084 R13: dffffc0000000000 R14: 00000000000002a0 R15: ffff88810d65a000 FS: 0000000000000000(0000) GS:ffff88811ac00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000020000100 CR3: 0000000103268003 CR4: 0000000000770ef0 PKRU: 55555554 Call Trace: <TASK> l2cap_le_connect_req net/bluetooth/l2cap_core.c:4902 [inline] l2cap_le_sig_cmd net/bluetooth/l2cap_core.c:5420 [inline] l2cap_le_sig_channel net/bluetooth/l2cap_core.c:5486 [inline] l2cap_recv_frame+0xe59d/0x11710 net/bluetooth/l2cap_core.c:6809 l2cap_recv_acldata+0x544/0x10a0 net/bluetooth/l2cap_core.c:7506 hci_acldata_packet net/bluetooth/hci_core.c:3939 [inline] hci_rx_work+0x5e5/0xb20 net/bluetooth/hci_core.c:4176 process_one_work kernel/workqueue.c:3254 [inline] process_scheduled_works+0x90f/0x1530 kernel/workqueue.c:3335 worker_thread+0x926/0xe70 kernel/workqueue.c:3416 kthread+0x2e3/0x380 kernel/kthread.c:388 ret_from_fork+0x5c/0x90 arch/x86/kernel/process.c:147 ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:244 </TASK> Modules linked in: ---[ end trace 0000000000000000 ]--- Fixes: 6ed58ec520ad ("Bluetooth: Use LE buffers for LE traffic") Suggested-by: Luiz Augusto von Dentz <luiz.dentz@gmail.com> Signed-off-by: Sungwoo Kim <iam@sung-woo.kim> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2024-05-14Bluetooth: hci_conn: Use __counted_by() and avoid -Wfamnae warningGustavo A. R. Silva
Prepare for the coming implementation by GCC and Clang of the __counted_by attribute. Flexible array members annotated with __counted_by can have their accesses bounds-checked at run-time via CONFIG_UBSAN_BOUNDS (for array indexing) and CONFIG_FORTIFY_SOURCE (for strcpy/memcpy-family functions). Also, -Wflex-array-member-not-at-end is coming in GCC-14, and we are getting ready to enable it globally. So, use the `DEFINE_FLEX()` helper for an on-stack definition of a flexible structure where the size of the flexible-array member is known at compile-time, and refactor the rest of the code, accordingly. With these changes, fix the following warning: net/bluetooth/hci_conn.c:669:41: warning: structure containing a flexible array member is not at the end of another structure [-Wflex-array-member-not-at-end] Link: https://github.com/KSPP/linux/issues/202 Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org> Reviewed-by: Kees Cook <keescook@chromium.org> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2024-05-14LE Create Connection command timeout increased to 20 secsMahesh Talewad
On our DUT, we can see that the host issues create connection cancel command after 4-sec if there is no connection complete event for LE create connection cmd. As per core spec v5.3 section 7.8.5, advertisement interval range is- Advertising_Interval_Min Default : 0x0800(1.28s) Time Range: 20ms to 10.24s Advertising_Interval_Max Default : 0x0800(1.28s) Time Range: 20ms to 10.24s If the remote device is using adv interval of > 4 sec, it is difficult to make a connection with the current timeout value. Also, with the default interval of 1.28 sec, we will get only 3 chances to capture the adv packets with the 4 sec window. Hence we want to increase this timeout to 20sec. Signed-off-by: Mahesh Talewad <mahesh.talewad@nxp.com> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2024-05-14Bluetooth: compute LE flow credits based on recvbuf spaceSebastian Urban
Previously LE flow credits were returned to the sender even if the socket's receive buffer was full. This meant that no back-pressure was applied to the sender, thus it continued to send data, resulting in data loss without any error being reported. Furthermore, the amount of credits was essentially fixed to a small amount, leading to reduced performance. This is fixed by computing the number of returned LE flow credits based on the estimated available space in the receive buffer of an L2CAP socket. Consequently, if the receive buffer is full, no credits are returned until the buffer is read and thus cleared by user-space. Since the computation of available receive buffer space can only be performed approximately (due to sk_buff overhead) and the receive buffer size may be changed by user-space after flow credits have been sent, superfluous received data is temporary stored within l2cap_pinfo. This is necessary because Bluetooth LE provides no retransmission mechanism once the data has been acked by the physical layer. If receive buffer space estimation is not possible at the moment, we fall back to providing credits for one full packet as before. This is currently the case during connection setup, when MPS is not yet available. Fixes: b1c325c23d75 ("Bluetooth: Implement returning of LE L2CAP credits") Signed-off-by: Sebastian Urban <surban@surban.net> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2024-05-14Bluetooth: hci_conn: Use __counted_by() to avoid -Wfamnae warningGustavo A. R. Silva
Prepare for the coming implementation by GCC and Clang of the __counted_by attribute. Flexible array members annotated with __counted_by can have their accesses bounds-checked at run-time via CONFIG_UBSAN_BOUNDS (for array indexing) and CONFIG_FORTIFY_SOURCE (for strcpy/memcpy-family functions). Also, -Wflex-array-member-not-at-end is coming in GCC-14, and we are getting ready to enable it globally. So, use the `DEFINE_FLEX()` helper for an on-stack definition of a flexible structure where the size of the flexible-array member is known at compile-time, and refactor the rest of the code, accordingly. With these changes, fix the following warning: net/bluetooth/hci_conn.c:2116:50: warning: structure containing a flexible array member is not at the end of another structure [-Wflex-array-member-not-at-end] Link: https://github.com/KSPP/linux/issues/202 Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org> Reviewed-by: Kees Cook <keescook@chromium.org> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2024-05-14Bluetooth: hci_conn, hci_sync: Use __counted_by() to avoid -Wfamnae warningsGustavo A. R. Silva
Prepare for the coming implementation by GCC and Clang of the __counted_by attribute. Flexible array members annotated with __counted_by can have their accesses bounds-checked at run-time via CONFIG_UBSAN_BOUNDS (for array indexing) and CONFIG_FORTIFY_SOURCE (for strcpy/memcpy-family functions). Also, -Wflex-array-member-not-at-end is coming in GCC-14, and we are getting ready to enable it globally. So, use the `DEFINE_FLEX()` helper for multiple on-stack definitions of a flexible structure where the size of the flexible-array member is known at compile-time, and refactor the rest of the code, accordingly. Notice that, due to the use of `__counted_by()` in `struct hci_cp_le_create_cis`, the for loop in function `hci_cs_le_create_cis()` had to be modified. Once the index `i`, through which `cp->cis[i]` is accessed, falls in the interval [0, cp->num_cis), `cp->num_cis` cannot be decremented all the way down to zero while accessing `cp->cis[]`: net/bluetooth/hci_event.c:4310: 4310 for (i = 0; cp->num_cis; cp->num_cis--, i++) { ... 4314 handle = __le16_to_cpu(cp->cis[i].cis_handle); otherwise, only half (one iteration before `cp->num_cis == i`) or half plus one (one iteration before `cp->num_cis < i`) of the items in the array will be accessed before running into an out-of-bounds issue. So, in order to avoid this, set `cp->num_cis` to zero just after the for loop. Also, make use of `aux_num_cis` variable to update `cmd->num_cis` after a `list_for_each_entry_rcu()` loop. With these changes, fix the following warnings: net/bluetooth/hci_sync.c:1239:56: warning: structure containing a flexible array member is not at the end of another structure [-Wflex-array-member-not-at-end] net/bluetooth/hci_sync.c:1415:51: warning: structure containing a flexible array member is not at the end of another structure [-Wflex-array-member-not-at-end] net/bluetooth/hci_sync.c:1731:51: warning: structure containing a flexible array member is not at the end of another structure [-Wflex-array-member-not-at-end] net/bluetooth/hci_sync.c:6497:45: warning: structure containing a flexible array member is not at the end of another structure [-Wflex-array-member-not-at-end] Link: https://github.com/KSPP/linux/issues/202 Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2024-05-14Bluetooth: L2CAP: Avoid -Wflex-array-member-not-at-end warningsGustavo A. R. Silva
-Wflex-array-member-not-at-end is coming in GCC-14, and we are getting ready to enable it globally. There are currently a couple of objects (`req` and `rsp`), in a couple of structures, that contain flexible structures (`struct l2cap_ecred_conn_req` and `struct l2cap_ecred_conn_rsp`), for example: struct l2cap_ecred_rsp_data { struct { struct l2cap_ecred_conn_rsp rsp; __le16 scid[L2CAP_ECRED_MAX_CID]; } __packed pdu; int count; }; in the struct above, `struct l2cap_ecred_conn_rsp` is a flexible structure: struct l2cap_ecred_conn_rsp { __le16 mtu; __le16 mps; __le16 credits; __le16 result; __le16 dcid[]; }; So, in order to avoid ending up with a flexible-array member in the middle of another structure, we use the `struct_group_tagged()` (and `__struct_group()` when the flexible structure is `__packed`) helper to separate the flexible array from the rest of the members in the flexible structure: struct l2cap_ecred_conn_rsp { struct_group_tagged(l2cap_ecred_conn_rsp_hdr, hdr, ... the rest of members ); __le16 dcid[]; }; With the change described above, we now declare objects of the type of the tagged struct, in this example `struct l2cap_ecred_conn_rsp_hdr`, without embedding flexible arrays in the middle of other structures: struct l2cap_ecred_rsp_data { struct { struct l2cap_ecred_conn_rsp_hdr rsp; __le16 scid[L2CAP_ECRED_MAX_CID]; } __packed pdu; int count; }; Also, when the flexible-array member needs to be accessed, we use `container_of()` to retrieve a pointer to the flexible structure. We also use the `DEFINE_RAW_FLEX()` helper for a couple of on-stack definitions of a flexible structure where the size of the flexible-array member is known at compile-time. So, with these changes, fix the following warnings: net/bluetooth/l2cap_core.c:1260:45: warning: structure containing a flexible array member is not at the end of another structure [-Wflex-array-member-not-at-end] net/bluetooth/l2cap_core.c:3740:45: warning: structure containing a flexible array member is not at the end of another structure [-Wflex-array-member-not-at-end] net/bluetooth/l2cap_core.c:4999:45: warning: structure containing a flexible array member is not at the end of another structure [-Wflex-array-member-not-at-end] net/bluetooth/l2cap_core.c:7116:47: warning: structure containing a flexible array member is not at the end of another structure [-Wflex-array-member-not-at-end] Link: https://github.com/KSPP/linux/issues/202 Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2024-05-14Bluetooth: ISO: Handle PA sync when no BIGInfo reports are generatedIulia Tanasescu
In case of a Broadcast Source that has PA enabled but no active BIG, a Broadcast Sink needs to establish PA sync and parse BASE from PA reports. This commit moves the allocation of a PA sync hcon from the BIGInfo advertising report event to the PA sync established event. After the first complete PA report, the hcon is notified to the ISO layer. A child socket is allocated and enqueued in the parent's accept queue. BIGInfo reports also need to be processed, to extract the encryption field and inform userspace. After the first BIGInfo report is received, the PA sync hcon is notified again to the ISO layer. Since a socket will be found this time, the socket state will transition to BT_CONNECTED and the userspace will be woken up using sk_state_change. Signed-off-by: Iulia Tanasescu <iulia.tanasescu@nxp.com> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2024-05-14Bluetooth: ISO: Make iso_get_sock_listen genericIulia Tanasescu
This makes iso_get_sock_listen more generic, to return matching socket in the state provided as argument. Signed-off-by: Iulia Tanasescu <iulia.tanasescu@nxp.com> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2024-05-14Bluetooth: Add proper definitions for scan interval and windowLuiz Augusto von Dentz
This adds proper definitions for scan interval and window and then make use of them instead their values. Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2024-05-14nbd: Use NULL to represent a pointerBart Van Assche
This patch fixes the following sparse warnings: drivers/block/nbd.c: note: in included file (through include/trace/trace_events.h, include/trace/define_trace.h, include/trace/events/nbd.h): ./include/trace/events/nbd.h:61:1: warning: Using plain integer as NULL pointer drivers/block/nbd.c: note: in included file (through include/trace/perf.h, include/trace/define_trace.h, include/trace/events/nbd.h): ./include/trace/events/nbd.h:61:1: warning: Using plain integer as NULL pointer Cc: Christoph Hellwig <hch@lst.de> Cc: Josef Bacik <jbacik@fb.com> Cc: Yu Kuai <yukuai3@huawei.com> Cc: Markus Pargmann <mpa@pengutronix.de> Signed-off-by: Bart Van Assche <bvanassche@acm.org> Link: https://lore.kernel.org/r/20240510202313.25209-2-bvanassche@acm.org Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-05-14devm-helpers: Fix a misspelled cancellation in the commentsAndy Shevchenko
Fix a misspelled cancellation in the comments. Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Link: https://lore.kernel.org/r/20240503173843.2922111-1-andriy.shevchenko@linux.intel.com Signed-off-by: Hans de Goede <hdegoede@redhat.com>
2024-05-14kprobes: remove dependency on CONFIG_MODULESMike Rapoport (IBM)
kprobes depended on CONFIG_MODULES because it has to allocate memory for code. Since code allocations are now implemented with execmem, kprobes can be enabled in non-modular kernels. Add #ifdef CONFIG_MODULE guards for the code dealing with kprobes inside modules, make CONFIG_KPROBES select CONFIG_EXECMEM and drop the dependency of CONFIG_KPROBES on CONFIG_MODULES. Signed-off-by: Mike Rapoport (IBM) <rppt@kernel.org> Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> [mcgrof: rebase in light of NEED_TASKS_RCU ] Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
2024-05-14mm/execmem, arch: convert remaining overrides of module_alloc to execmemMike Rapoport (IBM)
Extend execmem parameters to accommodate more complex overrides of module_alloc() by architectures. This includes specification of a fallback range required by arm, arm64 and powerpc, EXECMEM_MODULE_DATA type required by powerpc, support for allocation of KASAN shadow required by s390 and x86 and support for late initialization of execmem required by arm64. The core implementation of execmem_alloc() takes care of suppressing warnings when the initial allocation fails but there is a fallback range defined. Signed-off-by: Mike Rapoport (IBM) <rppt@kernel.org> Acked-by: Will Deacon <will@kernel.org> Acked-by: Song Liu <song@kernel.org> Tested-by: Liviu Dudau <liviu@dudau.co.uk> Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
2024-05-14mm/execmem, arch: convert simple overrides of module_alloc to execmemMike Rapoport (IBM)
Several architectures override module_alloc() only to define address range for code allocations different than VMALLOC address space. Provide a generic implementation in execmem that uses the parameters for address space ranges, required alignment and page protections provided by architectures. The architectures must fill execmem_info structure and implement execmem_arch_setup() that returns a pointer to that structure. This way the execmem initialization won't be called from every architecture, but rather from a central place, namely a core_initcall() in execmem. The execmem provides execmem_alloc() API that wraps __vmalloc_node_range() with the parameters defined by the architectures. If an architecture does not implement execmem_arch_setup(), execmem_alloc() will fall back to module_alloc(). Signed-off-by: Mike Rapoport (IBM) <rppt@kernel.org> Acked-by: Song Liu <song@kernel.org> Reviewed-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
2024-05-14mm: introduce execmem_alloc() and execmem_free()Mike Rapoport (IBM)
module_alloc() is used everywhere as a mean to allocate memory for code. Beside being semantically wrong, this unnecessarily ties all subsystems that need to allocate code, such as ftrace, kprobes and BPF to modules and puts the burden of code allocation to the modules code. Several architectures override module_alloc() because of various constraints where the executable memory can be located and this causes additional obstacles for improvements of code allocation. Start splitting code allocation from modules by introducing execmem_alloc() and execmem_free() APIs. Initially, execmem_alloc() is a wrapper for module_alloc() and execmem_free() is a replacement of module_memfree() to allow updating all call sites to use the new APIs. Since architectures define different restrictions on placement, permissions, alignment and other parameters for memory that can be used by different subsystems that allocate executable memory, execmem_alloc() takes a type argument, that will be used to identify the calling subsystem and to allow architectures define parameters for ranges suitable for that subsystem. No functional changes. Signed-off-by: Mike Rapoport (IBM) <rppt@kernel.org> Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Acked-by: Song Liu <song@kernel.org> Acked-by: Steven Rostedt (Google) <rostedt@goodmis.org> Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
2024-05-14ftrace: Remove unused global 'ftrace_direct_func_count'Dr. David Alan Gilbert
Commit 8788ca164eb4b ("ftrace: Remove the legacy _ftrace_direct API") stopped setting the 'ftrace_direct_func_count' variable, but left it around. Clean it up. Link: https://lore.kernel.org/linux-trace-kernel/20240506233305.215735-1-linux@treblig.org Signed-off-by: Dr. David Alan Gilbert <linux@treblig.org> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2024-05-14ftrace: Remove unused list 'ftrace_direct_funcs'Dr. David Alan Gilbert
Commit 8788ca164eb4b ("ftrace: Remove the legacy _ftrace_direct API") stopped using 'ftrace_direct_funcs' (and the associated struct ftrace_direct_func). Remove them. Build tested only (on x86-64 with FTRACE and DYNAMIC_FTRACE enabled) Link: https://lore.kernel.org/linux-trace-kernel/20240504132303.67538-1-linux@treblig.org Signed-off-by: Dr. David Alan Gilbert <linux@treblig.org> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2024-05-13Merge tag 'idxd-for-linus-may2024' of git bundle from ArjanLinus Torvalds
Pull DSA and IAA accelerator mis-alignment fix from Arjan van de Ven: "The DSA (memory copy/zero/etc) and IAA (compression) accelerators in the Sapphire Rapids and Emerald Rapids SOCs turn out to have a bug that has security implications. Both of these accelerators work by the application submitting a 64 byte command to the device; this command contains an opcode as well as the virtual address of the return value that the device will update on completion... and a set of opcode specific values. In a typical scenario a ring 3 application mmaps the device file and uses the ENQCMD or MOVDIR64 instructions (which are variations of a 64 byte atomic write) on this mmap'd memory region to directly submit commands to a device hardware. The return value as specified in the command, is supposed to be 32 (or 64) bytes aligned in memory, and generally the hardware checks and enforces this alignment. However in testing it has been found that there are conditions (controlled by the submitter) where this enforcement does not happen... which makes it possible for the return value to span a page boundary. And this is where it goes wrong - the accelerators will perform the virtual to physical address lookup on the first of the two pages, but end up continue writing to the next consecutive physical (host) page rather than the consecutive virtual page. In addition, the device will end up in a hung state on such unaligned write of the return value. This patch series has the proposed software side solution consisting of three parts: - Don't allow these two PCI devices to be assigned to VM guests (we cannot trust a VM guest to behave correctly and not cause this condition) - Don't allow ring 3 applications to set up the mmap unless they have CAP_SYS_RAWIO permissions. This makes it no longer possible for non-root applications to directly submit commands to the accelerator - Add a write() method to the device so that an application can submit its commands to the kernel driver, which performs the needed sanity checks before submitting it to the hardware. This switch from mmap to write is an incompatible interface change to non-root userspace, but we have not found a way to avoid this. All software we know of uses a small set of accessor libraries for these accelerators, for which libqpl and libdml (on github) are the most common. As part of the security release, updated versions of these libraries will be released that transparently fall back to write(). Intel has assigned CVE-2024-21823 to this hardware issue" * tag 'idxd-for-linus-may2024' of git bundle from Arjan: dmaengine: idxd: add a write() method for applications to submit work dmaengine: idxd: add a new security check to deal with a hardware erratum VFIO: Add the SPR_DSA and SPR_IAX devices to the denylist
2024-05-13Merge tag 'x86-cpu-2024-05-13' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 cpu updates from Ingo Molnar: - Rework the x86 CPU vendor/family/model code: introduce the 'VFM' value that is an 8+8+8 bit concatenation of the vendor/family/model value, and add macros that work on VFM values. This simplifies the addition of new Intel models & families, and simplifies existing enumeration & quirk code. - Add support for the AMD 0x80000026 leaf, to better parse topology information - Optimize the NUMA allocation layout of more per-CPU data structures - Improve the workaround for AMD erratum 1386 - Clear TME from /proc/cpuinfo as well, when disabled by the firmware - Improve x86 self-tests - Extend the mce_record tracepoint with the ::ppin and ::microcode fields - Implement recovery for MCE errors in TDX/SEAM non-root mode - Misc cleanups and fixes * tag 'x86-cpu-2024-05-13' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (34 commits) x86/mm: Switch to new Intel CPU model defines x86/tsc_msr: Switch to new Intel CPU model defines x86/tsc: Switch to new Intel CPU model defines x86/cpu: Switch to new Intel CPU model defines x86/resctrl: Switch to new Intel CPU model defines x86/microcode/intel: Switch to new Intel CPU model defines x86/mce: Switch to new Intel CPU model defines x86/cpu: Switch to new Intel CPU model defines x86/cpu/intel_epb: Switch to new Intel CPU model defines x86/aperfmperf: Switch to new Intel CPU model defines x86/apic: Switch to new Intel CPU model defines perf/x86/msr: Switch to new Intel CPU model defines perf/x86/intel/uncore: Switch to new Intel CPU model defines perf/x86/intel/pt: Switch to new Intel CPU model defines perf/x86/lbr: Switch to new Intel CPU model defines perf/x86/intel/cstate: Switch to new Intel CPU model defines x86/bugs: Switch to new Intel CPU model defines x86/bugs: Switch to new Intel CPU model defines x86/cpu/vfm: Update arch/x86/include/asm/intel-family.h x86/cpu/vfm: Add new macros to work with (vendor/family/model) values ...
2024-05-13net: revert partially applied PHY topology seriesJakub Kicinski
The series is causing issues with PHY drivers built as modules. Since it was only partially applied and the merge window has opened let's revert and try again for v6.11. Revert 6916e461e793 ("net: phy: Introduce ethernet link topology representation") Revert 0ec5ed6c130e ("net: sfp: pass the phy_device when disconnecting an sfp module's PHY") Revert e75e4e074c44 ("net: phy: add helpers to handle sfp phy connect/disconnect") Revert fdd353965b52 ("net: sfp: Add helper to return the SFP bus name") Revert 841942bc6212 ("net: ethtool: Allow passing a phy index for some commands") Link: https://lore.kernel.org/all/171242462917.4000.9759453824684907063.git-patchwork-notify@kernel.org/ Link: https://lore.kernel.org/all/20240507102822.2023826-1-maxime.chevallier@bootlin.com/ Link: https://lore.kernel.org/r/20240513154156.104281-1-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-05-13net: stmmac: move the EST structure to struct stmmac_privXiaolei Wang
Move the EST structure to struct stmmac_priv, because the EST configs don't look like platform config, but EST is enabled in runtime with the settings retrieved for the TC TAPRIO feature also in runtime. So it's better to have the EST-data preserved in the driver private data instead of the platform data storage. Signed-off-by: Xiaolei Wang <xiaolei.wang@windriver.com> Reviewed-by: Simon Horman <horms@kernel.org> Reviewed-by: Serge Semin <fancer.lancer@gmail.com> Reviewed-by: Andrew Halaney <ahalaney@redhat.com> Link: https://lore.kernel.org/r/20240513014346.1718740-3-xiaolei.wang@windriver.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-05-13net: stmmac: move the EST lock to struct stmmac_privXiaolei Wang
Reinitialize the whole EST structure would also reset the mutex lock which is embedded in the EST structure, and then trigger the following warning. To address this, move the lock to struct stmmac_priv. We also need to reacquire the mutex lock when doing this initialization. DEBUG_LOCKS_WARN_ON(lock->magic != lock) WARNING: CPU: 3 PID: 505 at kernel/locking/mutex.c:587 __mutex_lock+0xd84/0x1068 Modules linked in: CPU: 3 PID: 505 Comm: tc Not tainted 6.9.0-rc6-00053-g0106679839f7-dirty #29 Hardware name: NXP i.MX8MPlus EVK board (DT) pstate: 60000005 (nZCv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--) pc : __mutex_lock+0xd84/0x1068 lr : __mutex_lock+0xd84/0x1068 sp : ffffffc0864e3570 x29: ffffffc0864e3570 x28: ffffffc0817bdc78 x27: 0000000000000003 x26: ffffff80c54f1808 x25: ffffff80c9164080 x24: ffffffc080d723ac x23: 0000000000000000 x22: 0000000000000002 x21: 0000000000000000 x20: 0000000000000000 x19: ffffffc083bc3000 x18: ffffffffffffffff x17: ffffffc08117b080 x16: 0000000000000002 x15: ffffff80d2d40000 x14: 00000000000002da x13: ffffff80d2d404b8 x12: ffffffc082b5a5c8 x11: ffffffc082bca680 x10: ffffffc082bb2640 x9 : ffffffc082bb2698 x8 : 0000000000017fe8 x7 : c0000000ffffefff x6 : 0000000000000001 x5 : ffffff8178fe0d48 x4 : 0000000000000000 x3 : 0000000000000027 x2 : ffffff8178fe0d50 x1 : 0000000000000000 x0 : 0000000000000000 Call trace: __mutex_lock+0xd84/0x1068 mutex_lock_nested+0x28/0x34 tc_setup_taprio+0x118/0x68c stmmac_setup_tc+0x50/0xf0 taprio_change+0x868/0xc9c Fixes: b2aae654a479 ("net: stmmac: add mutex lock to protect est parameters") Signed-off-by: Xiaolei Wang <xiaolei.wang@windriver.com> Reviewed-by: Simon Horman <horms@kernel.org> Reviewed-by: Serge Semin <fancer.lancer@gmail.com> Reviewed-by: Andrew Halaney <ahalaney@redhat.com> Link: https://lore.kernel.org/r/20240513014346.1718740-2-xiaolei.wang@windriver.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-05-13mptcp: add net.mptcp.available_schedulersGregory Detal
The sysctl lists the available schedulers that can be set using net.mptcp.scheduler similarly to net.ipv4.tcp_available_congestion_control. Signed-off-by: Gregory Detal <gregory.detal@gmail.com> Reviewed-by: Mat Martineau <martineau@kernel.org> Tested-by: Geliang Tang <geliang@kernel.org> Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Signed-off-by: Mat Martineau <martineau@kernel.org> Link: https://lore.kernel.org/r/20240514011335.176158-5-martineau@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-05-13Merge tag 'x86-build-2024-05-13' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 build updates from Ingo Molnar: - Use -fpic to build the kexec 'purgatory' (the self-contained code that runs between two kernels) - Clean up vmlinux.lds.S generation - Simplify the X86_EXTENDED_PLATFORM section of the x86 Kconfig - Misc cleanups & fixes * tag 'x86-build-2024-05-13' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/Kconfig: Merge the two CONFIG_X86_EXTENDED_PLATFORM entries x86/purgatory: Switch to the position-independent small code model x86/boot: Replace __PHYSICAL_START with LOAD_PHYSICAL_ADDR x86/vmlinux.lds.S: Take __START_KERNEL out conditional definition x86/vmlinux.lds.S: Remove conditional definition of LOAD_OFFSET vmlinux.lds.h: Fix a typo in comment
2024-05-13Merge tag 'x86-boot-2024-05-13' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 boot updates from Ingo Molnar: - Move the kernel cmdline setup earlier in the boot process (again), to address a split_lock_detect= boot parameter bug - Ignore relocations in .notes sections - Simplify boot stack setup - Re-introduce a bootloader quirk wrt CR4 handling - Miscellaneous cleanups & fixes * tag 'x86-boot-2024-05-13' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/boot/64: Clear most of CR4 in startup_64(), except PAE, MCE and LA57 x86/boot: Move kernel cmdline setup earlier in the boot process (again) x86/build: Clean up arch/x86/tools/relocs.c a bit x86/boot: Ignore relocations in .notes sections in walk_relocs() too x86: Rename __{start,end}_init_task to __{start,end}_init_stack x86/boot: Simplify boot stack setup
2024-05-13tcp: rstreason: fully support in tcp_check_req()Jason Xing
We're going to send an RST due to invalid syn packet which is already checked whether 1) it is in sequence, 2) it is a retransmitted skb. As RFC 793 says, if the state of socket is not CLOSED/LISTEN/SYN-SENT, then we should send an RST when receiving bad syn packet: "fourth, check the SYN bit,...If the SYN is in the window it is an error, send a reset" Signed-off-by: Jason Xing <kernelxing@tencent.com> Link: https://lore.kernel.org/r/20240510122502.27850-6-kerneljasonxing@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-05-13tcp: rstreason: handle timewait cases in the receive pathJason Xing
There are two possible cases where TCP layer can send an RST. Since they happen in the same place, I think using one independent reason is enough to identify this special situation. Signed-off-by: Jason Xing <kernelxing@tencent.com> Link: https://lore.kernel.org/r/20240510122502.27850-5-kerneljasonxing@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-05-13tcp: rstreason: fully support in tcp_rcv_state_process()Jason Xing
Like the previous patch does in this series, finish the conversion map is enough to let rstreason mechanism work in this function. Signed-off-by: Jason Xing <kernelxing@tencent.com> Link: https://lore.kernel.org/r/20240510122502.27850-4-kerneljasonxing@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-05-13tcp: rstreason: fully support in tcp_ack()Jason Xing
Based on the existing skb drop reason, updating the rstreason map can help us finish the rstreason job in this function. Signed-off-by: Jason Xing <kernelxing@tencent.com> Link: https://lore.kernel.org/r/20240510122502.27850-3-kerneljasonxing@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-05-13tcp: rstreason: fully support in tcp_rcv_synsent_state_process()Jason Xing
In this function, only updating the map can finish the job for socket reset reason because the corresponding drop reasons are ready. Signed-off-by: Jason Xing <kernelxing@tencent.com> Link: https://lore.kernel.org/r/20240510122502.27850-2-kerneljasonxing@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-05-13net: stmmac: introduce pcs_init/pcs_exit stmmac operationsRussell King (Oracle)
Introduce a mechanism whereby platforms can create their PCS instances prior to the network device being published to userspace, but after some of the core stmmac initialisation has been completed. This means that the data structures that platforms need will be available. Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Reviewed-by: Maxime Chevallier <maxime.chevallier@bootlin.com> Reviewed-by: Serge Semin <fancer.lancer@gmail.com> Co-developed-by: Romain Gantois <romain.gantois@bootlin.com> Signed-off-by: Romain Gantois <romain.gantois@bootlin.com> Reviewed-by: Hariprasad Kelam <hkelam@marvell.com> Link: https://lore.kernel.org/r/20240513-rzn1-gmac1-v7-4-6acf58b5440d@bootlin.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-05-13net: pass back whether socket was empty post acceptJens Axboe
This adds an 'is_empty' argument to struct proto_accept_arg, which can be used to pass back information on whether or not the given socket has more connections to accept post the one just accepted. To utilize this information, the caller should initialize the 'is_empty' field to, eg, -1 and then check for 0/1 after the accept. If the field has been set, the caller knows whether there are more pending connections or not. If the field remains -1 after the accept call, the protocol doesn't support passing back this information. This patch wires it up for ipv4/6 TCP. Acked-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-05-13net: have do_accept() take a struct proto_accept_arg argumentJens Axboe
In preparation for passing in more information via this API, change do_accept() to take a proto_accept_arg struct pointer rather than just the file flags separately. No functional changes in this patch. Acked-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-05-13net: change proto and proto_ops accept typeJens Axboe
Rather than pass in flags, error pointer, and whether this is a kernel invocation or not, add a struct proto_accept_arg struct as the argument. This then holds all of these arguments, and prepares accept for being able to pass back more information. No functional changes in this patch. Acked-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-05-13Merge tag 'sched-core-2024-05-13' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull scheduler updates from Ingo Molnar: - Add cpufreq pressure feedback for the scheduler - Rework misfit load-balancing wrt affinity restrictions - Clean up and simplify the code around ::overutilized and ::overload access. - Simplify sched_balance_newidle() - Bump SCHEDSTAT_VERSION to 16 due to a cleanup of CPU_MAX_IDLE_TYPES handling that changed the output. - Rework & clean up <asm/vtime.h> interactions wrt arch_vtime_task_switch() - Reorganize, clean up and unify most of the higher level scheduler balancing function names around the sched_balance_*() prefix - Simplify the balancing flag code (sched_balance_running) - Miscellaneous cleanups & fixes * tag 'sched-core-2024-05-13' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (50 commits) sched/pelt: Remove shift of thermal clock sched/cpufreq: Rename arch_update_thermal_pressure() => arch_update_hw_pressure() thermal/cpufreq: Remove arch_update_thermal_pressure() sched/cpufreq: Take cpufreq feedback into account cpufreq: Add a cpufreq pressure feedback for the scheduler sched/fair: Fix update of rd->sg_overutilized sched/vtime: Do not include <asm/vtime.h> header s390/irq,nmi: Include <asm/vtime.h> header directly s390/vtime: Remove unused __ARCH_HAS_VTIME_TASK_SWITCH leftover sched/vtime: Get rid of generic vtime_task_switch() implementation sched/vtime: Remove confusing arch_vtime_task_switch() declaration sched/balancing: Simplify the sg_status bitmask and use separate ->overloaded and ->overutilized flags sched/fair: Rename set_rd_overutilized_status() to set_rd_overutilized() sched/fair: Rename SG_OVERLOAD to SG_OVERLOADED sched/fair: Rename {set|get}_rd_overload() to {set|get}_rd_overloaded() sched/fair: Rename root_domain::overload to ::overloaded sched/fair: Use helper functions to access root_domain::overload sched/fair: Check root_domain::overload value before update sched/fair: Combine EAS check with root_domain::overutilized access sched/fair: Simplify the continue_balancing logic in sched_balance_newidle() ...
2024-05-13Merge tag 'perf-core-2024-05-13' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull perf events updates from Ingo Molnar: - Combine perf and BPF for fast evalution of HW breakpoint conditions - Add LBR capture support outside of hardware events - Trigger IO signals for watermark_wakeup - Add RAPL support for Intel Arrow Lake and Lunar Lake - Optimize frequency-throttling - Miscellaneous cleanups & fixes * tag 'perf-core-2024-05-13' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (21 commits) perf/bpf: Mark perf_event_set_bpf_handler() and perf_event_free_bpf_handler() as inline too selftests/perf_events: Test FASYNC with watermark wakeups perf/ring_buffer: Trigger IO signals for watermark_wakeup perf: Move perf_event_fasync() to perf_event.h perf/bpf: Change the !CONFIG_BPF_SYSCALL stubs to static inlines selftest/bpf: Test a perf BPF program that suppresses side effects perf/bpf: Allow a BPF program to suppress all sample side effects perf/bpf: Remove unneeded uses_default_overflow_handler() perf/bpf: Call BPF handler directly, not through overflow machinery perf/bpf: Remove #ifdef CONFIG_BPF_SYSCALL from struct perf_event members perf/bpf: Create bpf_overflow_handler() stub for !CONFIG_BPF_SYSCALL perf/bpf: Reorder bpf_overflow_handler() ahead of __perf_event_overflow() perf/x86/rapl: Add support for Intel Lunar Lake perf/x86/rapl: Add support for Intel Arrow Lake perf/core: Reduce PMU access to adjust sample freq perf/core: Optimize perf_adjust_freq_unthr_context() perf/x86/amd: Don't reject non-sampling events with configured LBR perf/x86/amd: Support capturing LBR from software events perf/x86/amd: Avoid taking branches before disabling LBR perf/x86/amd: Ensure amd_pmu_core_disable_all() is always inlined ...
2024-05-13Merge tag 'locking-core-2024-05-13' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull locking updates from Ingo Molnar: - Over a dozen code generation micro-optimizations for the atomic and spinlock code - Add more __ro_after_init attributes - Robustify the lockdevent_*() macros * tag 'locking-core-2024-05-13' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: locking/pvqspinlock/x86: Use _Q_LOCKED_VAL in PV_UNLOCK_ASM macro locking/qspinlock/x86: Micro-optimize virt_spin_lock() locking/atomic/x86: Merge __arch{,_try}_cmpxchg64_emu_local() with __arch{,_try}_cmpxchg64_emu() locking/atomic/x86: Introduce arch_try_cmpxchg64_local() locking/pvqspinlock/x86: Remove redundant CMP after CMPXCHG in __raw_callee_save___pv_queued_spin_unlock() locking/pvqspinlock: Use try_cmpxchg() in qspinlock_paravirt.h locking/pvqspinlock: Use try_cmpxchg_acquire() in trylock_clear_pending() locking/qspinlock: Use atomic_try_cmpxchg_relaxed() in xchg_tail() locking/atomic/x86: Define arch_atomic_sub() family using arch_atomic_add() functions locking/atomic/x86: Rewrite x86_32 arch_atomic64_{,fetch}_{and,or,xor}() functions locking/atomic/x86: Introduce arch_atomic64_read_nonatomic() to x86_32 locking/atomic/x86: Introduce arch_atomic64_try_cmpxchg() to x86_32 locking/atomic/x86: Introduce arch_try_cmpxchg64() for !CONFIG_X86_CMPXCHG64 locking/atomic/x86: Modernize x86_32 arch_{,try_}_cmpxchg64{,_local}() locking/atomic/x86: Correct the definition of __arch_try_cmpxchg128() x86/tsc: Make __use_tsc __ro_after_init x86/kvm: Make kvm_async_pf_enabled __ro_after_init context_tracking: Make context_tracking_key __ro_after_init jump_label,module: Don't alloc static_key_mod for __ro_after_init keys locking/qspinlock: Always evaluate lockevent* non-event parameter once