Age | Commit message (Collapse) | Author |
|
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull thermal control updates from Rafael Wysocki:
"The most significant part of this is a rework of thermal governors,
including a redesign of the thermal governor interface and changes to
make some of them take trip point hysteresis into account properly, as
well as some related cleanups of the thermal governors and thermal
core.
The above is based on preliminary changes refactoring thermal data
structures and moving the definitions of some of them into the thermal
core which also ensure that trip point crossing notifications will be
sent to user space via netlink and recorded in the debug statistics in
temperature order.
In addition, netlink bind/unbind notifications are added to the
thermal core and the Intel HFI driver is modified to use them to avoid
sending netlink messages until there are subscribers.
Apart from that, multiple thermal drivers are updated which includes
new hardware support (MediaTek MT8188 and MT8186, Amlogic A1 thermal
sensor, Loongson-2K2000, Lmh QCM2290), fixes, cleanups and
documentation updates, and the recently added thermal debug code is
fixed and cleaned up.
Specifics:
- Redesign the thermal governor interface to allow the governors to
work in a more straightforward way (Rafael Wysocki)
- Make thermal governors take the current trip point thresholds into
account in their computations which allows trip hysteresis to be
observed more accurately (Rafael Wysocki)
- Make the thermal core manage passive polling for thermal zones and
remove passive polling management from thermal governors (Rafael
Wysocki)
- Refactor trip point representation and move the definition of
thermal governor and thermal zone device structures to the thermal
core (Rafael Wysocki)
- Sort trip point crossing notifications and debug recording of trip
point crossing events by temperature (Rafael Wysocki)
- Improve the handling of cooling device states and thermal
mitigation episodes in progress in the thermal debug code (Rafael
Wysocki)
- Avoid excessive updates of trip point statistics and clean up the
printing of thermal mitigation episode information (Rafael Wysocki)
- Clean up thermal governors and thermal core (Rafael Wysocki)
- Allow thermal drivers to register notifiers that will be invoked on
netlink events like BIND and UNBIND, so that they can adjust their
activity depending on whether or not there are any subscribers of
netlink messages coming from them, and make the Intel HFI driver
use this mechanism (Stanislaw Gruszka)
- Adjust the update delay and capabilities-per-event values in the
Intel HFI thermal driver to prevent it from missing events and
allow it to process more data in one go (Ricardo Neri)
- Add missing MODULE_DESCRIPTION() to multiple files in the
int340x_thermal and intel_soc_dts_iosf drivers (Srinivas
Pandruvada)
- Replace deprecated strncpy() with strscpy() in the int340x_thermal
driver (Justin Stitt)
- Add QCM2290 compatible DT bindings for Lmh and fix a NULL pointer
dereference in the lmh driver when the SCM is not present (Konrad
Dybcio)
- Use the strreplace() function instead of doing it manually in the
Armada driver (Rasmus Villemoes)
- Convert st,stih407-thermal to DT schema and fix up missing
properties (Raphael Gallais-Pou)
- Add suspend/resume by restoring the context of the tsens sensor
(Priyansh Jain)
- Support A1 SoC family Thermal Sensor controller and add the DT
bindings (Dmitry Rokosov)
- Improve the temperature approximation calculation and consolidate
the Tj constant into a shared area of the structure instead of
duplicating it on the Rcar Gen3 (Niklas Söderlund)
- Fix the Mediatek LVTS sensor coefficient for the MT8192 in order to
support it correctly (Hsin-Te Yuan)
- Fix a NULL pointer dereference in the tsens driver when the
function compute_intercept_slope() is called with a NULL parameter
(Aleksandr Mishin)
- Remove some unused fields in struct qpnp_tm_chip and k3_bandgap
(Christophe Jaillet)
- Fix up calibration efuse data decoding, consolidate the code by
checking boundaries and refactor some part of the LVTS Mediatek
driver. After setting the scene, add MT8186 and MT8188 along with
the DT bindings (Nicolas Pitre)
- Add Loongson-2K2000 support after some minor code adjustements and
providing the DT bindings definition (Binbin Zhou)"
* tag 'thermal-6.10-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (72 commits)
thermal: intel: hfi: Increase the number of CPU capabilities per netlink event
thermal: intel: hfi: Rename HFI_MAX_THERM_NOTIFY_COUNT
thermal: intel: hfi: Shorten the thermal netlink event delay to 100ms
thermal: intel: hfi: Rename HFI_UPDATE_INTERVAL
thermal: intel: Add missing module description
thermal: core: Move passive polling management to the core
thermal: core: Do not call handle_thermal_trip() if zone temperature is invalid
thermal: trip: Add missing empty code line
thermal/debugfs: Avoid printing zero duration for mitigation events in progress
thermal/debugfs: Pass cooling device state to thermal_debug_cdev_add()
thermal/debugfs: Create records for cdev states as they get used
thermal: core: Introduce thermal_governor_trip_crossed()
thermal/debugfs: Make tze_seq_show() skip invalid trips and trips with no stats
thermal/debugfs: Rename thermal_debug_update_temp() to thermal_debug_update_trip_stats()
thermal/debugfs: Clean up thermal_debug_update_temp()
thermal/debugfs: Avoid excessive updates of trip point statistics
thermal: core: Relocate critical and hot trip handling
thermal: core: Drop the .throttle() governor callback
thermal: gov_user_space: Use .trip_crossed() instead of .throttle()
thermal: gov_fair_share: Eliminate unnecessary integer divisions
...
|
|
This closely resembles helpers added for the global cgroup_rstat_lock in
commit fc29e04ae1ad ("cgroup/rstat: add cgroup_rstat_lock helpers and
tracepoints"). This is for the per CPU lock cgroup_rstat_cpu_lock.
Based on production workloads, we observe the fast-path "update" function
cgroup_rstat_updated() is invoked around 3 million times per sec, while the
"flush" function cgroup_rstat_flush_locked(), walking each possible CPU,
can see periodic spikes of 700 invocations/sec.
For this reason, the tracepoints are split into normal and fastpath
versions for this per-CPU lock. Making it feasible for production to
continuously monitor the non-fastpath tracepoint to detect lock contention
issues. The reason for monitoring is that lock disables IRQs which can
disturb e.g. softirq processing on the local CPUs involved. When the
global cgroup_rstat_lock stops disabling IRQs (e.g converted to a mutex),
this per CPU lock becomes the next bottleneck that can introduce latency
variations.
A practical bpftrace script for monitoring contention latency:
bpftrace -e '
tracepoint:cgroup:cgroup_rstat_cpu_lock_contended {
@start[tid]=nsecs; @cnt[probe]=count()}
tracepoint:cgroup:cgroup_rstat_cpu_locked {
if (args->contended) {
@wait_ns=hist(nsecs-@start[tid]); delete(@start[tid]);}
@cnt[probe]=count()}
interval:s:1 {time("%H:%M:%S "); print(@wait_ns); print(@cnt); clear(@cnt);}'
Signed-off-by: Jesper Dangaard Brouer <hawk@kernel.org>
Signed-off-by: Tejun Heo <tj@kernel.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest
Pull kunit updates from Shuah Khan:
- fix race condition in try-catch completion
- change __kunit_test_suites_init() to exit early if there is
nothing to test
- change string-stream-test to use KUNIT_DEFINE_ACTION_WRAPPER
- move fault tests behind KUNIT_FAULT_TEST Kconfig option
- kthread test fixes and improvements
- iov_iter test fixes
* tag 'linux_kselftest-kunit-6.10-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest:
kunit: bail out early in __kunit_test_suites_init() if there are no suites to test
kunit: string-stream-test: use KUNIT_DEFINE_ACTION_WRAPPER
kunit: test: Move fault tests behind KUNIT_FAULT_TEST Kconfig option
kunit: unregister the device on error
kunit: Fix race condition in try-catch completion
kunit: Add tests for fault
kunit: Print last test location on fault
kunit: Fix KUNIT_SUCCESS() calls in iov_iter tests
kunit: Handle test faults
kunit: Fix timeout message
kunit: Fix kthread reference
kunit: Handle thread creation error
|
|
There is no user of iucv_root outside of the core IUCV code left.
Therefore remove the EXPORT_SYMBOL.
Acked-by: Alexandra Winter <wintera@linux.ibm.com>
Link: https://lore.kernel.org/r/20240506194454.1160315-7-hca@linux.ibm.com
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
|
|
Provide iucv_alloc_device() and iucv_release_device() helper functions,
which can be used to deduplicate more or less identical IUCV device
allocation and release code in four different drivers.
Suggested-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Alexandra Winter <wintera@linux.ibm.com>
Link: https://lore.kernel.org/r/20240506194454.1160315-2-hca@linux.ibm.com
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux
Pull arm64 updates from Will Deacon:
"The most interesting parts are probably the mm changes from Ryan which
optimise the creation of the linear mapping at boot and (separately)
implement write-protect support for userfaultfd.
Outside of our usual directories, the Kbuild-related changes under
scripts/ have been acked by Masahiro whilst the drivers/acpi/ parts
have been acked by Rafael and the addition of cpumask_any_and_but()
has been acked by Yury.
ACPI:
- Support for the Firmware ACPI Control Structure (FACS) signature
feature which is used to reboot out of hibernation on some systems
Kbuild:
- Support for building Flat Image Tree (FIT) images, where the kernel
Image is compressed alongside a set of devicetree blobs
Memory management:
- Optimisation of our early page-table manipulation for creation of
the linear mapping
- Support for userfaultfd write protection, which brings along some
nice cleanups to our handling of invalid but present ptes
- Extend our use of range TLBI invalidation at EL1
Perf and PMUs:
- Ensure that the 'pmu->parent' pointer is correctly initialised by
PMU drivers
- Avoid allocating 'cpumask_t' types on the stack in some PMU drivers
- Fix parsing of the CPU PMU "version" field in assembly code, as it
doesn't follow the usual architectural rules
- Add best-effort unwinding support for USER_STACKTRACE
- Minor driver fixes and cleanups
Selftests:
- Minor cleanups to the arm64 selftests (missing NULL check, unused
variable)
Miscellaneous:
- Add a command-line alias for disabling 32-bit application support
- Add part number for Neoverse-V2 CPUs
- Minor fixes and cleanups"
* tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: (64 commits)
arm64/mm: Fix pud_user_accessible_page() for PGTABLE_LEVELS <= 2
arm64/mm: Add uffd write-protect support
arm64/mm: Move PTE_PRESENT_INVALID to overlay PTE_NG
arm64/mm: Remove PTE_PROT_NONE bit
arm64/mm: generalize PMD_PRESENT_INVALID for all levels
arm64: simplify arch_static_branch/_jump function
arm64: Add USER_STACKTRACE support
arm64: Add the arm64.no32bit_el0 command line option
drivers/perf: hisi: hns3: Actually use devm_add_action_or_reset()
drivers/perf: hisi: hns3: Fix out-of-bound access when valid event group
drivers/perf: hisi_pcie: Fix out-of-bound access when valid event group
kselftest: arm64: Add a null pointer check
arm64: defer clearing DAIF.D
arm64: assembler: update stale comment for disable_step_tsk
arm64/sysreg: Update PIE permission encodings
kselftest/arm64: Remove unused parameters in abi test
perf/arm-spe: Assign parents for event_source device
perf/arm-smmuv3: Assign parents for event_source device
perf/arm-dsu: Assign parents for event_source device
perf/arm-dmc620: Assign parents for event_source device
...
|
|
Merge in late fixes to prepare for the 6.10 net-next PR.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Add 2 byte padding to napi_gro_cb struct to ensure zeroed member is
aligned after flush_id member was removed in the original commit.
Fixes: 4b0ebbca3e16 ("net: gro: move L3 flush checks to tcp_gro_receive and udp_gro_receive_segment")
Suggested-by: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: Richard Gobert <richardbgobert@gmail.com>
Link: https://lore.kernel.org/r/fca08735-c245-49e5-af72-82900634f144@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull interrupt subsystem updates from Thomas Gleixner:
"Core code:
- Interrupt storm detection for the lockup watchdog:
Lockups which are caused by interrupt storms are not easy to debug
because there is no information about the events which make the
lockup detector trigger.
To make this more user friendly, provide an extenstion to interrupt
statistics which allows to take snapshots and an interface to
retrieve the delta to the snapshot. Use this new mechanism in the
watchdog code to do a two stage lockup analysis by taking the
snapshot and printing the deltas for the topmost active interrupts
on the second trigger.
Note: This contains both the interrupt and the watchdog changes as
the latter depend on the former obviously.
- Avoid summation loops in the /proc/interrupts output and use the
global counter when possible
- Skip suspended interrupts on CPU hotplug operations to ensure that
they are not delivered before the system resumes the device drivers
when coming out of suspend.
- On CPU hot-unplug interrupts which are affine to the outgoing CPU
are migrated to a different CPU in the affinity mask. This can fail
when the CPUs have no vectors left. Instead of giving up try to
migrate it to any online CPU and thereby breaking the affinity
setting in order to prevent a stale device interrupt which targets
an offline CPU
- The usual small cleanups
Driver code:
- Support for the RISCV AIA MSI controller
- Make the interrupt allocation for the Loongson PCH controller more
flexible to prevent vector exhaustion
- The usual set of cleanups and fixes all over the place"
* tag 'irq-core-2024-05-12' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (51 commits)
irqchip/gic-v3-its: Remove BUG_ON in its_vpe_irq_domain_alloc
cpuidle: Avoid explicit cpumask allocation on stack
irqchip/sifive-plic: Avoid explicit cpumask allocation on stack
irqchip/riscv-aplic-direct: Avoid explicit cpumask allocation on stack
irqchip/loongson-eiointc: Avoid explicit cpumask allocation on stack
irqchip/gic-v3-its: Avoid explicit cpumask allocation on stack
irqchip/irq-bcm6345-l1: Avoid explicit cpumask allocation on stack
cpumask: Introduce cpumask_first_and_and()
irqchip/irq-brcmstb-l2: Avoid saving mask on shutdown
genirq: Reuse irq_is_nmi()
genirq/cpuhotplug: Retry with cpu_online_mask when migration fails
genirq/cpuhotplug: Skip suspended interrupts when restoring affinity
arm64: dts: st: Add interrupt parent to pinctrl on stm32mp251
arm64: dts: st: Add exti1 and exti2 nodes on stm32mp251
ARM: dts: stm32: List exti parent interrupts on stm32mp131
ARM: dts: stm32: List exti parent interrupts on stm32mp151
arm64: Kconfig.platforms: Enable STM32_EXTI for ARCH_STM32
irqchip/stm32-exti: Mark events reserved with RIF configuration check
irqchip/stm32-exti: Skip secure events
irqchip/stm32-exti: Convert driver to standard PM
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull timers and timekeeping updates from Thomas Gleixner:
"Core code:
- Make timekeeping and VDSO time readouts resilent against math
overflow:
In guest context the kernel is prone to math overflow when the host
defers the timer interrupt due to overload, malfunction or malice.
This can be mitigated by checking the clocksource delta for the
maximum deferrement which is readily available. If that value is
exceeded then the code uses a slowpath function which can handle
the multiplication overflow.
This functionality is enabled unconditionally in the kernel, but
made conditional in the VDSO code. The latter is conditional
because it allows architectures to optimize the check so it is not
causing performance regressions.
On X86 this is achieved by reworking the existing check for
negative TSC deltas as a negative delta obviously exceeds the
maximum deferrement when it is evaluated as an unsigned value. That
avoids two conditionals in the hotpath and allows to hide both the
negative delta and the large delta handling in the same slow path.
- Add an initial minimal ktime_t abstraction for Rust
- The usual boring cleanups and enhancements
Drivers:
- Boring updates to device trees and trivial enhancements in various
drivers"
* tag 'timers-core-2024-05-12' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (33 commits)
clocksource/drivers/arm_arch_timer: Mark hisi_161010101_oem_info const
clocksource/drivers/timer-ti-dm: Remove an unused field in struct dmtimer
clocksource/drivers/renesas-ostm: Avoid reprobe after successful early probe
clocksource/drivers/renesas-ostm: Allow OSTM driver to reprobe for RZ/V2H(P) SoC
dt-bindings: timer: renesas: ostm: Document Renesas RZ/V2H(P) SoC
rust: time: doc: Add missing C header links
clocksource: Make the int help prompt unit readable in ncurses
hrtimer: Rename __hrtimer_hres_active() to hrtimer_hres_active()
timerqueue: Remove never used function timerqueue_node_expires()
rust: time: Add Ktime
vdso: Fix powerpc build U64_MAX undeclared error
clockevents: Convert s[n]printf() to sysfs_emit()
clocksource: Convert s[n]printf() to sysfs_emit()
clocksource: Make watchdog and suspend-timing multiplication overflow safe
timekeeping: Let timekeeping_cycles_to_ns() handle both under and overflow
timekeeping: Make delta calculation overflow safe
timekeeping: Prepare timekeeping_cycles_to_ns() for overflow safety
timekeeping: Fold in timekeeping_delta_to_ns()
timekeeping: Consolidate timekeeping helpers
timekeeping: Refactor timekeeping helpers
...
|
|
If hdev->le_num_of_adv_sets is set to 1 it means that only handle 0x00
can be used, but since the MGMT interface instances start from 1
(instance 0 means all instances in case of MGMT_OP_REMOVE_ADVERTISING)
the code needs to map the instance to handle otherwise users will not be
able to advertise as instance 1 would attempt to use handle 0x01.
Fixes: 1d0fac2c38ed ("Bluetooth: Use controller sets when available")
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
|
|
Since BT_HS has been remove HCI_AMP controllers no longer has any use so
remove it along with the capability of creating AMP controllers.
Since we no longer need to differentiate between AMP and Primary
controllers, as only HCI_PRIMARY is left, this also remove
hdev->dev_type altogether.
Fixes: e7b02296fb40 ("Bluetooth: Remove BT_HS")
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
|
|
l2cap_le_flowctl_init() can cause both div-by-zero and an integer
overflow since hdev->le_mtu may not fall in the valid range.
Move MTU from hci_dev to hci_conn to validate MTU and stop the connection
process earlier if MTU is invalid.
Also, add a missing validation in read_buffer_size() and make it return
an error value if the validation fails.
Now hci_conn_add() returns ERR_PTR() as it can fail due to the both a
kzalloc failure and invalid MTU value.
divide error: 0000 [#1] PREEMPT SMP KASAN NOPTI
CPU: 0 PID: 67 Comm: kworker/u5:0 Tainted: G W 6.9.0-rc5+ #20
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.15.0-1 04/01/2014
Workqueue: hci0 hci_rx_work
RIP: 0010:l2cap_le_flowctl_init+0x19e/0x3f0 net/bluetooth/l2cap_core.c:547
Code: e8 17 17 0c 00 66 41 89 9f 84 00 00 00 bf 01 00 00 00 41 b8 02 00 00 00 4c
89 fe 4c 89 e2 89 d9 e8 27 17 0c 00 44 89 f0 31 d2 <66> f7 f3 89 c3 ff c3 4d 8d
b7 88 00 00 00 4c 89 f0 48 c1 e8 03 42
RSP: 0018:ffff88810bc0f858 EFLAGS: 00010246
RAX: 00000000000002a0 RBX: 0000000000000000 RCX: dffffc0000000000
RDX: 0000000000000000 RSI: ffff88810bc0f7c0 RDI: ffffc90002dcb66f
RBP: ffff88810bc0f880 R08: aa69db2dda70ff01 R09: 0000ffaaaaaaaaaa
R10: 0084000000ffaaaa R11: 0000000000000000 R12: ffff88810d65a084
R13: dffffc0000000000 R14: 00000000000002a0 R15: ffff88810d65a000
FS: 0000000000000000(0000) GS:ffff88811ac00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000020000100 CR3: 0000000103268003 CR4: 0000000000770ef0
PKRU: 55555554
Call Trace:
<TASK>
l2cap_le_connect_req net/bluetooth/l2cap_core.c:4902 [inline]
l2cap_le_sig_cmd net/bluetooth/l2cap_core.c:5420 [inline]
l2cap_le_sig_channel net/bluetooth/l2cap_core.c:5486 [inline]
l2cap_recv_frame+0xe59d/0x11710 net/bluetooth/l2cap_core.c:6809
l2cap_recv_acldata+0x544/0x10a0 net/bluetooth/l2cap_core.c:7506
hci_acldata_packet net/bluetooth/hci_core.c:3939 [inline]
hci_rx_work+0x5e5/0xb20 net/bluetooth/hci_core.c:4176
process_one_work kernel/workqueue.c:3254 [inline]
process_scheduled_works+0x90f/0x1530 kernel/workqueue.c:3335
worker_thread+0x926/0xe70 kernel/workqueue.c:3416
kthread+0x2e3/0x380 kernel/kthread.c:388
ret_from_fork+0x5c/0x90 arch/x86/kernel/process.c:147
ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:244
</TASK>
Modules linked in:
---[ end trace 0000000000000000 ]---
Fixes: 6ed58ec520ad ("Bluetooth: Use LE buffers for LE traffic")
Suggested-by: Luiz Augusto von Dentz <luiz.dentz@gmail.com>
Signed-off-by: Sungwoo Kim <iam@sung-woo.kim>
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
|
|
Prepare for the coming implementation by GCC and Clang of the
__counted_by attribute. Flexible array members annotated with
__counted_by can have their accesses bounds-checked at run-time
via CONFIG_UBSAN_BOUNDS (for array indexing) and CONFIG_FORTIFY_SOURCE
(for strcpy/memcpy-family functions).
Also, -Wflex-array-member-not-at-end is coming in GCC-14, and we are
getting ready to enable it globally.
So, use the `DEFINE_FLEX()` helper for an on-stack definition of
a flexible structure where the size of the flexible-array member
is known at compile-time, and refactor the rest of the code,
accordingly.
With these changes, fix the following warning:
net/bluetooth/hci_conn.c:669:41: warning: structure containing a
flexible array member is not at the end of another structure
[-Wflex-array-member-not-at-end]
Link: https://github.com/KSPP/linux/issues/202
Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>
Reviewed-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
|
|
On our DUT, we can see that the host issues create connection cancel
command after 4-sec if there is no connection complete event for
LE create connection cmd.
As per core spec v5.3 section 7.8.5, advertisement interval range is-
Advertising_Interval_Min
Default : 0x0800(1.28s)
Time Range: 20ms to 10.24s
Advertising_Interval_Max
Default : 0x0800(1.28s)
Time Range: 20ms to 10.24s
If the remote device is using adv interval of > 4 sec, it is
difficult to make a connection with the current timeout value.
Also, with the default interval of 1.28 sec, we will get only
3 chances to capture the adv packets with the 4 sec window.
Hence we want to increase this timeout to 20sec.
Signed-off-by: Mahesh Talewad <mahesh.talewad@nxp.com>
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
|
|
Previously LE flow credits were returned to the
sender even if the socket's receive buffer was
full. This meant that no back-pressure
was applied to the sender, thus it continued to
send data, resulting in data loss without any
error being reported. Furthermore, the amount
of credits was essentially fixed to a small
amount, leading to reduced performance.
This is fixed by computing the number of returned
LE flow credits based on the estimated available
space in the receive buffer of an L2CAP socket.
Consequently, if the receive buffer is full, no
credits are returned until the buffer is read and
thus cleared by user-space.
Since the computation of available receive buffer
space can only be performed approximately (due to
sk_buff overhead) and the receive buffer size may
be changed by user-space after flow credits have
been sent, superfluous received data is temporary
stored within l2cap_pinfo. This is necessary
because Bluetooth LE provides no retransmission
mechanism once the data has been acked by the
physical layer.
If receive buffer space estimation is not possible
at the moment, we fall back to providing credits
for one full packet as before. This is currently
the case during connection setup, when MPS is not
yet available.
Fixes: b1c325c23d75 ("Bluetooth: Implement returning of LE L2CAP credits")
Signed-off-by: Sebastian Urban <surban@surban.net>
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
|
|
Prepare for the coming implementation by GCC and Clang of the
__counted_by attribute. Flexible array members annotated with
__counted_by can have their accesses bounds-checked at run-time
via CONFIG_UBSAN_BOUNDS (for array indexing) and CONFIG_FORTIFY_SOURCE
(for strcpy/memcpy-family functions).
Also, -Wflex-array-member-not-at-end is coming in GCC-14, and we are
getting ready to enable it globally.
So, use the `DEFINE_FLEX()` helper for an on-stack definition of
a flexible structure where the size of the flexible-array member
is known at compile-time, and refactor the rest of the code,
accordingly.
With these changes, fix the following warning:
net/bluetooth/hci_conn.c:2116:50: warning: structure containing a flexible
array member is not at the end of another structure
[-Wflex-array-member-not-at-end]
Link: https://github.com/KSPP/linux/issues/202
Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>
Reviewed-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
|
|
Prepare for the coming implementation by GCC and Clang of the
__counted_by attribute. Flexible array members annotated with
__counted_by can have their accesses bounds-checked at run-time
via CONFIG_UBSAN_BOUNDS (for array indexing) and CONFIG_FORTIFY_SOURCE
(for strcpy/memcpy-family functions).
Also, -Wflex-array-member-not-at-end is coming in GCC-14, and we are
getting ready to enable it globally.
So, use the `DEFINE_FLEX()` helper for multiple on-stack definitions
of a flexible structure where the size of the flexible-array member
is known at compile-time, and refactor the rest of the code,
accordingly.
Notice that, due to the use of `__counted_by()` in `struct
hci_cp_le_create_cis`, the for loop in function `hci_cs_le_create_cis()`
had to be modified. Once the index `i`, through which `cp->cis[i]` is
accessed, falls in the interval [0, cp->num_cis), `cp->num_cis` cannot
be decremented all the way down to zero while accessing `cp->cis[]`:
net/bluetooth/hci_event.c:4310:
4310 for (i = 0; cp->num_cis; cp->num_cis--, i++) {
...
4314 handle = __le16_to_cpu(cp->cis[i].cis_handle);
otherwise, only half (one iteration before `cp->num_cis == i`) or half
plus one (one iteration before `cp->num_cis < i`) of the items in the
array will be accessed before running into an out-of-bounds issue. So,
in order to avoid this, set `cp->num_cis` to zero just after the for
loop.
Also, make use of `aux_num_cis` variable to update `cmd->num_cis` after
a `list_for_each_entry_rcu()` loop.
With these changes, fix the following warnings:
net/bluetooth/hci_sync.c:1239:56: warning: structure containing a flexible
array member is not at the end of another structure
[-Wflex-array-member-not-at-end]
net/bluetooth/hci_sync.c:1415:51: warning: structure containing a flexible
array member is not at the end of another structure
[-Wflex-array-member-not-at-end]
net/bluetooth/hci_sync.c:1731:51: warning: structure containing a flexible
array member is not at the end of another structure
[-Wflex-array-member-not-at-end]
net/bluetooth/hci_sync.c:6497:45: warning: structure containing a flexible
array member is not at the end of another structure
[-Wflex-array-member-not-at-end]
Link: https://github.com/KSPP/linux/issues/202
Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
|
|
-Wflex-array-member-not-at-end is coming in GCC-14, and we are getting
ready to enable it globally.
There are currently a couple of objects (`req` and `rsp`), in a couple
of structures, that contain flexible structures (`struct l2cap_ecred_conn_req`
and `struct l2cap_ecred_conn_rsp`), for example:
struct l2cap_ecred_rsp_data {
struct {
struct l2cap_ecred_conn_rsp rsp;
__le16 scid[L2CAP_ECRED_MAX_CID];
} __packed pdu;
int count;
};
in the struct above, `struct l2cap_ecred_conn_rsp` is a flexible
structure:
struct l2cap_ecred_conn_rsp {
__le16 mtu;
__le16 mps;
__le16 credits;
__le16 result;
__le16 dcid[];
};
So, in order to avoid ending up with a flexible-array member in the
middle of another structure, we use the `struct_group_tagged()` (and
`__struct_group()` when the flexible structure is `__packed`) helper
to separate the flexible array from the rest of the members in the
flexible structure:
struct l2cap_ecred_conn_rsp {
struct_group_tagged(l2cap_ecred_conn_rsp_hdr, hdr,
... the rest of members
);
__le16 dcid[];
};
With the change described above, we now declare objects of the type of
the tagged struct, in this example `struct l2cap_ecred_conn_rsp_hdr`,
without embedding flexible arrays in the middle of other structures:
struct l2cap_ecred_rsp_data {
struct {
struct l2cap_ecred_conn_rsp_hdr rsp;
__le16 scid[L2CAP_ECRED_MAX_CID];
} __packed pdu;
int count;
};
Also, when the flexible-array member needs to be accessed, we use
`container_of()` to retrieve a pointer to the flexible structure.
We also use the `DEFINE_RAW_FLEX()` helper for a couple of on-stack
definitions of a flexible structure where the size of the flexible-array
member is known at compile-time.
So, with these changes, fix the following warnings:
net/bluetooth/l2cap_core.c:1260:45: warning: structure containing a
flexible array member is not at the end of another structure
[-Wflex-array-member-not-at-end]
net/bluetooth/l2cap_core.c:3740:45: warning: structure containing a
flexible array member is not at the end of another structure
[-Wflex-array-member-not-at-end]
net/bluetooth/l2cap_core.c:4999:45: warning: structure containing a
flexible array member is not at the end of another structure
[-Wflex-array-member-not-at-end]
net/bluetooth/l2cap_core.c:7116:47: warning: structure containing a
flexible array member is not at the end of another structure
[-Wflex-array-member-not-at-end]
Link: https://github.com/KSPP/linux/issues/202
Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
|
|
In case of a Broadcast Source that has PA enabled but no active BIG,
a Broadcast Sink needs to establish PA sync and parse BASE from PA
reports.
This commit moves the allocation of a PA sync hcon from the BIGInfo
advertising report event to the PA sync established event. After the
first complete PA report, the hcon is notified to the ISO layer. A
child socket is allocated and enqueued in the parent's accept queue.
BIGInfo reports also need to be processed, to extract the encryption
field and inform userspace. After the first BIGInfo report is received,
the PA sync hcon is notified again to the ISO layer. Since a socket will
be found this time, the socket state will transition to BT_CONNECTED and
the userspace will be woken up using sk_state_change.
Signed-off-by: Iulia Tanasescu <iulia.tanasescu@nxp.com>
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
|
|
This makes iso_get_sock_listen more generic, to return matching socket
in the state provided as argument.
Signed-off-by: Iulia Tanasescu <iulia.tanasescu@nxp.com>
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
|
|
This adds proper definitions for scan interval and window and then make
use of them instead their values.
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
|
|
This patch fixes the following sparse warnings:
drivers/block/nbd.c: note: in included file (through include/trace/trace_events.h, include/trace/define_trace.h, include/trace/events/nbd.h):
./include/trace/events/nbd.h:61:1: warning: Using plain integer as NULL pointer
drivers/block/nbd.c: note: in included file (through include/trace/perf.h, include/trace/define_trace.h, include/trace/events/nbd.h):
./include/trace/events/nbd.h:61:1: warning: Using plain integer as NULL pointer
Cc: Christoph Hellwig <hch@lst.de>
Cc: Josef Bacik <jbacik@fb.com>
Cc: Yu Kuai <yukuai3@huawei.com>
Cc: Markus Pargmann <mpa@pengutronix.de>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Link: https://lore.kernel.org/r/20240510202313.25209-2-bvanassche@acm.org
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Fix a misspelled cancellation in the comments.
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Link: https://lore.kernel.org/r/20240503173843.2922111-1-andriy.shevchenko@linux.intel.com
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
|
|
kprobes depended on CONFIG_MODULES because it has to allocate memory for
code.
Since code allocations are now implemented with execmem, kprobes can be
enabled in non-modular kernels.
Add #ifdef CONFIG_MODULE guards for the code dealing with kprobes inside
modules, make CONFIG_KPROBES select CONFIG_EXECMEM and drop the
dependency of CONFIG_KPROBES on CONFIG_MODULES.
Signed-off-by: Mike Rapoport (IBM) <rppt@kernel.org>
Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
[mcgrof: rebase in light of NEED_TASKS_RCU ]
Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
|
|
Extend execmem parameters to accommodate more complex overrides of
module_alloc() by architectures.
This includes specification of a fallback range required by arm, arm64
and powerpc, EXECMEM_MODULE_DATA type required by powerpc, support for
allocation of KASAN shadow required by s390 and x86 and support for
late initialization of execmem required by arm64.
The core implementation of execmem_alloc() takes care of suppressing
warnings when the initial allocation fails but there is a fallback range
defined.
Signed-off-by: Mike Rapoport (IBM) <rppt@kernel.org>
Acked-by: Will Deacon <will@kernel.org>
Acked-by: Song Liu <song@kernel.org>
Tested-by: Liviu Dudau <liviu@dudau.co.uk>
Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
|
|
Several architectures override module_alloc() only to define address
range for code allocations different than VMALLOC address space.
Provide a generic implementation in execmem that uses the parameters for
address space ranges, required alignment and page protections provided
by architectures.
The architectures must fill execmem_info structure and implement
execmem_arch_setup() that returns a pointer to that structure. This way the
execmem initialization won't be called from every architecture, but rather
from a central place, namely a core_initcall() in execmem.
The execmem provides execmem_alloc() API that wraps __vmalloc_node_range()
with the parameters defined by the architectures. If an architecture does
not implement execmem_arch_setup(), execmem_alloc() will fall back to
module_alloc().
Signed-off-by: Mike Rapoport (IBM) <rppt@kernel.org>
Acked-by: Song Liu <song@kernel.org>
Reviewed-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
|
|
module_alloc() is used everywhere as a mean to allocate memory for code.
Beside being semantically wrong, this unnecessarily ties all subsystems
that need to allocate code, such as ftrace, kprobes and BPF to modules and
puts the burden of code allocation to the modules code.
Several architectures override module_alloc() because of various
constraints where the executable memory can be located and this causes
additional obstacles for improvements of code allocation.
Start splitting code allocation from modules by introducing execmem_alloc()
and execmem_free() APIs.
Initially, execmem_alloc() is a wrapper for module_alloc() and
execmem_free() is a replacement of module_memfree() to allow updating all
call sites to use the new APIs.
Since architectures define different restrictions on placement,
permissions, alignment and other parameters for memory that can be used by
different subsystems that allocate executable memory, execmem_alloc() takes
a type argument, that will be used to identify the calling subsystem and to
allow architectures define parameters for ranges suitable for that
subsystem.
No functional changes.
Signed-off-by: Mike Rapoport (IBM) <rppt@kernel.org>
Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Acked-by: Song Liu <song@kernel.org>
Acked-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
|
|
Commit 8788ca164eb4b ("ftrace: Remove the legacy _ftrace_direct API")
stopped setting the 'ftrace_direct_func_count' variable, but left
it around. Clean it up.
Link: https://lore.kernel.org/linux-trace-kernel/20240506233305.215735-1-linux@treblig.org
Signed-off-by: Dr. David Alan Gilbert <linux@treblig.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
|
|
Commit 8788ca164eb4b ("ftrace: Remove the legacy _ftrace_direct API")
stopped using 'ftrace_direct_funcs' (and the associated
struct ftrace_direct_func). Remove them.
Build tested only (on x86-64 with FTRACE and DYNAMIC_FTRACE
enabled)
Link: https://lore.kernel.org/linux-trace-kernel/20240504132303.67538-1-linux@treblig.org
Signed-off-by: Dr. David Alan Gilbert <linux@treblig.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
|
|
Pull DSA and IAA accelerator mis-alignment fix from Arjan van de Ven:
"The DSA (memory copy/zero/etc) and IAA (compression) accelerators in
the Sapphire Rapids and Emerald Rapids SOCs turn out to have a bug
that has security implications.
Both of these accelerators work by the application submitting a 64
byte command to the device; this command contains an opcode as well as
the virtual address of the return value that the device will update on
completion... and a set of opcode specific values.
In a typical scenario a ring 3 application mmaps the device file and
uses the ENQCMD or MOVDIR64 instructions (which are variations of a 64
byte atomic write) on this mmap'd memory region to directly submit
commands to a device hardware.
The return value as specified in the command, is supposed to be 32 (or
64) bytes aligned in memory, and generally the hardware checks and
enforces this alignment.
However in testing it has been found that there are conditions
(controlled by the submitter) where this enforcement does not
happen... which makes it possible for the return value to span a page
boundary. And this is where it goes wrong - the accelerators will
perform the virtual to physical address lookup on the first of the two
pages, but end up continue writing to the next consecutive physical
(host) page rather than the consecutive virtual page. In addition, the
device will end up in a hung state on such unaligned write of the
return value.
This patch series has the proposed software side solution consisting
of three parts:
- Don't allow these two PCI devices to be assigned to VM guests (we
cannot trust a VM guest to behave correctly and not cause this
condition)
- Don't allow ring 3 applications to set up the mmap unless they have
CAP_SYS_RAWIO permissions. This makes it no longer possible for
non-root applications to directly submit commands to the
accelerator
- Add a write() method to the device so that an application can
submit its commands to the kernel driver, which performs the needed
sanity checks before submitting it to the hardware.
This switch from mmap to write is an incompatible interface change to
non-root userspace, but we have not found a way to avoid this. All
software we know of uses a small set of accessor libraries for these
accelerators, for which libqpl and libdml (on github) are the most
common. As part of the security release, updated versions of these
libraries will be released that transparently fall back to write().
Intel has assigned CVE-2024-21823 to this hardware issue"
* tag 'idxd-for-linus-may2024' of git bundle from Arjan:
dmaengine: idxd: add a write() method for applications to submit work
dmaengine: idxd: add a new security check to deal with a hardware erratum
VFIO: Add the SPR_DSA and SPR_IAX devices to the denylist
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 cpu updates from Ingo Molnar:
- Rework the x86 CPU vendor/family/model code: introduce the 'VFM'
value that is an 8+8+8 bit concatenation of the vendor/family/model
value, and add macros that work on VFM values. This simplifies the
addition of new Intel models & families, and simplifies existing
enumeration & quirk code.
- Add support for the AMD 0x80000026 leaf, to better parse topology
information
- Optimize the NUMA allocation layout of more per-CPU data structures
- Improve the workaround for AMD erratum 1386
- Clear TME from /proc/cpuinfo as well, when disabled by the firmware
- Improve x86 self-tests
- Extend the mce_record tracepoint with the ::ppin and ::microcode fields
- Implement recovery for MCE errors in TDX/SEAM non-root mode
- Misc cleanups and fixes
* tag 'x86-cpu-2024-05-13' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (34 commits)
x86/mm: Switch to new Intel CPU model defines
x86/tsc_msr: Switch to new Intel CPU model defines
x86/tsc: Switch to new Intel CPU model defines
x86/cpu: Switch to new Intel CPU model defines
x86/resctrl: Switch to new Intel CPU model defines
x86/microcode/intel: Switch to new Intel CPU model defines
x86/mce: Switch to new Intel CPU model defines
x86/cpu: Switch to new Intel CPU model defines
x86/cpu/intel_epb: Switch to new Intel CPU model defines
x86/aperfmperf: Switch to new Intel CPU model defines
x86/apic: Switch to new Intel CPU model defines
perf/x86/msr: Switch to new Intel CPU model defines
perf/x86/intel/uncore: Switch to new Intel CPU model defines
perf/x86/intel/pt: Switch to new Intel CPU model defines
perf/x86/lbr: Switch to new Intel CPU model defines
perf/x86/intel/cstate: Switch to new Intel CPU model defines
x86/bugs: Switch to new Intel CPU model defines
x86/bugs: Switch to new Intel CPU model defines
x86/cpu/vfm: Update arch/x86/include/asm/intel-family.h
x86/cpu/vfm: Add new macros to work with (vendor/family/model) values
...
|
|
The series is causing issues with PHY drivers built as modules.
Since it was only partially applied and the merge window has
opened let's revert and try again for v6.11.
Revert 6916e461e793 ("net: phy: Introduce ethernet link topology representation")
Revert 0ec5ed6c130e ("net: sfp: pass the phy_device when disconnecting an sfp module's PHY")
Revert e75e4e074c44 ("net: phy: add helpers to handle sfp phy connect/disconnect")
Revert fdd353965b52 ("net: sfp: Add helper to return the SFP bus name")
Revert 841942bc6212 ("net: ethtool: Allow passing a phy index for some commands")
Link: https://lore.kernel.org/all/171242462917.4000.9759453824684907063.git-patchwork-notify@kernel.org/
Link: https://lore.kernel.org/all/20240507102822.2023826-1-maxime.chevallier@bootlin.com/
Link: https://lore.kernel.org/r/20240513154156.104281-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Move the EST structure to struct stmmac_priv, because the
EST configs don't look like platform config, but EST is
enabled in runtime with the settings retrieved for the TC
TAPRIO feature also in runtime. So it's better to have the
EST-data preserved in the driver private data instead of
the platform data storage.
Signed-off-by: Xiaolei Wang <xiaolei.wang@windriver.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Reviewed-by: Serge Semin <fancer.lancer@gmail.com>
Reviewed-by: Andrew Halaney <ahalaney@redhat.com>
Link: https://lore.kernel.org/r/20240513014346.1718740-3-xiaolei.wang@windriver.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Reinitialize the whole EST structure would also reset the mutex
lock which is embedded in the EST structure, and then trigger
the following warning. To address this, move the lock to struct
stmmac_priv. We also need to reacquire the mutex lock when doing
this initialization.
DEBUG_LOCKS_WARN_ON(lock->magic != lock)
WARNING: CPU: 3 PID: 505 at kernel/locking/mutex.c:587 __mutex_lock+0xd84/0x1068
Modules linked in:
CPU: 3 PID: 505 Comm: tc Not tainted 6.9.0-rc6-00053-g0106679839f7-dirty #29
Hardware name: NXP i.MX8MPlus EVK board (DT)
pstate: 60000005 (nZCv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
pc : __mutex_lock+0xd84/0x1068
lr : __mutex_lock+0xd84/0x1068
sp : ffffffc0864e3570
x29: ffffffc0864e3570 x28: ffffffc0817bdc78 x27: 0000000000000003
x26: ffffff80c54f1808 x25: ffffff80c9164080 x24: ffffffc080d723ac
x23: 0000000000000000 x22: 0000000000000002 x21: 0000000000000000
x20: 0000000000000000 x19: ffffffc083bc3000 x18: ffffffffffffffff
x17: ffffffc08117b080 x16: 0000000000000002 x15: ffffff80d2d40000
x14: 00000000000002da x13: ffffff80d2d404b8 x12: ffffffc082b5a5c8
x11: ffffffc082bca680 x10: ffffffc082bb2640 x9 : ffffffc082bb2698
x8 : 0000000000017fe8 x7 : c0000000ffffefff x6 : 0000000000000001
x5 : ffffff8178fe0d48 x4 : 0000000000000000 x3 : 0000000000000027
x2 : ffffff8178fe0d50 x1 : 0000000000000000 x0 : 0000000000000000
Call trace:
__mutex_lock+0xd84/0x1068
mutex_lock_nested+0x28/0x34
tc_setup_taprio+0x118/0x68c
stmmac_setup_tc+0x50/0xf0
taprio_change+0x868/0xc9c
Fixes: b2aae654a479 ("net: stmmac: add mutex lock to protect est parameters")
Signed-off-by: Xiaolei Wang <xiaolei.wang@windriver.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Reviewed-by: Serge Semin <fancer.lancer@gmail.com>
Reviewed-by: Andrew Halaney <ahalaney@redhat.com>
Link: https://lore.kernel.org/r/20240513014346.1718740-2-xiaolei.wang@windriver.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
The sysctl lists the available schedulers that can be set using
net.mptcp.scheduler similarly to net.ipv4.tcp_available_congestion_control.
Signed-off-by: Gregory Detal <gregory.detal@gmail.com>
Reviewed-by: Mat Martineau <martineau@kernel.org>
Tested-by: Geliang Tang <geliang@kernel.org>
Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Signed-off-by: Mat Martineau <martineau@kernel.org>
Link: https://lore.kernel.org/r/20240514011335.176158-5-martineau@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 build updates from Ingo Molnar:
- Use -fpic to build the kexec 'purgatory' (the self-contained
code that runs between two kernels)
- Clean up vmlinux.lds.S generation
- Simplify the X86_EXTENDED_PLATFORM section of the x86 Kconfig
- Misc cleanups & fixes
* tag 'x86-build-2024-05-13' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/Kconfig: Merge the two CONFIG_X86_EXTENDED_PLATFORM entries
x86/purgatory: Switch to the position-independent small code model
x86/boot: Replace __PHYSICAL_START with LOAD_PHYSICAL_ADDR
x86/vmlinux.lds.S: Take __START_KERNEL out conditional definition
x86/vmlinux.lds.S: Remove conditional definition of LOAD_OFFSET
vmlinux.lds.h: Fix a typo in comment
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 boot updates from Ingo Molnar:
- Move the kernel cmdline setup earlier in the boot process (again),
to address a split_lock_detect= boot parameter bug
- Ignore relocations in .notes sections
- Simplify boot stack setup
- Re-introduce a bootloader quirk wrt CR4 handling
- Miscellaneous cleanups & fixes
* tag 'x86-boot-2024-05-13' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/boot/64: Clear most of CR4 in startup_64(), except PAE, MCE and LA57
x86/boot: Move kernel cmdline setup earlier in the boot process (again)
x86/build: Clean up arch/x86/tools/relocs.c a bit
x86/boot: Ignore relocations in .notes sections in walk_relocs() too
x86: Rename __{start,end}_init_task to __{start,end}_init_stack
x86/boot: Simplify boot stack setup
|
|
We're going to send an RST due to invalid syn packet which is already
checked whether 1) it is in sequence, 2) it is a retransmitted skb.
As RFC 793 says, if the state of socket is not CLOSED/LISTEN/SYN-SENT,
then we should send an RST when receiving bad syn packet:
"fourth, check the SYN bit,...If the SYN is in the window it is an
error, send a reset"
Signed-off-by: Jason Xing <kernelxing@tencent.com>
Link: https://lore.kernel.org/r/20240510122502.27850-6-kerneljasonxing@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
There are two possible cases where TCP layer can send an RST. Since they
happen in the same place, I think using one independent reason is enough
to identify this special situation.
Signed-off-by: Jason Xing <kernelxing@tencent.com>
Link: https://lore.kernel.org/r/20240510122502.27850-5-kerneljasonxing@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Like the previous patch does in this series, finish the conversion map is
enough to let rstreason mechanism work in this function.
Signed-off-by: Jason Xing <kernelxing@tencent.com>
Link: https://lore.kernel.org/r/20240510122502.27850-4-kerneljasonxing@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Based on the existing skb drop reason, updating the rstreason map can
help us finish the rstreason job in this function.
Signed-off-by: Jason Xing <kernelxing@tencent.com>
Link: https://lore.kernel.org/r/20240510122502.27850-3-kerneljasonxing@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
In this function, only updating the map can finish the job for socket
reset reason because the corresponding drop reasons are ready.
Signed-off-by: Jason Xing <kernelxing@tencent.com>
Link: https://lore.kernel.org/r/20240510122502.27850-2-kerneljasonxing@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Introduce a mechanism whereby platforms can create their PCS instances
prior to the network device being published to userspace, but after
some of the core stmmac initialisation has been completed. This means
that the data structures that platforms need will be available.
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Reviewed-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Reviewed-by: Serge Semin <fancer.lancer@gmail.com>
Co-developed-by: Romain Gantois <romain.gantois@bootlin.com>
Signed-off-by: Romain Gantois <romain.gantois@bootlin.com>
Reviewed-by: Hariprasad Kelam <hkelam@marvell.com>
Link: https://lore.kernel.org/r/20240513-rzn1-gmac1-v7-4-6acf58b5440d@bootlin.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
This adds an 'is_empty' argument to struct proto_accept_arg, which can
be used to pass back information on whether or not the given socket has
more connections to accept post the one just accepted.
To utilize this information, the caller should initialize the 'is_empty'
field to, eg, -1 and then check for 0/1 after the accept. If the field
has been set, the caller knows whether there are more pending connections
or not. If the field remains -1 after the accept call, the protocol
doesn't support passing back this information.
This patch wires it up for ipv4/6 TCP.
Acked-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
In preparation for passing in more information via this API, change
do_accept() to take a proto_accept_arg struct pointer rather than just
the file flags separately.
No functional changes in this patch.
Acked-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Rather than pass in flags, error pointer, and whether this is a kernel
invocation or not, add a struct proto_accept_arg struct as the argument.
This then holds all of these arguments, and prepares accept for being
able to pass back more information.
No functional changes in this patch.
Acked-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull scheduler updates from Ingo Molnar:
- Add cpufreq pressure feedback for the scheduler
- Rework misfit load-balancing wrt affinity restrictions
- Clean up and simplify the code around ::overutilized and
::overload access.
- Simplify sched_balance_newidle()
- Bump SCHEDSTAT_VERSION to 16 due to a cleanup of CPU_MAX_IDLE_TYPES
handling that changed the output.
- Rework & clean up <asm/vtime.h> interactions wrt arch_vtime_task_switch()
- Reorganize, clean up and unify most of the higher level
scheduler balancing function names around the sched_balance_*()
prefix
- Simplify the balancing flag code (sched_balance_running)
- Miscellaneous cleanups & fixes
* tag 'sched-core-2024-05-13' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (50 commits)
sched/pelt: Remove shift of thermal clock
sched/cpufreq: Rename arch_update_thermal_pressure() => arch_update_hw_pressure()
thermal/cpufreq: Remove arch_update_thermal_pressure()
sched/cpufreq: Take cpufreq feedback into account
cpufreq: Add a cpufreq pressure feedback for the scheduler
sched/fair: Fix update of rd->sg_overutilized
sched/vtime: Do not include <asm/vtime.h> header
s390/irq,nmi: Include <asm/vtime.h> header directly
s390/vtime: Remove unused __ARCH_HAS_VTIME_TASK_SWITCH leftover
sched/vtime: Get rid of generic vtime_task_switch() implementation
sched/vtime: Remove confusing arch_vtime_task_switch() declaration
sched/balancing: Simplify the sg_status bitmask and use separate ->overloaded and ->overutilized flags
sched/fair: Rename set_rd_overutilized_status() to set_rd_overutilized()
sched/fair: Rename SG_OVERLOAD to SG_OVERLOADED
sched/fair: Rename {set|get}_rd_overload() to {set|get}_rd_overloaded()
sched/fair: Rename root_domain::overload to ::overloaded
sched/fair: Use helper functions to access root_domain::overload
sched/fair: Check root_domain::overload value before update
sched/fair: Combine EAS check with root_domain::overutilized access
sched/fair: Simplify the continue_balancing logic in sched_balance_newidle()
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf events updates from Ingo Molnar:
- Combine perf and BPF for fast evalution of HW breakpoint
conditions
- Add LBR capture support outside of hardware events
- Trigger IO signals for watermark_wakeup
- Add RAPL support for Intel Arrow Lake and Lunar Lake
- Optimize frequency-throttling
- Miscellaneous cleanups & fixes
* tag 'perf-core-2024-05-13' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (21 commits)
perf/bpf: Mark perf_event_set_bpf_handler() and perf_event_free_bpf_handler() as inline too
selftests/perf_events: Test FASYNC with watermark wakeups
perf/ring_buffer: Trigger IO signals for watermark_wakeup
perf: Move perf_event_fasync() to perf_event.h
perf/bpf: Change the !CONFIG_BPF_SYSCALL stubs to static inlines
selftest/bpf: Test a perf BPF program that suppresses side effects
perf/bpf: Allow a BPF program to suppress all sample side effects
perf/bpf: Remove unneeded uses_default_overflow_handler()
perf/bpf: Call BPF handler directly, not through overflow machinery
perf/bpf: Remove #ifdef CONFIG_BPF_SYSCALL from struct perf_event members
perf/bpf: Create bpf_overflow_handler() stub for !CONFIG_BPF_SYSCALL
perf/bpf: Reorder bpf_overflow_handler() ahead of __perf_event_overflow()
perf/x86/rapl: Add support for Intel Lunar Lake
perf/x86/rapl: Add support for Intel Arrow Lake
perf/core: Reduce PMU access to adjust sample freq
perf/core: Optimize perf_adjust_freq_unthr_context()
perf/x86/amd: Don't reject non-sampling events with configured LBR
perf/x86/amd: Support capturing LBR from software events
perf/x86/amd: Avoid taking branches before disabling LBR
perf/x86/amd: Ensure amd_pmu_core_disable_all() is always inlined
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull locking updates from Ingo Molnar:
- Over a dozen code generation micro-optimizations for the atomic
and spinlock code
- Add more __ro_after_init attributes
- Robustify the lockdevent_*() macros
* tag 'locking-core-2024-05-13' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
locking/pvqspinlock/x86: Use _Q_LOCKED_VAL in PV_UNLOCK_ASM macro
locking/qspinlock/x86: Micro-optimize virt_spin_lock()
locking/atomic/x86: Merge __arch{,_try}_cmpxchg64_emu_local() with __arch{,_try}_cmpxchg64_emu()
locking/atomic/x86: Introduce arch_try_cmpxchg64_local()
locking/pvqspinlock/x86: Remove redundant CMP after CMPXCHG in __raw_callee_save___pv_queued_spin_unlock()
locking/pvqspinlock: Use try_cmpxchg() in qspinlock_paravirt.h
locking/pvqspinlock: Use try_cmpxchg_acquire() in trylock_clear_pending()
locking/qspinlock: Use atomic_try_cmpxchg_relaxed() in xchg_tail()
locking/atomic/x86: Define arch_atomic_sub() family using arch_atomic_add() functions
locking/atomic/x86: Rewrite x86_32 arch_atomic64_{,fetch}_{and,or,xor}() functions
locking/atomic/x86: Introduce arch_atomic64_read_nonatomic() to x86_32
locking/atomic/x86: Introduce arch_atomic64_try_cmpxchg() to x86_32
locking/atomic/x86: Introduce arch_try_cmpxchg64() for !CONFIG_X86_CMPXCHG64
locking/atomic/x86: Modernize x86_32 arch_{,try_}_cmpxchg64{,_local}()
locking/atomic/x86: Correct the definition of __arch_try_cmpxchg128()
x86/tsc: Make __use_tsc __ro_after_init
x86/kvm: Make kvm_async_pf_enabled __ro_after_init
context_tracking: Make context_tracking_key __ro_after_init
jump_label,module: Don't alloc static_key_mod for __ro_after_init keys
locking/qspinlock: Always evaluate lockevent* non-event parameter once
|