summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2024-03-06Bluetooth: hci_sync: Attempt to dequeue connection attemptLuiz Augusto von Dentz
If connection is still queued/pending in the cmd_sync queue it means no command has been generated and it should be safe to just dequeue the callback when it is being aborted. Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2024-03-06Bluetooth: hci_sync: Add helper functions to manipulate cmd_sync queueLuiz Augusto von Dentz
This adds functions to queue, dequeue and lookup into the cmd_sync list. Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2024-03-06Bluetooth: hci_conn: Fix UAF Write in __hci_acl_create_connection_syncLuiz Augusto von Dentz
This fixes the UAF on __hci_acl_create_connection_sync caused by connection abortion, it uses the same logic as to LE_LINK which uses hci_cmd_sync_cancel to prevent the callback to run if the connection is abort prematurely. Reported-by: syzbot+3f0a39be7a2035700868@syzkaller.appspotmail.com Fixes: 45340097ce6e ("Bluetooth: hci_conn: Only do ACL connections sequentially") Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2024-03-06Bluetooth: hci_conn: Always use sk_timeo as conn_timeoutLuiz Augusto von Dentz
This aligns the use socket sk_timeo as conn_timeout when initiating a connection and then use it when scheduling the resulting HCI command, that way the command is actually aborted synchronously thus not blocking commands generated by hci_abort_conn_sync to inform the controller the connection is to be aborted. Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2024-03-06Bluetooth: hci_event: Remove code to removed CONFIG_BT_HSLukas Bulwahn
Commit cec9f3c5561d ("Bluetooth: Remove BT_HS") removes config BT_HS, but misses two "ifdef BT_HS" blocks in hci_event.c. Remove this dead code from this removed config option. Signed-off-by: Lukas Bulwahn <lukas.bulwahn@gmail.com> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2024-03-06Bluetooth: Remove pending ACL connection attemptsJonas Dreßler
With the last commit we moved to using the hci_sync queue for "Create Connection" requests, removing the need for retrying the paging after finished/failed "Create Connection" requests and after the end of inquiries. hci_conn_check_pending() was used to trigger this retry, we can remove it now. Note that we can also remove the special handling for COMMAND_DISALLOWED errors in the completion handler of "Create Connection", because "Create Connection" requests are now always serialized. This is somewhat reverting commit 4c67bc74f016 ("[Bluetooth] Support concurrent connect requests"). With this, the BT_CONNECT2 state of ACL hci_conn objects should now be back to meaning only one thing: That we received a "Connection Request" from another device (see hci_conn_request_evt), but the response to that is going to be deferred. Signed-off-by: Jonas Dreßler <verdre@v0yd.nl> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2024-03-06Bluetooth: hci_conn: Only do ACL connections sequentiallyJonas Dreßler
Pretty much all bluetooth chipsets only support paging a single device at a time, and if they don't reject a secondary "Create Connection" request while another is still ongoing, they'll most likely serialize those requests in the firware. With commit 4c67bc74f016 ("[Bluetooth] Support concurrent connect requests") we started adding some serialization of our own in case the adapter returns "Command Disallowed" HCI error. This commit was using the BT_CONNECT2 state for the serialization, this state is also used for a few more things (most notably to indicate we're waiting for an inquiry to cancel) and therefore a bit unreliable. Also not all BT firwares would respond with "Command Disallowed" on too many connection requests, some will also respond with "Hardware Failure" (BCM4378), and others will error out later and send a "Connect Complete" event with error "Rejected Limited Resources" (Marvell 88W8897). We can clean things up a bit and also make the serialization more reliable by using our hci_sync machinery to always do "Create Connection" requests in a sequential manner. This is very similar to what we're already doing for establishing LE connections, and it works well there. Note that this causes a test failure in mgmt-tester (test "Pair Device - Power off 1") because the hci_abort_conn_sync() changes the error we return on timeout of the "Create Connection". We'll fix this on the mgmt-tester side by adjusting the expected error for the test. Signed-off-by: Jonas Dreßler <verdre@v0yd.nl> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2024-03-06Bluetooth: hci_event: Fix not indicating new connection for BIG SyncLuiz Augusto von Dentz
BIG Sync (aka. Broadcast sink) requires to inform that the device is connected when a data path is active otherwise userspace could attempt to free resources allocated to the device object while scanning. Fixes: 1d11d70d1f6b ("Bluetooth: ISO: Pass BIG encryption info through QoS") Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2024-03-06Bluetooth: Remove BT_HSLuiz Augusto von Dentz
High Speed, Alternate MAC and PHY (AMP) extension, has been removed from Bluetooth Core specification on 5.3: https://www.bluetooth.com/blog/new-core-specification-v5-3-feature-enhancements/ Fixes: 244bc377591c ("Bluetooth: Add BT_HS config option") Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2024-03-06Bluetooth: btintel: Fix null ptr deref in btintel_read_versionEdward Adam Davis
If hci_cmd_sync_complete() is triggered and skb is NULL, then hdev->req_skb is NULL, which will cause this issue. Reported-and-tested-by: syzbot+830d9e3fa61968246abd@syzkaller.appspotmail.com Signed-off-by: Edward Adam Davis <eadavis@qq.com> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2024-03-06Bluetooth: Remove usage of the deprecated ida_simple_xx() APIChristophe JAILLET
ida_alloc() and ida_free() should be preferred to the deprecated ida_simple_get() and ida_simple_remove(). Note that the upper limit of ida_simple_get() is exclusive, but the one of ida_alloc_max() is inclusive. So a -1 has been added when needed. Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2024-03-06Bluetooth: btusb: Add new VID/PID 13d3/3602 for MT7925Ulrik Strid
Add VID 13d3 & PID 3602 for MediaTek MT7925 USB Bluetooth chip. The information in /sys/kernel/debug/usb/devices about the Bluetooth device is listed as the below. T: Bus=07 Lev=01 Prnt=01 Port=10 Cnt=02 Dev#= 2 Spd=480 MxCh= 0 D: Ver= 2.10 Cls=ef(misc ) Sub=02 Prot=01 MxPS=64 #Cfgs= 1 P: Vendor=13d3 ProdID=3602 Rev= 1.00 S: Manufacturer=MediaTek Inc. S: Product=Wireless_Device S: SerialNumber=000000000 C:* #Ifs= 3 Cfg#= 1 Atr=e0 MxPwr=100mA A: FirstIf#= 0 IfCount= 3 Cls=e0(wlcon) Sub=01 Prot=01 I:* If#= 0 Alt= 0 #EPs= 3 Cls=e0(wlcon) Sub=01 Prot=01 Driver=btusb E: Ad=81(I) Atr=03(Int.) MxPS= 16 Ivl=125us E: Ad=82(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms E: Ad=02(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms I:* If#= 1 Alt= 0 #EPs= 2 Cls=e0(wlcon) Sub=01 Prot=01 Driver=btusb E: Ad=83(I) Atr=01(Isoc) MxPS= 0 Ivl=1ms E: Ad=03(O) Atr=01(Isoc) MxPS= 0 Ivl=1ms I: If#= 1 Alt= 1 #EPs= 2 Cls=e0(wlcon) Sub=01 Prot=01 Driver=btusb E: Ad=83(I) Atr=01(Isoc) MxPS= 9 Ivl=1ms E: Ad=03(O) Atr=01(Isoc) MxPS= 9 Ivl=1ms I: If#= 1 Alt= 2 #EPs= 2 Cls=e0(wlcon) Sub=01 Prot=01 Driver=btusb E: Ad=83(I) Atr=01(Isoc) MxPS= 17 Ivl=1ms E: Ad=03(O) Atr=01(Isoc) MxPS= 17 Ivl=1ms I: If#= 1 Alt= 3 #EPs= 2 Cls=e0(wlcon) Sub=01 Prot=01 Driver=btusb E: Ad=83(I) Atr=01(Isoc) MxPS= 25 Ivl=1ms E: Ad=03(O) Atr=01(Isoc) MxPS= 25 Ivl=1ms I: If#= 1 Alt= 4 #EPs= 2 Cls=e0(wlcon) Sub=01 Prot=01 Driver=btusb E: Ad=83(I) Atr=01(Isoc) MxPS= 33 Ivl=1ms E: Ad=03(O) Atr=01(Isoc) MxPS= 33 Ivl=1ms I: If#= 1 Alt= 5 #EPs= 2 Cls=e0(wlcon) Sub=01 Prot=01 Driver=btusb E: Ad=83(I) Atr=01(Isoc) MxPS= 49 Ivl=1ms E: Ad=03(O) Atr=01(Isoc) MxPS= 49 Ivl=1ms I: If#= 1 Alt= 6 #EPs= 2 Cls=e0(wlcon) Sub=01 Prot=01 Driver=btusb E: Ad=83(I) Atr=01(Isoc) MxPS= 63 Ivl=1ms E: Ad=03(O) Atr=01(Isoc) MxPS= 63 Ivl=1ms I:* If#= 2 Alt= 0 #EPs= 2 Cls=e0(wlcon) Sub=01 Prot=01 Driver=(none) E: Ad=8a(I) Atr=03(Int.) MxPS= 64 Ivl=125us E: Ad=0a(O) Atr=03(Int.) MxPS= 64 Ivl=125us I: If#= 2 Alt= 1 #EPs= 2 Cls=e0(wlcon) Sub=01 Prot=01 Driver=(none) E: Ad=8a(I) Atr=03(Int.) MxPS= 512 Ivl=125us E: Ad=0a(O) Atr=03(Int.) MxPS= 512 Ivl=125us Signed-off-by: Ulrik Strid <ulrik@strid.tech> Signed-off-by: Deren Wu <deren.wu@mediatek.com> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2024-03-06Bluetooth: hci_core: Cancel request on command timeoutLuiz Augusto von Dentz
If command has timed out call __hci_cmd_sync_cancel to notify the hci_req since it will inevitably cause a timeout. This also rework the code around __hci_cmd_sync_cancel since it was wrongly assuming it needs to cancel timer as well, but sometimes the timers have not been started or in fact they already had timed out in which case they don't need to be cancel yet again. Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2024-03-06Bluetooth: hci_event: Use HCI error defines instead of magic valuesJonas Dreßler
We have error defines already, so let's use them. Signed-off-by: Jonas Dreßler <verdre@v0yd.nl> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2024-03-06Bluetooth: Remove superfluous call to hci_conn_check_pending()Jonas Dreßler
The "pending connections" feature was originally introduced with commit 4c67bc74f016 ("[Bluetooth] Support concurrent connect requests") and 6bd57416127e ("[Bluetooth] Handling pending connect attempts after inquiry") to handle controllers supporting only a single connection request at a time. Later things were extended to also cancel ongoing inquiries on connect() with commit 89e65975fea5 ("Bluetooth: Cancel Inquiry before Create Connection"). With commit a9de9248064b ("[Bluetooth] Switch from OGF+OCF to using only opcodes"), hci_conn_check_pending() was introduced as a helper to consolidate a few places where we check for pending connections (indicated by the BT_CONNECT2 flag) and then try to connect. This refactoring commit also snuck in two more calls to hci_conn_check_pending(): - One is in the failure callback of hci_cs_inquiry(), this one probably makes sense: If we send an "HCI Inquiry" command and then immediately after a "Create Connection" command, the "Create Connection" command might fail before the "HCI Inquiry" command, and then we want to retry the "Create Connection" on failure of the "HCI Inquiry". - The other added call to hci_conn_check_pending() is in the event handler for the "Remote Name" event, this seems unrelated and is possibly a copy-paste error, so remove that one. Fixes: a9de9248064b ("[Bluetooth] Switch from OGF+OCF to using only opcodes") Signed-off-by: Jonas Dreßler <verdre@v0yd.nl> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2024-03-06Bluetooth: Disconnect connected devices before rfkilling adapterJonas Dreßler
On a lot of platforms (at least the MS Surface devices, M1 macbooks, and a few ThinkPads) firmware doesn't do its job when rfkilling a device and the bluetooth adapter is not actually shut down properly on rfkill. This leads to connected devices remaining in connected state and the bluetooth connection eventually timing out after rfkilling an adapter. Use the rfkill hook in the HCI driver to go through the full power-off sequence (including stopping scans and disconnecting devices) before rfkilling it, just like MGMT_OP_SET_POWERED would do. In case anything during the larger power-off sequence fails, make sure the device is still closed and the rfkill ends up being effective in the end. Signed-off-by: Jonas Dreßler <verdre@v0yd.nl> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2024-03-06Bluetooth: Add new state HCI_POWERING_DOWNJonas Dreßler
Add a new state HCI_POWERING_DOWN that indicates that the device is currently powering down, this will be useful for the next commit. Signed-off-by: Jonas Dreßler <verdre@v0yd.nl> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2024-03-06Bluetooth: mgmt: Remove leftover queuing of power_off workJonas Dreßler
Queuing of power_off work was introduced in these functions with commits 8b064a3ad377 ("Bluetooth: Clean up HCI state when doing power off") and c9910d0fb4fc ("Bluetooth: Fix disconnecting connections in non-connected states") in an effort to clean up state and do things like disconnecting devices before actually powering off the device. After that, commit a3172b7eb4a2 ("Bluetooth: Add timer to force power off") introduced a timeout to ensure that the device actually got powered off, even if some of the cleanup work would never complete. This code later got refactored with commit cf75ad8b41d2 ("Bluetooth: hci_sync: Convert MGMT_SET_POWERED"), which made powering off the device synchronous and removed the need for initiating the power_off work from other places. The timeout mentioned above got removed too, because we now also made use of the command timeout during power on/off. These days the power_off work still exists, but it only seems to only be used for HCI_AUTO_OFF functionality, which is why we never noticed those two leftover places where we queue power_off work. So let's remove that code. Fixes: cf75ad8b41d2 ("Bluetooth: hci_sync: Convert MGMT_SET_POWERED") Signed-off-by: Jonas Dreßler <verdre@v0yd.nl> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2024-03-06Bluetooth: Remove HCI_POWER_OFF_TIMEOUTJonas Dreßler
With commit cf75ad8b41d2 ("Bluetooth: hci_sync: Convert MGMT_SET_POWERED"), the power off sequence got refactored so that this timeout was no longer necessary, let's remove the leftover define from the header too. Fixes: cf75ad8b41d2 ("Bluetooth: hci_sync: Convert MGMT_SET_POWERED") Signed-off-by: Jonas Dreßler <verdre@v0yd.nl> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2024-03-06Bluetooth: btnxpuart: Resolve TX timeout error in power save stress testNeeraj Sanjay Kale
This fixes the tx timeout issue seen while running a stress test on btnxpuart for couple of hours, such that the interval between two HCI commands coincide with the power save timeout value of 2 seconds. Test procedure using bash script: <load btnxpuart.ko> hciconfig hci0 up //Enable Power Save feature hcitool -i hci0 cmd 3f 23 02 00 00 while (true) do hciconfig hci0 leadv sleep 2 hciconfig hci0 noleadv sleep 2 done Error log, after adding few more debug prints: Bluetooth: btnxpuart_queue_skb(): 01 0A 20 01 00 Bluetooth: hci0: Set UART break: on, status=0 Bluetooth: hci0: btnxpuart_tx_wakeup() tx_work scheduled Bluetooth: hci0: btnxpuart_tx_work() dequeue: 01 0A 20 01 00 Can't set advertise mode on hci0: Connection timed out (110) Bluetooth: hci0: command 0x200a tx timeout When the power save mechanism turns on UART break, and btnxpuart_tx_work() is scheduled simultaneously, psdata->ps_state is read as PS_STATE_AWAKE, which prevents the psdata->work from being scheduled, which is responsible to turn OFF UART break. This issue is fixed by adding a ps_lock mutex around UART break on/off as well as around ps_state read/write. btnxpuart_tx_wakeup() will now read updated ps_state value. If ps_state is PS_STATE_SLEEP, it will first schedule psdata->work, and then it will reschedule itself once UART break has been turned off and ps_state is PS_STATE_AWAKE. Tested above script for 50,000 iterations and TX timeout error was not observed anymore. Signed-off-by: Neeraj Sanjay Kale <neeraj.sanjaykale@nxp.com> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2024-03-06Bluetooth: btrtl: Add the support for RTL8852BT/RTL8852BE-VTMax Chou
Add the support for RTL8852BT/RTL8852BE-VT BT controller on USB interface. The necessary firmware will be submitted to linux-firmware project. The device info from /sys/kernel/debug/usb/devices as below. T: Bus=02 Lev=02 Prnt=02 Port=05 Cnt=01 Dev#= 8 Spd=12 MxCh= 0 D: Ver= 1.00 Cls=e0(wlcon) Sub=01 Prot=01 MxPS=64 #Cfgs= 1 P: Vendor=0bda ProdID=8520 Rev= 0.00 S: Manufacturer=Realtek S: Product=Bluetooth Radio S: SerialNumber=00e04c000001 C:* #Ifs= 2 Cfg#= 1 Atr=e0 MxPwr=500mA I:* If#= 0 Alt= 0 #EPs= 3 Cls=e0(wlcon) Sub=01 Prot=01 Driver=btusb E: Ad=81(I) Atr=03(Int.) MxPS= 16 Ivl=1ms E: Ad=02(O) Atr=02(Bulk) MxPS= 64 Ivl=0ms E: Ad=82(I) Atr=02(Bulk) MxPS= 64 Ivl=0ms I:* If#= 1 Alt= 0 #EPs= 2 Cls=e0(wlcon) Sub=01 Prot=01 Driver=btusb E: Ad=03(O) Atr=01(Isoc) MxPS= 0 Ivl=1ms E: Ad=83(I) Atr=01(Isoc) MxPS= 0 Ivl=1ms I: If#= 1 Alt= 1 #EPs= 2 Cls=e0(wlcon) Sub=01 Prot=01 Driver=btusb E: Ad=03(O) Atr=01(Isoc) MxPS= 9 Ivl=1ms E: Ad=83(I) Atr=01(Isoc) MxPS= 9 Ivl=1ms I: If#= 1 Alt= 2 #EPs= 2 Cls=e0(wlcon) Sub=01 Prot=01 Driver=btusb E: Ad=03(O) Atr=01(Isoc) MxPS= 17 Ivl=1ms E: Ad=83(I) Atr=01(Isoc) MxPS= 17 Ivl=1ms I: If#= 1 Alt= 3 #EPs= 2 Cls=e0(wlcon) Sub=01 Prot=01 Driver=btusb E: Ad=03(O) Atr=01(Isoc) MxPS= 25 Ivl=1ms E: Ad=83(I) Atr=01(Isoc) MxPS= 25 Ivl=1ms I: If#= 1 Alt= 4 #EPs= 2 Cls=e0(wlcon) Sub=01 Prot=01 Driver=btusb E: Ad=03(O) Atr=01(Isoc) MxPS= 33 Ivl=1ms E: Ad=83(I) Atr=01(Isoc) MxPS= 33 Ivl=1ms I: If#= 1 Alt= 5 #EPs= 2 Cls=e0(wlcon) Sub=01 Prot=01 Driver=btusb E: Ad=03(O) Atr=01(Isoc) MxPS= 49 Ivl=1ms E: Ad=83(I) Atr=01(Isoc) MxPS= 49 Ivl=1ms Signed-off-by: Max Chou <max.chou@realtek.com> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2024-03-06mmc: core: Fix switch on gp3 partitionDominique Martinet
Commit e7794c14fd73 ("mmc: rpmb: fixes pause retune on all RPMB partitions.") added a mask check for 'part_type', but the mask used was wrong leading to the code intended for rpmb also being executed for GP3. On some MMCs (but not all) this would make gp3 partition inaccessible: armadillo:~# head -c 1 < /dev/mmcblk2gp3 head: standard input: I/O error armadillo:~# dmesg -c [ 422.976583] mmc2: running CQE recovery [ 423.058182] mmc2: running CQE recovery [ 423.137607] mmc2: running CQE recovery [ 423.137802] blk_update_request: I/O error, dev mmcblk2gp3, sector 0 op 0x0:(READ) flags 0x80700 phys_seg 4 prio class 0 [ 423.237125] mmc2: running CQE recovery [ 423.318206] mmc2: running CQE recovery [ 423.397680] mmc2: running CQE recovery [ 423.397837] blk_update_request: I/O error, dev mmcblk2gp3, sector 0 op 0x0:(READ) flags 0x0 phys_seg 1 prio class 0 [ 423.408287] Buffer I/O error on dev mmcblk2gp3, logical block 0, async page read the part_type values of interest here are defined as follow: main 0 boot0 1 boot1 2 rpmb 3 gp0 4 gp1 5 gp2 6 gp3 7 so mask with EXT_CSD_PART_CONFIG_ACC_MASK (7) to correctly identify rpmb Fixes: e7794c14fd73 ("mmc: rpmb: fixes pause retune on all RPMB partitions.") Cc: stable@vger.kernel.org Cc: Jorge Ramirez-Ortiz <jorge@foundries.io> Signed-off-by: Dominique Martinet <dominique.martinet@atmark-techno.com> Reviewed-by: Linus Walleij <linus.walleij@linaro.org> Link: https://lore.kernel.org/r/20240306-mmc-partswitch-v1-1-bf116985d950@codewreck.org Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
2024-03-06kexec: copy only happens before uchunk goes to zeroyang.zhang
When loading segments, ubytes is <= mbytes. When ubytes is exhausted, there could be remaining mbytes. Then in the while loop, the buf pointer advancing with mchunk will causing meaningless reading even though it doesn't harm. So let's change to make sure that all of the copying and the rest only happens before uchunk goes to zero. Link: https://lkml.kernel.org/r/20240222092119.5602-1-gaoshanliukou@163.com Signed-off-by: yang.zhang <yang.zhang@hexintek.com> Acked-by: Baoquan He <bhe@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-03-06get_signal: don't initialize ksig->info if SIGNAL_GROUP_EXIT/group_exec_taskOleg Nesterov
This initialization is incomplete and unnecessary, neither do_group_exit() nor PF_USER_WORKER need ksig->info. Link: https://lkml.kernel.org/r/20240226165653.GA20834@redhat.com Signed-off-by: Oleg Nesterov <oleg@redhat.com> Cc: Christian Brauner <brauner@kernel.org> Cc: Eric W. Biederman <ebiederm@xmission.com> Cc: Peter Collingbourne <pcc@google.com> Cc: Wen Yang <wenyang.linux@foxmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-03-06get_signal: hide_si_addr_tag_bits: fix the usage of uninitialized ksigOleg Nesterov
ksig->ka and ksig->info are not initialized if get_signal() returns 0 or if the caller is PF_USER_WORKER. Check signr != 0 before SA_EXPOSE_TAGBITS and move the "out" label down. The latter means that ksig->sig won't be initialized if a PF_USER_WORKER thread gets a fatal signal but this is fine, PF_USER_WORKER's don't use ksig. And there is nothing new, in this case ksig->ka and ksig-info are not initialized anyway. Add a comment. Link: https://lkml.kernel.org/r/20240226165650.GA20829@redhat.com Signed-off-by: Oleg Nesterov <oleg@redhat.com> Cc: Christian Brauner <brauner@kernel.org> Cc: Eric W. Biederman <ebiederm@xmission.com> Cc: Peter Collingbourne <pcc@google.com> Cc: Wen Yang <wenyang.linux@foxmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-03-06get_signal: don't abuse ksig->info.si_signo and ksig->sigOleg Nesterov
Patch series "get_signal: minor cleanups and fix". Lets remove this clear_siginfo() right now. It is incomplete (and thus looks confusing) and unnecessary. Also, PF_USER_WORKER's already don't get a fully initialized ksig anyway. This patch (of 3): Cleanup and preparation for the next changes. get_signal() uses signr or ksig->info.si_signo or ksig->sig in a chaotic way, this looks confusing. Change it to always use signr. Link: https://lkml.kernel.org/r/20240226165612.GA20787@redhat.com Link: https://lkml.kernel.org/r/20240226165647.GA20826@redhat.com Signed-off-by: Oleg Nesterov <oleg@redhat.com> Cc: Christian Brauner <brauner@kernel.org> Cc: Eric W. Biederman <ebiederm@xmission.com> Cc: Peter Collingbourne <pcc@google.com> Cc: Wen Yang <wenyang.linux@foxmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-03-06const_structs.checkpatch: add device_typeRicardo B. Marliere
Since commit aed65af1cc2f ("drivers: make device_type const"), the driver core can properly handle constant struct device_type. Make sure that new usages of the struct already enter the tree as const. Link: https://lkml.kernel.org/r/20240218-device_cleanup-checkpatch-v1-1-8b0b89c4f6b1@marliere.net Signed-off-by: Ricardo B. Marliere <ricardo@marliere.net> Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Joe Perches <joe@perches.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-03-06Normalise "name (ad@dr)" MODULE_AUTHORs to "name <ad@dr>"Ahelenia Ziemiańska
Found with git grep 'MODULE_AUTHOR(".*([^)]*@' Fixed with sed -i '/MODULE_AUTHOR(".*([^)]*@/{s/ (/ </g;s/)"/>"/;s/)and/> and/}' \ $(git grep -l 'MODULE_AUTHOR(".*([^)]*@') Also: in drivers/media/usb/siano/smsusb.c normalise ", INC" to ", Inc"; this is what every other MODULE_AUTHOR for this company says, and it's what the header says in drivers/sbus/char/openprom.c normalise a double-spaced separator; this is clearly copied from the copyright header, where the names are aligned on consecutive lines thusly: * Linux/SPARC PROM Configuration Driver * Copyright (C) 1996 Thomas K. Dyas (tdyas@noc.rutgers.edu) * Copyright (C) 1996 Eddie C. Dost (ecd@skynet.be) but the authorship branding is single-line Link: https://lkml.kernel.org/r/mk3geln4azm5binjjlfsgjepow4o73domjv6ajybws3tz22vb3@tarta.nabijaczleweli.xyz Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz> Cc: Joe Perches <joe@perches.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-03-06dyndbg: replace kstrdup() + strchr() with kstrdup_and_replace()Andy Shevchenko
Replace open coded functionalify of kstrdup_and_replace() with a call. Link: https://lkml.kernel.org/r/20240213162741.3102810-1-andriy.shevchenko@linux.intel.com Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Reviewed-by: Luis Chamberlain <mcgrof@kernel.org> Cc: Jason Baron <jbaron@akamai.com> Cc: Jim Cromie <jim.cromie@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-03-06filemap: avoid unnecessary major faults in filemap_fault()ZhangPeng
A major fault occurred when using mlockall(MCL_CURRENT | MCL_FUTURE) in application, which leading to an unexpected issue[1]. This is caused by temporarily cleared PTE during a read+clear/modify/write update of the PTE, eg, do_numa_page()/change_pte_range(). For the data segment of the user-mode program, the global variable area is a private mapping. After the pagecache is loaded, the private anonymous page is generated after the COW is triggered. Mlockall can lock COW pages (anonymous pages), but the original file pages cannot be locked and may be reclaimed. If the global variable (private anon page) is accessed when vmf->pte is zeroed in numa fault, a file page fault will be triggered. At this time, the original private file page may have been reclaimed. If the page cache is not available at this time, a major fault will be triggered and the file will be read, causing additional overhead. This issue affects our traffic analysis service. The inbound traffic is heavy. If a major fault occurs, the I/O schedule is triggered and the original I/O is suspended. Generally, the I/O schedule is 0.7 ms. If other applications are operating disks, the system needs to wait for more than 10 ms. However, the inbound traffic is heavy and the NIC buffer is small. As a result, packet loss occurs. But the traffic analysis service can't tolerate packet loss. Fix this by holding PTL and rechecking the PTE in filemap_fault() before triggering a major fault. We do this check only if vma is VM_LOCKED to reduce the performance impact in common scenarios. In our product environment, there were 7 major faults every 12 hours. After the patch is applied, no major fault have been triggered. Testing file page read and write page fault performance in ext4 and ramdisk using will-it-scale[2] on a x86 physical machine. The data is the average change compared with the mainline after the patch is applied. The test results are within the range of fluctuation. We do this check only if vma is VM_LOCKED, therefore, no performance regressions is caused for most common cases. The test results are as follows: processes processes_idle threads threads_idle ext4 private file write: 0.22% 0.26% 1.21% -0.15% ext4 private file read: 0.03% 1.00% 1.39% 0.34% ext4 shared file write: -0.50% -0.02% -0.14% -0.02% ramdisk private file write: 0.07% 0.02% 0.53% 0.04% ramdisk private file read: 0.01% 1.60% -0.32% -0.02% [1] https://lore.kernel.org/linux-mm/9e62fd9a-bee0-52bf-50a7-498fa17434ee@huawei.com/ [2] https://github.com/antonblanchard/will-it-scale/ Link: https://lkml.kernel.org/r/20240306083809.1236634-1-zhangpeng362@huawei.com Signed-off-by: ZhangPeng <zhangpeng362@huawei.com> Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com> Suggested-by: "Huang, Ying" <ying.huang@intel.com> Suggested-by: David Hildenbrand <david@redhat.com> Reviewed-by: "Huang, Ying" <ying.huang@intel.com> Reviewed-by: David Hildenbrand <david@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-03-06mm,page_owner: drop unnecessary checkOscar Salvador
stackdepot only saves stack_records which size is greather than 0, so we cannot possibly have empty stack_records. Drop the check. Link: https://lkml.kernel.org/r/20240306123217.29774-3-osalvador@suse.de Signed-off-by: Oscar Salvador <osalvador@suse.de> Cc: Alexander Potapenko <glider@google.com> Cc: Andrey Konovalov <andreyknvl@gmail.com> Cc: kernel test robot <oliver.sang@intel.com> Cc: Marco Elver <elver@google.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-03-06mm,page_owner: check for null stack_record before bumping its refcountOscar Salvador
Patch series "page_owner: Fixup and cleanup". This patchset consists of a fixup by an error that was reported by intel robot, where it seems to be that by the time page_owner gets initialized, stackdepot has already depleted its allocation space and returns 0-handles, turning that into null stack_records when trying to retrieve the stack_record. I was not able to reproduce that from the config because it booted fine for me, but when setting e.g: dummy_handle to 0 artificially, I could see the same error that was reported. The second patch is a cleanup that can also lead to a compilation warning. This patch (of 2): Although the retrieval of the stack_records for {dummy,failure}_handle happen when page_owner gets initialized, there seems to be some situations where stackdepot space has been already depleted by then, so we get 0-handles which make stack_records being NULL for those cases. Be careful to 1) only bump stack_records refcount and 2) only access stack_record fields if we actually have a non-null stack_record between hands. Link: https://lkml.kernel.org/r/20240306123217.29774-1-osalvador@suse.de Link: https://lkml.kernel.org/r/20240306123217.29774-2-osalvador@suse.de Fixes: 4bedfb314bdd ("mm,page_owner: implement the tracking of the stacks count") Signed-off-by: Oscar Salvador <osalvador@suse.de> Reported-by: kernel test robot <oliver.sang@intel.com> Closes: https://lore.kernel.org/oe-lkp/202403051032.e2f865a-lkp@intel.com Cc: Alexander Potapenko <glider@google.com> Cc: Andrey Konovalov <andreyknvl@gmail.com> Cc: Marco Elver <elver@google.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-03-06mm: swap: fix race between free_swap_and_cache() and swapoff()Ryan Roberts
There was previously a theoretical window where swapoff() could run and teardown a swap_info_struct while a call to free_swap_and_cache() was running in another thread. This could cause, amongst other bad possibilities, swap_page_trans_huge_swapped() (called by free_swap_and_cache()) to access the freed memory for swap_map. This is a theoretical problem and I haven't been able to provoke it from a test case. But there has been agreement based on code review that this is possible (see link below). Fix it by using get_swap_device()/put_swap_device(), which will stall swapoff(). There was an extra check in _swap_info_get() to confirm that the swap entry was not free. This isn't present in get_swap_device() because it doesn't make sense in general due to the race between getting the reference and swapoff. So I've added an equivalent check directly in free_swap_and_cache(). Details of how to provoke one possible issue (thanks to David Hildenbrand for deriving this): --8<----- __swap_entry_free() might be the last user and result in "count == SWAP_HAS_CACHE". swapoff->try_to_unuse() will stop as soon as soon as si->inuse_pages==0. So the question is: could someone reclaim the folio and turn si->inuse_pages==0, before we completed swap_page_trans_huge_swapped(). Imagine the following: 2 MiB folio in the swapcache. Only 2 subpages are still references by swap entries. Process 1 still references subpage 0 via swap entry. Process 2 still references subpage 1 via swap entry. Process 1 quits. Calls free_swap_and_cache(). -> count == SWAP_HAS_CACHE [then, preempted in the hypervisor etc.] Process 2 quits. Calls free_swap_and_cache(). -> count == SWAP_HAS_CACHE Process 2 goes ahead, passes swap_page_trans_huge_swapped(), and calls __try_to_reclaim_swap(). __try_to_reclaim_swap()->folio_free_swap()->delete_from_swap_cache()-> put_swap_folio()->free_swap_slot()->swapcache_free_entries()-> swap_entry_free()->swap_range_free()-> ... WRITE_ONCE(si->inuse_pages, si->inuse_pages - nr_entries); What stops swapoff to succeed after process 2 reclaimed the swap cache but before process1 finished its call to swap_page_trans_huge_swapped()? --8<----- Link: https://lkml.kernel.org/r/20240306140356.3974886-1-ryan.roberts@arm.com Fixes: 7c00bafee87c ("mm/swap: free swap slots in batch") Closes: https://lore.kernel.org/linux-mm/65a66eb9-41f8-4790-8db2-0c70ea15979f@redhat.com/ Signed-off-by: Ryan Roberts <ryan.roberts@arm.com> Cc: David Hildenbrand <david@redhat.com> Cc: "Huang, Ying" <ying.huang@intel.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-03-06mm/treewide: align up pXd_leaf() retval across archsPeter Xu
Even if pXd_leaf() API is defined globally, it's not clear on the retval, and there are three types used (bool, int, unsigned log). Always return a boolean for pXd_leaf() APIs. Link: https://lkml.kernel.org/r/20240305043750.93762-11-peterx@redhat.com Signed-off-by: Peter Xu <peterx@redhat.com> Suggested-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Mike Rapoport (IBM) <rppt@kernel.org> Cc: Alexander Potapenko <glider@google.com> Cc: Andrey Konovalov <andreyknvl@gmail.com> Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com> Cc: "Aneesh Kumar K.V" <aneesh.kumar@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Christophe Leroy <christophe.leroy@csgroup.eu> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Kirill A. Shutemov <kirill@shutemov.name> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Muchun Song <muchun.song@linux.dev> Cc: "Naveen N. Rao" <naveen.n.rao@linux.ibm.com> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vincenzo Frascino <vincenzo.frascino@arm.com> Cc: Yang Shi <shy828301@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-03-06mm/treewide: drop pXd_large()Peter Xu
They're not used anymore, drop all of them. Link: https://lkml.kernel.org/r/20240305043750.93762-10-peterx@redhat.com Signed-off-by: Peter Xu <peterx@redhat.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Mike Rapoport (IBM) <rppt@kernel.org> Cc: Alexander Potapenko <glider@google.com> Cc: Andrey Konovalov <andreyknvl@gmail.com> Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com> Cc: "Aneesh Kumar K.V" <aneesh.kumar@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Christophe Leroy <christophe.leroy@csgroup.eu> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Kirill A. Shutemov <kirill@shutemov.name> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Muchun Song <muchun.song@linux.dev> Cc: "Naveen N. Rao" <naveen.n.rao@linux.ibm.com> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vincenzo Frascino <vincenzo.frascino@arm.com> Cc: Yang Shi <shy828301@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-03-06mm/treewide: replace pud_large() with pud_leaf()Peter Xu
pud_large() is always defined as pud_leaf(). Merge their usages. Chose pud_leaf() because pud_leaf() is a global API, while pud_large() is not. Link: https://lkml.kernel.org/r/20240305043750.93762-9-peterx@redhat.com Signed-off-by: Peter Xu <peterx@redhat.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Cc: Alexander Potapenko <glider@google.com> Cc: Andrey Konovalov <andreyknvl@gmail.com> Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com> Cc: "Aneesh Kumar K.V" <aneesh.kumar@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Christophe Leroy <christophe.leroy@csgroup.eu> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Kirill A. Shutemov <kirill@shutemov.name> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Muchun Song <muchun.song@linux.dev> Cc: "Naveen N. Rao" <naveen.n.rao@linux.ibm.com> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vincenzo Frascino <vincenzo.frascino@arm.com> Cc: Yang Shi <shy828301@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-03-06mm/treewide: replace pmd_large() with pmd_leaf()Peter Xu
pmd_large() is always defined as pmd_leaf(). Merge their usages. Chose pmd_leaf() because pmd_leaf() is a global API, while pmd_large() is not. Link: https://lkml.kernel.org/r/20240305043750.93762-8-peterx@redhat.com Signed-off-by: Peter Xu <peterx@redhat.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Mike Rapoport (IBM) <rppt@kernel.org> Cc: Alexander Potapenko <glider@google.com> Cc: Andrey Konovalov <andreyknvl@gmail.com> Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com> Cc: "Aneesh Kumar K.V" <aneesh.kumar@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Christophe Leroy <christophe.leroy@csgroup.eu> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Kirill A. Shutemov <kirill@shutemov.name> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Muchun Song <muchun.song@linux.dev> Cc: "Naveen N. Rao" <naveen.n.rao@linux.ibm.com> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vincenzo Frascino <vincenzo.frascino@arm.com> Cc: Yang Shi <shy828301@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-03-06mm/kasan: use pXd_leaf() in shadow_mapped()Peter Xu
There is an old trick in shadow_mapped() to use pXd_bad() to detect huge pages. After commit 93fab1b22ef7 ("mm: add generic p?d_leaf() macros") we have a global API for huge mappings. Use that to replace the trick. Link: https://lkml.kernel.org/r/20240305043750.93762-7-peterx@redhat.com Signed-off-by: Peter Xu <peterx@redhat.com> Reviewed-by: Mike Rapoport (IBM) <rppt@kernel.org> Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com> Cc: Alexander Potapenko <glider@google.com> Cc: Andrey Konovalov <andreyknvl@gmail.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Vincenzo Frascino <vincenzo.frascino@arm.com> Cc: "Aneesh Kumar K.V" <aneesh.kumar@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Christophe Leroy <christophe.leroy@csgroup.eu> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jason Gunthorpe <jgg@nvidia.com> Cc: Kirill A. Shutemov <kirill@shutemov.name> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Muchun Song <muchun.song@linux.dev> Cc: "Naveen N. Rao" <naveen.n.rao@linux.ibm.com> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Yang Shi <shy828301@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-03-06mm/x86: drop two unnecessary pud_leaf() definitionsPeter Xu
pud_leaf() has a fallback macro defined in include/linux/pgtable.h already. Drop the extra two for x86. Link: https://lkml.kernel.org/r/20240305043750.93762-6-peterx@redhat.com Signed-off-by: Peter Xu <peterx@redhat.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Acked-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Mike Rapoport (IBM) <rppt@kernel.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Alexander Potapenko <glider@google.com> Cc: Andrey Konovalov <andreyknvl@gmail.com> Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com> Cc: "Aneesh Kumar K.V" <aneesh.kumar@kernel.org> Cc: Christophe Leroy <christophe.leroy@csgroup.eu> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Kirill A. Shutemov <kirill@shutemov.name> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Muchun Song <muchun.song@linux.dev> Cc: "Naveen N. Rao" <naveen.n.rao@linux.ibm.com> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: Vincenzo Frascino <vincenzo.frascino@arm.com> Cc: Yang Shi <shy828301@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-03-06mm/x86: replace pgd_large() with pgd_leaf()Peter Xu
pgd_leaf() is a global API while pgd_large() is not. Always use the global pgd_leaf(), then drop pgd_large(). Link: https://lkml.kernel.org/r/20240305043750.93762-5-peterx@redhat.com Signed-off-by: Peter Xu <peterx@redhat.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Mike Rapoport (IBM) <rppt@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Alexander Potapenko <glider@google.com> Cc: Andrey Konovalov <andreyknvl@gmail.com> Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com> Cc: "Aneesh Kumar K.V" <aneesh.kumar@kernel.org> Cc: Christophe Leroy <christophe.leroy@csgroup.eu> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Kirill A. Shutemov <kirill@shutemov.name> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Muchun Song <muchun.song@linux.dev> Cc: "Naveen N. Rao" <naveen.n.rao@linux.ibm.com> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: Vincenzo Frascino <vincenzo.frascino@arm.com> Cc: Yang Shi <shy828301@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-03-06mm/x86: replace p4d_large() with p4d_leaf()Peter Xu
p4d_large() is always defined as p4d_leaf(). Merge their usages. Chose p4d_leaf() because p4d_leaf() is a global API, while p4d_large() is not. Only x86 has p4d_leaf() defined as of now. So it also means after this patch we removed all p4d_large() usages. Link: https://lkml.kernel.org/r/20240305043750.93762-4-peterx@redhat.com Signed-off-by: Peter Xu <peterx@redhat.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Mike Rapoport (IBM) <rppt@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Alexander Potapenko <glider@google.com> Cc: Andrey Konovalov <andreyknvl@gmail.com> Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com> Cc: "Aneesh Kumar K.V" <aneesh.kumar@kernel.org> Cc: Christophe Leroy <christophe.leroy@csgroup.eu> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Kirill A. Shutemov <kirill@shutemov.name> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Muchun Song <muchun.song@linux.dev> Cc: "Naveen N. Rao" <naveen.n.rao@linux.ibm.com> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: Vincenzo Frascino <vincenzo.frascino@arm.com> Cc: Yang Shi <shy828301@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-03-06mm/powerpc: replace pXd_is_leaf() with pXd_leaf()Peter Xu
They're the same macros underneath. Drop pXd_is_leaf(), instead always use pXd_leaf(). At the meantime, instead of renames, drop the pXd_is_leaf() fallback definitions directly in arch/powerpc/include/asm/pgtable.h. because similar fallback macros for pXd_leaf() are already defined in include/linux/pgtable.h. Link: https://lkml.kernel.org/r/20240305043750.93762-3-peterx@redhat.com Signed-off-by: Peter Xu <peterx@redhat.com> Suggested-by: Christophe Leroy <christophe.leroy@csgroup.eu> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Christophe Leroy <christophe.leroy@csgroup.eu> Reviewed-by: Mike Rapoport (IBM) <rppt@kernel.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: "Aneesh Kumar K.V" <aneesh.kumar@kernel.org> Cc: "Naveen N. Rao" <naveen.n.rao@linux.ibm.com> Cc: Alexander Potapenko <glider@google.com> Cc: Andrey Konovalov <andreyknvl@gmail.com> Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Kirill A. Shutemov <kirill@shutemov.name> Cc: Muchun Song <muchun.song@linux.dev> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vincenzo Frascino <vincenzo.frascino@arm.com> Cc: Yang Shi <shy828301@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-03-06mm/powerpc: define pXd_large() with pXd_leaf()Peter Xu
Patch series "mm/treewide: Replace pXd_large() with pXd_leaf()", v3. These two APIs are mostly always the same. It's confusing to have both of them. Merge them into one. Here I used pXd_leaf() only because pXd_leaf() is a global API which is always defined, while pXd_large() is not. We have yet one more API that is similar which is pXd_huge(), but that's even trickier, so let's do it step by step. Some special cares are taken for ppc and x86, they're done as separate cleanups first. This patch (of 10): The two definitions are the same. The only difference is that pXd_large() is only defined with THP selected, and only on book3s 64bits. Instead of implementing it twice, make pXd_large() a macro to pXd_leaf(). Define it unconditionally just like pXd_leaf(). This helps to prepare merging the two APIs. Link: https://lkml.kernel.org/r/20240305043750.93762-1-peterx@redhat.com Link: https://lkml.kernel.org/r/20240305043750.93762-2-peterx@redhat.com Signed-off-by: Peter Xu <peterx@redhat.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Mike Rapoport (IBM) <rppt@kernel.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: Christophe Leroy <christophe.leroy@csgroup.eu> Cc: "Aneesh Kumar K.V" <aneesh.kumar@kernel.org> Cc: "Naveen N. Rao" <naveen.n.rao@linux.ibm.com> Cc: Kirill A. Shutemov <kirill@shutemov.name> Cc: Muchun Song <muchun.song@linux.dev> Cc: Yang Shi <shy828301@gmail.com> Cc: Alexander Potapenko <glider@google.com> Cc: Andrey Konovalov <andreyknvl@gmail.com> Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vincenzo Frascino <vincenzo.frascino@arm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-03-06mm/zswap: global lru and shrinker shared by all zswap_pools fixChengming Zhou
Commit bf9b7df23cb3 ("mm/zswap: global lru and shrinker shared by all zswap_pools") introduced a new lock to protect zswap_next_shrink, instead of reusing zswap_pools_lock. But the problem is that it's initialized only when zswap enabled, which causes bug if zswap_memcg_offline_cleanup() called without zswap enabled. Fix it by using DEFINE_SPINLOCK() to statically initialize them and define them as multiple static variables to keep in consistent with the existing global variables in zswap. Link: https://lkml.kernel.org/r/20240305075345.1493214-1-chengming.zhou@linux.dev Fixes: bf9b7df23cb3 ("mm/zswap: global lru and shrinker shared by all zswap_pools") Reported-by: kernel test robot <oliver.sang@intel.com> Closes: https://lore.kernel.org/oe-lkp/202403051008.a8cf8a94-lkp@intel.com Signed-off-by: Chengming Zhou <chengming.zhou@linux.dev> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Cc: Nhat Pham <nphamcs@gmail.com> Cc: Yosry Ahmed <yosryahmed@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-03-06mm: memory: fix shift-out-of-bounds in fault_around_bytes_setKefeng Wang
The rounddown_pow_of_two(0) is undefined, so val = 0 is not allowed in the fault_around_bytes_set(), and leads to shift-out-of-bounds, UBSAN: shift-out-of-bounds in include/linux/log2.h:67:13 shift exponent 4294967295 is too large for 64-bit type 'long unsigned int' CPU: 7 PID: 107 Comm: sh Not tainted 6.8.0-rc6-next-20240301 #294 Hardware name: QEMU QEMU Virtual Machine, BIOS 0.0.0 02/06/2015 Call trace: dump_backtrace+0x94/0xec show_stack+0x18/0x24 dump_stack_lvl+0x78/0x90 dump_stack+0x18/0x24 ubsan_epilogue+0x10/0x44 __ubsan_handle_shift_out_of_bounds+0x98/0x134 fault_around_bytes_set+0xa4/0xb0 simple_attr_write_xsigned.isra.0+0xe4/0x1ac simple_attr_write+0x18/0x24 debugfs_attr_write+0x4c/0x98 vfs_write+0xd0/0x4b0 ksys_write+0x6c/0xfc __arm64_sys_write+0x1c/0x28 invoke_syscall+0x44/0x104 el0_svc_common.constprop.0+0x40/0xe0 do_el0_svc+0x1c/0x28 el0_svc+0x34/0xdc el0t_64_sync_handler+0xc0/0xc4 el0t_64_sync+0x190/0x194 ---[ end trace ]--- Fix it by setting the minimum val to PAGE_SIZE. Link: https://lkml.kernel.org/r/20240302064312.2358924-1-wangkefeng.wang@huawei.com Fixes: 53d36a56d8c4 ("mm: prefer fault_around_pages to fault_around_bytes") Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com> Reported-by: Yue Sun <samsun1006219@gmail.com> Closes: https://lore.kernel.org/all/CAEkJfYPim6DQqW1GqCiHLdh2-eweqk1fGyXqs3JM+8e1qGge8w@mail.gmail.com/ Reviewed-by: Lorenzo Stoakes <lstoakes@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-03-06s390: supplement for ptdesc conversionQi Zheng
After commit 6326c26c1514 ("s390: convert various pgalloc functions to use ptdescs"), there are still some positions that use page->{lru, index} instead of ptdesc->{pt_list, pt_index}. In order to make the use of ptdesc->{pt_list, pt_index} clearer, it would be better to convert them as well. [zhengqi.arch@bytedance.com: fix build failure] Link: https://lkml.kernel.org/r/20240305072154.26168-1-zhengqi.arch@bytedance.com Link: https://lkml.kernel.org/r/04beaf3255056ffe131a5ea595736066c1e84756.1709541697.git.zhengqi.arch@bytedance.com Signed-off-by: Qi Zheng <zhengqi.arch@bytedance.com> Cc: Christian Borntraeger <borntraeger@linux.ibm.com> Cc: Janosch Frank <frankja@linux.ibm.com> Cc: Claudio Imbrenda <imbrenda@linux.ibm.com> Cc: David Hildenbrand <david@redhat.com> Cc: Hugh Dickins <hughd@google.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Mike Rapoport (IBM) <rppt@kernel.org> Cc: Muchun Song <muchun.song@linux.dev> Cc: Vishal Moola (Oracle) <vishal.moola@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-03-06mm: pgtable: add missing pt_index to struct ptdescQi Zheng
In s390, the page->index field is used for gmap (see gmap_shadow_pgt()), so add the corresponding pt_index to struct ptdesc and add a comment to clarify this. Link: https://lkml.kernel.org/r/283624c2af45fb2090b41a6b1b5481bb0a45bad7.1709541697.git.zhengqi.arch@bytedance.com Signed-off-by: Qi Zheng <zhengqi.arch@bytedance.com> Cc: Christian Borntraeger <borntraeger@linux.ibm.com> Cc: Claudio Imbrenda <imbrenda@linux.ibm.com> Cc: David Hildenbrand <david@redhat.com> Cc: Hugh Dickins <hughd@google.com> Cc: Janosch Frank <frankja@linux.ibm.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Mike Rapoport (IBM) <rppt@kernel.org> Cc: Muchun Song <muchun.song@linux.dev> Cc: Vishal Moola (Oracle) <vishal.moola@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-03-06mm: pgtable: correct the wrong comment about ptdesc->__page_flagsQi Zheng
Patch series "minor fixes and supplement for ptdesc". In this series, the [PATCH 1/3] and [PATCH 2/3] are fixes for some issues discovered during code inspection. The [PATCH 3/3] is a supplement to ptdesc conversion in s390, I don't know why this is not done in the commit 6326c26c1514 ("s390: convert various pgalloc functions to use ptdescs"), maybe I missed something. And since I don't have an s390 environment, I hope kernel test robot can help compile and test, and this is why I did not fold [PATCH 2/3] and [PATCH 3/3] into one patch. This patch (of 3): The commit 32cc0b7c9d50 ("powerpc: add pte_free_defer() for pgtables sharing page") introduced the use of PageActive flag to page table fragments tracking, so the ptdesc->__page_flags is not unused, so correct the wrong comment. Link: https://lkml.kernel.org/r/cover.1709541697.git.zhengqi.arch@bytedance.com Link: https://lkml.kernel.org/r/cc42d5915fd98fd802f920de243f535efcfe01db.1709541697.git.zhengqi.arch@bytedance.com Signed-off-by: Qi Zheng <zhengqi.arch@bytedance.com> Cc: David Hildenbrand <david@redhat.com> Cc: Hugh Dickins <hughd@google.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Mike Rapoport (IBM) <rppt@kernel.org> Cc: Muchun Song <muchun.song@linux.dev> Cc: Vishal Moola (Oracle) <vishal.moola@gmail.com> Cc: Christian Borntraeger <borntraeger@linux.ibm.com> Cc: Claudio Imbrenda <imbrenda@linux.ibm.com> Cc: Janosch Frank <frankja@linux.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-03-06mm: page_alloc: use div64_ul() instead of do_div()Thorsten Blum
Fixes Coccinelle/coccicheck warning reported by do_div.cocci. Compared to do_div(), div64_ul() does not implicitly cast the divisor and does not unnecessarily calculate the remainder. Link: https://lkml.kernel.org/r/20240228224911.1164-2-thorsten.blum@toblux.com Signed-off-by: Thorsten Blum <thorsten.blum@toblux.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-03-06mm/mempolicy: use a folio in do_mbind()Matthew Wilcox (Oracle)
We actually add folios to the pagelist already, but then work with them as pages. Removes a call to compound_head() in PageKsm() and removes a reference to page->index. Link: https://lkml.kernel.org/r/20240229153015.1996829-1-willy@infradead.org Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Gregory Price <gregory.price@memverge.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>