summaryrefslogtreecommitdiff
path: root/drivers/android/binder_alloc.c
AgeCommit message (Collapse)Author
2024-01-17Merge tag 'char-misc-6.8-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc Pull char/misc and other driver updates from Greg KH: "Here is the big set of char/misc and other driver subsystem changes for 6.8-rc1. Other than lots of binder driver changes (as you can see by the merge conflicts) included in here are: - lots of iio driver updates and additions - spmi driver updates - eeprom driver updates - firmware driver updates - ocxl driver updates - mhi driver updates - w1 driver updates - nvmem driver updates - coresight driver updates - platform driver remove callback api changes - tags.sh script updates - bus_type constant marking cleanups - lots of other small driver updates All of these have been in linux-next for a while with no reported issues" * tag 'char-misc-6.8-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc: (341 commits) android: removed duplicate linux/errno uio: Fix use-after-free in uio_open drivers: soc: xilinx: add check for platform firmware: xilinx: Export function to use in other module scripts/tags.sh: remove find_sources scripts/tags.sh: use -n to test archinclude scripts/tags.sh: add local annotation scripts/tags.sh: use more portable -path instead of -wholename scripts/tags.sh: Update comment (addition of gtags) firmware: zynqmp: Convert to platform remove callback returning void firmware: turris-mox-rwtm: Convert to platform remove callback returning void firmware: stratix10-svc: Convert to platform remove callback returning void firmware: stratix10-rsu: Convert to platform remove callback returning void firmware: raspberrypi: Convert to platform remove callback returning void firmware: qemu_fw_cfg: Convert to platform remove callback returning void firmware: mtk-adsp-ipc: Convert to platform remove callback returning void firmware: imx-dsp: Convert to platform remove callback returning void firmware: coreboot_table: Convert to platform remove callback returning void firmware: arm_scpi: Convert to platform remove callback returning void firmware: arm_scmi: Convert to platform remove callback returning void ...
2023-12-12list_lru: allow explicit memcg and NUMA node selectionNhat Pham
Patch series "workload-specific and memory pressure-driven zswap writeback", v8. There are currently several issues with zswap writeback: 1. There is only a single global LRU for zswap, making it impossible to perform worload-specific shrinking - an memcg under memory pressure cannot determine which pages in the pool it owns, and often ends up writing pages from other memcgs. This issue has been previously observed in practice and mitigated by simply disabling memcg-initiated shrinking: https://lore.kernel.org/all/20230530232435.3097106-1-nphamcs@gmail.com/T/#u But this solution leaves a lot to be desired, as we still do not have an avenue for an memcg to free up its own memory locked up in the zswap pool. 2. We only shrink the zswap pool when the user-defined limit is hit. This means that if we set the limit too high, cold data that are unlikely to be used again will reside in the pool, wasting precious memory. It is hard to predict how much zswap space will be needed ahead of time, as this depends on the workload (specifically, on factors such as memory access patterns and compressibility of the memory pages). This patch series solves these issues by separating the global zswap LRU into per-memcg and per-NUMA LRUs, and performs workload-specific (i.e memcg- and NUMA-aware) zswap writeback under memory pressure. The new shrinker does not have any parameter that must be tuned by the user, and can be opted in or out on a per-memcg basis. As a proof of concept, we ran the following synthetic benchmark: build the linux kernel in a memory-limited cgroup, and allocate some cold data in tmpfs to see if the shrinker could write them out and improved the overall performance. Depending on the amount of cold data generated, we observe from 14% to 35% reduction in kernel CPU time used in the kernel builds. This patch (of 6): The interface of list_lru is based on the assumption that the list node and the data it represents belong to the same allocated on the correct node/memcg. While this assumption is valid for existing slab objects LRU such as dentries and inodes, it is undocumented, and rather inflexible for certain potential list_lru users (such as the upcoming zswap shrinker and the THP shrinker). It has caused us a lot of issues during our development. This patch changes list_lru interface so that the caller must explicitly specify numa node and memcg when adding and removing objects. The old list_lru_add() and list_lru_del() are renamed to list_lru_add_obj() and list_lru_del_obj(), respectively. It also extends the list_lru API with a new function, list_lru_putback, which undoes a previous list_lru_isolate call. Unlike list_lru_add, it does not increment the LRU node count (as list_lru_isolate does not decrement the node count). list_lru_putback also allows for explicit memcg and NUMA node selection. Link: https://lkml.kernel.org/r/20231130194023.4102148-1-nphamcs@gmail.com Link: https://lkml.kernel.org/r/20231130194023.4102148-2-nphamcs@gmail.com Signed-off-by: Nhat Pham <nphamcs@gmail.com> Suggested-by: Johannes Weiner <hannes@cmpxchg.org> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Tested-by: Bagas Sanjaya <bagasdotme@gmail.com> Cc: Chris Li <chrisl@kernel.org> Cc: Dan Streetman <ddstreet@ieee.org> Cc: Domenico Cerasuolo <cerasuolodomenico@gmail.com> Cc: Michal Hocko <mhocko@kernel.org> Cc: Muchun Song <muchun.song@linux.dev> Cc: Roman Gushchin <roman.gushchin@linux.dev> Cc: Seth Jennings <sjenning@redhat.com> Cc: Shakeel Butt <shakeelb@google.com> Cc: Shuah Khan <shuah@kernel.org> Cc: Vitaly Wool <vitaly.wool@konsulko.com> Cc: Yosry Ahmed <yosryahmed@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-12-05binder: switch alloc->mutex to spinlock_tCarlos Llamas
The alloc->mutex is a highly contended lock that causes performance issues on Android devices. When a low-priority task is given this lock and it sleeps, it becomes difficult for the task to wake up and complete its work. This delays other tasks that are also waiting on the mutex. The problem gets worse when there is memory pressure in the system, because this increases the contention on the alloc->mutex while the shrinker reclaims binder pages. Switching to a spinlock helps to keep the waiters running and avoids the overhead of waking up tasks. This significantly improves the transaction latency when the problematic scenario occurs. The performance impact of this patchset was measured by stress-testing the binder alloc contention. In this test, several clients of different priorities send thousands of transactions of different sizes to a single server. In parallel, pages get reclaimed using the shinker's debugfs. The test was run on a Pixel 8, Pixel 6 and qemu machine. The results were similar on all three devices: after: | sched | prio | average | max | min | |--------+------+---------+-----------+---------| | fifo | 99 | 0.135ms | 1.197ms | 0.022ms | | fifo | 01 | 0.136ms | 5.232ms | 0.018ms | | other | -20 | 0.180ms | 7.403ms | 0.019ms | | other | 19 | 0.241ms | 58.094ms | 0.018ms | before: | sched | prio | average | max | min | |--------+------+---------+-----------+---------| | fifo | 99 | 0.350ms | 248.730ms | 0.020ms | | fifo | 01 | 0.357ms | 248.817ms | 0.024ms | | other | -20 | 0.399ms | 249.906ms | 0.020ms | | other | 19 | 0.477ms | 297.756ms | 0.022ms | The key metrics above are the average and max latencies (wall time). These improvements should roughly translate to p95-p99 latencies on real workloads. The response time is up to 200x faster in these scenarios and there is no penalty in the regular path. Note that it is only possible to convert this lock after a series of changes made by previous patches. These mainly include refactoring the sections that might_sleep() and changing the locking order with the mmap_lock amongst others. Reviewed-by: Alice Ryhl <aliceryhl@google.com> Signed-off-by: Carlos Llamas <cmllamas@google.com> Link: https://lore.kernel.org/r/20231201172212.1813387-29-cmllamas@google.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-12-05binder: reverse locking order in shrinker callbackCarlos Llamas
The locking order currently requires the alloc->mutex to be acquired first followed by the mmap lock. However, the alloc->mutex is converted into a spinlock in subsequent commits so the order needs to be reversed to avoid nesting the sleeping mmap lock under the spinlock. The shrinker's callback binder_alloc_free_page() is the only place that needs to be reordered since other functions have been refactored and no longer nest these locks. Some minor cosmetic changes are also included in this patch. Signed-off-by: Carlos Llamas <cmllamas@google.com> Link: https://lore.kernel.org/r/20231201172212.1813387-28-cmllamas@google.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-12-05binder: avoid user addresses in debug logsCarlos Llamas
Prefer logging vma offsets instead of addresses or simply drop the debug log altogether if not useful. Note this covers the instances affected by the switch to store addresses as unsigned long. However, there are other sections in the driver that could do the same. Signed-off-by: Carlos Llamas <cmllamas@google.com> Reviewed-by: Alice Ryhl <aliceryhl@google.com> Link: https://lore.kernel.org/r/20231201172212.1813387-27-cmllamas@google.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-12-05binder: refactor binder_delete_free_buffer()Carlos Llamas
Skip the freelist call immediately as needed, instead of continuing the pointless checks. Also, drop the debug logs that we don't really need. Signed-off-by: Carlos Llamas <cmllamas@google.com> Reviewed-by: Alice Ryhl <aliceryhl@google.com> Link: https://lore.kernel.org/r/20231201172212.1813387-26-cmllamas@google.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-12-05binder: collapse print_binder_buffer() into callerCarlos Llamas
The code in print_binder_buffer() is quite small so it can be collapsed into its single caller binder_alloc_print_allocated(). No functional change in this patch. Signed-off-by: Carlos Llamas <cmllamas@google.com> Reviewed-by: Alice Ryhl <aliceryhl@google.com> Link: https://lore.kernel.org/r/20231201172212.1813387-25-cmllamas@google.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-12-05binder: document the final page calculationCarlos Llamas
The code to determine the page range for binder_lru_freelist_del() is quite obscure. It leverages the buffer_size calculated before doing an oversized buffer split. This is used to figure out if the last page is being shared with another active buffer. If so, the page gets trimmed out of the range as it has been previously removed from the freelist. This would be equivalent to getting the start page of the next in-use buffer explicitly. However, the code for this is much larger as we can see in binder_free_buf_locked() routine. Instead, lets settle on documenting the tricky step and using better names for now. I believe an ideal solution would be to count the binder_page->users to determine when a page should be added or removed from the freelist. However, this is a much bigger change than what I'm willing to risk at this time. Signed-off-by: Carlos Llamas <cmllamas@google.com> Link: https://lore.kernel.org/r/20231201172212.1813387-24-cmllamas@google.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-12-05binder: rename lru shrinker utilitiesCarlos Llamas
Now that the page allocation step is done separately we should rename the binder_free_page_range() and binder_allocate_page_range() functions to provide a more accurate description of what they do. Lets borrow the freelist concept used in other parts of the kernel for this. No functional change here. Signed-off-by: Carlos Llamas <cmllamas@google.com> Reviewed-by: Alice Ryhl <aliceryhl@google.com> Link: https://lore.kernel.org/r/20231201172212.1813387-23-cmllamas@google.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-12-05binder: make oversized buffer code more readableCarlos Llamas
The sections in binder_alloc_new_buf_locked() dealing with oversized buffers are scattered which makes them difficult to read. Instead, consolidate this code into a single block to improve readability. No functional change here. Signed-off-by: Carlos Llamas <cmllamas@google.com> Reviewed-by: Alice Ryhl <aliceryhl@google.com> Link: https://lore.kernel.org/r/20231201172212.1813387-22-cmllamas@google.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-12-05binder: remove redundant debug logCarlos Llamas
The debug information in this statement is already logged earlier in the same function. We can get rid of this duplicate log. Signed-off-by: Carlos Llamas <cmllamas@google.com> Reviewed-by: Alice Ryhl <aliceryhl@google.com> Link: https://lore.kernel.org/r/20231201172212.1813387-21-cmllamas@google.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-12-05binder: perform page installation outside of locksCarlos Llamas
Split out the insertion of pages to be outside of the alloc->mutex in a separate binder_install_buffer_pages() routine. Since this is no longer serialized, we must look at the full range of pages used by the buffers. The installation is protected with mmap_sem in write mode since multiple tasks might race to install the same page. Besides avoiding unnecessary nested locking this helps in preparation of switching the alloc->mutex into a spinlock_t in subsequent patches. Signed-off-by: Carlos Llamas <cmllamas@google.com> Reviewed-by: Alice Ryhl <aliceryhl@google.com> Link: https://lore.kernel.org/r/20231201172212.1813387-20-cmllamas@google.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-12-05binder: initialize lru pages in mmap callbackCarlos Llamas
Rather than repeatedly initializing some of the binder_lru_page members during binder_alloc_new_buf(), perform this initialization just once in binder_alloc_mmap_handler(), after the pages have been created. Reviewed-by: Alice Ryhl <aliceryhl@google.com> Signed-off-by: Carlos Llamas <cmllamas@google.com> Link: https://lore.kernel.org/r/20231201172212.1813387-19-cmllamas@google.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-12-05binder: malloc new_buffer outside of locksCarlos Llamas
Preallocate new_buffer before acquiring the alloc->mutex and hand it down to binder_alloc_new_buf_locked(). The new buffer will be used in the vast majority of requests (measured at 98.2% in field data). The buffer is discarded otherwise. This change is required in preparation for transitioning alloc->mutex into a spinlock in subsequent commits. Signed-off-by: Carlos Llamas <cmllamas@google.com> Reviewed-by: Alice Ryhl <aliceryhl@google.com> Link: https://lore.kernel.org/r/20231201172212.1813387-18-cmllamas@google.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-12-05binder: refactor page range allocationCarlos Llamas
Instead of looping through the page range twice to first determine if the mmap lock is required, simply do it per-page as needed. Split out all this logic into a separate binder_install_single_page() function. Reviewed-by: Alice Ryhl <aliceryhl@google.com> Signed-off-by: Carlos Llamas <cmllamas@google.com> Link: https://lore.kernel.org/r/20231201172212.1813387-17-cmllamas@google.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-12-05binder: relocate binder_alloc_clear_buf()Carlos Llamas
Move this function up along with binder_alloc_get_page() so that their prototypes aren't necessary. No functional change in this patch. Reviewed-by: Alice Ryhl <aliceryhl@google.com> Signed-off-by: Carlos Llamas <cmllamas@google.com> Link: https://lore.kernel.org/r/20231201172212.1813387-16-cmllamas@google.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-12-05binder: relocate low space calculationCarlos Llamas
Move the low async space calculation to debug_low_async_space_locked(). This logic not only fits better here but also offloads some of the many tasks currently done in binder_alloc_new_buf_locked(). No functional change in this patch. Reviewed-by: Alice Ryhl <aliceryhl@google.com> Signed-off-by: Carlos Llamas <cmllamas@google.com> Link: https://lore.kernel.org/r/20231201172212.1813387-15-cmllamas@google.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-12-05binder: separate the no-space debugging logicCarlos Llamas
Move the no-space debugging logic into a separate function. Lets also mark this branch as unlikely in binder_alloc_new_buf_locked() as most requests will fit without issue. Also add a few cosmetic changes and suggestions from checkpatch. Reviewed-by: Alice Ryhl <aliceryhl@google.com> Signed-off-by: Carlos Llamas <cmllamas@google.com> Link: https://lore.kernel.org/r/20231201172212.1813387-14-cmllamas@google.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-12-05binder: remove pid param in binder_alloc_new_buf()Carlos Llamas
Binder attributes the buffer allocation to the current->tgid everytime. There is no need to pass this as a parameter so drop it. Also add a few touchups to follow the coding guidelines. No functional changes are introduced in this patch. Reviewed-by: Alice Ryhl <aliceryhl@google.com> Signed-off-by: Carlos Llamas <cmllamas@google.com> Link: https://lore.kernel.org/r/20231201172212.1813387-13-cmllamas@google.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-12-05binder: do unlocked work in binder_alloc_new_buf()Carlos Llamas
Extract non-critical sections from binder_alloc_new_buf_locked() that don't require holding the alloc->mutex. While we are here, consolidate the checks for size overflow and zero-sized padding into a separate sanitized_size() helper function. Also add a few touchups to follow the coding guidelines. Signed-off-by: Carlos Llamas <cmllamas@google.com> Reviewed-by: Alice Ryhl <aliceryhl@google.com> Link: https://lore.kernel.org/r/20231201172212.1813387-12-cmllamas@google.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-12-05binder: split up binder_update_page_range()Carlos Llamas
The binder_update_page_range() function performs both allocation and freeing of binder pages. However, these two operations are unrelated and have no common logic. In fact, when a free operation is requested, the allocation logic is skipped entirely. This behavior makes the error path unnecessarily complex. To improve readability of the code, this patch splits the allocation and freeing operations into separate functions. No functional changes are introduced by this patch. Reviewed-by: Alice Ryhl <aliceryhl@google.com> Signed-off-by: Carlos Llamas <cmllamas@google.com> Link: https://lore.kernel.org/r/20231201172212.1813387-11-cmllamas@google.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-12-05binder: keep vma addresses type as unsigned longCarlos Llamas
The vma addresses in binder are currently stored as void __user *. This requires casting back and forth between the mm/ api which uses unsigned long. Since we also do internal arithmetic on these addresses we end up having to cast them _again_ to an integer type. Lets stop all the unnecessary casting which kills code readability and store the virtual addresses as the native unsigned long from mm/. Note that this approach is preferred over uintptr_t as Linus explains in [1]. Opportunistically add a few cosmetic touchups. Link: https://lore.kernel.org/all/CAHk-=wj2OHy-5e+srG1fy+ZU00TmZ1NFp6kFLbVLMXHe7A1d-g@mail.gmail.com/ [1] Signed-off-by: Carlos Llamas <cmllamas@google.com> Reviewed-by: Alice Ryhl <aliceryhl@google.com> Link: https://lore.kernel.org/r/20231201172212.1813387-10-cmllamas@google.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-12-05binder: fix comment on binder_alloc_new_buf() return valueCarlos Llamas
Update the comments of binder_alloc_new_buf() to reflect that the return value of the function is now ERR_PTR(-errno) on failure. No functional changes in this patch. Cc: stable@vger.kernel.org Fixes: 57ada2fb2250 ("binder: add log information for binder transaction failures") Reviewed-by: Alice Ryhl <aliceryhl@google.com> Signed-off-by: Carlos Llamas <cmllamas@google.com> Link: https://lore.kernel.org/r/20231201172212.1813387-8-cmllamas@google.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-12-05binder: fix trivial typo of binder_free_buf_locked()Carlos Llamas
Fix minor misspelling of the function in the comment section. No functional changes in this patch. Cc: stable@vger.kernel.org Fixes: 0f966cba95c7 ("binder: add flag to clear buffer on txn complete") Reviewed-by: Alice Ryhl <aliceryhl@google.com> Signed-off-by: Carlos Llamas <cmllamas@google.com> Link: https://lore.kernel.org/r/20231201172212.1813387-7-cmllamas@google.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-12-05binder: fix unused alloc->free_async_spaceCarlos Llamas
Each transaction is associated with a 'struct binder_buffer' that stores the metadata about its buffer area. Since commit 74310e06be4d ("android: binder: Move buffer out of area shared with user space") this struct is no longer embedded within the buffer itself but is instead allocated on the heap to prevent userspace access to this driver-exclusive info. Unfortunately, the space of this struct is still being accounted for in the total buffer size calculation, specifically for async transactions. This results in an additional 104 bytes added to every async buffer request, and this area is never used. This wasted space can be substantial. If we consider the maximum mmap buffer space of SZ_4M, the driver will reserve half of it for async transactions, or 0x200000. This area should, in theory, accommodate up to 262,144 buffers of the minimum 8-byte size. However, after adding the extra 'sizeof(struct binder_buffer)', the total number of buffers drops to only 18,724, which is a sad 7.14% of the actual capacity. This patch fixes the buffer size calculation to enable the utilization of the entire async buffer space. This is expected to reduce the number of -ENOSPC errors that are seen on the field. Fixes: 74310e06be4d ("android: binder: Move buffer out of area shared with user space") Signed-off-by: Carlos Llamas <cmllamas@google.com> Reviewed-by: Alice Ryhl <aliceryhl@google.com> Link: https://lore.kernel.org/r/20231201172212.1813387-6-cmllamas@google.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-12-05binder: fix async space check for 0-sized buffersCarlos Llamas
Move the padding of 0-sized buffers to an earlier stage to account for this round up during the alloc->free_async_space check. Fixes: 74310e06be4d ("android: binder: Move buffer out of area shared with user space") Reviewed-by: Alice Ryhl <aliceryhl@google.com> Signed-off-by: Carlos Llamas <cmllamas@google.com> Link: https://lore.kernel.org/r/20231201172212.1813387-5-cmllamas@google.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-12-05binder: fix race between mmput() and do_exit()Carlos Llamas
Task A calls binder_update_page_range() to allocate and insert pages on a remote address space from Task B. For this, Task A pins the remote mm via mmget_not_zero() first. This can race with Task B do_exit() and the final mmput() refcount decrement will come from Task A. Task A | Task B ------------------+------------------ mmget_not_zero() | | do_exit() | exit_mm() | mmput() mmput() | exit_mmap() | remove_vma() | fput() | In this case, the work of ____fput() from Task B is queued up in Task A as TWA_RESUME. So in theory, Task A returns to userspace and the cleanup work gets executed. However, Task A instead sleep, waiting for a reply from Task B that never comes (it's dead). This means the binder_deferred_release() is blocked until an unrelated binder event forces Task A to go back to userspace. All the associated death notifications will also be delayed until then. In order to fix this use mmput_async() that will schedule the work in the corresponding mm->async_put_work WQ instead of Task A. Fixes: 457b9a6f09f0 ("Staging: android: add binder driver") Reviewed-by: Alice Ryhl <aliceryhl@google.com> Signed-off-by: Carlos Llamas <cmllamas@google.com> Link: https://lore.kernel.org/r/20231201172212.1813387-4-cmllamas@google.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-12-05binder: fix use-after-free in shinker's callbackCarlos Llamas
The mmap read lock is used during the shrinker's callback, which means that using alloc->vma pointer isn't safe as it can race with munmap(). As of commit dd2283f2605e ("mm: mmap: zap pages with read mmap_sem in munmap") the mmap lock is downgraded after the vma has been isolated. I was able to reproduce this issue by manually adding some delays and triggering page reclaiming through the shrinker's debug sysfs. The following KASAN report confirms the UAF: ================================================================== BUG: KASAN: slab-use-after-free in zap_page_range_single+0x470/0x4b8 Read of size 8 at addr ffff356ed50e50f0 by task bash/478 CPU: 1 PID: 478 Comm: bash Not tainted 6.6.0-rc5-00055-g1c8b86a3799f-dirty #70 Hardware name: linux,dummy-virt (DT) Call trace: zap_page_range_single+0x470/0x4b8 binder_alloc_free_page+0x608/0xadc __list_lru_walk_one+0x130/0x3b0 list_lru_walk_node+0xc4/0x22c binder_shrink_scan+0x108/0x1dc shrinker_debugfs_scan_write+0x2b4/0x500 full_proxy_write+0xd4/0x140 vfs_write+0x1ac/0x758 ksys_write+0xf0/0x1dc __arm64_sys_write+0x6c/0x9c Allocated by task 492: kmem_cache_alloc+0x130/0x368 vm_area_alloc+0x2c/0x190 mmap_region+0x258/0x18bc do_mmap+0x694/0xa60 vm_mmap_pgoff+0x170/0x29c ksys_mmap_pgoff+0x290/0x3a0 __arm64_sys_mmap+0xcc/0x144 Freed by task 491: kmem_cache_free+0x17c/0x3c8 vm_area_free_rcu_cb+0x74/0x98 rcu_core+0xa38/0x26d4 rcu_core_si+0x10/0x1c __do_softirq+0x2fc/0xd24 Last potentially related work creation: __call_rcu_common.constprop.0+0x6c/0xba0 call_rcu+0x10/0x1c vm_area_free+0x18/0x24 remove_vma+0xe4/0x118 do_vmi_align_munmap.isra.0+0x718/0xb5c do_vmi_munmap+0xdc/0x1fc __vm_munmap+0x10c/0x278 __arm64_sys_munmap+0x58/0x7c Fix this issue by performing instead a vma_lookup() which will fail to find the vma that was isolated before the mmap lock downgrade. Note that this option has better performance than upgrading to a mmap write lock which would increase contention. Plus, mmap_write_trylock() has been recently removed anyway. Fixes: dd2283f2605e ("mm: mmap: zap pages with read mmap_sem in munmap") Cc: stable@vger.kernel.org Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Minchan Kim <minchan@kernel.org> Reviewed-by: Alice Ryhl <aliceryhl@google.com> Signed-off-by: Carlos Llamas <cmllamas@google.com> Link: https://lore.kernel.org/r/20231201172212.1813387-3-cmllamas@google.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-10-04binder: dynamically allocate the android-binder shrinkerQi Zheng
Use new APIs to dynamically allocate the android-binder shrinker. Link: https://lkml.kernel.org/r/20230911094444.68966-4-zhengqi.arch@bytedance.com Signed-off-by: Qi Zheng <zhengqi.arch@bytedance.com> Reviewed-by: Muchun Song <songmuchun@bytedance.com> Acked-by: Carlos Llamas <cmllamas@google.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Abhinav Kumar <quic_abhinavk@quicinc.com> Cc: Alasdair Kergon <agk@redhat.com> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com> Cc: Andreas Dilger <adilger.kernel@dilger.ca> Cc: Andreas Gruenbacher <agruenba@redhat.com> Cc: Anna Schumaker <anna@kernel.org> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Bob Peterson <rpeterso@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Chandan Babu R <chandan.babu@oracle.com> Cc: Chao Yu <chao@kernel.org> Cc: Chris Mason <clm@fb.com> Cc: Christian Brauner <brauner@kernel.org> Cc: Christian Koenig <christian.koenig@amd.com> Cc: Chuck Lever <cel@kernel.org> Cc: Coly Li <colyli@suse.de> Cc: Dai Ngo <Dai.Ngo@oracle.com> Cc: Daniel Vetter <daniel@ffwll.ch> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Cc: "Darrick J. Wong" <djwong@kernel.org> Cc: Dave Chinner <david@fromorbit.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: David Airlie <airlied@gmail.com> Cc: David Hildenbrand <david@redhat.com> Cc: David Sterba <dsterba@suse.com> Cc: Dmitry Baryshkov <dmitry.baryshkov@linaro.org> Cc: Gao Xiang <hsiangkao@linux.alibaba.com> Cc: Huang Rui <ray.huang@amd.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jaegeuk Kim <jaegeuk@kernel.org> Cc: Jani Nikula <jani.nikula@linux.intel.com> Cc: Jan Kara <jack@suse.cz> Cc: Jason Wang <jasowang@redhat.com> Cc: Jeff Layton <jlayton@kernel.org> Cc: Jeffle Xu <jefflexu@linux.alibaba.com> Cc: Joel Fernandes (Google) <joel@joelfernandes.org> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: Josef Bacik <josef@toxicpanda.com> Cc: Juergen Gross <jgross@suse.com> Cc: Kent Overstreet <kent.overstreet@gmail.com> Cc: Kirill Tkhai <tkhai@ya.ru> Cc: Marijn Suijten <marijn.suijten@somainline.org> Cc: "Michael S. Tsirkin" <mst@redhat.com> Cc: Mike Snitzer <snitzer@kernel.org> Cc: Minchan Kim <minchan@kernel.org> Cc: Muchun Song <muchun.song@linux.dev> Cc: Nadav Amit <namit@vmware.com> Cc: Neil Brown <neilb@suse.de> Cc: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com> Cc: Olga Kornievskaia <kolga@netapp.com> Cc: Paul E. McKenney <paulmck@kernel.org> Cc: Richard Weinberger <richard@nod.at> Cc: Rob Clark <robdclark@gmail.com> Cc: Rob Herring <robh@kernel.org> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com> Cc: Roman Gushchin <roman.gushchin@linux.dev> Cc: Sean Paul <sean@poorly.run> Cc: Sergey Senozhatsky <senozhatsky@chromium.org> Cc: Song Liu <song@kernel.org> Cc: Stefano Stabellini <sstabellini@kernel.org> Cc: Steven Price <steven.price@arm.com> Cc: "Theodore Ts'o" <tytso@mit.edu> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tomeu Vizoso <tomeu.vizoso@collabora.com> Cc: Tom Talpey <tom@talpey.com> Cc: Trond Myklebust <trond.myklebust@hammerspace.com> Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Xuan Zhuo <xuanzhuo@linux.alibaba.com> Cc: Yue Hu <huyue2@coolpad.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-08-04binder: fix memory leak in binder_init()Qi Zheng
In binder_init(), the destruction of binder_alloc_shrinker_init() is not performed in the wrong path, which will cause memory leaks. So this commit introduces binder_alloc_shrinker_exit() and calls it in the wrong path to fix that. Signed-off-by: Qi Zheng <zhengqi.arch@bytedance.com> Acked-by: Carlos Llamas <cmllamas@google.com> Fixes: f2517eb76f1f ("android: binder: Add global lru shrinker to binder") Cc: stable <stable@kernel.org> Link: https://lore.kernel.org/r/20230625154937.64316-1-qi.zheng@linux.dev Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-05-20binder: fix UAF of alloc->vma in race with munmap()Carlos Llamas
[ cmllamas: clean forward port from commit 015ac18be7de ("binder: fix UAF of alloc->vma in race with munmap()") in 5.10 stable. It is needed in mainline after the revert of commit a43cfc87caaf ("android: binder: stop saving a pointer to the VMA") as pointed out by Liam. The commit log and tags have been tweaked to reflect this. ] In commit 720c24192404 ("ANDROID: binder: change down_write to down_read") binder assumed the mmap read lock is sufficient to protect alloc->vma inside binder_update_page_range(). This used to be accurate until commit dd2283f2605e ("mm: mmap: zap pages with read mmap_sem in munmap"), which now downgrades the mmap_lock after detaching the vma from the rbtree in munmap(). Then it proceeds to teardown and free the vma with only the read lock held. This means that accesses to alloc->vma in binder_update_page_range() now will race with vm_area_free() in munmap() and can cause a UAF as shown in the following KASAN trace: ================================================================== BUG: KASAN: use-after-free in vm_insert_page+0x7c/0x1f0 Read of size 8 at addr ffff16204ad00600 by task server/558 CPU: 3 PID: 558 Comm: server Not tainted 5.10.150-00001-gdc8dcf942daa #1 Hardware name: linux,dummy-virt (DT) Call trace: dump_backtrace+0x0/0x2a0 show_stack+0x18/0x2c dump_stack+0xf8/0x164 print_address_description.constprop.0+0x9c/0x538 kasan_report+0x120/0x200 __asan_load8+0xa0/0xc4 vm_insert_page+0x7c/0x1f0 binder_update_page_range+0x278/0x50c binder_alloc_new_buf+0x3f0/0xba0 binder_transaction+0x64c/0x3040 binder_thread_write+0x924/0x2020 binder_ioctl+0x1610/0x2e5c __arm64_sys_ioctl+0xd4/0x120 el0_svc_common.constprop.0+0xac/0x270 do_el0_svc+0x38/0xa0 el0_svc+0x1c/0x2c el0_sync_handler+0xe8/0x114 el0_sync+0x180/0x1c0 Allocated by task 559: kasan_save_stack+0x38/0x6c __kasan_kmalloc.constprop.0+0xe4/0xf0 kasan_slab_alloc+0x18/0x2c kmem_cache_alloc+0x1b0/0x2d0 vm_area_alloc+0x28/0x94 mmap_region+0x378/0x920 do_mmap+0x3f0/0x600 vm_mmap_pgoff+0x150/0x17c ksys_mmap_pgoff+0x284/0x2dc __arm64_sys_mmap+0x84/0xa4 el0_svc_common.constprop.0+0xac/0x270 do_el0_svc+0x38/0xa0 el0_svc+0x1c/0x2c el0_sync_handler+0xe8/0x114 el0_sync+0x180/0x1c0 Freed by task 560: kasan_save_stack+0x38/0x6c kasan_set_track+0x28/0x40 kasan_set_free_info+0x24/0x4c __kasan_slab_free+0x100/0x164 kasan_slab_free+0x14/0x20 kmem_cache_free+0xc4/0x34c vm_area_free+0x1c/0x2c remove_vma+0x7c/0x94 __do_munmap+0x358/0x710 __vm_munmap+0xbc/0x130 __arm64_sys_munmap+0x4c/0x64 el0_svc_common.constprop.0+0xac/0x270 do_el0_svc+0x38/0xa0 el0_svc+0x1c/0x2c el0_sync_handler+0xe8/0x114 el0_sync+0x180/0x1c0 [...] ================================================================== To prevent the race above, revert back to taking the mmap write lock inside binder_update_page_range(). One might expect an increase of mmap lock contention. However, binder already serializes these calls via top level alloc->mutex. Also, there was no performance impact shown when running the binder benchmark tests. Fixes: c0fd2101781e ("Revert "android: binder: stop saving a pointer to the VMA"") Fixes: dd2283f2605e ("mm: mmap: zap pages with read mmap_sem in munmap") Reported-by: Jann Horn <jannh@google.com> Closes: https://lore.kernel.org/all/20230518144052.xkj6vmddccq4v66b@revolver Cc: <stable@vger.kernel.org> Cc: Minchan Kim <minchan@kernel.org> Cc: Yang Shi <yang.shi@linux.alibaba.com> Cc: Liam Howlett <liam.howlett@oracle.com> Signed-off-by: Carlos Llamas <cmllamas@google.com> Acked-by: Todd Kjos <tkjos@google.com> Link: https://lore.kernel.org/r/20230519195950.1775656-1-cmllamas@google.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-05-13binder: add lockless binder_alloc_(set|get)_vma()Carlos Llamas
Bring back the original lockless design in binder_alloc to determine whether the buffer setup has been completed by the ->mmap() handler. However, this time use smp_load_acquire() and smp_store_release() to wrap all the ordering in a single macro call. Also, add comments to make it evident that binder uses alloc->vma to determine when the binder_alloc has been fully initialized. In these scenarios acquiring the mmap_lock is not required. Fixes: a43cfc87caaf ("android: binder: stop saving a pointer to the VMA") Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Suren Baghdasaryan <surenb@google.com> Cc: stable@vger.kernel.org Signed-off-by: Carlos Llamas <cmllamas@google.com> Link: https://lore.kernel.org/r/20230502201220.1756319-3-cmllamas@google.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-05-13Revert "android: binder: stop saving a pointer to the VMA"Carlos Llamas
This reverts commit a43cfc87caaf46710c8027a8c23b8a55f1078f19. This patch fixed an issue reported by syzkaller in [1]. However, this turned out to be only a band-aid in binder. The root cause, as bisected by syzkaller, was fixed by commit 5789151e48ac ("mm/mmap: undo ->mmap() when mas_preallocate() fails"). We no longer need the patch for binder. Reverting such patch allows us to have a lockless access to alloc->vma in specific cases where the mmap_lock is not required. This approach avoids the contention that caused a performance regression. [1] https://lore.kernel.org/all/0000000000004a0dbe05e1d749e0@google.com [cmllamas: resolved conflicts with rework of alloc->mm and removal of binder_alloc_set_vma() also fixed comment section] Fixes: a43cfc87caaf ("android: binder: stop saving a pointer to the VMA") Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Suren Baghdasaryan <surenb@google.com> Cc: stable@vger.kernel.org Signed-off-by: Carlos Llamas <cmllamas@google.com> Link: https://lore.kernel.org/r/20230502201220.1756319-2-cmllamas@google.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-05-13Revert "binder_alloc: add missing mmap_lock calls when using the VMA"Carlos Llamas
This reverts commit 44e602b4e52f70f04620bbbf4fe46ecb40170bde. This caused a performance regression particularly when pages are getting reclaimed. We don't need to acquire the mmap_lock to determine when the binder buffer has been fully initialized. A subsequent patch will bring back the lockless approach for this. [cmllamas: resolved trivial conflicts with renaming of alloc->mm] Fixes: 44e602b4e52f ("binder_alloc: add missing mmap_lock calls when using the VMA") Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Suren Baghdasaryan <surenb@google.com> Cc: stable@vger.kernel.org Signed-off-by: Carlos Llamas <cmllamas@google.com> Link: https://lore.kernel.org/r/20230502201220.1756319-1-cmllamas@google.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-01-18mm: remove zap_page_range and create zap_vma_pagesMike Kravetz
zap_page_range was originally designed to unmap pages within an address range that could span multiple vmas. While working on [1], it was discovered that all callers of zap_page_range pass a range entirely within a single vma. In addition, the mmu notification call within zap_page range does not correctly handle ranges that span multiple vmas. When crossing a vma boundary, a new mmu_notifier_range_init/end call pair with the new vma should be made. Instead of fixing zap_page_range, do the following: - Create a new routine zap_vma_pages() that will remove all pages within the passed vma. Most users of zap_page_range pass the entire vma and can use this new routine. - For callers of zap_page_range not passing the entire vma, instead call zap_page_range_single(). - Remove zap_page_range. [1] https://lore.kernel.org/linux-mm/20221114235507.294320-2-mike.kravetz@oracle.com/ Link: https://lkml.kernel.org/r/20230104002732.232573-1-mike.kravetz@oracle.com Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com> Suggested-by: Peter Xu <peterx@redhat.com> Acked-by: Michal Hocko <mhocko@suse.com> Acked-by: Peter Xu <peterx@redhat.com> Acked-by: Heiko Carstens <hca@linux.ibm.com> [s390] Reviewed-by: Christoph Hellwig <hch@lst.de> Cc: Christian Borntraeger <borntraeger@linux.ibm.com> Cc: Christian Brauner <brauner@kernel.org> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: David Hildenbrand <david@redhat.com> Cc: Eric Dumazet <edumazet@google.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Nadav Amit <nadav.amit@gmail.com> Cc: Palmer Dabbelt <palmer@dabbelt.com> Cc: Rik van Riel <riel@surriel.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Will Deacon <will@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-11-09binder: validate alloc->mm in ->mmap() handlerCarlos Llamas
Since commit 1da52815d5f1 ("binder: fix alloc->vma_vm_mm null-ptr dereference") binder caches a pointer to the current->mm during open(). This fixes a null-ptr dereference reported by syzkaller. Unfortunately, it also opens the door for a process to update its mm after the open(), (e.g. via execve) making the cached alloc->mm pointer invalid. Things get worse when the process continues to mmap() a vma. From this point forward, binder will attempt to find this vma using an obsolete alloc->mm reference. Such as in binder_update_page_range(), where the wrong vma is obtained via vma_lookup(), yet binder proceeds to happily insert new pages into it. To avoid this issue fail the ->mmap() callback if we detect a mismatch between the vma->vm_mm and the original alloc->mm pointer. This prevents alloc->vm_addr from getting set, so that any subsequent vma_lookup() calls fail as expected. Fixes: 1da52815d5f1 ("binder: fix alloc->vma_vm_mm null-ptr dereference") Reported-by: Jann Horn <jannh@google.com> Cc: <stable@vger.kernel.org> # 5.15+ Signed-off-by: Carlos Llamas <cmllamas@google.com> Acked-by: Todd Kjos <tkjos@google.com> Link: https://lore.kernel.org/r/20221104231235.348958-1-cmllamas@google.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-09-06binder: remove binder_alloc_set_vma()Carlos Llamas
The mmap_locked asserts here are not needed since this is only called back from the mmap stack in ->mmap() and ->close() which always acquire the lock first. Remove these asserts along with binder_alloc_set_vma() altogether since it's trivial enough to be consumed by callers. Cc: Liam R. Howlett <Liam.Howlett@oracle.com> Signed-off-by: Carlos Llamas <cmllamas@google.com> Link: https://lore.kernel.org/r/20220906135948.3048225-3-cmllamas@google.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-09-06binder: rename alloc->vma_vm_mm to alloc->mmCarlos Llamas
Rename ->vma_vm_mm to ->mm to reflect the fact that we no longer cache this reference from vma->vm_mm but from current->mm instead. No functional changes in this patch. Reviewed-by: Christian Brauner (Microsoft) <brauner@kernel.org> Acked-by: Todd Kjos <tkjos@google.com> Signed-off-by: Carlos Llamas <cmllamas@google.com> Link: https://lore.kernel.org/r/20220906135948.3048225-2-cmllamas@google.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-09-02Merge tag 'char-misc-6.0-rc4' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc Pull char/misc driver fixes from Greg KH: "Here are some small char/misc and other driver fixes for 6.0-rc4. Included in here are: - binder fixes for previous fixes, and a few more fixes uncovered by them. - iio driver fixes - soundwire driver fixes - fastrpc driver fixes for memory corruption on some hardware - peci driver fix - mhi driver fix All of these have been in linux-next with no reported problems" * tag 'char-misc-6.0-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc: binder: fix alloc->vma_vm_mm null-ptr dereference misc: fastrpc: increase maximum session count misc: fastrpc: fix memory corruption on open misc: fastrpc: fix memory corruption on probe soundwire: qcom: fix device status array range bus: mhi: host: Fix up null pointer access in mhi_irq_handler soundwire: qcom: remove duplicate reset control get iio: light: cm32181: make cm32181_pm_ops static iio: ad7292: Prevent regulator double disable dt-bindings: iio: gyroscope: bosch,bmg160: correct number of pins iio: adc: mcp3911: use correct formula for AD conversion iio: adc: mcp3911: correct "microchip,device-addr" property Revert "binder_alloc: Add missing mmap_lock calls when using the VMA" binder_alloc: Add missing mmap_lock calls when using the VMA binder: fix UAF of ref->proc caused by race condition iio: light: cm3605: Fix an error handling path in cm3605_probe() iio: adc: mcp3911: make use of the sign bit peci: cpu: Fix use-after-free in adev_release() peci: aspeed: fix error check return value of platform_get_irq()
2022-09-01binder: fix alloc->vma_vm_mm null-ptr dereferenceCarlos Llamas
Syzbot reported a couple issues introduced by commit 44e602b4e52f ("binder_alloc: add missing mmap_lock calls when using the VMA"), in which we attempt to acquire the mmap_lock when alloc->vma_vm_mm has not been initialized yet. This can happen if a binder_proc receives a transaction without having previously called mmap() to setup the binder_proc->alloc space in [1]. Also, a similar issue occurs via binder_alloc_print_pages() when we try to dump the debugfs binder stats file in [2]. Sample of syzbot's crash report: ================================================================== KASAN: null-ptr-deref in range [0x0000000000000128-0x000000000000012f] CPU: 0 PID: 3755 Comm: syz-executor229 Not tainted 6.0.0-rc1-next-20220819-syzkaller #0 syz-executor229[3755] cmdline: ./syz-executor2294415195 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 07/22/2022 RIP: 0010:__lock_acquire+0xd83/0x56d0 kernel/locking/lockdep.c:4923 [...] Call Trace: <TASK> lock_acquire kernel/locking/lockdep.c:5666 [inline] lock_acquire+0x1ab/0x570 kernel/locking/lockdep.c:5631 down_read+0x98/0x450 kernel/locking/rwsem.c:1499 mmap_read_lock include/linux/mmap_lock.h:117 [inline] binder_alloc_new_buf_locked drivers/android/binder_alloc.c:405 [inline] binder_alloc_new_buf+0xa5/0x19e0 drivers/android/binder_alloc.c:593 binder_transaction+0x242e/0x9a80 drivers/android/binder.c:3199 binder_thread_write+0x664/0x3220 drivers/android/binder.c:3986 binder_ioctl_write_read drivers/android/binder.c:5036 [inline] binder_ioctl+0x3470/0x6d00 drivers/android/binder.c:5323 vfs_ioctl fs/ioctl.c:51 [inline] __do_sys_ioctl fs/ioctl.c:870 [inline] __se_sys_ioctl fs/ioctl.c:856 [inline] __x64_sys_ioctl+0x193/0x200 fs/ioctl.c:856 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x63/0xcd [...] ================================================================== Fix these issues by setting up alloc->vma_vm_mm pointer during open() and caching directly from current->mm. This guarantees we have a valid reference to take the mmap_lock during scenarios described above. [1] https://syzkaller.appspot.com/bug?extid=f7dc54e5be28950ac459 [2] https://syzkaller.appspot.com/bug?extid=a75ebe0452711c9e56d9 Fixes: 44e602b4e52f ("binder_alloc: add missing mmap_lock calls when using the VMA") Cc: <stable@vger.kernel.org> # v5.15+ Cc: Liam R. Howlett <Liam.Howlett@oracle.com> Reported-by: syzbot+f7dc54e5be28950ac459@syzkaller.appspotmail.com Reported-by: syzbot+a75ebe0452711c9e56d9@syzkaller.appspotmail.com Reviewed-by: Liam R. Howlett <Liam.Howlett@oracle.com> Acked-by: Todd Kjos <tkjos@google.com> Signed-off-by: Carlos Llamas <cmllamas@google.com> Link: https://lore.kernel.org/r/20220829201254.1814484-2-cmllamas@google.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-08-28binder_alloc: add missing mmap_lock calls when using the VMALiam Howlett
Take the mmap_read_lock() when using the VMA in binder_alloc_print_pages() and when checking for a VMA in binder_alloc_new_buf_locked(). It is worth noting binder_alloc_new_buf_locked() drops the VMA read lock after it verifies a VMA exists, but may be taken again deeper in the call stack, if necessary. Link: https://lkml.kernel.org/r/20220810160209.1630707-1-Liam.Howlett@oracle.com Fixes: a43cfc87caaf (android: binder: stop saving a pointer to the VMA) Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com> Reported-by: Ondrej Mosnacek <omosnace@redhat.com> Reported-by: <syzbot+a7b60a176ec13cafb793@syzkaller.appspotmail.com> Acked-by: Carlos Llamas <cmllamas@google.com> Tested-by: Ondrej Mosnacek <omosnace@redhat.com> Cc: Minchan Kim <minchan@kernel.org> Cc: Christian Brauner (Microsoft) <brauner@kernel.org> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Hridya Valsaraju <hridya@google.com> Cc: Joel Fernandes <joel@joelfernandes.org> Cc: Martijn Coenen <maco@android.com> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Todd Kjos <tkjos@android.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: "Arve Hjønnevåg" <arve@android.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-08-19Revert "binder_alloc: Add missing mmap_lock calls when using the VMA"Greg Kroah-Hartman
This reverts commit d6f35446d0769a98e9d761593d267cdd24f09ecd. It is coming in through Andrew's tree instead, and for some reason we have different versions. I trust the version from Andrew more as the original offending commit came through his tree. Fixes: d6f35446d076 ("binder_alloc: Add missing mmap_lock calls when using the VMA") Cc: stable <stable@kernel.org> Cc: Ondrej Mosnacek <omosnace@redhat.com> Cc: Ondrej Mosnacek <omosnace@redhat.com> Cc: Carlos Llamas <cmllamas@google.com> Cc: Liam R. Howlett <Liam.Howlett@oracle.com> Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Link: https://lore.kernel.org/r/20220819184027.7b3fda3e@canb.auug.org.au Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-08-18binder_alloc: Add missing mmap_lock calls when using the VMALiam Howlett
Take the mmap_read_lock() when using the VMA in binder_alloc_print_pages() and when checking for a VMA in binder_alloc_new_buf_locked(). It is worth noting binder_alloc_new_buf_locked() drops the VMA read lock after it verifies a VMA exists, but may be taken again deeper in the call stack, if necessary. Fixes: a43cfc87caaf (android: binder: stop saving a pointer to the VMA) Cc: stable <stable@kernel.org> Reported-by: Ondrej Mosnacek <omosnace@redhat.com> Reported-by: syzbot+a7b60a176ec13cafb793@syzkaller.appspotmail.com Tested-by: Ondrej Mosnacek <omosnace@redhat.com> Acked-by: Carlos Llamas <cmllamas@google.com> Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com> Link: https://lore.kernel.org/r/20220810160209.1630707-1-Liam.Howlett@oracle.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-07-29android: binder: fix lockdep check on clearing vmaLiam Howlett
When munmapping a vma, the mmap_lock can be degraded to a write before calling close() on the file handle. The binder close() function calls binder_alloc_set_vma() to clear the vma address, which now has a lock dep check for writing on the mmap_lock. Change the lockdep check to ensure the reading lock is held while clearing and keep the write check while writing. Link: https://lkml.kernel.org/r/20220627151857.2316964-1-Liam.Howlett@oracle.com Fixes: 472a68df605b ("android: binder: stop saving a pointer to the VMA") Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com> Reported-by: syzbot+da54fa8d793ca89c741f@syzkaller.appspotmail.com Acked-by: Todd Kjos <tkjos@google.com> Cc: "Arve Hjønnevåg" <arve@android.com> Cc: Christian Brauner (Microsoft) <brauner@kernel.org> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Hridya Valsaraju <hridya@google.com> Cc: Joel Fernandes <joel@joelfernandes.org> Cc: Martijn Coenen <maco@android.com> Cc: Suren Baghdasaryan <surenb@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-07-29android: binder: stop saving a pointer to the VMALiam R. Howlett
Do not record a pointer to a VMA outside of the mmap_lock for later use. This is unsafe and there are a number of failure paths *after* the recorded VMA pointer may be freed during setup. There is no callback to the driver to clear the saved pointer from generic mm code. Furthermore, the VMA pointer may become stale if any number of VMA operations end up freeing the VMA so saving it was fragile to being with. Instead, change the binder_alloc struct to record the start address of the VMA and use vma_lookup() to get the vma when needed. Add lockdep mmap_lock checks on updates to the vma pointer to ensure the lock is held and depend on that lock for synchronization of readers and writers - which was already the case anyways, so the smp_wmb()/smp_rmb() was not necessary. [akpm@linux-foundation.org: fix drivers/android/binder_alloc_selftest.c] Link: https://lkml.kernel.org/r/20220621140212.vpkio64idahetbyf@revolver Fixes: da1b9564e85b ("android: binder: fix the race mmap and alloc_new_buf_locked") Reported-by: syzbot+58b51ac2b04e388ab7b0@syzkaller.appspotmail.com Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com> Cc: Minchan Kim <minchan@kernel.org> Cc: Christian Brauner (Microsoft) <brauner@kernel.org> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Hridya Valsaraju <hridya@google.com> Cc: Joel Fernandes <joel@joelfernandes.org> Cc: Martijn Coenen <maco@android.com> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Todd Kjos <tkjos@android.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-07-03mm: shrinkers: provide shrinkers with namesRoman Gushchin
Currently shrinkers are anonymous objects. For debugging purposes they can be identified by count/scan function names, but it's not always useful: e.g. for superblock's shrinkers it's nice to have at least an idea of to which superblock the shrinker belongs. This commit adds names to shrinkers. register_shrinker() and prealloc_shrinker() functions are extended to take a format and arguments to master a name. In some cases it's not possible to determine a good name at the time when a shrinker is allocated. For such cases shrinker_debugfs_rename() is provided. The expected format is: <subsystem>-<shrinker_type>[:<instance>]-<id> For some shrinkers an instance can be encoded as (MAJOR:MINOR) pair. After this change the shrinker debugfs directory looks like: $ cd /sys/kernel/debug/shrinker/ $ ls dquota-cache-16 sb-devpts-28 sb-proc-47 sb-tmpfs-42 mm-shadow-18 sb-devtmpfs-5 sb-proc-48 sb-tmpfs-43 mm-zspool:zram0-34 sb-hugetlbfs-17 sb-pstore-31 sb-tmpfs-44 rcu-kfree-0 sb-hugetlbfs-33 sb-rootfs-2 sb-tmpfs-49 sb-aio-20 sb-iomem-12 sb-securityfs-6 sb-tracefs-13 sb-anon_inodefs-15 sb-mqueue-21 sb-selinuxfs-22 sb-xfs:vda1-36 sb-bdev-3 sb-nsfs-4 sb-sockfs-8 sb-zsmalloc-19 sb-bpf-32 sb-pipefs-14 sb-sysfs-26 thp-deferred_split-10 sb-btrfs:vda2-24 sb-proc-25 sb-tmpfs-1 thp-zero-9 sb-cgroup2-30 sb-proc-39 sb-tmpfs-27 xfs-buf:vda1-37 sb-configfs-23 sb-proc-41 sb-tmpfs-29 xfs-inodegc:vda1-38 sb-dax-11 sb-proc-45 sb-tmpfs-35 sb-debugfs-7 sb-proc-46 sb-tmpfs-40 [roman.gushchin@linux.dev: fix build warnings] Link: https://lkml.kernel.org/r/Yr+ZTnLb9lJk6fJO@castle Reported-by: kernel test robot <lkp@intel.com> Link: https://lkml.kernel.org/r/20220601032227.4076670-4-roman.gushchin@linux.dev Signed-off-by: Roman Gushchin <roman.gushchin@linux.dev> Cc: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Cc: Dave Chinner <dchinner@redhat.com> Cc: Hillf Danton <hdanton@sina.com> Cc: Kent Overstreet <kent.overstreet@gmail.com> Cc: Muchun Song <songmuchun@bytedance.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-04-26binder: Use memcpy_{to,from}_page() in binder_alloc_do_buffer_copy()Fabio M. De Francesco
The use of kmap_atomic() is being deprecated in favor of kmap_local_page() where it is feasible. Each call of kmap_atomic() in the kernel creates a non-preemptible section and disable pagefaults. This could be a source of unwanted latency, so kmap_local_page() should be preferred. With kmap_local_page(), the mapping is per thread, CPU local and not globally visible. Furthermore, the mapping can be acquired from any context (including interrupts). binder_alloc_do_buffer_copy() is a function where the use of kmap_local_page() in place of kmap_atomic() is correctly suited. Use kmap_local_page() / kunmap_local() in place of kmap_atomic() / kunmap_atomic() but, instead of open coding the mappings and call memcpy() to and from the virtual addresses of the mapped pages, prefer the use of the memcpy_{to,from}_page() wrappers (as suggested by Christophe Jaillet). Cc: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Acked-by: Todd Kjos <tkjos@google.com> Signed-off-by: Fabio M. De Francesco <fmdefrancesco@gmail.com> Link: https://lore.kernel.org/r/20220425175754.8180-4-fmdefrancesco@gmail.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-04-26binder: Use kmap_local_page() in binder_alloc_copy_user_to_buffer()Fabio M. De Francesco
The use of kmap() is being deprecated in favor of kmap_local_page() where it is feasible. With kmap_local_page(), the mapping is per thread, CPU local and not globally visible. binder_alloc_copy_user_to_buffer() is a function where the use of kmap_local_page() in place of kmap() is correctly suited because the mapping is local to the thread. Therefore, use kmap_local_page() / kunmap_local(). Cc: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Acked-by: Todd Kjos <tkjos@google.com> Signed-off-by: Fabio M. De Francesco <fmdefrancesco@gmail.com> Link: https://lore.kernel.org/r/20220425175754.8180-3-fmdefrancesco@gmail.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-04-26binder: Use memset_page() in binder_alloc_clear_buf()Fabio M. De Francesco
The use of kmap() is being deprecated in favor of kmap_local_page() where it is feasible. With kmap_local_page(), the mapping is per thread, CPU local and not globally visible. binder_alloc_clear_buf() is a function where the use of kmap_local_page() in place of kmap() is correctly suited because the mapping is local to the thread. Therefore, use kmap_local_page() / kunmap_local() but, instead of open coding these two functions and adding a memset() of the virtual address of the mapping, prefer memset_page(). Cc: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Acked-by: Todd Kjos <tkjos@google.com> Signed-off-by: Fabio M. De Francesco <fmdefrancesco@gmail.com> Link: https://lore.kernel.org/r/20220425175754.8180-2-fmdefrancesco@gmail.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-02-04drivers/android: remove redundant ret variableMinghao Chi
Return value from list_lru_count() directly instead of taking this in another redundant variable. Reported-by: Zeal Robot <zealci@zte.com.cn> Signed-off-by: Minghao Chi <chi.minghao@zte.com.cn> Signed-off-by: CGEL ZTE <cgel.zte@gmail.com> Link: https://lore.kernel.org/r/20220104113500.602158-1-chi.minghao@zte.com.cn Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>