summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2024-10-30Merge branch 'work.fdtable' into vfs.fileChristian Brauner
Bring in the fdtable changes for this cycle. Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-10-30Merge patch series "fs: introduce file_ref_t"Christian Brauner
Christian Brauner <brauner@kernel.org> says: As atomic_inc_not_zero() is implemented with a try_cmpxchg() loop it has O(N^2) behaviour under contention with N concurrent operations and it is in a hot path in __fget_files_rcu(). The rcuref infrastructures remedies this problem by using an unconditional increment relying on safe- and dead zones to make this work and requiring rcu protection for the data structure in question. This not just scales better it also introduces overflow protection. However, in contrast to generic rcuref, files require a memory barrier and thus cannot rely on *_relaxed() atomic operations and also require to be built on atomic_long_t as having massive amounts of reference isn't unheard of even if it is just an attack. As suggested by Linus, add a file specific variant instead of making this a generic library. I've been testing this with will-it-scale using a multi-threaded fstat() on the same file descriptor on a machine that Jens gave me access (thank you very much!): processor : 511 vendor_id : AuthenticAMD cpu family : 25 model : 160 model name : AMD EPYC 9754 128-Core Processor and I consistently get a 3-5% improvement on workloads with 256+ and more threads comparing v6.12-rc1 as base with and without these patches applied. * patches from https://lore.kernel.org/r/20241007-brauner-file-rcuref-v2-0-387e24dc9163@kernel.org: fs: port files to file_ref fs: add file_ref fs: protect backing files with rcu Link: https://lore.kernel.org/r/20241007-brauner-file-rcuref-v2-0-387e24dc9163@kernel.org Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-10-30fs: port files to file_refChristian Brauner
Port files to rely on file_ref reference to improve scaling and gain overflow protection. - We continue to WARN during get_file() in case a file that is already marked dead is revived as get_file() is only valid if the caller already holds a reference to the file. This hasn't changed just the check changes. - The semantics for epoll and ttm's dmabuf usage have changed. Both epoll and ttm synchronize with __fput() to prevent the underlying file from beeing freed. (1) epoll Explaining epoll is straightforward using a simple diagram. Essentially, the mutex of the epoll instance needs to be taken in both __fput() and around epi_fget() preventing the file from being freed while it is polled or preventing the file from being resurrected. CPU1 CPU2 fput(file) -> __fput(file) -> eventpoll_release(file) -> eventpoll_release_file(file) mutex_lock(&ep->mtx) epi_item_poll() -> epi_fget() -> file_ref_get(file) mutex_unlock(&ep->mtx) mutex_lock(&ep->mtx); __ep_remove() mutex_unlock(&ep->mtx); -> kmem_cache_free(file) (2) ttm dmabuf This explanation is a bit more involved. A regular dmabuf file stashed the dmabuf in file->private_data and the file in dmabuf->file: file->private_data = dmabuf; dmabuf->file = file; The generic release method of a dmabuf file handles file specific things: f_op->release::dma_buf_file_release() while the generic dentry release method of a dmabuf handles dmabuf freeing including driver specific things: dentry->d_release::dma_buf_release() During ttm dmabuf initialization in ttm_object_device_init() the ttm driver copies the provided struct dma_buf_ops into a private location: struct ttm_object_device { spinlock_t object_lock; struct dma_buf_ops ops; void (*dmabuf_release)(struct dma_buf *dma_buf); struct idr idr; }; ttm_object_device_init(const struct dma_buf_ops *ops) { // copy original dma_buf_ops in private location tdev->ops = *ops; // stash the release method of the original struct dma_buf_ops tdev->dmabuf_release = tdev->ops.release; // override the release method in the copy of the struct dma_buf_ops // with ttm's own dmabuf release method tdev->ops.release = ttm_prime_dmabuf_release; } When a new dmabuf is created the struct dma_buf_ops with the overriden release method set to ttm_prime_dmabuf_release is passed in exp_info.ops: DEFINE_DMA_BUF_EXPORT_INFO(exp_info); exp_info.ops = &tdev->ops; exp_info.size = prime->size; exp_info.flags = flags; exp_info.priv = prime; The call to dma_buf_export() then sets mutex_lock_interruptible(&prime->mutex); dma_buf = dma_buf_export(&exp_info) { dmabuf->ops = exp_info->ops; } mutex_unlock(&prime->mutex); which creates a new dmabuf file and then install a file descriptor to it in the callers file descriptor table: ret = dma_buf_fd(dma_buf, flags); When that dmabuf file is closed we now get: fput(file) -> __fput(file) -> f_op->release::dma_buf_file_release() -> dput() -> d_op->d_release::dma_buf_release() -> dmabuf->ops->release::ttm_prime_dmabuf_release() mutex_lock(&prime->mutex); if (prime->dma_buf == dma_buf) prime->dma_buf = NULL; mutex_unlock(&prime->mutex); Where we can see that prime->dma_buf is set to NULL. So when we have the following diagram: CPU1 CPU2 fput(file) -> __fput(file) -> f_op->release::dma_buf_file_release() -> dput() -> d_op->d_release::dma_buf_release() -> dmabuf->ops->release::ttm_prime_dmabuf_release() ttm_prime_handle_to_fd() mutex_lock_interruptible(&prime->mutex) dma_buf = prime->dma_buf dma_buf && get_dma_buf_unless_doomed(dma_buf) -> file_ref_get(dma_buf->file) mutex_unlock(&prime->mutex); mutex_lock(&prime->mutex); if (prime->dma_buf == dma_buf) prime->dma_buf = NULL; mutex_unlock(&prime->mutex); -> kmem_cache_free(file) The logic of the mechanism is the same as for epoll: sync with __fput() preventing the file from being freed. Here the synchronization happens through the ttm instance's prime->mutex. Basically, the lifetime of the dma_buf and the file are tighly coupled. Both (1) and (2) used to call atomic_inc_not_zero() to check whether the file has already been marked dead and then refuse to revive it. This is only safe because both (1) and (2) sync with __fput() and thus prevent kmem_cache_free() on the file being called and thus prevent the file from being immediately recycled due to SLAB_TYPESAFE_BY_RCU. Both (1) and (2) have been ported from atomic_inc_not_zero() to file_ref_get(). That means a file that is already in the process of being marked as FILE_REF_DEAD: file_ref_put() cnt = atomic_long_dec_return() -> __file_ref_put(cnt) if (cnt == FIlE_REF_NOREF) atomic_long_try_cmpxchg_release(cnt, FILE_REF_DEAD) can be revived again: CPU1 CPU2 file_ref_put() cnt = atomic_long_dec_return() -> __file_ref_put(cnt) if (cnt == FIlE_REF_NOREF) file_ref_get() // Brings reference back to FILE_REF_ONEREF atomic_long_add_negative() atomic_long_try_cmpxchg_release(cnt, FILE_REF_DEAD) This is fine and inherent to the file_ref_get()/file_ref_put() semantics. For both (1) and (2) this is safe because __fput() is prevented from making progress if file_ref_get() fails due to the aforementioned synchronization mechanisms. Two cases need to be considered that affect both (1) epoll and (2) ttm dmabuf: (i) fput()'s file_ref_put() and marks the file as FILE_REF_NOREF but before that fput() can mark the file as FILE_REF_DEAD someone manages to sneak in a file_ref_get() and brings the refcount back from FILE_REF_NOREF to FILE_REF_ONEREF. In that case the original fput() doesn't call __fput(). For epoll the poll will finish and for ttm dmabuf the file can be used again. For ttm dambuf this is actually an advantage because it avoids immediately allocating a new dmabuf object. CPU1 CPU2 file_ref_put() cnt = atomic_long_dec_return() -> __file_ref_put(cnt) if (cnt == FIlE_REF_NOREF) file_ref_get() // Brings reference back to FILE_REF_ONEREF atomic_long_add_negative() atomic_long_try_cmpxchg_release(cnt, FILE_REF_DEAD) (ii) fput()'s file_ref_put() marks the file FILE_REF_NOREF and also suceeds in actually marking it FILE_REF_DEAD and then calls into __fput() to free the file. When either (1) or (2) call file_ref_get() they fail as atomic_long_add_negative() will return true. At the same time, both (1) and (2) all file_ref_get() under mutexes that __fput() must also acquire preventing kmem_cache_free() from freeing the file. So while this might be treated as a change in semantics for (1) and (2) it really isn't. It if should end up causing issues this can be fixed by adding a helper that does something like: long cnt = atomic_long_read(&ref->refcnt); do { if (cnt < 0) return false; } while (!atomic_long_try_cmpxchg(&ref->refcnt, &cnt, cnt + 1)); return true; which would block FILE_REF_NOREF to FILE_REF_ONEREF transitions. - Jann correctly pointed out that kmem_cache_zalloc() cannot be used anymore once files have been ported to file_ref_t. The kmem_cache_zalloc() call will memset() the whole struct file to zero when it is reallocated. This will also set file->f_ref to zero which mens that a concurrent file_ref_get() can return true: CPU1 CPU2 __get_file_rcu() rcu_dereference_raw() close() [frees file] alloc_empty_file() kmem_cache_zalloc() [reallocates same file] memset(..., 0, ...) file_ref_get() [increments 0->1, returns true] init_file() file_ref_init(..., 1) [sets to 0] rcu_dereference_raw() fput() file_ref_put() [decrements 0->FILE_REF_NOREF, frees file] [UAF] causing a concurrent __get_file_rcu() call to acquire a reference to the file that is about to be reallocated and immediately freeing it on realizing that it has been recycled. This causes a UAF for the task that reallocated/recycled the file. This is prevented by switching from kmem_cache_zalloc() to kmem_cache_alloc() and initializing the fields manually. With file->f_ref initialized last. Note that a memset() also isn't guaranteed to atomically update an unsigned long so it's theoretically possible to see torn and therefore bogus counter values. Link: https://lore.kernel.org/r/20241007-brauner-file-rcuref-v2-3-387e24dc9163@kernel.org Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-10-19fs: add file_refChristian Brauner
As atomic_inc_not_zero() is implemented with a try_cmpxchg() loop it has O(N^2) behaviour under contention with N concurrent operations and it is in a hot path in __fget_files_rcu(). The rcuref infrastructures remedies this problem by using an unconditional increment relying on safe- and dead zones to make this work and requiring rcu protection for the data structure in question. This not just scales better it also introduces overflow protection. However, in contrast to generic rcuref, files require a memory barrier and thus cannot rely on *_relaxed() atomic operations and also require to be built on atomic_long_t as having massive amounts of reference isn't unheard of even if it is just an attack. As suggested by Linus, add a file specific variant instead of making this a generic library. Files are SLAB_TYPESAFE_BY_RCU and thus don't have "regular" rcu protection. In short, freeing of files isn't delayed until a grace period has elapsed. Instead, they are freed immediately and thus can be reused (multiple times) within the same grace period. So when picking a file from the file descriptor table via its file descriptor number it is thus possible to see an elevated reference count on file->f_count even though the file has already been recycled possibly multiple times by another task. To guard against this the vfs will pick the file from the file descriptor table twice. Once before the refcount increment and once after to compare the pointers (grossly simplified). If they match then the file is still valid. If not the caller needs to fput() it. The unconditional increment makes the following race possible as illustrated by rcuref: > Deconstruction race > =================== > > The release operation must be protected by prohibiting a grace period in > order to prevent a possible use after free: > > T1 T2 > put() get() > // ref->refcnt = ONEREF > if (!atomic_add_negative(-1, &ref->refcnt)) > return false; <- Not taken > > // ref->refcnt == NOREF > --> preemption > // Elevates ref->refcnt to ONEREF > if (!atomic_add_negative(1, &ref->refcnt)) > return true; <- taken > > if (put(&p->ref)) { <-- Succeeds > remove_pointer(p); > kfree_rcu(p, rcu); > } > > RCU grace period ends, object is freed > > atomic_cmpxchg(&ref->refcnt, NOREF, DEAD); <- UAF > > [...] it prevents the grace period which keeps the object alive until > all put() operations complete. Having files by SLAB_TYPESAFE_BY_RCU shouldn't cause any problems for this deconstruction race. Afaict, the only interesting case would be someone freeing the file and someone immediately recycling it within the same grace period and reinitializing file->f_count to ONEREF while a concurrent fput() is doing atomic_cmpxchg(&ref->refcnt, NOREF, DEAD) as in the race above. But this is safe from SLAB_TYPESAFE_BY_RCU's perspective and it should be safe from rcuref's perspective. T1 T2 T3 fput() fget() // f_count->refcnt = ONEREF if (!atomic_add_negative(-1, &f_count->refcnt)) return false; <- Not taken // f_count->refcnt == NOREF --> preemption // Elevates f_count->refcnt to ONEREF if (!atomic_add_negative(1, &f_count->refcnt)) return true; <- taken if (put(&f_count)) { <-- Succeeds remove_pointer(p); /* * Cache is SLAB_TYPESAFE_BY_RCU * so this is freed without a grace period. */ kmem_cache_free(p); } kmem_cache_alloc() init_file() { // Sets f_count->refcnt to ONEREF rcuref_long_init(&f->f_count, 1); } Object has been reused within the same grace period via kmem_cache_alloc()'s SLAB_TYPESAFE_BY_RCU. /* * With SLAB_TYPESAFE_BY_RCU this would be a safe UAF access and * it would work correctly because the atomic_cmpxchg() * will fail because the refcount has been reset to ONEREF by T3. */ atomic_cmpxchg(&ref->refcnt, NOREF, DEAD); <- UAF However, there are other cases to consider: (1) Benign race due to multiple atomic_long_read() CPU1 CPU2 file_ref_put() // last reference // => count goes negative/FILE_REF_NOREF atomic_long_add_negative_release(-1, &ref->refcnt) -> __file_ref_put() file_ref_get() // goes back from negative/FILE_REF_NOREF to 0 // and file_ref_get() succeeds atomic_long_add_negative(1, &ref->refcnt) // This is immediately followed by file_ref_put() // managing to set FILE_REF_DEAD file_ref_put() // __file_ref_put() continues and sees // cnt > FILE_REF_RELEASED // and splats with // "imbalanced put on file reference count" cnt = atomic_long_read(&ref->refcnt); The race however is benign and the problem is the atomic_long_read(). Instead of performing a separate read this uses atomic_long_dec_return() and pass the value to __file_ref_put(). Thanks to Linus for pointing out that braino. (2) SLAB_TYPESAFE_BY_RCU may cause recycled files to be marked dead When a file is recycled the following race exists: CPU1 CPU2 // @file is already dead and thus // cnt >= FILE_REF_RELEASED. file_ref_get(file) atomic_long_add_negative(1, &ref->refcnt) // We thus call into __file_ref_get() -> __file_ref_get() // which sees cnt >= FILE_REF_RELEASED cnt = atomic_long_read(&ref->refcnt); // In the meantime @file gets freed kmem_cache_free() // and is immediately recycled file = kmem_cache_zalloc() // and the reference count is reinitialized // and the file alive again in someone // else's file descriptor table file_ref_init(&ref->refcnt, 1); // the __file_ref_get() slowpath now continues // and as it saw earlier that cnt >= FILE_REF_RELEASED // it wants to ensure that we're staying in the middle // of the deadzone and unconditionally sets // FILE_REF_DEAD. // This marks @file dead for CPU2... atomic_long_set(&ref->refcnt, FILE_REF_DEAD); // Caller issues a close() system call to close @file close(fd) file = file_close_fd_locked() filp_flush() // The caller sees that cnt >= FILE_REF_RELEASED // and warns the first time... CHECK_DATA_CORRUPTION(file_count(file) == 0) // and then splats a second time because // __file_ref_put() sees cnt >= FILE_REF_RELEASED file_ref_put(&ref->refcnt); -> __file_ref_put() My initial inclination was to replace the unconditional atomic_long_set() with an atomic_long_try_cmpxchg() but Linus pointed out that: > I think we should just make file_ref_get() do a simple > > return !atomic_long_add_negative(1, &ref->refcnt)); > > and nothing else. Yes, multiple CPU's can race, and you can increment > more than once, but the gap - even on 32-bit - between DEAD and > becoming close to REF_RELEASED is so big that we simply don't care. > That's the point of having a gap. I've been testing this with will-it-scale using fstat() on a machine that Jens gave me access (thank you very much!): processor : 511 vendor_id : AuthenticAMD cpu family : 25 model : 160 model name : AMD EPYC 9754 128-Core Processor and I consistently get a 3-5% improvement on 256+ threads. Reported-by: kernel test robot <oliver.sang@intel.com> Closes: https://lore.kernel.org/oe-lkp/202410151043.5d224a27-oliver.sang@intel.com Closes: https://lore.kernel.org/all/202410151611.f4cd71f2-oliver.sang@intel.com Link: https://lore.kernel.org/r/20241007-brauner-file-rcuref-v2-2-387e24dc9163@kernel.org Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-10-09expand_files(): simplify calling conventionsAl Viro
All callers treat 0 and 1 returned by expand_files() in the same way now since the call in alloc_fd() had been made conditional. Just make it return 0 on success and be done with it... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2024-10-09make __set_open_fd() set cloexec state as wellAl Viro
->close_on_exec[] state is maintained only for opened descriptors; as the result, anything that marks a descriptor opened has to set its cloexec state explicitly. As the result, all calls of __set_open_fd() are followed by __set_close_on_exec(); might as well fold it into __set_open_fd() so that cloexec state is defined as soon as the descriptor is marked opened. [braino fix folded] Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2024-10-08fs: protect backing files with rcuChristian Brauner
Currently backing files are not under any form of rcu protection. Switching to file_ref requires rcu protection and so does the speculative vma lookup. Switch backing files to the same rcu slab type as regular files. There should be no additional magic required as the lifetime of a backing file is always tied to a regular file. Link: https://lore.kernel.org/r/20241007-brauner-file-rcuref-v2-1-387e24dc9163@kernel.org Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-10-07file.c: merge __{set,clear}_close_on_exec()Al Viro
they are always go in pairs; seeing that they are inlined, might as well make that a single inline function taking a boolean argument ("do we want close_on_exec set for that descriptor") Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2024-10-07alloc_fdtable(): change calling conventions.Al Viro
First of all, tell it how many slots do we want, not which slot is wanted. It makes one caller (dup_fd()) more straightforward and doesn't harm another (expand_fdtable()). Furthermore, make it return ERR_PTR() on failure rather than returning NULL. Simplifies the callers. Simplify the size calculation, while we are at it - note that we always have slots_wanted greater than BITS_PER_LONG. What the rules boil down to is * use the smallest power of two large enough to give us that many slots * on 32bit skip 64 and 128 - the minimal capacity we want there is 256 slots (i.e. 1Kb fd array). * on 64bit don't skip anything, the minimal capacity is 128 - and we'll never be asked for 64 or less. 128 slots means 1Kb fd array, again. * on 128bit, if that ever happens, don't skip anything - we'll never be asked for 128 or less, so the fd array allocation will be at least 2Kb. Reviewed-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2024-10-07fs/file.c: add fast path in find_next_fd()Yu Ma
Skip 2-levels searching via find_next_zero_bit() when there is free slot in the word contains next_fd, as: (1) next_fd indicates the lower bound for the first free fd. (2) There is fast path inside of find_next_zero_bit() when size<=64 to speed up searching. (3) After fdt is expanded (the bitmap size doubled for each time of expansion), it would never be shrunk. The search size increases but there are few open fds available here. This fast path is proposed by Mateusz Guzik <mjguzik@gmail.com>, and agreed by Jan Kara <jack@suse.cz>, which is more generic and scalable than previous versions. And on top of patch 1 and 2, it improves pts/blogbench-1.1.0 read by 8% and write by 4% on Intel ICX 160 cores configuration with v6.10-rc7. Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Tim Chen <tim.c.chen@linux.intel.com> Signed-off-by: Yu Ma <yu.ma@intel.com> Link: https://lore.kernel.org/r/20240717145018.3972922-4-yu.ma@intel.com Signed-off-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2024-10-07fs/file.c: conditionally clear full_fdsYu Ma
64 bits in open_fds are mapped to a common bit in full_fds_bits. It is very likely that a bit in full_fds_bits has been cleared before in __clear_open_fds()'s operation. Check the clear bit in full_fds_bits before clearing to avoid unnecessary write and cache bouncing. See commit fc90888d07b8 ("vfs: conditionally clear close-on-exec flag") for a similar optimization. take stock kernel with patch 1 as baseline, it improves pts/blogbench-1.1.0 read for 13%, and write for 5% on Intel ICX 160 cores configuration with v6.10-rc7. Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Tim Chen <tim.c.chen@linux.intel.com> Signed-off-by: Yu Ma <yu.ma@intel.com> Link: https://lore.kernel.org/r/20240717145018.3972922-3-yu.ma@intel.com Signed-off-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2024-10-07fs/file.c: remove sanity_check and add likely/unlikely in alloc_fd()Yu Ma
alloc_fd() has a sanity check inside to make sure the struct file mapping to the allocated fd is NULL. Remove this sanity check since it can be assured by exisitng zero initilization and NULL set when recycling fd. Meanwhile, add likely/unlikely and expand_file() call avoidance to reduce the work under file_lock. Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Tim Chen <tim.c.chen@linux.intel.com> Signed-off-by: Yu Ma <yu.ma@intel.com> Link: https://lore.kernel.org/r/20240717145018.3972922-2-yu.ma@intel.com Signed-off-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2024-10-07move close_range(2) into fs/file.c, fold __close_range() into itAl Viro
We never had callers for __close_range() except for close_range(2) itself. Nothing of that sort has appeared in four years and if any users do show up, we can always separate those suckers again. Reviewed-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2024-10-07close_files(): don't bother with xchg()Al Viro
At that point nobody else has references to the victim files_struct; as the matter of fact, the caller will free it immediately after close_files() returns, with no RCU delays or anything of that sort. That's why we are not protecting against fdtable reallocation on expansion, not cleaning the bitmaps, etc. There's no point zeroing the pointers in ->fd[] either, let alone make that an atomic operation. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2024-10-07remove pointless includes of <linux/fdtable.h>Al Viro
some of those used to be needed, some had been cargo-culted for no reason... Reviewed-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2024-10-07get rid of ...lookup...fdget_rcu() familyAl Viro
Once upon a time, predecessors of those used to do file lookup without bumping a refcount, provided that caller held rcu_read_lock() across the lookup and whatever it wanted to read from the struct file found. When struct file allocation switched to SLAB_TYPESAFE_BY_RCU, that stopped being feasible and these primitives started to bump the file refcount for lookup result, requiring the caller to call fput() afterwards. But that turned them pointless - e.g. rcu_read_lock(); file = lookup_fdget_rcu(fd); rcu_read_unlock(); is equivalent to file = fget_raw(fd); and all callers of lookup_fdget_rcu() are of that form. Similarly, task_lookup_fdget_rcu() calls can be replaced with calling fget_task(). task_lookup_next_fdget_rcu() doesn't have direct counterparts, but its callers would be happier if we replaced it with an analogue that deals with RCU internally. Reviewed-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2024-10-06Linux 6.12-rc2v6.12-rc2Linus Torvalds
2024-10-06Merge tag 'kbuild-fixes-v6.12' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild Pull Kbuild fixes from Masahiro Yamada: - Move non-boot built-in DTBs to the .rodata section - Fix Kconfig bugs - Fix maint scripts in the linux-image Debian package - Import some list macros to scripts/include/ * tag 'kbuild-fixes-v6.12' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild: kbuild: deb-pkg: Remove blank first line from maint scripts kbuild: fix a typo dt_binding_schema -> dt_binding_schemas scripts: import more list macros kconfig: qconf: fix buffer overflow in debug links kconfig: qconf: move conf_read() before drawing tree pain kconfig: clear expr::val_is_valid when allocated kconfig: fix infinite loop in sym_calc_choice() kbuild: move non-boot built-in DTBs to .rodata section
2024-10-06Merge tag 'platform-drivers-x86-v6.12-2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86 Pull x86 platform driver fixes from Hans de Goede: - Intel PMC fix for suspend/resume issues on some Sky and Kaby Lake laptops - Intel Diamond Rapids hw-id additions - Documentation and MAINTAINERS fixes - Some other small fixes * tag 'platform-drivers-x86-v6.12-2' of git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86: platform/x86: x86-android-tablets: Fix use after free on platform_device_register() errors platform/x86: wmi: Update WMI driver API documentation platform/x86: dell-ddv: Fix typo in documentation platform/x86: dell-sysman: add support for alienware products platform/x86/intel: power-domains: Add Diamond Rapids support platform/x86: ISST: Add Diamond Rapids to support list platform/x86:intel/pmc: Disable ACPI PM Timer disabling on Sky and Kaby Lake platform/x86: dell-laptop: Do not fail when encountering unsupported batteries MAINTAINERS: Update Intel In Field Scan(IFS) entry platform/x86: ISST: Fix the KASAN report slab-out-of-bounds bug
2024-10-06Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds
Pull kvm fixes from Paolo Bonzini: "ARM64: - Fix pKVM error path on init, making sure we do not change critical system registers as we're about to fail - Make sure that the host's vector length is at capped by a value common to all CPUs - Fix kvm_has_feat*() handling of "negative" features, as the current code is pretty broken - Promote Joey to the status of official reviewer, while James steps down -- hopefully only temporarly x86: - Fix compilation with KVM_INTEL=KVM_AMD=n - Fix disabling KVM_X86_QUIRK_SLOT_ZAP_ALL when shadow MMU is in use Selftests: - Fix compilation on non-x86 architectures" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: x86/reboot: emergency callbacks are now registered by common KVM code KVM: x86: leave kvm.ko out of the build if no vendor module is requested KVM: x86/mmu: fix KVM_X86_QUIRK_SLOT_ZAP_ALL for shadow MMU KVM: arm64: Fix kvm_has_feat*() handling of negative features KVM: selftests: Fix build on architectures other than x86_64 KVM: arm64: Another reviewer reshuffle KVM: arm64: Constrain the host to the maximum shared SVE VL with pKVM KVM: arm64: Fix __pkvm_init_vcpu cptr_el2 error path
2024-10-06Merge tag 'powerpc-6.12-3' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux Pull powerpc fix from Michael Ellerman: - Allow r30 to be used in vDSO code generation of getrandom Thanks to Jason A. Donenfeld * tag 'powerpc-6.12-3' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: powerpc/vdso: allow r30 in vDSO code generation of getrandom
2024-10-07kbuild: deb-pkg: Remove blank first line from maint scriptsAaron Thompson
The blank line causes execve() to fail: # strace ./postinst execve("./postinst", ...) = -1 ENOEXEC (Exec format error) strace: exec: Exec format error +++ exited with 1 +++ However running the scripts via shell does work (at least with bash) because the shell attempts to execute the file as a shell script when execve() fails. Fixes: b611daae5efc ("kbuild: deb-pkg: split image and debug objects staging out into functions") Signed-off-by: Aaron Thompson <dev@aaront.org> Reviewed-by: Nathan Chancellor <nathan@kernel.org> Reviewed-by: Nicolas Schier <nicolas@fjasle.eu> Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
2024-10-07kbuild: fix a typo dt_binding_schema -> dt_binding_schemasXu Yang
If we follow "make help" to "make dt_binding_schema", we will see below error: $ make dt_binding_schema make[1]: *** No rule to make target 'dt_binding_schema'. Stop. make: *** [Makefile:224: __sub-make] Error 2 It should be a typo. So this will fix it. Fixes: 604a57ba9781 ("dt-bindings: kbuild: Add separate target/dependency for processed-schema.json") Signed-off-by: Xu Yang <xu.yang_2@nxp.com> Reviewed-by: Nicolas Schier <n.schier@avm.de> Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
2024-10-07scripts: import more list macrosSami Tolvanen
Import list_is_first, list_is_last, list_replace, and list_replace_init. Signed-off-by: Sami Tolvanen <samitolvanen@google.com> Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
2024-10-06platform/x86: x86-android-tablets: Fix use after free on ↵Hans de Goede
platform_device_register() errors x86_android_tablet_remove() frees the pdevs[] array, so it should not be used after calling x86_android_tablet_remove(). When platform_device_register() fails, store the pdevs[x] PTR_ERR() value into the local ret variable before calling x86_android_tablet_remove() to avoid using pdevs[] after it has been freed. Fixes: 5eba0141206e ("platform/x86: x86-android-tablets: Add support for instantiating platform-devs") Fixes: e2200d3f26da ("platform/x86: x86-android-tablets: Add gpio_keys support to x86_android_tablet_init()") Cc: stable@vger.kernel.org Reported-by: Aleksandr Burakov <a.burakov@rosalinux.ru> Closes: https://lore.kernel.org/platform-driver-x86/20240917120458.7300-1-a.burakov@rosalinux.ru/ Signed-off-by: Hans de Goede <hdegoede@redhat.com> Link: https://lore.kernel.org/r/20241005130545.64136-1-hdegoede@redhat.com
2024-10-06platform/x86: wmi: Update WMI driver API documentationArmin Wolf
The WMI driver core now passes the WMI event data to legacy notify handlers, so WMI devices sharing notification IDs are now being handled properly. Fixes: e04e2b760ddb ("platform/x86: wmi: Pass event data directly to legacy notify handlers") Signed-off-by: Armin Wolf <W_Armin@gmx.de> Link: https://lore.kernel.org/r/20241005213825.701887-1-W_Armin@gmx.de Signed-off-by: Hans de Goede <hdegoede@redhat.com>
2024-10-06platform/x86: dell-ddv: Fix typo in documentationAnaswara T Rajan
Fix typo in word 'diagnostics' in documentation. Signed-off-by: Anaswara T Rajan <anaswaratrajan@gmail.com> Reviewed-by: Armin Wolf <W_Armin@gmx.de> Link: https://lore.kernel.org/r/20241005070056.16326-1-anaswaratrajan@gmail.com Signed-off-by: Hans de Goede <hdegoede@redhat.com>
2024-10-06platform/x86: dell-sysman: add support for alienware productsCrag Wang
Alienware supports firmware-attributes and has its own OEM string. Signed-off-by: Crag Wang <crag_wang@dell.com> Link: https://lore.kernel.org/r/20241004152826.93992-1-crag_wang@dell.com Reviewed-by: Hans de Goede <hdegoede@redhat.com> Signed-off-by: Hans de Goede <hdegoede@redhat.com>
2024-10-06platform/x86/intel: power-domains: Add Diamond Rapids supportSrinivas Pandruvada
Add Diamond Rapids (INTEL_PANTHERCOVE_X) to tpmi_cpu_ids to support domaid id mappings. Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> Link: https://lore.kernel.org/r/20241003215554.3013807-3-srinivas.pandruvada@linux.intel.com Signed-off-by: Hans de Goede <hdegoede@redhat.com>
2024-10-06platform/x86: ISST: Add Diamond Rapids to support listSrinivas Pandruvada
Add Diamond Rapids (INTEL_PANTHERCOVE_X) to SST support list by adding to isst_cpu_ids. Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> Link: https://lore.kernel.org/r/20241003215554.3013807-2-srinivas.pandruvada@linux.intel.com Signed-off-by: Hans de Goede <hdegoede@redhat.com>
2024-10-06platform/x86:intel/pmc: Disable ACPI PM Timer disabling on Sky and Kaby LakeHans de Goede
There have been multiple reports that the ACPI PM Timer disabling is causing Sky and Kaby Lake systems to hang on all suspend (s2idle, s3, hibernate) methods. Remove the acpi_pm_tmr_ctl_offset and acpi_pm_tmr_disable_bit settings from spt_reg_map to disable the ACPI PM Timer disabling on Sky and Kaby Lake to fix the hang on suspend. Fixes: e86c8186d03a ("platform/x86:intel/pmc: Enable the ACPI PM Timer to be turned off when suspended") Reported-by: Paul Menzel <pmenzel@molgen.mpg.de> Closes: https://lore.kernel.org/linux-pm/18784f62-91ff-4d88-9621-6c88eb0af2b5@molgen.mpg.de/ Reported-by: Todd Brandt <todd.e.brandt@intel.com> Closes: https://bugzilla.kernel.org/show_bug.cgi?id=219346 Cc: Marek Maslanka <mmaslanka@google.com> Signed-off-by: Hans de Goede <hdegoede@redhat.com> Tested-by: Todd Brandt <todd.e.brandt@intel.com> Tested-by: Paul Menzel <pmenzel@molgen.mpg.de> # Dell XPS 13 9360/0596KF Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Link: https://lore.kernel.org/r/20241003202614.17181-2-hdegoede@redhat.com
2024-10-06platform/x86: dell-laptop: Do not fail when encountering unsupported batteriesArmin Wolf
If the battery hook encounters a unsupported battery, it will return an error. This in turn will cause the battery driver to automatically unregister the battery hook. On machines with multiple batteries however, this will prevent the battery hook from handling the primary battery, since it will always get unregistered upon encountering one of the unsupported batteries. Fix this by simply ignoring unsupported batteries. Reviewed-by: Pali Rohár <pali@kernel.org> Fixes: ab58016c68cc ("platform/x86:dell-laptop: Add knobs to change battery charge settings") Signed-off-by: Armin Wolf <W_Armin@gmx.de> Link: https://lore.kernel.org/r/20241001212835.341788-4-W_Armin@gmx.de Reviewed-by: Hans de Goede <hdegoede@redhat.com> Signed-off-by: Hans de Goede <hdegoede@redhat.com>
2024-10-06MAINTAINERS: Update Intel In Field Scan(IFS) entryJithu Joseph
Ashok is no longer with Intel and his e-mail address will start bouncing soon. Update his email address to the new one he provided to ensure correct contact details in the MAINTAINERS file. Signed-off-by: Jithu Joseph <jithu.joseph@intel.com> Link: https://lore.kernel.org/r/20241001170808.203970-1-jithu.joseph@intel.com Signed-off-by: Hans de Goede <hdegoede@redhat.com>
2024-10-06Merge tag 'kvmarm-fixes-6.12-1' of ↵Paolo Bonzini
git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD KVM/arm64 fixes for 6.12, take #1 - Fix pKVM error path on init, making sure we do not change critical system registers as we're about to fail - Make sure that the host's vector length is at capped by a value common to all CPUs - Fix kvm_has_feat*() handling of "negative" features, as the current code is pretty broken - Promote Joey to the status of official reviewer, while James steps down -- hopefully only temporarly
2024-10-06x86/reboot: emergency callbacks are now registered by common KVM codePaolo Bonzini
Guard them with CONFIG_KVM_X86_COMMON rather than the two vendor modules. In practice this has no functional change, because CONFIG_KVM_X86_COMMON is set if and only if at least one vendor-specific module is being built. However, it is cleaner to specify CONFIG_KVM_X86_COMMON for functions that are used in kvm.ko. Reported-by: Linus Torvalds <torvalds@linux-foundation.org> Fixes: 590b09b1d88e ("KVM: x86: Register "emergency disable" callbacks when virt is enabled") Fixes: 6d55a94222db ("x86/reboot: Unconditionally define cpu_emergency_virt_cb typedef") Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2024-10-06KVM: x86: leave kvm.ko out of the build if no vendor module is requestedPaolo Bonzini
kvm.ko is nothing but library code shared by kvm-intel.ko and kvm-amd.ko. It provides no functionality on its own and it is unnecessary unless one of the vendor-specific module is compiled. In particular, /dev/kvm is not created until one of kvm-intel.ko or kvm-amd.ko is loaded. Use CONFIG_KVM to decide if it is built-in or a module, but use the vendor-specific modules for the actual decision on whether to build it. This also fixes a build failure when CONFIG_KVM_INTEL and CONFIG_KVM_AMD are both disabled. The cpu_emergency_register_virt_callback() function is called from kvm.ko, but it is only defined if at least one of CONFIG_KVM_INTEL and CONFIG_KVM_AMD is provided. Fixes: 590b09b1d88e ("KVM: x86: Register "emergency disable" callbacks when virt is enabled") Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2024-10-05Merge tag 'bcachefs-2024-10-05' of git://evilpiepirate.org/bcachefsLinus Torvalds
Pull bcachefs fixes from Kent Overstreet: "A lot of little fixes, bigger ones include: - bcachefs's __wait_on_freeing_inode() was broken in rc1 due to vfs changes, now fixed along with another lost wakeup - fragmentation LRU fixes; fsck now repairs successfully (this is the data structure copygc uses); along with some nice simplification. - Rework logged op error handling, so that if logged op replay errors (due to another filesystem error) we delete the logged op instead of going into an infinite loop) - Various small filesystem connectivitity repair fixes" * tag 'bcachefs-2024-10-05' of git://evilpiepirate.org/bcachefs: bcachefs: Rework logged op error handling bcachefs: Add warn param to subvol_get_snapshot, peek_inode bcachefs: Kill snapshot arg to fsck_write_inode() bcachefs: Check for unlinked, non-empty dirs in check_inode() bcachefs: Check for unlinked inodes with dirents bcachefs: Check for directories with no backpointers bcachefs: Kill alloc_v4.fragmentation_lru bcachefs: minor lru fsck fixes bcachefs: Mark more errors AUTOFIX bcachefs: Make sure we print error that causes fsck to bail out bcachefs: bkey errors are only AUTOFIX during read bcachefs: Create lost+found in correct snapshot bcachefs: Fix reattach_inode() bcachefs: Add missing wakeup to bch2_inode_hash_remove() bcachefs: Fix trans_commit disk accounting revert bcachefs: Fix bch2_inode_is_open() check bcachefs: Fix return type of dirent_points_to_inode_nowarn() bcachefs: Fix bad shift in bch2_read_flag_list()
2024-10-05Merge tag 'for-linus-6.12a-rc2-tag' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip Pull xen fix from Juergen Gross: "Fix Xen config issue introduced in the merge window" * tag 'for-linus-6.12a-rc2-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip: xen: Fix config option reference in XEN_PRIVCMD definition
2024-10-05Merge tag 'ext4_for_linus-5.12-rc2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4 Pull ext4 fixes from Ted Ts'o: "Fix some ext4 bugs and regressions relating to oneline resize and fast commits" * tag 'ext4_for_linus-5.12-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: ext4: fix off by one issue in alloc_flex_gd() ext4: mark fc as ineligible using an handle in ext4_xattr_set() ext4: use handle to mark fc as ineligible in __track_dentry_update()
2024-10-05Merge tag 'cxl-fixes-6.12-rc2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl Pull cxl fix from Ira Weiny: - Fix calculation for SBDF in error injection * tag 'cxl-fixes-6.12-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl: EINJ, CXL: Fix CXL device SBDF calculation
2024-10-05Merge tag 'i2c-for-6.12-rc2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux Pull i2c fix from Wolfram Sang: - Fix potential deadlock during runtime suspend and resume (stm32f7) * tag 'i2c-for-6.12-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux: i2c: stm32f7: Do not prepare/unprepare clock during runtime suspend/resume
2024-10-05Merge tag 'spi-fix-v6.12-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi Pull spi fixes from Mark Brown: "A small set of driver specific fixes that came in since the merge window, about half of which is fixes for correctness in the use of the runtime PM APIs done as part of a broader cleanup" * tag 'spi-fix-v6.12-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi: spi: s3c64xx: fix timeout counters in flush_fifo spi: atmel-quadspi: Fix wrong register value written to MR spi: spi-cadence: Fix missing spi_controller_is_target() check spi: spi-cadence: Fix pm_runtime_set_suspended() with runtime pm enabled spi: spi-imx: Fix pm_runtime_set_suspended() with runtime pm enabled
2024-10-05Merge tag 'hardening-v6.12-rc2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux Pull hardening fixes from Kees Cook: - gcc plugins: Avoid Kconfig warnings with randstruct (Nathan Chancellor) - MAINTAINERS: Add security/Kconfig.hardening to hardening section (Nathan Chancellor) - MAINTAINERS: Add unsafe_memcpy() to the FORTIFY review list * tag 'hardening-v6.12-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux: MAINTAINERS: Add security/Kconfig.hardening to hardening section hardening: Adjust dependencies in selection of MODVERSIONS MAINTAINERS: Add unsafe_memcpy() to the FORTIFY review list
2024-10-05Merge tag 'lsm-pr-20241004' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/lsm Pull lsm revert from Paul Moore: "Here is the CONFIG_SECURITY_TOMOYO_LKM revert that we've been discussing this week. With near unanimous agreement that the original TOMOYO patches were not the right way to solve the distro problem Tetsuo is trying the solve, reverting is our best option at this time" * tag 'lsm-pr-20241004' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/lsm: tomoyo: revert CONFIG_SECURITY_TOMOYO_LKM support
2024-10-05platform/x86: ISST: Fix the KASAN report slab-out-of-bounds bugZach Wade
Attaching SST PCI device to VM causes "BUG: KASAN: slab-out-of-bounds". kasan report: [ 19.411889] ================================================================== [ 19.413702] BUG: KASAN: slab-out-of-bounds in _isst_if_get_pci_dev+0x3d5/0x400 [isst_if_common] [ 19.415634] Read of size 8 at addr ffff888829e65200 by task cpuhp/16/113 [ 19.417368] [ 19.418627] CPU: 16 PID: 113 Comm: cpuhp/16 Tainted: G E 6.9.0 #10 [ 19.420435] Hardware name: VMware, Inc. VMware20,1/440BX Desktop Reference Platform, BIOS VMW201.00V.20192059.B64.2207280713 07/28/2022 [ 19.422687] Call Trace: [ 19.424091] <TASK> [ 19.425448] dump_stack_lvl+0x5d/0x80 [ 19.426963] ? _isst_if_get_pci_dev+0x3d5/0x400 [isst_if_common] [ 19.428694] print_report+0x19d/0x52e [ 19.430206] ? __pfx__raw_spin_lock_irqsave+0x10/0x10 [ 19.431837] ? _isst_if_get_pci_dev+0x3d5/0x400 [isst_if_common] [ 19.433539] kasan_report+0xf0/0x170 [ 19.435019] ? _isst_if_get_pci_dev+0x3d5/0x400 [isst_if_common] [ 19.436709] _isst_if_get_pci_dev+0x3d5/0x400 [isst_if_common] [ 19.438379] ? __pfx_sched_clock_cpu+0x10/0x10 [ 19.439910] isst_if_cpu_online+0x406/0x58f [isst_if_common] [ 19.441573] ? __pfx_isst_if_cpu_online+0x10/0x10 [isst_if_common] [ 19.443263] ? ttwu_queue_wakelist+0x2c1/0x360 [ 19.444797] cpuhp_invoke_callback+0x221/0xec0 [ 19.446337] cpuhp_thread_fun+0x21b/0x610 [ 19.447814] ? __pfx_cpuhp_thread_fun+0x10/0x10 [ 19.449354] smpboot_thread_fn+0x2e7/0x6e0 [ 19.450859] ? __pfx_smpboot_thread_fn+0x10/0x10 [ 19.452405] kthread+0x29c/0x350 [ 19.453817] ? __pfx_kthread+0x10/0x10 [ 19.455253] ret_from_fork+0x31/0x70 [ 19.456685] ? __pfx_kthread+0x10/0x10 [ 19.458114] ret_from_fork_asm+0x1a/0x30 [ 19.459573] </TASK> [ 19.460853] [ 19.462055] Allocated by task 1198: [ 19.463410] kasan_save_stack+0x30/0x50 [ 19.464788] kasan_save_track+0x14/0x30 [ 19.466139] __kasan_kmalloc+0xaa/0xb0 [ 19.467465] __kmalloc+0x1cd/0x470 [ 19.468748] isst_if_cdev_register+0x1da/0x350 [isst_if_common] [ 19.470233] isst_if_mbox_init+0x108/0xff0 [isst_if_mbox_msr] [ 19.471670] do_one_initcall+0xa4/0x380 [ 19.472903] do_init_module+0x238/0x760 [ 19.474105] load_module+0x5239/0x6f00 [ 19.475285] init_module_from_file+0xd1/0x130 [ 19.476506] idempotent_init_module+0x23b/0x650 [ 19.477725] __x64_sys_finit_module+0xbe/0x130 [ 19.476506] idempotent_init_module+0x23b/0x650 [ 19.477725] __x64_sys_finit_module+0xbe/0x130 [ 19.478920] do_syscall_64+0x82/0x160 [ 19.480036] entry_SYSCALL_64_after_hwframe+0x76/0x7e [ 19.481292] [ 19.482205] The buggy address belongs to the object at ffff888829e65000 which belongs to the cache kmalloc-512 of size 512 [ 19.484818] The buggy address is located 0 bytes to the right of allocated 512-byte region [ffff888829e65000, ffff888829e65200) [ 19.487447] [ 19.488328] The buggy address belongs to the physical page: [ 19.489569] page: refcount:1 mapcount:0 mapping:0000000000000000 index:0xffff888829e60c00 pfn:0x829e60 [ 19.491140] head: order:3 entire_mapcount:0 nr_pages_mapped:0 pincount:0 [ 19.492466] anon flags: 0x57ffffc0000840(slab|head|node=1|zone=2|lastcpupid=0x1fffff) [ 19.493914] page_type: 0xffffffff() [ 19.494988] raw: 0057ffffc0000840 ffff88810004cc80 0000000000000000 0000000000000001 [ 19.496451] raw: ffff888829e60c00 0000000080200018 00000001ffffffff 0000000000000000 [ 19.497906] head: 0057ffffc0000840 ffff88810004cc80 0000000000000000 0000000000000001 [ 19.499379] head: ffff888829e60c00 0000000080200018 00000001ffffffff 0000000000000000 [ 19.500844] head: 0057ffffc0000003 ffffea0020a79801 ffffea0020a79848 00000000ffffffff [ 19.502316] head: 0000000800000000 0000000000000000 00000000ffffffff 0000000000000000 [ 19.503784] page dumped because: kasan: bad access detected [ 19.505058] [ 19.505970] Memory state around the buggy address: [ 19.507172] ffff888829e65100: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 [ 19.508599] ffff888829e65180: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 [ 19.510013] >ffff888829e65200: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc [ 19.510014] ^ [ 19.510016] ffff888829e65280: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc [ 19.510018] ffff888829e65300: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc [ 19.515367] ================================================================== The reason for this error is physical_package_ids assigned by VMware VMM are not continuous and have gaps. This will cause value returned by topology_physical_package_id() to be more than topology_max_packages(). Here the allocation uses topology_max_packages(). The call to topology_max_packages() returns maximum logical package ID not physical ID. Hence use topology_logical_package_id() instead of topology_physical_package_id(). Fixes: 9a1aac8a96dc ("platform/x86: ISST: PUNIT device mapping with Sub-NUMA clustering") Cc: stable@vger.kernel.org Acked-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> Signed-off-by: Zach Wade <zachwade.k@gmail.com> Link: https://lore.kernel.org/r/20240923144508.1764-1-zachwade.k@gmail.com Reviewed-by: Hans de Goede <hdegoede@redhat.com> Signed-off-by: Hans de Goede <hdegoede@redhat.com>
2024-10-04Merge tag 'linux_kselftest-fixes-6.12-rc2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest Pull kselftest fixes from Shuah Khan: "Fixes to build warnings, install scripts, run-time error path, and git status cleanups to tests: - devices/probe: fix for Python3 regex string syntax warnings - clone3: removing unused macro from clone3_cap_checkpoint_restore() - vDSO: fix to align getrandom states to cache line - core and exec: add missing executables to .gitignore files - rtc: change to skip test if /dev/rtc0 can't be accessed - timers/posix: fix warn_unused_result result in __fatal_error() - breakpoints: fix to detect suspend successful condition correctly - hid: fix to install required dependencies to run the test" * tag 'linux_kselftest-fixes-6.12-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest: selftests: breakpoints: use remaining time to check if suspend succeed kselftest/devices/probe: Fix SyntaxWarning in regex strings for Python3 selftest: hid: add missing run-hid-tools-tests.sh selftests: vDSO: align getrandom states to cache line selftests: exec: update gitignore for load_address selftests: core: add unshare_test to gitignore clone3: clone3_cap_checkpoint_restore: remove unused MAX_PID_NS_LEVEL macro selftests:timers: posix_timers: Fix warn_unused_result in __fatal_error() selftest: rtc: Check if could access /dev/rtc0 before testing
2024-10-04bcachefs: Rework logged op error handlingKent Overstreet
Initially it was thought that we just wanted to ignore errors from logged op replay, but it turns out we do need to catch -EROFS, or we'll go into an infinite loop. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-10-04bcachefs: Add warn param to subvol_get_snapshot, peek_inodeKent Overstreet
These shouldn't always be fatal errors - logged op resume, in particular, and we want it as a parameter there. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-10-04bcachefs: Kill snapshot arg to fsck_write_inode()Kent Overstreet
It was initially believed that it would be better to be explicit about the snapshot we're updating when writing inodes in fsck; however, it turns out that passing around the snapshot separately is more error prone and we're usually updating the inode in the same snapshow we read it from. This is different from normal filesystem paths, where we do the update in the snapshot of the subvolume we're in. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-10-04bcachefs: Check for unlinked, non-empty dirs in check_inode()Kent Overstreet
We want to check for this early so it can be reattached if necessary in check_unreachable_inodes(); better than letting it be deleted and having the children reattached, losing their filenames. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>