linux.git - Linus' kernel tree

Age	Commit message (Collapse)	Author
2024-01-01	afs: Keep a record of the current fileserver endpoint state	David Howells
	Keep a record of the current fileserver endpoint state, including the probe state, and replace it when a new probe is started rather than just squelching the old state and overwriting it. Clearance of the old state can cause a race if there's another thread also currently trying to communicate with that server. It appears that this race might be the culprit for some occasions where kafs complains about invalid data in the RPC reply because the rotation algorithm fell all the way through without actually issuing an RPC call and the error return got filled in from the probe state (which has a zero error recorded). Whatever happens to be in the caller's reply buffer is then taken as the response. Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org
2024-01-01	afs: Dispatch vlserver probes in priority order	David Howells
	When probing all the addresses for a volume location server, dispatch them in order of descending priority to try and get back highest priority one first. Also add a tracepoint to show the transmission and completion of the probes. Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org
2024-01-01	afs: Dispatch fileserver probes in priority order	David Howells
	When probing all the addresses for a fileserver, dispatch them in order of descending priority to try and get back highest priority one first. Also add a tracepoint to show the transmission and completion of the probes. Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org
2024-01-01	afs: Mark address lists with configured priorities	David Howells
	Add a field to each address in an address list (afs_addr_list struct) that records the current priority for that address according to the address preference table. We don't want to do this every time we use an address list, so the version number of the address preference table is recorded in the address list too and we only re-mark the list when we see the version change. These numbers are then displayed through /proc/net/afs/servers. Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org
2024-01-01	afs: Provide a way to configure address priorities	David Howells
	AFS servers may have multiple addresses, but the client can't easily judge between them as to which one is best. For instance, an address that has a larger RTT might actually have a better bandwidth because it goes through a switch rather than being directly connected - but we can't work this out dynamically unless we push through sufficient data that we can measure it. To allow the administrator to configure this, add a list of preference weightings for server addresses by IPv4/IPv6 address or subnet and allow this to be viewed through a procfile and altered by writing text commands to that same file. Preference rules can be added/updated by: echo "add <proto> <addr>[/<subnet>] <prior>" >/proc/fs/afs/addr_prefs echo "add udp 1.2.3.4 1000" >/proc/fs/afs/addr_prefs echo "add udp 192.168.0.0/16 3000" >/proc/fs/afs/addr_prefs echo "add udp 1001:2002:0:6::/64 4000" >/proc/fs/afs/addr_prefs and removed by: echo "del <proto> <addr>[/<subnet>]" >/proc/fs/afs/addr_prefs echo "del udp 1.2.3.4" >/proc/fs/afs/addr_prefs where the priority is a number between 0 and 65535. The list is split between IPv4 and IPv6 addresses and each sublist is kept in numerical order, with rules that would otherwise match but have different subnet masking being ordered with the most specific submatch first. A subsequent patch will apply these rules. Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org
2024-01-01	afs: Remove the unimplemented afs_cmp_addr_list()	David Howells
	Remove afs_cmp_addr_list() as it was never implemented. Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org
2024-01-01	afs: Add some more info to /proc/net/afs/servers	David Howells
	In /proc/net/afs/servers, show the cell name and the last error for each address in the server's list. Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org
2024-01-01	Merge tag 'nf-next-23-12-22' of ↵	David S. Miller
	git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf-next Pablo Neira Ayuso says: ==================== netfilter pull request 23-12-22 The following patchset contains Netfilter updates for net-next: 1) Add locking for NFT_MSG_GETSETELEM_RESET requests, to address a race scenario with two concurrent processes running a dump-and-reset which exposes negative counters to userspace, from Phil Sutter. 2) Use GFP_KERNEL in pipapo GC, from Florian Westphal. 3) Reorder nf_flowtable struct members, place the read-mostly parts accessed by the datapath first. From Florian Westphal. 4) Set on dead flag for NFT_MSG_NEWSET in abort path, from Florian Westphal. 5) Support filtering zone in ctnetlink, from Felix Huettner. 6) Bail out if user tries to redefine an existing chain with different type in nf_tables. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2024-01-01	MAINTAINERS: step down as TJA11XX C45 maintainer	Radu Pirea (NXP OSS)
	I am stepping down as TJA11XX C45 maintainer. Andrei Botila will take the responsibility to maintain and improve the support for TJA11XX C45 PHYs. Signed-off-by: Radu Pirea (NXP OSS) <radu-nicolae.pirea@oss.nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-01-01	platform/x86: intel-uncore-freq: Add additional client processors	Srinivas Pandruvada
	Add support for client processors starting from Kaby Lake. Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> Link: https://lore.kernel.org/r/20231222203957.1348043-1-srinivas.pandruvada@linux.intel.com Reviewed-by: Hans de Goede <hdegoede@redhat.com> Signed-off-by: Hans de Goede <hdegoede@redhat.com>
2024-01-01	platform/x86: Remove "X86 PLATFORM DRIVERS - ARCH" from MAINTAINERS	Andy Shevchenko
	It seems traffic there is quite low and changes are often not related to PDx86 anyhow. Besides that I have a lot of other stuff to do, I'm rearly pay attention on these emails. Doesn't seem Daren to be active either. With this in mind, remove (stale) section. Note, it might be make sense to actually move that folder under PDx86 umbrella (in MAINTAINERS) if people find it suitable. That will reduce burden on arch/x86 maintenance. Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Link: https://lore.kernel.org/r/20231222144453.2888706-1-andriy.shevchenko@linux.intel.com Reviewed-by: Hans de Goede <hdegoede@redhat.com> Signed-off-by: Hans de Goede <hdegoede@redhat.com>
2024-01-01	Merge tag 'for-netdev' of ↵	David S. Miller
	https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next Daniel Borkmann says: ==================== bpf-next-for-netdev The following pull-request contains BPF updates for your net-next tree. We've added 22 non-merge commits during the last 3 day(s) which contain a total of 23 files changed, 652 insertions(+), 431 deletions(-). The main changes are: 1) Add verifier support for annotating user's global BPF subprogram arguments with few commonly requested annotations for a better developer experience, from Andrii Nakryiko. These tags are: - Ability to annotate a special PTR_TO_CTX argument - Ability to annotate a generic PTR_TO_MEM as non-NULL 2) Support BPF verifier tracking of BPF_JNE which helps cases when the compiler transforms (unsigned) "a > 0" into "if a == 0 goto xxx" and the like, from Menglong Dong. 3) Fix a warning in bpf_mem_cache's check_obj_size() as reported by LKP, from Hou Tao. 4) Re-support uid/gid options when mounting bpffs which had to be reverted with the prior token series revert to avoid conflicts, from Daniel Borkmann. 5) Fix a libbpf NULL pointer dereference in bpf_object__collect_prog_relos() found from fuzzing the library with malformed ELF files, from Mingyi Zhang. 6) Skip DWARF sections in libbpf's linker sanity check given compiler options to generate compressed debug sections can trigger a rejection due to misalignment, from Alyssa Ross. 7) Fix an unnecessary use of the comma operator in BPF verifier, from Simon Horman. 8) Fix format specifier for unsigned long values in cpustat sample, from Colin Ian King. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2024-01-01	r8169: Fix PCI error on system resume	Kai-Heng Feng
	Some r8168 NICs stop working upon system resume: [ 688.051096] r8169 0000:02:00.1 enp2s0f1: rtl_ep_ocp_read_cond == 0 (loop: 10, delay: 10000). [ 688.175131] r8169 0000:02:00.1 enp2s0f1: Link is Down ... [ 691.534611] r8169 0000:02:00.1 enp2s0f1: PCI error (cmd = 0x0407, status_errs = 0x0000) Not sure if it's related, but those NICs have a BMC device at function 0: 02:00.0 Unassigned class [ff00]: Realtek Semiconductor Co., Ltd. Realtek RealManage BMC [10ec:816e] (rev 1a) Trial and error shows that increase the loop wait on rtl_ep_ocp_read_cond to 30 can eliminate the issue, so let rtl8168ep_driver_start() to wait a bit longer. Fixes: e6d6ca6e1204 ("r8169: Add support for another RTL8168FP") Signed-off-by: Kai-Heng Feng <kai.heng.feng@canonical.com> Reviewed-by: Heiner Kallweit <hkallweit1@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-01-01	net/tcp_sigpool: Use kref_get_unless_zero()	Dmitry Safonov
	The freeing and re-allocation of algorithm are protected by cpool_mutex, so it doesn't fix an actual use-after-free, but avoids a deserved refcount_warn_saturate() warning. A trivial fix for the racy behavior. Fixes: 8c73b26315aa ("net/tcp: Prepare tcp_md5sig_pool for TCP-AO") Suggested-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Dmitry Safonov <dima@arista.com> Tested-by: Bagas Sanjaya <bagasdotme@gmail.com> Reported-by: syzbot <syzkaller@googlegroups.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-01-01	net: sched: em_text: fix possible memory leak in em_text_destroy()	Hangyu Hua
	m->data needs to be freed when em_text_destroy is called. Fixes: d675c989ed2d ("[PKT_SCHED]: Packet classification based on textsearch (ematch)") Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: Hangyu Hua <hbh25y@gmail.com> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-01-01	net: mdio: get/put device node during (un)registration	Luiz Angelo Daros de Luca
	The __of_mdiobus_register() function was storing the device node in dev.of_node without increasing its reference count. It implicitly relied on the caller to maintain the allocated node until the mdiobus was unregistered. Now, __of_mdiobus_register() will acquire the node before assigning it, and of_mdiobus_unregister_callback() will be called at the end of mdio_unregister(). Drivers can now release the node immediately after MDIO registration. Some of them are already doing that even before this patch. Signed-off-by: Luiz Angelo Daros de Luca <luizluca@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-12-31	Linux 6.7-rc8v6.7-rc8	Linus Torvalds

2023-12-31	get_maintainer: remove stray punctuation when cleaning file emails	Alvin Šipraga
	When parsing emails from .yaml files in particular, stray punctuation such as a leading '-' can end up in the name. For example, consider a common YAML section such as: maintainers: - devicetree@vger.kernel.org This would previously be processed by get_maintainer.pl as: - <devicetree@vger.kernel.org> Make the logic in clean_file_emails more robust by deleting any sub-names which consist of common single punctuation marks before proceeding to the best-effort name extraction logic. The output is then correct: devicetree@vger.kernel.org Some additional comments are added to the function to make things clearer to future readers. Link: https://lore.kernel.org/all/0173e76a36b3a9b4e7f324dd3a36fd4a9757f302.camel@perches.com/ Suggested-by: Joe Perches <joe@perches.com> Signed-off-by: Alvin Šipraga <alsi@bang-olufsen.dk> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2023-12-31	get_maintainer: correctly parse UTF-8 encoded names in files	Alvin Šipraga
	While the script correctly extracts UTF-8 encoded names from the MAINTAINERS file, the regular expressions damage my name when parsing from .yaml files. Fix this by replacing the Latin-1-compatible regular expressions with the unicode property matcher \p{L}, which matches on any letter according to the Unicode General Category of letters. The proposed solution only works if the script uses proper string encoding from the outset, so instruct Perl to unconditionally open all files with UTF-8 encoding. This should be safe, as the entire source tree is either UTF-8 or ASCII encoded anyway. See [1] for a detailed analysis. Furthermore, to prevent the \w expression from matching non-ASCII when checking for whether a name should be escaped with quotes, add the /a flag to the regular expression. The escaping logic was duplicated in two places, so it has been factored out into its own function. The original issue was also identified on the tools mailing list [2]. This should solve the observed side effects there as well. Link: https://lore.kernel.org/all/dzn6uco4c45oaa3ia4u37uo5mlt33obecv7gghj2l756fr4hdh@mt3cprft3tmq/ [1] Link: https://lore.kernel.org/tools/20230726-gush-slouching-a5cd41@meerkat/ [2] Signed-off-by: Alvin Šipraga <alsi@bang-olufsen.dk> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2023-12-30	Merge tag 'trace-v6.7-rc7' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace Pull tracing fixes from Steven Rostedt: - Fix readers that are blocked on the ring buffer when buffer_percent is 100%. They are supposed to wake up when the buffer is full, but because the sub-buffer that the writer is on is never considered "dirty" in the calculation, dirty pages will never equal nr_pages. Add +1 to the dirty count in order to count for the sub-buffer that the writer is on. - When a reader is blocked on the "snapshot_raw" file, it is to be woken up when a snapshot is done and be able to read the snapshot buffer. But because the snapshot swaps the buffers (the main one with the snapshot one), and the snapshot reader is waiting on the old snapshot buffer, it was not woken up (because it is now on the main buffer after the swap). Worse yet, when it reads the buffer after a snapshot, it's not reading the snapshot buffer, it's reading the live active main buffer. Fix this by forcing a wakeup of all readers on the snapshot buffer when a new snapshot happens, and then update the buffer that the reader is reading to be back on the snapshot buffer. - Fix the modification of the direct_function hash. There was a race when new functions were added to the direct_function hash as when it moved function entries from the old hash to the new one, a direct function trace could be hit and not see its entry. This is fixed by allocating the new hash, copy all the old entries onto it as well as the new entries, and then use rcu_assign_pointer() to update the new direct_function hash with it. This also fixes a memory leak in that code. - Fix eventfs ownership * tag 'trace-v6.7-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace: ftrace: Fix modification of direct_function hash while in use tracing: Fix blocked reader of snapshot buffer ring-buffer: Fix wake ups when buffer_percent is set to 100 eventfs: Fix file and directory uid and gid ownership
2023-12-30	locking/osq_lock: Clarify osq_wait_next()	David Laight
	Directly return NULL or 'next' instead of breaking out of the loop. Signed-off-by: David Laight <david.laight@aculab.com> [ Split original patch into two independent parts - Linus ] Link: https://lore.kernel.org/lkml/7c8828aec72e42eeb841ca0ee3397e9a@AcuMS.aculab.com/ Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2023-12-30	locking/osq_lock: Clarify osq_wait_next() calling convention	David Laight
	osq_wait_next() is passed 'prev' from osq_lock() and NULL from osq_unlock() but only needs the 'cpu' value to write to lock->tail. Just pass prev->cpu or OSQ_UNLOCKED_VAL instead. Should have no effect on the generated code since gcc manages to assume that 'prev != NULL' due to an earlier dereference. Signed-off-by: David Laight <david.laight@aculab.com> [ Changed 'old' to 'old_cpu' by request from Waiman Long - Linus ] Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2023-12-30	locking/osq_lock: Move the definition of optimistic_spin_node into osq_lock.c	David Laight
	struct optimistic_spin_node is private to the implementation. Move it into the C file to ensure nothing is accessing it. Signed-off-by: David Laight <david.laight@aculab.com> Acked-by: Waiman Long <longman@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2023-12-30	RDMA/erdma: Add hardware statistics support	Cheng Xu
	First, we add a new command to query hardware statistics, and then implement two functions: ib_device_ops.alloc_hw_port_stats and ib_device_ops.get_hw_stats to allow rdma tool can get the statistics of erdma device. Signed-off-by: Cheng Xu <chengyou@linux.alibaba.com> Link: https://lore.kernel.org/r/20231227084800.99091-3-chengyou@linux.alibaba.com Signed-off-by: Leon Romanovsky <leon@kernel.org>
2023-12-30	RDMA/erdma: Introduce dma pool for hardware responses of CMDQ requests	Cheng Xu
	Hardware response, such as the result of query statistics, may be too long to be directly accommodated within the CQE structure. To address this, we introduce a DMA pool to hold the hardware's responses of CMDQ requests. Signed-off-by: Cheng Xu <chengyou@linux.alibaba.com> Link: https://lore.kernel.org/r/20231227084800.99091-2-chengyou@linux.alibaba.com Signed-off-by: Leon Romanovsky <leon@kernel.org>
2023-12-30	ftrace: Fix modification of direct_function hash while in use	Steven Rostedt (Google)
	Masami Hiramatsu reported a memory leak in register_ftrace_direct() where if the number of new entries are added is large enough to cause two allocations in the loop: for (i = 0; i < size; i++) { hlist_for_each_entry(entry, &hash->buckets[i], hlist) { new = ftrace_add_rec_direct(entry->ip, addr, &free_hash); if (!new) goto out_remove; entry->direct = addr; } } Where ftrace_add_rec_direct() has: if (ftrace_hash_empty(direct_functions) \|\| direct_functions->count > 2 * (1 << direct_functions->size_bits)) { struct ftrace_hash new_hash; int size = ftrace_hash_empty(direct_functions) ? 0 : direct_functions->count + 1; if (size < 32) size = 32; new_hash = dup_hash(direct_functions, size); if (!new_hash) return NULL; free_hash = direct_functions; direct_functions = new_hash; } The "*free_hash = direct_functions;" can happen twice, losing the previous allocation of direct_functions. But this also exposed a more serious bug. The modification of direct_functions above is not safe. As direct_functions can be referenced at any time to find what direct caller it should call, the time between: new_hash = dup_hash(direct_functions, size); and direct_functions = new_hash; can have a race with another CPU (or even this one if it gets interrupted), and the entries being moved to the new hash are not referenced. That's because the "dup_hash()" is really misnamed and is really a "move_hash()". It moves the entries from the old hash to the new one. Now even if that was changed, this code is not proper as direct_functions should not be updated until the end. That is the best way to handle function reference changes, and is the way other parts of ftrace handles this. The following is done: 1. Change add_hash_entry() to return the entry it created and inserted into the hash, and not just return success or not. 2. Replace ftrace_add_rec_direct() with add_hash_entry(), and remove the former. 3. Allocate a "new_hash" at the start that is made for holding both the new hash entries as well as the existing entries in direct_functions. 4. Copy (not move) the direct_function entries over to the new_hash. 5. Copy the entries of the added hash to the new_hash. 6. If everything succeeds, then use rcu_pointer_assign() to update the direct_functions with the new_hash. This simplifies the code and fixes both the memory leak as well as the race condition mentioned above. Link: https://lore.kernel.org/all/170368070504.42064.8960569647118388081.stgit@devnote2/ Link: https://lore.kernel.org/linux-trace-kernel/20231229115134.08dd5174@gandalf.local.home Cc: stable@vger.kernel.org Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Alexei Starovoitov <ast@kernel.org> Cc: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Fixes: 763e34e74bb7d ("ftrace: Add register_ftrace_direct()") Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2023-12-30	Merge branch 'topic/cs35l41' into for-next	Takashi Iwai
	Pull CS35L41 codec updates for Lenovo laptops. Signed-off-by: Takashi Iwai <tiwai@suse.de>
2023-12-30	ALSA: hda: Add driver properties for cs35l41 for Lenovo Legion Slim 7 Gen 8 ↵	Dorian Cruveiller
	serie Add driver properties on 4 models of this laptop serie since they don't have _DSD in the ACPI table Signed-off-by: Dorian Cruveiller <doriancruveiller@gmail.com> Cc: <stable@vger.kernel.org> # v6.7 Link: https://lore.kernel.org/r/20231230114312.22118-1-doriancruveiller@gmail.com Signed-off-by: Takashi Iwai <tiwai@suse.de>
2023-12-30	ALSA: hda/realtek: enable SND_PCI_QUIRK for Lenovo Legion Slim 7 Gen 8 ↵	Dorian Cruveiller
	(2023) serie Link up the realtek audio chip to the cirrus cs35l41 sound amplifier chip on 4 models of the Lenovo legion slim 7 gen 8 (2023). These models are 16IRH8 (2 differents subsystem id) and 16APH8 (2 differents subsystem ids). Subsystem ids list: - 17AA38B4 - 17AA38B5 - 17AA38B6 - 17AA38B7 Signed-off-by: Dorian Cruveiller <doriancruveiller@gmail.com> Cc: <stable@vger.kernel.org> # v6.7 Link: https://lore.kernel.org/r/20231230114001.19855-1-doriancruveiller@gmail.com Signed-off-by: Takashi Iwai <tiwai@suse.de>
2023-12-30	x86/alternative: Correct feature bit debug output	Borislav Petkov (AMD)
	In https://lore.kernel.org/r/20231206110636.GBZXBVvCWj2IDjVk4c@fat_crate.local I wanted to adjust the alternative patching debug output to the new changes introduced by da0fe6e68e10 ("x86/alternative: Add indirect call patching") but removed the '' which denotes the ->x86_capability word. The correct output should be, for example: [ 0.230071] SMP alternatives: feat: 1132+15, old: (entry_SYSCALL_64_after_hwframe+0x5a/0x77 (ffffffff81c000c2) len: 16), repl: (ffffffff89ae896a, len: 5) flags: 0x0 while the incorrect one says "... 1132+15" currently. Add back the '*'. Fixes: da0fe6e68e10 ("x86/alternative: Add indirect call patching") Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://lore.kernel.org/r/20231206110636.GBZXBVvCWj2IDjVk4c@fat_crate.local
2023-12-30	ALSA: hda/tas2781: configure the amp after firmware load	Gergo Koteles
	Make the amp available immediately after a module load to avoid having to wait for a PCM hook action. (eg. unloading & loading the module while listening music) Signed-off-by: Gergo Koteles <soyer@irl.hu> Link: https://lore.kernel.org/r/7f2f65d9212aa16edd4db8725489ae59dbe74c66.1703895108.git.soyer@irl.hu Signed-off-by: Takashi Iwai <tiwai@suse.de>
2023-12-30	ALSA: hda/realtek: enable SND_PCI_QUIRK for hp pavilion 14-ec1xxx series	Aabish Malik
	The HP Pavilion 14 ec1xxx series uses the HP mainboard 8A0F with the ALC287 codec. The mute led can be enabled using the already existing ALC287_FIXUP_HP_GPIO_LED quirk. Tested on an HP Pavilion ec1003AU Signed-off-by: Aabish Malik <aabishmalik3337@gmail.com> Cc: <stable@vger.kernel.org> Link: https://lore.kernel.org/r/20231229170352.742261-3-aabishmalik3337@gmail.com Signed-off-by: Takashi Iwai <tiwai@suse.de>
2023-12-30	ALSA: mark all struct bus_type as const	Greg Kroah-Hartman
	Now that the driver core can properly handle constant struct bus_type, move all of the sound subsystem struct bus_type structures as const, placing them into read-only memory which can not be modified at runtime. Note, this fixes a duplicate definition of ac97_bus_type, which somehow was declared extern in a .h file, and then static as a prototype in a .c file, and then properly later on in the same .c file. Amazing that no compiler warning ever showed up for this. Cc: Jaroslav Kysela <perex@perex.cz> Cc: Takashi Iwai <tiwai@suse.com> Cc: Dawei Li <set_pte_at@outlook.com> Cc: Yu Liao <liaoyu15@huawei.com> Cc: "Rafael J. Wysocki" <rafael@kernel.org> Cc: Hans de Goede <hdegoede@redhat.com> Cc: linux-sound@vger.kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Link: https://lore.kernel.org/r/2023121945-immersion-budget-d0aa@gregkh Signed-off-by: Takashi Iwai <tiwai@suse.de>
2023-12-29	zswap: memcontrol: implement zswap writeback disabling	Nhat Pham
	During our experiment with zswap, we sometimes observe swap IOs due to occasional zswap store failures and writebacks-to-swap. These swapping IOs prevent many users who cannot tolerate swapping from adopting zswap to save memory and improve performance where possible. This patch adds the option to disable this behavior entirely: do not writeback to backing swapping device when a zswap store attempt fail, and do not write pages in the zswap pool back to the backing swap device (both when the pool is full, and when the new zswap shrinker is called). This new behavior can be opted-in/out on a per-cgroup basis via a new cgroup file. By default, writebacks to swap device is enabled, which is the previous behavior. Initially, writeback is enabled for the root cgroup, and a newly created cgroup will inherit the current setting of its parent. Note that this is subtly different from setting memory.swap.max to 0, as it still allows for pages to be stored in the zswap pool (which itself consumes swap space in its current form). This patch should be applied on top of the zswap shrinker series: https://lore.kernel.org/linux-mm/20231130194023.4102148-1-nphamcs@gmail.com/ as it also disables the zswap shrinker, a major source of zswap writebacks. For the most part, this feature is motivated by internal parties who have already established their opinions regarding swapping - the workloads that are highly sensitive to IO, and especially those who are using servers with really slow disk performance (for instance, massive but slow HDDs). For these folks, it's impossible to convince them to even entertain zswap if swapping also comes as a packaged deal. Writeback disabling is quite a useful feature in these situations - on a mixed workloads deployment, they can disable writeback for the more IO-sensitive workloads, and enable writeback for other background workloads. For instance, on a server with HDD, I allocate memories and populate them with random values (so that zswap store will always fail), and specify memory.high low enough to trigger reclaim. The time it takes to allocate the memories and just read through it a couple of times (doing silly things like computing the values' average etc.): zswap.writeback disabled: real 0m30.537s user 0m23.687s sys 0m6.637s 0 pages swapped in 0 pages swapped out zswap.writeback enabled: real 0m45.061s user 0m24.310s sys 0m8.892s 712686 pages swapped in 461093 pages swapped out (the last two lines are from vmstat -s). [nphamcs@gmail.com: add a comment about recurring zswap store failures leading to reclaim inefficiency] Link: https://lkml.kernel.org/r/20231221005725.3446672-1-nphamcs@gmail.com Link: https://lkml.kernel.org/r/20231207192406.3809579-1-nphamcs@gmail.com Signed-off-by: Nhat Pham <nphamcs@gmail.com> Suggested-by: Johannes Weiner <hannes@cmpxchg.org> Reviewed-by: Yosry Ahmed <yosryahmed@google.com> Acked-by: Chris Li <chrisl@kernel.org> Cc: Dan Streetman <ddstreet@ieee.org> Cc: David Heidelberg <david@ixit.cz> Cc: Domenico Cerasuolo <cerasuolodomenico@gmail.com> Cc: Hugh Dickins <hughd@google.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: Michal Hocko <mhocko@kernel.org> Cc: Mike Rapoport (IBM) <rppt@kernel.org> Cc: Muchun Song <muchun.song@linux.dev> Cc: Roman Gushchin <roman.gushchin@linux.dev> Cc: Sergey Senozhatsky <senozhatsky@chromium.org> Cc: Seth Jennings <sjenning@redhat.com> Cc: Shakeel Butt <shakeelb@google.com> Cc: Tejun Heo <tj@kernel.org> Cc: Vitaly Wool <vitaly.wool@konsulko.com> Cc: Zefan Li <lizefan.x@bytedance.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-12-29	mlxbf_gige: fix receive packet race condition	David Thompson
	Under heavy traffic, the BlueField Gigabit interface can become unresponsive. This is due to a possible race condition in the mlxbf_gige_rx_packet function, where the function exits with producer and consumer indices equal but there are remaining packet(s) to be processed. In order to prevent this situation, read receive consumer index before the HW replenish so that the mlxbf_gige_rx_packet function returns an accurate return value even if a packet is received into just-replenished buffer prior to exiting this routine. If the just-replenished buffer is received and occupies the last RX ring entry, the interface would not recover and instead would encounter RX packet drops related to internal buffer shortages since the driver RX logic is not being triggered to drain the RX ring. This patch will address and prevent this "ring full" condition. Fixes: f92e1869d74e ("Add Mellanox BlueField Gigabit Ethernet driver") Reviewed-by: Asmaa Mnebhi <asmaa@nvidia.com> Signed-off-by: David Thompson <davthompson@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-12-29	Merge tag 'mlx5-updates-2023-12-20' of ↵	David S. Miller
	git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux Saeed Mahameed says: ==================== mlx5-updates-2023-12-20 mlx5 Socket direct support and management PF profile. Tariq Says: =========== Support Socket-Direct multi-dev netdev This series adds support for combining multiple devices (PFs) of the same port under one netdev instance. Passing traffic through different devices belonging to different NUMA sockets saves cross-numa traffic and allows apps running on the same netdev from different numas to still feel a sense of proximity to the device and achieve improved performance. We achieve this by grouping PFs together, and creating the netdev only once all group members are probed. Symmetrically, we destroy the netdev once any of the PFs is removed. The channels are distributed between all devices, a proper configuration would utilize the correct close numa when working on a certain app/cpu. We pick one device to be a primary (leader), and it fills a special role. The other devices (secondaries) are disconnected from the network in the chip level (set to silent mode). All RX/TX traffic is steered through the primary to/from the secondaries. Currently, we limit the support to PFs only, and up to two devices (sockets). =========== Armen Says: =========== Management PF support and module integration This patch rolls out comprehensive support for the Management Physical Function (MGMT PF) within the mlx5 driver. It involves updating the mlx5 interface header to introduce necessary definitions for MGMT PF and adding a new management PF netdev profile, which will allow the host side to communicate with the embedded linux on Blue-field devices. =========== ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2023-12-29	arm64: dts: rockchip: Fix led pinctrl of lubancat 1	Andy Yan
	According to the schematics, the gpio control sys_led is GPIO0_C5. Fixes: 8d94da58de53 ("arm64: dts: rockchip: Add EmbedFire LubanCat 1") Reported-by: Zhang Ning <zhangn1985@outlook.com> Closes: https://lore.kernel.org/linux-rockchip/OS0P286MB06412D049D8BF7B063D41350CD95A@OS0P286MB0641.JPNP286.PROD.OUTLOOK.COM/T/#u Signed-off-by: Andy Yan <andyshrk@163.com> Link: https://lore.kernel.org/r/20231225005055.3102743-1-andyshrk@163.com Signed-off-by: Heiko Stuebner <heiko@sntech.de>
2023-12-29	arm64: dts: rockchip: correct gpio_pwrctrl1 typo on nanopc-t6	John Clark
	Both rk806_dvs1_null and rk806_dvs2_null duplicate gpio_pwrctrl2 and gpio_pwrctrl1 is not set. This patch sets gpio_pwrctrl1. Signed-off-by: John Clark <inindev@gmail.com> Link: https://lore.kernel.org/r/20231225223226.17690-1-inindev@gmail.com Signed-off-by: Heiko Stuebner <heiko@sntech.de>
2023-12-29	arm64: dts: rockchip: correct gpio_pwrctrl1 typo on rock-5b	John Clark
	Both rk806_dvs1_null and rk806_dvs2_null duplicate gpio_pwrctrl2 and gpio_pwrctrl1 is not set. This patch sets gpio_pwrctrl1. Signed-off-by: John Clark <inindev@gmail.com> Reviewed-by: Sebastian Reichel <sebastian.reichel@collabora.com> Link: https://lore.kernel.org/r/20231225222859.17153-2-inindev@gmail.com Signed-off-by: Heiko Stuebner <heiko@sntech.de>
2023-12-29	arm64: dts: rockchip: support poweroff on the rock-5b	John Clark
	Allow the rock-5b to poweroff its pmic. When issuing a "shutdown -h now" on the rock-5b it reboots instead. Defining 'system-power-controller' allows the rk806 to power down. Commit c699fbfdfd54 ("arm64: dts: rockchip: Support poweroff on NanoPC-T6") similarly resolves this issue for the nanopc-t6. Signed-off-by: John Clark <inindev@gmail.com> Reviewed-by: Sebastian Reichel <sebastian.reichel@collabora.com> Link: https://lore.kernel.org/r/20231225222859.17153-1-inindev@gmail.com Signed-off-by: Heiko Stuebner <heiko@sntech.de>
2023-12-29	arm64: dts: rockchip: Support poweroff on Orange Pi 5	Jimmy Hon
	The RK806 on the Orange Pi 5 can be used to power on/off the whole board. Mark it as the system power controller. Signed-off-by: Jimmy Hon <honyuenkwun@gmail.com> Link: https://lore.kernel.org/r/20231227203211.1047-1-honyuenkwun@gmail.com Signed-off-by: Heiko Stuebner <heiko@sntech.de>
2023-12-29	arm64: dts: rockchip: nanopc-t6 sdmmc beautification	John Clark
	drop max-frequency = <200000000> as it is already defined in rk3588s.dtsi order no-sdio & no-mmc properties while we are here Signed-off-by: John Clark <inindev@gmail.com> Link: https://lore.kernel.org/r/20231228173011.2863-1-inindev@gmail.com Signed-off-by: Heiko Stuebner <heiko@sntech.de>
2023-12-29	kdump: remove redundant DEFAULT_CRASH_KERNEL_LOW_SIZE	Youling Tang
	Remove duplicate definitions, no functional changes. Link: https://lkml.kernel.org/r/MW4PR84MB3145459ADC7EB38BBB36955B8198A@MW4PR84MB3145.NAMPRD84.PROD.OUTLOOK.COM Signed-off-by: Youling Tang <tangyouling@kylinos.cn> Reported-by: Huacai Chen <chenhuacai@loongson.cn> Acked-by: Baoquan He <bhe@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-12-29	scripts/decode_stacktrace.sh: strip unexpected CR from lines	Bjorn Andersson
	When the kernel log is acquired over a serial cable it is not uncommon for the log to contain carriage return characters, in addition to the expected line feeds. When this output is feed into decode_stacktrace.sh, handle_line() fails to strip the trailing ']' off the module name, which results in find_module() not being able to find the referred to kernel module. This is reported to the user as: WARNING! Modules path isn't set, but is needed to parse this symbol The solution is to reconfigure the serial port, or to strip the carriage returns from the log, but this isn't obvious from the error reported by the script. Instead, make decode_stacktrace.sh more user friendly by stripping the trailing carriage return. Link: https://lkml.kernel.org/r/20231225-decode-stacktrace-cr-v1-1-9f306f38cdde@quicinc.com Signed-off-by: Bjorn Andersson <quic_bjorande@quicinc.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-12-29	watchdog: if panicking and we dumped everything, don't re-enable dumping	Douglas Anderson
	If, as part of handling a hardlockup or softlockup, we've already dumped all CPUs and we're just about to panic, don't reenable dumping and give some other CPU a chance to hop in there and add some confusing logs right as the panic is happening. Link: https://lkml.kernel.org/r/20231220131534.4.Id3a9c7ec2d7d83e4080da6f8662ba2226b40543f@changeid Signed-off-by: Douglas Anderson <dianders@chromium.org> Cc: John Ogness <john.ogness@linutronix.de> Cc: Lecopzer Chen <lecopzer.chen@mediatek.com> Cc: Li Zhe <lizhe.67@bytedance.com> Cc: Petr Mladek <pmladek@suse.com> Cc: Pingfan Liu <kernelfans@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-12-29	watchdog/hardlockup: use printk_cpu_sync_get_irqsave() to serialize reporting	Douglas Anderson
	If two CPUs end up reporting a hardlockup at the same time then their logs could get interleaved which is hard to read. The interleaving problem was especially bad with the "perf" hardlockup detector where the locked up CPU is always the same as the running CPU and we end up in show_regs(). show_regs() has no inherent serialization so we could mix together two crawls if two hardlockups happened at the same time (and if we didn't have `sysctl_hardlockup_all_cpu_backtrace` set). With this change we'll fully serialize hardlockups when using the "perf" hardlockup detector. The interleaving problem was less bad with the "buddy" hardlockup detector. With "buddy" we always end up calling `trigger_single_cpu_backtrace(cpu)` on some CPU other than the running one. trigger_single_cpu_backtrace() always at least serializes the individual stack crawls because it eventually uses printk_cpu_sync_get_irqsave(). Unfortunately the fact that trigger_single_cpu_backtrace() eventually calls printk_cpu_sync_get_irqsave() (on a different CPU) means that we have to drop the "lock" before calling it and we can't fully serialize all printouts associated with a given hardlockup. However, we still do get the advantage of serializing the output of print_modules() and print_irqtrace_events(). Aside from serializing hardlockups from each other, this change also has the advantage of serializing hardlockups and softlockups from each other if they happen to happen at the same time since they are both using the same "lock". Even though nobody is expected to hang while holding the lock associated with printk_cpu_sync_get_irqsave(), out of an abundance of caution, we don't call printk_cpu_sync_get_irqsave() until after we print out about the hardlockup. This makes extra sure that, even if printk_cpu_sync_get_irqsave() somehow never runs we at least print that we saw the hardlockup. This is different than the choice made for softlockup because hardlockup is really our last resort. Link: https://lkml.kernel.org/r/20231220131534.3.I6ff691b3b40f0379bc860f80c6e729a0485b5247@changeid Signed-off-by: Douglas Anderson <dianders@chromium.org> Reviewed-by: John Ogness <john.ogness@linutronix.de> Cc: Lecopzer Chen <lecopzer.chen@mediatek.com> Cc: Li Zhe <lizhe.67@bytedance.com> Cc: Petr Mladek <pmladek@suse.com> Cc: Pingfan Liu <kernelfans@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-12-29	watchdog/softlockup: use printk_cpu_sync_get_irqsave() to serialize reporting	Douglas Anderson
	Instead of introducing a spinlock, use printk_cpu_sync_get_irqsave() and printk_cpu_sync_put_irqrestore() to serialize softlockup reporting. Alone this doesn't have any real advantage over the spinlock, but this will allow us to use the same function in a future change to also serialize hardlockup crawls. NOTE: for the most part this serialization is important because we often end up in the show_regs() path and that has no built-in serialization if there are multiple callers at once. However, even in the case where we end up in the dump_stack() path this still has some advantages because the stack will be guaranteed to be together in the logs with the lockup message with no interleaving. NOTE: the fact that printk_cpu_sync_get_irqsave() is allowed to be called multiple times on the same CPU is important here. Specifically we hold the "lock" while calling dump_stack() which also gets the same "lock". This is explicitly documented to be OK and means we don't need to introduce a variant of dump_stack() that doesn't grab the lock. Link: https://lkml.kernel.org/r/20231220131534.2.Ia5906525d440d8e8383cde31b7c61c2aadc8f907@changeid Signed-off-by: Douglas Anderson <dianders@chromium.org> Reviewed-by: John Ogness <john.ogness@linutronix.de> Reviewed-by: Li Zhe <lizhe.67@bytedance.com> Cc: Lecopzer Chen <lecopzer.chen@mediatek.com> Cc: Petr Mladek <pmladek@suse.com> Cc: Pingfan Liu <kernelfans@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-12-29	watchdog/hardlockup: adopt softlockup logic avoiding double-dumps	Douglas Anderson
	Patch series "watchdog: Better handling of concurrent lockups". When we get multiple lockups at roughly the same time, the output in the kernel logs can be very confusing since the reports about the lockups end up interleaved in the logs. There is some code in the kernel to try to handle this but it wasn't that complete. Li Zhe recently made this a bit better for softlockups (specifically for the case where `kernel.softlockup_all_cpu_backtrace` is not set) in commit 9d02330abd3e ("softlockup: serialized softlockup's log"), but that only handled softlockup reports. Hardlockup reports still had similar issues. This series also has a small fix to avoid dumping all stacks a second time in the case of a panic. This is a bit unrelated to the interleaving fixes but it does also improve the clarity of lockup reports. This patch (of 4): The hardlockup detector and softlockup detector both have the ability to dump the stack of all CPUs (`kernel.hardlockup_all_cpu_backtrace` and `kernel.softlockup_all_cpu_backtrace`). Both detectors also have some logic to attempt to avoid interleaving printouts if two CPUs were trying to do dumps of all CPUs at the same time. However: - The hardlockup detector's logic still allowed interleaving some information. Specifically another CPU could print modules and dump the stack of the locked CPU at the same time we were dumping all CPUs. - In the case where `kernel.hardlockup_panic` was set in addition to `kernel.hardlockup_all_cpu_backtrace`, when two CPUs both detected hardlockups at the same time the second CPU could call panic() while the first was still dumping stacks. This was especially bad if the locked up CPU wasn't responding to the request for a backtrace since the function nmi_trigger_cpumask_backtrace() can wait up to 10 seconds. Let's resolve this by adopting the softlockup logic in the hardlockup handler. NOTES: - As part of this, one might think that we should make a helper function that both the hard and softlockup detectors call. This turns out not to be super trivial since it would have to be parameterized quite a bit since there are separate global variables controlling each lockup detector and they print log messages that are just different enough that it would be a pain. We probably don't want to change the messages that are printed without good reason to avoid throwing log parsers for a loop. - One might also think that it would be a good idea to have the hardlockup and softlockup detector use the same global variable to prevent interleaving. This would make sure that softlockups and hardlockups can't interleave each other. That _almost_ works but has a dangerous flaw if `kernel.hardlockup_panic` is not the same as `kernel.softlockup_panic` because we might skip a call to panic() if one type of lockup was detected at the same time as another. Link: https://lkml.kernel.org/r/20231220211640.2023645-1-dianders@chromium.org Link: https://lkml.kernel.org/r/20231220131534.1.I4f35a69fbb124b5f0c71f75c631e11fabbe188ff@changeid Signed-off-by: Douglas Anderson <dianders@chromium.org> Cc: John Ogness <john.ogness@linutronix.de> Cc: Lecopzer Chen <lecopzer.chen@mediatek.com> Cc: Li Zhe <lizhe.67@bytedance.com> Cc: Petr Mladek <pmladek@suse.com> Cc: Pingfan Liu <kernelfans@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-12-29	kexec_core: fix the assignment to kimage->control_page	Yuntao Wang
	image->control_page represents the starting address for allocating the next control page, while hole_end represents the address of the last valid byte of the currently allocated control page. This bug actually does not affect the correctness of allocating control pages, because image->control_page is currently only used in kimage_alloc_crash_control_pages(), and this function, when allocating control pages, will first align image->control_page up to the nearest `(1 << order) << PAGE_SHIFT` boundary, then use this value as the starting address of the next control page. This ensures that the newly allocated control page will use the correct starting address and not overlap with previously allocated control pages. Although it does not affect the correctness of the final result, it is better for us to set image->control_page to the correct value, in case it might be used elsewhere in the future, potentially causing errors. Therefore, after successfully allocating a control page, image->control_page should be updated to `hole_end + 1`, rather than hole_end. Link: https://lkml.kernel.org/r/20231221042308.11076-1-ytcoode@gmail.com Signed-off-by: Yuntao Wang <ytcoode@gmail.com> Cc: Baoquan He <bhe@redhat.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-12-29	x86/kexec: fix incorrect end address passed to kernel_ident_mapping_init()	Yuntao Wang
	kernel_ident_mapping_init() takes an exclusive memory range [pstart, pend) where pend is not included in the range, while res represents an inclusive memory range [start, end] where end is considered part of the range. Passing [start, end] rather than [start, end+1) to kernel_ident_mapping_init() may result in the identity mapping for the end address not being set up. For example, when res->start is equal to res->end, kernel_ident_mapping_init() will not establish any identity mapping. Similarly, when the value of res->end is a multiple of 2M and the page table maps 2M pages, kernel_ident_mapping_init() will also not set up identity mapping for res->end. Therefore, passing res->end directly to kernel_ident_mapping_init() is incorrect, the correct end address should be `res->end + 1`. Link: https://lkml.kernel.org/r/20231221101702.20956-1-ytcoode@gmail.com Signed-off-by: Yuntao Wang <ytcoode@gmail.com> Cc: Baoquan He <bhe@redhat.com> Cc: Bjorn Helgaas <bhelgaas@google.com> Cc: Borislav Petkov (AMD) <bp@alien8.de> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Simon Horman <horms@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>