summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2022-02-21random: group crng functionsJason A. Donenfeld
This pulls all of the crng-focused functions into the second labeled section. No functional changes. Cc: Theodore Ts'o <tytso@mit.edu> Reviewed-by: Dominik Brodowski <linux@dominikbrodowski.net> Reviewed-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2022-02-21random: group initialization wait functionsJason A. Donenfeld
This pulls all of the readiness waiting-focused functions into the first labeled section. No functional changes. Cc: Theodore Ts'o <tytso@mit.edu> Reviewed-by: Dominik Brodowski <linux@dominikbrodowski.net> Reviewed-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2022-02-21random: remove whitespace and reorder includesJason A. Donenfeld
This is purely cosmetic. Future work involves figuring out which of these headers we need and which we don't. Reviewed-by: Dominik Brodowski <linux@dominikbrodowski.net> Reviewed-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2022-02-21random: remove useless header commentJason A. Donenfeld
This really adds nothing at all useful. Cc: Theodore Ts'o <tytso@mit.edu> Reviewed-by: Dominik Brodowski <linux@dominikbrodowski.net> Reviewed-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2022-02-21random: introduce drain_entropy() helper to declutter crng_reseed()Jason A. Donenfeld
In preparation for separating responsibilities, break out the entropy count management part of crng_reseed() into its own function. No functional changes. Cc: Theodore Ts'o <tytso@mit.edu> Reviewed-by: Dominik Brodowski <linux@dominikbrodowski.net> Reviewed-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2022-02-21random: deobfuscate irq u32/u64 contributionsJason A. Donenfeld
In the irq handler, we fill out 16 bytes differently on 32-bit and 64-bit platforms, and for 32-bit vs 64-bit cycle counters, which doesn't always correspond with the bitness of the platform. Whether or not you like this strangeness, it is a matter of fact. But it might not be a fact you well realized until now, because the code that loaded the irq info into 4 32-bit words was quite confusing. Instead, this commit makes everything explicit by having separate (compile-time) branches for 32-bit and 64-bit types. Cc: Theodore Ts'o <tytso@mit.edu> Reviewed-by: Dominik Brodowski <linux@dominikbrodowski.net> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2022-02-21random: add proper SPDX headerJason A. Donenfeld
Convert the current license into the SPDX notation of "(GPL-2.0 OR BSD-3-Clause)". This infers GPL-2.0 from the text "ALTERNATIVELY, this product may be distributed under the terms of the GNU General Public License, in which case the provisions of the GPL are required INSTEAD OF the above restrictions" and it infers BSD-3-Clause from the verbatim BSD 3 clause license in the file. Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Theodore Ts'o <tytso@mit.edu> Cc: Dominik Brodowski <linux@dominikbrodowski.net> Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2022-02-21random: remove unused tracepointsJason A. Donenfeld
These explicit tracepoints aren't really used and show sign of aging. It's work to keep these up to date, and before I attempted to keep them up to date, they weren't up to date, which indicates that they're not really used. These days there are better ways of introspecting anyway. Cc: Theodore Ts'o <tytso@mit.edu> Reviewed-by: Dominik Brodowski <linux@dominikbrodowski.net> Reviewed-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2022-02-21random: remove ifdef'd out interrupt benchJason A. Donenfeld
With tools like kbench9000 giving more finegrained responses, and this basically never having been used ever since it was initially added, let's just get rid of this. There *is* still work to be done on the interrupt handler, but this really isn't the way it's being developed. Cc: Theodore Ts'o <tytso@mit.edu> Reviewed-by: Eric Biggers <ebiggers@google.com> Reviewed-by: Dominik Brodowski <linux@dominikbrodowski.net> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2022-02-21random: tie batched entropy generation to base_crng generationJason A. Donenfeld
Now that we have an explicit base_crng generation counter, we don't need a separate one for batched entropy. Rather, we can just move the generation forward every time we change crng_init state or update the base_crng key. Cc: Theodore Ts'o <tytso@mit.edu> Reviewed-by: Eric Biggers <ebiggers@google.com> Reviewed-by: Dominik Brodowski <linux@dominikbrodowski.net> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2022-02-21random: fix locking for crng_init in crng_reseed()Dominik Brodowski
crng_init is protected by primary_crng->lock. Therefore, we need to hold this lock when increasing crng_init to 2. As we shouldn't hold this lock for too long, only hold it for those parts which require protection. Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net> Reviewed-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2022-02-21random: zero buffer after reading entropy from userspaceJason A. Donenfeld
This buffer may contain entropic data that shouldn't stick around longer than needed, so zero out the temporary buffer at the end of write_pool(). Reviewed-by: Dominik Brodowski <linux@dominikbrodowski.net> Reviewed-by: Jann Horn <jannh@google.com> Reviewed-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2022-02-21random: remove outdated INT_MAX >> 6 check in urandom_read()Jason A. Donenfeld
In 79a8468747c5 ("random: check for increase of entropy_count because of signed conversion"), a number of checks were added around what values were passed to account(), because account() was doing fancy fixed point fractional arithmetic, and a user had some ability to pass large values directly into it. One of things in that commit was limiting those values to INT_MAX >> 6. The first >> 3 was for bytes to bits, and the next >> 3 was for bits to 1/8 fractional bits. However, for several years now, urandom reads no longer touch entropy accounting, and so this check serves no purpose. The current flow is: urandom_read_nowarn()-->get_random_bytes_user()-->chacha20_block() Of course, we don't want that size_t to be truncated when adding it into the ssize_t. But we arrive at urandom_read_nowarn() in the first place either via ordinary fops, which limits reads to MAX_RW_COUNT, or via getrandom() which limits reads to INT_MAX. Cc: Theodore Ts'o <tytso@mit.edu> Reviewed-by: Dominik Brodowski <linux@dominikbrodowski.net> Reviewed-by: Jann Horn <jannh@google.com> Reviewed-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2022-02-21random: make more consistent use of integer typesJason A. Donenfeld
We've been using a flurry of int, unsigned int, size_t, and ssize_t. Let's unify all of this into size_t where it makes sense, as it does in most places, and leave ssize_t for return values with possible errors. In addition, keeping with the convention of other functions in this file, functions that are dealing with raw bytes now take void * consistently instead of a mix of that and u8 *, because much of the time we're actually passing some other structure that is then interpreted as bytes by the function. We also take the opportunity to fix the outdated and incorrect comment in get_random_bytes_arch(). Cc: Theodore Ts'o <tytso@mit.edu> Reviewed-by: Dominik Brodowski <linux@dominikbrodowski.net> Reviewed-by: Jann Horn <jannh@google.com> Reviewed-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2022-02-21random: use hash function for crng_slow_load()Jason A. Donenfeld
Since we have a hash function that's really fast, and the goal of crng_slow_load() is reportedly to "touch all of the crng's state", we can just hash the old state together with the new state and call it a day. This way we dont need to reason about another LFSR or worry about various attacks there. This code is only ever used at early boot and then never again. Cc: Theodore Ts'o <tytso@mit.edu> Reviewed-by: Dominik Brodowski <linux@dominikbrodowski.net> Reviewed-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2022-02-21random: use simpler fast key erasure flow on per-cpu keysJason A. Donenfeld
Rather than the clunky NUMA full ChaCha state system we had prior, this commit is closer to the original "fast key erasure RNG" proposal from <https://blog.cr.yp.to/20170723-random.html>, by simply treating ChaCha keys on a per-cpu basis. All entropy is extracted to a base crng key of 32 bytes. This base crng has a birthdate and a generation counter. When we go to take bytes from the crng, we first check if the birthdate is too old; if it is, we reseed per usual. Then we start working on a per-cpu crng. This per-cpu crng makes sure that it has the same generation counter as the base crng. If it doesn't, it does fast key erasure with the base crng key and uses the output as its new per-cpu key, and then updates its local generation counter. Then, using this per-cpu state, we do ordinary fast key erasure. Half of this first block is used to overwrite the per-cpu crng key for the next call -- this is the fast key erasure RNG idea -- and the other half, along with the ChaCha state, is returned to the caller. If the caller desires more than this remaining half, it can generate more ChaCha blocks, unlocked, using the now detached ChaCha state that was just returned. Crypto-wise, this is more or less what we were doing before, but this simply makes it more explicit and ensures that we always have backtrack protection by not playing games with a shared block counter. The flow looks like this: ──extract()──► base_crng.key ◄──memcpy()───┐ │ │ └──chacha()──────┬─► new_base_key └─► crngs[n].key ◄──memcpy()───┐ │ │ └──chacha()───┬─► new_key └─► random_bytes │ └────► There are a few hairy details around early init. Just as was done before, prior to having gathered enough entropy, crng_fast_load() and crng_slow_load() dump bytes directly into the base crng, and when we go to take bytes from the crng, in that case, we're doing fast key erasure with the base crng rather than the fast unlocked per-cpu crngs. This is fine as that's only the state of affairs during very early boot; once the crng initializes we never use these paths again. In the process of all this, the APIs into the crng become a bit simpler: we have get_random_bytes(buf, len) and get_random_bytes_user(buf, len), which both do what you'd expect. All of the details of fast key erasure and per-cpu selection happen only in a very short critical section of crng_make_state(), which selects the right per-cpu key, does the fast key erasure, and returns a local state to the caller's stack. So, we no longer have a need for a separate backtrack function, as this happens all at once here. The API then allows us to extend backtrack protection to batched entropy without really having to do much at all. The result is a bit simpler than before and has fewer foot guns. The init time state machine also gets a lot simpler as we don't need to wait for workqueues to come online and do deferred work. And the multi-core performance should be increased significantly, by virtue of having hardly any locking on the fast path. Cc: Theodore Ts'o <tytso@mit.edu> Cc: Dominik Brodowski <linux@dominikbrodowski.net> Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Reviewed-by: Jann Horn <jannh@google.com> Reviewed-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2022-02-21random: absorb fast pool into input pool after fast loadJason A. Donenfeld
During crng_init == 0, we never credit entropy in add_interrupt_ randomness(), but instead dump it directly into the primary_crng. That's fine, except for the fact that we then wind up throwing away that entropy later when we switch to extracting from the input pool and xoring into (and later in this series overwriting) the primary_crng key. The two other early init sites -- add_hwgenerator_randomness()'s use crng_fast_load() and add_device_ randomness()'s use of crng_slow_load() -- always additionally give their inputs to the input pool. But not add_interrupt_randomness(). This commit fixes that shortcoming by calling mix_pool_bytes() after crng_fast_load() in add_interrupt_randomness(). That's partially verboten on PREEMPT_RT, where it implies taking spinlock_t from an IRQ handler. But this also only happens during early boot and then never again after that. Plus it's a trylock so it has the same considerations as calling crng_fast_load(), which we're already using. Cc: Theodore Ts'o <tytso@mit.edu> Reviewed-by: Dominik Brodowski <linux@dominikbrodowski.net> Reviewed-by: Eric Biggers <ebiggers@google.com> Suggested-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2022-02-21ACPI: clean up double words in two commentsTom Rix
Remove the second 'on' and 'those'. Signed-off-by: Tom Rix <trix@redhat.com> [ rjw: Subject adjustments ] Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2022-02-21sc16is7xx: Fix for incorrect data being transmittedPhil Elwell
UART drivers are meant to use the port spinlock within certain methods, to protect against reentrancy. The sc16is7xx driver does very little locking, presumably because when added it triggers "scheduling while atomic" errors. This is due to the use of mutexes within the regmap abstraction layer, and the mutex implementation's habit of sleeping the current thread while waiting for access. Unfortunately this lack of interlocking can lead to corruption of outbound data, which occurs when the buffer used for I2C transmission is used simultaneously by two threads - a work queue thread running sc16is7xx_tx_proc, and an IRQ thread in sc16is7xx_port_irq, both of which can call sc16is7xx_handle_tx. An earlier patch added efr_lock, a mutex that controls access to the EFR register. This mutex is already claimed in the IRQ handler, and all that is required is to claim the same mutex in sc16is7xx_tx_proc. See: https://github.com/raspberrypi/linux/issues/4885 Fixes: 6393ff1c4435 ("sc16is7xx: Use threaded IRQ") Cc: stable <stable@vger.kernel.org> Signed-off-by: Phil Elwell <phil@raspberrypi.com> Link: https://lore.kernel.org/r/20220216160802.1026013-1-phil@raspberrypi.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-02-21tty: n_gsm: fix deadlock in gsmtty_open()daniel.starke@siemens.com
In the current implementation the user may open a virtual tty which then could fail to establish the underlying DLCI. The function gsmtty_open() gets stuck in tty_port_block_til_ready() while waiting for a carrier rise. This happens if the remote side fails to acknowledge the link establishment request in time or completely. At some point gsm_dlci_close() is called to abort the link establishment attempt. The function tries to inform the associated virtual tty by performing a hangup. But the blocking loop within tty_port_block_til_ready() is not informed about this event. The patch proposed here fixes this by resetting the initialization state of the virtual tty to ensure the loop exits and triggering it to make tty_port_block_til_ready() return. Fixes: e1eaea46bb40 ("tty: n_gsm line discipline") Cc: stable@vger.kernel.org Signed-off-by: Daniel Starke <daniel.starke@siemens.com> Link: https://lore.kernel.org/r/20220218073123.2121-7-daniel.starke@siemens.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-02-21tty: n_gsm: fix wrong modem processing in convergence layer type 2daniel.starke@siemens.com
The function gsm_process_modem() exists to handle modem status bits of incoming frames. This includes incoming MSC (modem status command) frames and convergence layer type 2 data frames. The function, however, was only designed to handle MSC frames as it expects the command length. Within gsm_dlci_data() it is wrongly assumed that this is the same as the data frame length. This is only true if the data frame contains only 1 byte of payload. This patch names the length parameter of gsm_process_modem() in a generic manner to reflect its association. It also corrects all calls to the function to handle the variable number of modem status octets correctly in both cases. Fixes: 7263287af93d ("tty: n_gsm: Fixed logic to decode break signal from modem status") Cc: stable@vger.kernel.org Signed-off-by: Daniel Starke <daniel.starke@siemens.com> Link: https://lore.kernel.org/r/20220218073123.2121-6-daniel.starke@siemens.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-02-21tty: n_gsm: fix wrong tty control line for flow controldaniel.starke@siemens.com
tty flow control is handled via gsmtty_throttle() and gsmtty_unthrottle(). Both functions propagate the outgoing hardware flow control state to the remote side via MSC (modem status command) frames. The local state is taken from the RTS (ready to send) flag of the tty. However, RTS gets mapped to DTR (data terminal ready), which is wrong. This patch corrects this by mapping RTS to RTS. Fixes: e1eaea46bb40 ("tty: n_gsm line discipline") Cc: stable@vger.kernel.org Signed-off-by: Daniel Starke <daniel.starke@siemens.com> Link: https://lore.kernel.org/r/20220218073123.2121-5-daniel.starke@siemens.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-02-21tty: n_gsm: fix NULL pointer access due to DLCI releasedaniel.starke@siemens.com
The here fixed commit made the tty hangup asynchronous to avoid a circular locking warning. I could not reproduce this warning. Furthermore, due to the asynchronous hangup the function call now gets queued up while the underlying tty is being freed. Depending on the timing this results in a NULL pointer access in the global work queue scheduler. To be precise in process_one_work(). Therefore, the previous commit made the issue worse which it tried to fix. This patch fixes this by falling back to the old behavior which uses a blocking tty hangup call before freeing up the associated tty. Fixes: 7030082a7415 ("tty: n_gsm: avoid recursive locking with async port hangup") Cc: stable@vger.kernel.org Signed-off-by: Daniel Starke <daniel.starke@siemens.com> Link: https://lore.kernel.org/r/20220218073123.2121-4-daniel.starke@siemens.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-02-21tty: n_gsm: fix proper link termination after failed opendaniel.starke@siemens.com
Trying to open a DLCI by sending a SABM frame may fail with a timeout. The link is closed on the initiator side without informing the responder about this event. The responder assumes the link is open after sending a UA frame to answer the SABM frame. The link gets stuck in a half open state. This patch fixes this by initiating the proper link termination procedure after link setup timeout instead of silently closing it down. Fixes: e1eaea46bb40 ("tty: n_gsm line discipline") Cc: stable@vger.kernel.org Signed-off-by: Daniel Starke <daniel.starke@siemens.com> Link: https://lore.kernel.org/r/20220218073123.2121-3-daniel.starke@siemens.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-02-21tty: n_gsm: fix encoding of command/response bitdaniel.starke@siemens.com
n_gsm is based on the 3GPP 07.010 and its newer version is the 3GPP 27.010. See https://portal.3gpp.org/desktopmodules/Specifications/SpecificationDetails.aspx?specificationId=1516 The changes from 07.010 to 27.010 are non-functional. Therefore, I refer to the newer 27.010 here. Chapter 5.2.1.2 describes the encoding of the C/R (command/response) bit. Table 1 shows that the actual encoding of the C/R bit is inverted if the associated frame is sent by the responder. The referenced commit fixed here further broke the internal meaning of this bit in the outgoing path by always setting the C/R bit regardless of the frame type. This patch fixes both by setting the C/R bit always consistently for command (1) and response (0) frames and inverting it later for the responder where necessary. The meaning of this bit in the debug output is being preserved and shows the bit as if it was encoded by the initiator. This reflects only the frame type rather than the encoded combination of communication side and frame type. Fixes: cc0f42122a7e ("tty: n_gsm: Modify CR,PF bit when config requester") Cc: stable@vger.kernel.org Signed-off-by: Daniel Starke <daniel.starke@siemens.com> Link: https://lore.kernel.org/r/20220218073123.2121-2-daniel.starke@siemens.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-02-21tty: n_gsm: fix encoding of control signal octet bit DVdaniel.starke@siemens.com
n_gsm is based on the 3GPP 07.010 and its newer version is the 3GPP 27.010. See https://portal.3gpp.org/desktopmodules/Specifications/SpecificationDetails.aspx?specificationId=1516 The changes from 07.010 to 27.010 are non-functional. Therefore, I refer to the newer 27.010 here. Chapter 5.4.6.3.7 describes the encoding of the control signal octet used by the MSC (modem status command). The same encoding is also used in convergence layer type 2 as described in chapter 5.5.2. Table 7 and 24 both require the DV (data valid) bit to be set 1 for outgoing control signal octets sent by the DTE (data terminal equipment), i.e. for the initiator side. Currently, the DV bit is only set if CD (carrier detect) is on, regardless of the side. This patch fixes this behavior by setting the DV bit on the initiator side unconditionally. Fixes: e1eaea46bb40 ("tty: n_gsm line discipline") Cc: stable@vger.kernel.org Signed-off-by: Daniel Starke <daniel.starke@siemens.com> Link: https://lore.kernel.org/r/20220218073123.2121-1-daniel.starke@siemens.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-02-21Merge tag 'platform-drivers-x86-v5.17-3' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86 Pull x86 platform driver fixes from Hans de Goede: "Two small fixes and one hardware-id addition" * tag 'platform-drivers-x86-v5.17-3' of git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86: platform/x86: int3472: Add terminator to gpiod_lookup_table platform/x86: asus-wmi: Fix regression when probing for fan curve control platform/x86: thinkpad_acpi: Add dual-fan quirk for T15g (2nd gen)
2022-02-21mtd: core: Fix a conflict between MTD and NVMEM on wp-gpios propertyChristophe Kerello
Wp-gpios property can be used on NVMEM nodes and the same property can be also used on MTD NAND nodes. In case of the wp-gpios property is defined at NAND level node, the GPIO management is done at NAND driver level. Write protect is disabled when the driver is probed or resumed and is enabled when the driver is released or suspended. When no partitions are defined in the NAND DT node, then the NAND DT node will be passed to NVMEM framework. If wp-gpios property is defined in this node, the GPIO resource is taken twice and the NAND controller driver fails to probe. A new Boolean flag named ignore_wp has been added in nvmem_config. In case ignore_wp is set, it means that the GPIO is handled by the provider. Lets set this flag in MTD layer to avoid the conflict on wp_gpios property. Fixes: 2a127da461a9 ("nvmem: add support for the write-protect pin") Cc: stable@vger.kernel.org Acked-by: Miquel Raynal <miquel.raynal@bootlin.com> Signed-off-by: Christophe Kerello <christophe.kerello@foss.st.com> Signed-off-by: Srinivas Kandagatla <srinivas.kandagatla@linaro.org> Link: https://lore.kernel.org/r/20220220151432.16605-3-srinivas.kandagatla@linaro.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-02-21nvmem: core: Fix a conflict between MTD and NVMEM on wp-gpios propertyChristophe Kerello
Wp-gpios property can be used on NVMEM nodes and the same property can be also used on MTD NAND nodes. In case of the wp-gpios property is defined at NAND level node, the GPIO management is done at NAND driver level. Write protect is disabled when the driver is probed or resumed and is enabled when the driver is released or suspended. When no partitions are defined in the NAND DT node, then the NAND DT node will be passed to NVMEM framework. If wp-gpios property is defined in this node, the GPIO resource is taken twice and the NAND controller driver fails to probe. It would be possible to set config->wp_gpio at MTD level before calling nvmem_register function but NVMEM framework will toggle this GPIO on each write when this GPIO should only be controlled at NAND level driver to ensure that the Write Protect has not been enabled. A way to fix this conflict is to add a new boolean flag in nvmem_config named ignore_wp. In case ignore_wp is set, the GPIO resource will be managed by the provider. Fixes: 2a127da461a9 ("nvmem: add support for the write-protect pin") Cc: stable@vger.kernel.org Signed-off-by: Christophe Kerello <christophe.kerello@foss.st.com> Signed-off-by: Srinivas Kandagatla <srinivas.kandagatla@linaro.org> Link: https://lore.kernel.org/r/20220220151432.16605-2-srinivas.kandagatla@linaro.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-02-21Merge tag 'iio-fixes-for-5.17a' of ↵Greg Kroah-Hartman
https://git.kernel.org/pub/scm/linux/kernel/git/jic23/iio into char-misc-linus Jonathan writes: 1st set of IIO fixes for the 5.17 cycle. Several drivers: - Fix a failure to disable runtime in probe error paths. All cases were introduced in the same rework patch. adi,ad7124 - Fix incorrect register masking. adi,ad74413r - Avoid referencing negative array offsets. - Use ngpio size when iterating over mask not numebr of channels. - Fix issue with wrong mask uage getting GPIOs. adi,admv1014 - Drop check on unsigned less than 0. adi,ads16480 - Correctly handle devices that don't have burst mode support. fsl,fxls8962af - Add missing padding needed between address and data for SPI transfers. men_z188 - Fix iomap leak in error path. st,lsm6dsx - Wait for setting time in oneshot reads to get a stable result. ti,tsc2046 - Prevent an array overflow. * tag 'iio-fixes-for-5.17a' of https://git.kernel.org/pub/scm/linux/kernel/git/jic23/iio: iio: imu: st_lsm6dsx: wait for settling time in st_lsm6dsx_read_oneshot iio: Fix error handling for PM iio: addac: ad74413r: correct comparator gpio getters mask usage iio: addac: ad74413r: use ngpio size when iterating over mask iio: addac: ad74413r: Do not reference negative array offsets iio: adc: men_z188_adc: Fix a resource leak in an error handling path iio: frequency: admv1013: remove the always true condition iio: accel: fxls8962af: add padding to regmap for SPI iio:imu:adis16480: fix buffering for devices with no burst mode iio: adc: ad7124: fix mask used for setting AIN_BUFP & AIN_BUFM bits iio: adc: tsc2046: fix memory corruption by preventing array overflow
2022-02-21irqchip/gic-v3: Use dsb(ishst) to order writes with ICC_SGI1R_EL1 accessesBarry Song
A dsb(ishst) barrier should be enough to order previous writes with the system register generating the SGI, as we only need to guarantee the visibility of data to other CPUs in the inner shareable domain before we send the SGI. A micro-benchmark is written to verify the performance impact on kunpeng920 machine with 2 sockets, each socket has 2 dies, and each die has 24 CPUs, so totally the system has 2 * 2 * 24 = 96 CPUs. ~2% performance improvement can be seen by this benchmark. The code of benchmark module: #include <linux/module.h> #include <linux/timekeeping.h> volatile int data0 ____cacheline_aligned; volatile int data1 ____cacheline_aligned; volatile int data2 ____cacheline_aligned; volatile int data3 ____cacheline_aligned; volatile int data4 ____cacheline_aligned; volatile int data5 ____cacheline_aligned; volatile int data6 ____cacheline_aligned; static void ipi_latency_func(void *val) { } static int __init ipi_latency_init(void) { ktime_t stime, etime, delta; int cpu, i; int start = smp_processor_id(); stime = ktime_get(); for ( i = 0; i < 1000; i++) for (cpu = 0; cpu < 96; cpu++) { data0 = data1 = data2 = data3 = data4 = data5 = data6 = cpu; smp_call_function_single(cpu, ipi_latency_func, NULL, 1); } etime = ktime_get(); delta = ktime_sub(etime, stime); printk("%s ipi from cpu%d to cpu0-95 delta of 1000times:%lld\n", __func__, start, delta); return 0; } module_init(ipi_latency_init); static void ipi_latency_exit(void) { } module_exit(ipi_latency_exit); MODULE_DESCRIPTION("IPI benchmark"); MODULE_LICENSE("GPL"); run the below commands 10 times on both Vanilla and the kernel with this patch: # taskset -c 0 insmod test.ko # rmmod test The result on vanilla: ipi_latency_init ipi from cpu0 to cpu0-95 delta of 1000times:126757449 ipi_latency_init ipi from cpu0 to cpu0-95 delta of 1000times:126784249 ipi_latency_init ipi from cpu0 to cpu0-95 delta of 1000times:126177703 ipi_latency_init ipi from cpu0 to cpu0-95 delta of 1000times:127022281 ipi_latency_init ipi from cpu0 to cpu0-95 delta of 1000times:126184883 ipi_latency_init ipi from cpu0 to cpu0-95 delta of 1000times:127374585 ipi_latency_init ipi from cpu0 to cpu0-95 delta of 1000times:125778089 ipi_latency_init ipi from cpu0 to cpu0-95 delta of 1000times:126974441 ipi_latency_init ipi from cpu0 to cpu0-95 delta of 1000times:127357625 ipi_latency_init ipi from cpu0 to cpu0-95 delta of 1000times:126228184 The result on the kernel with this patch: ipi_latency_init ipi from cpu0 to cpu0-95 delta of 1000times:124467401 ipi_latency_init ipi from cpu0 to cpu0-95 delta of 1000times:123474209 ipi_latency_init ipi from cpu0 to cpu0-95 delta of 1000times:123558497 ipi_latency_init ipi from cpu0 to cpu0-95 delta of 1000times:122993951 ipi_latency_init ipi from cpu0 to cpu0-95 delta of 1000times:122984223 ipi_latency_init ipi from cpu0 to cpu0-95 delta of 1000times:123323609 ipi_latency_init ipi from cpu0 to cpu0-95 delta of 1000times:124507583 ipi_latency_init ipi from cpu0 to cpu0-95 delta of 1000times:123386963 ipi_latency_init ipi from cpu0 to cpu0-95 delta of 1000times:123340664 ipi_latency_init ipi from cpu0 to cpu0-95 delta of 1000times:123285324 Signed-off-by: Barry Song <song.bao.hua@hisilicon.com> [maz: tidied up commit message] Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20220220061910.6155-1-21cnbao@gmail.com
2022-02-21random: do not xor RDRAND when writing into /dev/randomJason A. Donenfeld
Continuing the reasoning of "random: ensure early RDSEED goes through mixer on init", we don't want RDRAND interacting with anything without going through the mixer function, as a backdoored CPU could presumably cancel out data during an xor, which it'd have a harder time doing when being forced through a cryptographic hash function. There's actually no need at all to be calling RDRAND in write_pool(), because before we extract from the pool, we always do so with 32 bytes of RDSEED hashed in at that stage. Xoring at this stage is needless and introduces a minor liability. Cc: Theodore Ts'o <tytso@mit.edu> Reviewed-by: Dominik Brodowski <linux@dominikbrodowski.net> Reviewed-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2022-02-21random: ensure early RDSEED goes through mixer on initJason A. Donenfeld
Continuing the reasoning of "random: use RDSEED instead of RDRAND in entropy extraction" from this series, at init time we also don't want to be xoring RDSEED directly into the crng. Instead it's safer to put it into our entropy collector and then re-extract it, so that it goes through a hash function with preimage resistance. As a matter of hygiene, we also order these now so that the RDSEED byte are hashed in first, followed by the bytes that are likely more predictable (e.g. utsname()). Cc: Theodore Ts'o <tytso@mit.edu> Reviewed-by: Dominik Brodowski <linux@dominikbrodowski.net> Reviewed-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2022-02-21random: inline leaves of rand_initialize()Jason A. Donenfeld
This is a preparatory commit for the following one. We simply inline the various functions that rand_initialize() calls that have no other callers. The compiler was doing this anyway before. Doing this will allow us to reorganize this after. We can then move the trust_cpu and parse_trust_cpu definitions a bit closer to where they're actually used, which makes the code easier to read. Cc: Theodore Ts'o <tytso@mit.edu> Reviewed-by: Dominik Brodowski <linux@dominikbrodowski.net> Reviewed-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2022-02-21random: get rid of secondary crngsJason A. Donenfeld
As the comment said, this is indeed a "hack". Since it was introduced, it's been a constant state machine nightmare, with lots of subtle early boot issues and a wildly complex set of machinery to keep everything in sync. Rather than continuing to play whack-a-mole with this approach, this commit simply removes it entirely. This commit is preparation for "random: use simpler fast key erasure flow on per-cpu keys" in this series, which introduces a simpler (and faster) mechanism to accomplish the same thing. Cc: Theodore Ts'o <tytso@mit.edu> Reviewed-by: Eric Biggers <ebiggers@google.com> Reviewed-by: Dominik Brodowski <linux@dominikbrodowski.net> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2022-02-21random: use RDSEED instead of RDRAND in entropy extractionJason A. Donenfeld
When /dev/random was directly connected with entropy extraction, without any expansion stage, extract_buf() was called for every 10 bytes of data read from /dev/random. For that reason, RDRAND was used rather than RDSEED. At the same time, crng_reseed() was still only called every 5 minutes, so there RDSEED made sense. Those olden days were also a time when the entropy collector did not use a cryptographic hash function, which meant most bets were off in terms of real preimage resistance. For that reason too it didn't matter _that_ much whether RDSEED was mixed in before or after entropy extraction; both choices were sort of bad. But now we have a cryptographic hash function at work, and with that we get real preimage resistance. We also now only call extract_entropy() every 5 minutes, rather than every 10 bytes. This allows us to do two important things. First, we can switch to using RDSEED in extract_entropy(), as Dominik suggested. Second, we can ensure that RDSEED input always goes into the cryptographic hash function with other things before being used directly. This eliminates a category of attacks in which the CPU knows the current state of the crng and knows that we're going to xor RDSEED into it, and so it computes a malicious RDSEED. By going through our hash function, it would require the CPU to compute a preimage on the fly, which isn't going to happen. Cc: Theodore Ts'o <tytso@mit.edu> Reviewed-by: Eric Biggers <ebiggers@google.com> Reviewed-by: Dominik Brodowski <linux@dominikbrodowski.net> Suggested-by: Dominik Brodowski <linux@dominikbrodowski.net> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2022-02-21random: fix locking in crng_fast_load()Dominik Brodowski
crng_init is protected by primary_crng->lock, so keep holding that lock when incrementing crng_init from 0 to 1 in crng_fast_load(). The call to pr_notice() can wait until the lock is released; this code path cannot be reached twice, as crng_fast_load() aborts early if crng_init > 0. Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net> Reviewed-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2022-02-21random: remove batched entropy lockingJason A. Donenfeld
Rather than use spinlocks to protect batched entropy, we can instead disable interrupts locally, since we're dealing with per-cpu data, and manage resets with a basic generation counter. At the same time, we can't quite do this on PREEMPT_RT, where we still want spinlocks-as- mutexes semantics. So we use a local_lock_t, which provides the right behavior for each. Because this is a per-cpu lock, that generation counter is still doing the necessary CPU-to-CPU communication. This should improve performance a bit. It will also fix the linked splat that Jonathan received with a PROVE_RAW_LOCK_NESTING=y. Reviewed-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Reviewed-by: Dominik Brodowski <linux@dominikbrodowski.net> Reviewed-by: Eric Biggers <ebiggers@google.com> Suggested-by: Andy Lutomirski <luto@kernel.org> Reported-by: Jonathan Neuschäfer <j.neuschaefer@gmx.net> Tested-by: Jonathan Neuschäfer <j.neuschaefer@gmx.net> Link: https://lore.kernel.org/lkml/YfMa0QgsjCVdRAvJ@latitude/ Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2022-02-21random: remove use_input_pool parameter from crng_reseed()Eric Biggers
The primary_crng is always reseeded from the input_pool, while the NUMA crngs are always reseeded from the primary_crng. Remove the redundant 'use_input_pool' parameter from crng_reseed() and just directly check whether the crng is the primary_crng. Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2022-02-21random: make credit_entropy_bits() always safeJason A. Donenfeld
This is called from various hwgenerator drivers, so rather than having one "safe" version for userspace and one "unsafe" version for the kernel, just make everything safe; the checks are cheap and sensible to have anyway. Reported-by: Sultan Alsawaf <sultan@kerneltoast.com> Reviewed-by: Eric Biggers <ebiggers@google.com> Reviewed-by: Dominik Brodowski <linux@dominikbrodowski.net> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2022-02-21random: always wake up entropy writers after extractionJason A. Donenfeld
Now that POOL_BITS == POOL_MIN_BITS, we must unconditionally wake up entropy writers after every extraction. Therefore there's no point of write_wakeup_threshold, so we can move it to the dustbin of unused compatibility sysctls. While we're at it, we can fix a small comparison where we were waking up after <= min rather than < min. Cc: Theodore Ts'o <tytso@mit.edu> Suggested-by: Eric Biggers <ebiggers@kernel.org> Reviewed-by: Eric Biggers <ebiggers@google.com> Reviewed-by: Dominik Brodowski <linux@dominikbrodowski.net> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2022-02-21random: use linear min-entropy accumulation creditingJason A. Donenfeld
30e37ec516ae ("random: account for entropy loss due to overwrites") assumed that adding new entropy to the LFSR pool probabilistically cancelled out old entropy there, so entropy was credited asymptotically, approximating Shannon entropy of independent sources (rather than a stronger min-entropy notion) using 1/8th fractional bits and replacing a constant 2-2/√𝑒 term (~0.786938) with 3/4 (0.75) to slightly underestimate it. This wasn't superb, but it was perhaps better than nothing, so that's what was done. Which entropy specifically was being cancelled out and how much precisely each time is hard to tell, though as I showed with the attack code in my previous commit, a motivated adversary with sufficient information can actually cancel out everything. Since we're no longer using an LFSR for entropy accumulation, this probabilistic cancellation is no longer relevant. Rather, we're now using a computational hash function as the accumulator and we've switched to working in the random oracle model, from which we can now revisit the question of min-entropy accumulation, which is done in detail in <https://eprint.iacr.org/2019/198>. Consider a long input bit string that is built by concatenating various smaller independent input bit strings. Each one of these inputs has a designated min-entropy, which is what we're passing to credit_entropy_bits(h). When we pass the concatenation of these to a random oracle, it means that an adversary trying to receive back the same reply as us would need to become certain about each part of the concatenated bit string we passed in, which means becoming certain about all of those h values. That means we can estimate the accumulation by simply adding up the h values in calls to credit_entropy_bits(h); there's no probabilistic cancellation at play like there was said to be for the LFSR. Incidentally, this is also what other entropy accumulators based on computational hash functions do as well. So this commit replaces credit_entropy_bits(h) with essentially `total = min(POOL_BITS, total + h)`, done with a cmpxchg loop as before. What if we're wrong and the above is nonsense? It's not, but let's assume we don't want the actual _behavior_ of the code to change much. Currently that behavior is not extracting from the input pool until it has 128 bits of entropy in it. With the old algorithm, we'd hit that magic 128 number after roughly 256 calls to credit_entropy_bits(1). So, we can retain more or less the old behavior by waiting to extract from the input pool until it hits 256 bits of entropy using the new code. For people concerned about this change, it means that there's not that much practical behavioral change. And for folks actually trying to model the behavior rigorously, it means that we have an even higher margin against attacks. Cc: Theodore Ts'o <tytso@mit.edu> Cc: Dominik Brodowski <linux@dominikbrodowski.net> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Reviewed-by: Eric Biggers <ebiggers@google.com> Reviewed-by: Jean-Philippe Aumasson <jeanphilippe.aumasson@gmail.com> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2022-02-21random: simplify entropy debitingJason A. Donenfeld
Our pool is 256 bits, and we only ever use all of it or don't use it at all, which is decided by whether or not it has at least 128 bits in it. So we can drastically simplify the accounting and cmpxchg loop to do exactly this. While we're at it, we move the minimum bit size into a constant so it can be shared between the two places where it matters. The reason we want any of this is for the case in which an attacker has compromised the current state, and then bruteforces small amounts of entropy added to it. By demanding a particular minimum amount of entropy be present before reseeding, we make that bruteforcing difficult. Note that this rationale no longer includes anything about /dev/random blocking at the right moment, since /dev/random no longer blocks (except for at ~boot), but rather uses the crng. In a former life, /dev/random was different and therefore required a more nuanced account(), but this is no longer. Behaviorally, nothing changes here. This is just a simplification of the code. Cc: Theodore Ts'o <tytso@mit.edu> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Reviewed-by: Eric Biggers <ebiggers@google.com> Reviewed-by: Dominik Brodowski <linux@dominikbrodowski.net> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2022-02-21random: use computational hash for entropy extractionJason A. Donenfeld
The current 4096-bit LFSR used for entropy collection had a few desirable attributes for the context in which it was created. For example, the state was huge, which meant that /dev/random would be able to output quite a bit of accumulated entropy before blocking. It was also, in its time, quite fast at accumulating entropy byte-by-byte, which matters given the varying contexts in which mix_pool_bytes() is called. And its diffusion was relatively high, which meant that changes would ripple across several words of state rather quickly. However, it also suffers from a few security vulnerabilities. In particular, inputs learned by an attacker can be undone, but moreover, if the state of the pool leaks, its contents can be controlled and entirely zeroed out. I've demonstrated this attack with this SMT2 script, <https://xn--4db.cc/5o9xO8pb>, which Boolector/CaDiCal solves in a matter of seconds on a single core of my laptop, resulting in little proof of concept C demonstrators such as <https://xn--4db.cc/jCkvvIaH/c>. For basically all recent formal models of RNGs, these attacks represent a significant cryptographic flaw. But how does this manifest practically? If an attacker has access to the system to such a degree that he can learn the internal state of the RNG, arguably there are other lower hanging vulnerabilities -- side-channel, infoleak, or otherwise -- that might have higher priority. On the other hand, seed files are frequently used on systems that have a hard time generating much entropy on their own, and these seed files, being files, often leak or are duplicated and distributed accidentally, or are even seeded over the Internet intentionally, where their contents might be recorded or tampered with. Seen this way, an otherwise quasi-implausible vulnerability is a bit more practical than initially thought. Another aspect of the current mix_pool_bytes() function is that, while its performance was arguably competitive for the time in which it was created, it's no longer considered so. This patch improves performance significantly: on a high-end CPU, an i7-11850H, it improves performance of mix_pool_bytes() by 225%, and on a low-end CPU, a Cortex-A7, it improves performance by 103%. This commit replaces the LFSR of mix_pool_bytes() with a straight- forward cryptographic hash function, BLAKE2s, which is already in use for pool extraction. Universal hashing with a secret seed was considered too, something along the lines of <https://eprint.iacr.org/2013/338>, but the requirement for a secret seed makes for a chicken & egg problem. Instead we go with a formally proven scheme using a computational hash function, described in sections 5.1, 6.4, and B.1.8 of <https://eprint.iacr.org/2019/198>. BLAKE2s outputs 256 bits, which should give us an appropriate amount of min-entropy accumulation, and a wide enough margin of collision resistance against active attacks. mix_pool_bytes() becomes a simple call to blake2s_update(), for accumulation, while the extraction step becomes a blake2s_final() to generate a seed, with which we can then do a HKDF-like or BLAKE2X-like expansion, the first part of which we fold back as an init key for subsequent blake2s_update()s, and the rest we produce to the caller. This then is provided to our CRNG like usual. In that expansion step, we make opportunistic use of 32 bytes of RDRAND output, just as before. We also always reseed the crng with 32 bytes, unconditionally, or not at all, rather than sometimes with 16 as before, as we don't win anything by limiting beyond the 16 byte threshold. Going for a hash function as an entropy collector is a conservative, proven approach. The result of all this is a much simpler and much less bespoke construction than what's there now, which not only plugs a vulnerability but also improves performance considerably. Cc: Theodore Ts'o <tytso@mit.edu> Cc: Dominik Brodowski <linux@dominikbrodowski.net> Reviewed-by: Eric Biggers <ebiggers@google.com> Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Reviewed-by: Jean-Philippe Aumasson <jeanphilippe.aumasson@gmail.com> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2022-02-21spi: Stacked/parallel memories bindingsMark Brown
Merge series from Miquel Raynal <miquel.raynal@bootlin.com>: DT bindings for stacked and parallel flash/memory devices. base-commit: e783362eb54cd99b2cac8b3a9aeac942e6f6ac07
2022-02-21lib/iov_iter: initialize "flags" in new pipe_bufferMax Kellermann
The functions copy_page_to_iter_pipe() and push_pipe() can both allocate a new pipe_buffer, but the "flags" member initializer is missing. Fixes: 241699cd72a8 ("new iov_iter flavour: pipe-backed") To: Alexander Viro <viro@zeniv.linux.org.uk> To: linux-fsdevel@vger.kernel.org To: linux-kernel@vger.kernel.org Cc: stable@vger.kernel.org Signed-off-by: Max Kellermann <max.kellermann@ionos.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2022-02-21ARM: 9178/1: fix unmet dependency on BITREVERSE for HAVE_ARCH_BITREVERSEJulian Braha
Resending this to properly add it to the patch tracker - thanks for letting me know, Arnd :) When ARM is enabled, and BITREVERSE is disabled, Kbuild gives the following warning: WARNING: unmet direct dependencies detected for HAVE_ARCH_BITREVERSE Depends on [n]: BITREVERSE [=n] Selected by [y]: - ARM [=y] && (CPU_32v7M [=n] || CPU_32v7 [=y]) && !CPU_32v6 [=n] This is because ARM selects HAVE_ARCH_BITREVERSE without selecting BITREVERSE, despite HAVE_ARCH_BITREVERSE depending on BITREVERSE. This unmet dependency bug was found by Kismet, a static analysis tool for Kconfig. Please advise if this is not the appropriate solution. Signed-off-by: Julian Braha <julianbraha@gmail.com> Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
2022-02-21ARM: Fix kgdb breakpoint for Thumb2Russell King (Oracle)
The kgdb code needs to register an undef hook for the Thumb UDF instruction that will fault in order to be functional on Thumb2 platforms. Reported-by: Johannes Stezenbach <js@sig21.net> Tested-by: Johannes Stezenbach <js@sig21.net> Fixes: 5cbad0ebf45c ("kgdb: support for ARCH=arm") Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
2022-02-21netfilter: nft_limit: fix stateful object memory leakFlorian Westphal
We need to provide a destroy callback to release the extra fields. Fixes: 3b9e2ea6c11b ("netfilter: nft_limit: move stateful fields out of expression data") Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2022-02-21netfilter: nf_tables: unregister flowtable hooks on netns exitPablo Neira Ayuso
Unregister flowtable hooks before they are releases via nf_tables_flowtable_destroy() otherwise hook core reports UAF. BUG: KASAN: use-after-free in nf_hook_entries_grow+0x5a7/0x700 net/netfilter/core.c:142 net/netfilter/core.c:142 Read of size 4 at addr ffff8880736f7438 by task syz-executor579/3666 CPU: 0 PID: 3666 Comm: syz-executor579 Not tainted 5.16.0-rc5-syzkaller #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Call Trace: <TASK> __dump_stack lib/dump_stack.c:88 [inline] __dump_stack lib/dump_stack.c:88 [inline] lib/dump_stack.c:106 dump_stack_lvl+0x1dc/0x2d8 lib/dump_stack.c:106 lib/dump_stack.c:106 print_address_description+0x65/0x380 mm/kasan/report.c:247 mm/kasan/report.c:247 __kasan_report mm/kasan/report.c:433 [inline] __kasan_report mm/kasan/report.c:433 [inline] mm/kasan/report.c:450 kasan_report+0x19a/0x1f0 mm/kasan/report.c:450 mm/kasan/report.c:450 nf_hook_entries_grow+0x5a7/0x700 net/netfilter/core.c:142 net/netfilter/core.c:142 __nf_register_net_hook+0x27e/0x8d0 net/netfilter/core.c:429 net/netfilter/core.c:429 nf_register_net_hook+0xaa/0x180 net/netfilter/core.c:571 net/netfilter/core.c:571 nft_register_flowtable_net_hooks+0x3c5/0x730 net/netfilter/nf_tables_api.c:7232 net/netfilter/nf_tables_api.c:7232 nf_tables_newflowtable+0x2022/0x2cf0 net/netfilter/nf_tables_api.c:7430 net/netfilter/nf_tables_api.c:7430 nfnetlink_rcv_batch net/netfilter/nfnetlink.c:513 [inline] nfnetlink_rcv_skb_batch net/netfilter/nfnetlink.c:634 [inline] nfnetlink_rcv_batch net/netfilter/nfnetlink.c:513 [inline] net/netfilter/nfnetlink.c:652 nfnetlink_rcv_skb_batch net/netfilter/nfnetlink.c:634 [inline] net/netfilter/nfnetlink.c:652 nfnetlink_rcv+0x10e6/0x2550 net/netfilter/nfnetlink.c:652 net/netfilter/nfnetlink.c:652 __nft_release_hook() calls nft_unregister_flowtable_net_hooks() which only unregisters the hooks, then after RCU grace period, it is guaranteed that no packets add new entries to the flowtable (no flow offload rules and flowtable hooks are reachable from packet path), so it is safe to call nf_flow_table_free() which cleans up the remaining entries from the flowtable (both software and hardware) and it unbinds the flow_block. Fixes: ff4bf2f42a40 ("netfilter: nf_tables: add nft_unregister_flowtable_hook()") Reported-by: syzbot+e918523f77e62790d6d9@syzkaller.appspotmail.com Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>