linux.git - Linus' kernel tree

Age	Commit message (Collapse)	Author
2022-05-19	random: move randomize_page() into mm where it belongs	Jason A. Donenfeld
	randomize_page is an mm function. It is documented like one. It contains the history of one. It has the naming convention of one. It looks just like another very similar function in mm, randomize_stack_top(). And it has always been maintained and updated by mm people. There is no need for it to be in random.c. In the "which shape does not look like the other ones" test, pointing to randomize_page() is correct. So move randomize_page() into mm/util.c, right next to the similar randomize_stack_top() function. This commit contains no actual code changes. Cc: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2022-05-19	random: remove mostly unused async readiness notifier	Jason A. Donenfeld
	The register_random_ready_notifier() notifier is somewhat complicated, and was already recently rewritten to use notifier blocks. It is only used now by one consumer in the kernel, vsprintf.c, for which the async mechanism is really overly complex for what it actually needs. This commit removes register_random_ready_notifier() and unregister_random_ ready_notifier(), because it just adds complication with little utility, and changes vsprintf.c to just check on `!rng_is_initialized() && !rng_has_arch_random()`, which will eventually be true. Performance- wise, that code was already using a static branch, so there's basically no overhead at all to this change. Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Sergey Senozhatsky <senozhatsky@chromium.org> Acked-by: Petr Mladek <pmladek@suse.com> # for vsprintf.c Reviewed-by: Petr Mladek <pmladek@suse.com> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2022-05-19	random: remove get_random_bytes_arch() and add rng_has_arch_random()	Jason A. Donenfeld
	The RNG incorporates RDRAND into its state at boot and every time it reseeds, so there's no reason for callers to use it directly. The hashing that the RNG does on it is preferable to using the bytes raw. The only current use case of get_random_bytes_arch() is vsprintf's siphash key for pointer hashing, which uses it to initialize the pointer secret earlier than usual if RDRAND is available. In order to replace this narrow use case, just expose whether RDRAND is mixed into the RNG, with a new function called rng_has_arch_random(). With that taken care of, there are no users of get_random_bytes_arch() left, so it can be removed. Later, if trust_cpu gets turned on by default (as most distros are doing), this one use of rng_has_arch_random() can probably go away as well. Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Sergey Senozhatsky <senozhatsky@chromium.org> Acked-by: Petr Mladek <pmladek@suse.com> # for vsprintf.c Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2022-05-19	random: make consistent use of buf and len	Jason A. Donenfeld
	The current code was a mix of "nbytes", "count", "size", "buffer", "in", and so forth. Instead, let's clean this up by naming input parameters "buf" (or "ubuf") and "len", so that you always understand that you're reading this variety of function argument. Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2022-05-19	random: use proper return types on get_random_{int,long}_wait()	Jason A. Donenfeld
	Before these were returning signed values, but the API is intended to be used with unsigned values. Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2022-05-19	random: remove extern from functions in header	Jason A. Donenfeld
	Accoriding to the kernel style guide, having `extern` on functions in headers is old school and deprecated, and doesn't add anything. So remove them from random.h, and tidy up the file a little bit too. Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2022-05-19	Merge tag 'iio-for-5.19a' of ↵	Greg Kroah-Hartman
	https://git.kernel.org/pub/scm/linux/kernel/git/jic23/iio into char-misc-next Jonathan writes: First set of IIO new device support, features and cleanup for 5.19 Usual mixed bag. Stand out this time is Andy Shevchenko's continuing effort to move drivers over the generic firmware interfaces. Device support * sprd,sc2720 - upm9620 binding addition. - Refactor and support for sc2720, sc2721 and sc2730. * ti,ads1015 - Refactor driver and add support for TLA2024. Device support (IDs only) * invensense,mpu6050 - Add ID for ICM-20608-D. * st,accel: - Add ID for lis302dl. * st,lsm6dsx - Add support for ASM330LHHX (can fallback to LSM6DSR.) Features * convert drivers to device properties - IIO core - adi,ad7266 - adi,adis16480 - adi,adxl355 - bosch,bmi160 - domintech,dmard06 - fsl,fxas21002c - invensense,mpu3050 - linear,ltc2983 - linear,ltc2632 - maxbotix,mb1232 - maxim,max31856 - maxim,max31865 - multiplexer - ping - rescale - taos,tsl2772 * core - Add runtime check on whether realbits fit in storagebits for each channel. * adi,ad_sigma_delta - Add sequencer support and relevant update_scan_mode callbacks for adi,ad7192 and adi,ad7124. Cleanup and minor fixes * MAINTAINERS - Update Lorenzo Bianconi's email address for IIO drivers. - Add entry for ad3552r and update maintainer in dt-binding doc. * tree-wide - Replace strtobool() with kstrtobool(). - Drop false OF dependencies. * core - Tidy up and document IIO modes. - Take iio_buffer_enabled() out of header allowing current_mode to be moved to the opaque structure. - As all kfifo buffers use the same mode value, drop that parameter and set it unconditionally. - White space fixes and similar. - Drop use of list iterator variable for list_for_each_entry_continue_reverse and use list_prepare_entry to restart. * sysfs-trigger - Replace use of 'found' variable with dedicate list iterator variable. * adi,ad7124 - Drop misleading shift. * adi,ad2s1210 - Remove redundant local variable assignment. * adi,adis16480 - Use local device pointer to reduce repetition. - Improve handling of clocks. * domintech,dmard09 - White space. * dummy driver - Improve error handling. * fsl,mma8452 - Add missing documentation of name element. * invensense,mpu3050 - Stop remove() returning non 0. * kionix,kxsd9 - White space. * linear,ltc2688 - Use local variable for struct device. - Combine of_node_put() error handling paths. * linear,ltc2983 - Avoid use of constants in messages where a define is available. * microchip,mcp4131 - Fix compatible in dt example. * pni,rm3100 - Stop directly accessing iio_dev->current_mode just to find out if the buffer is enabled. * renesas,rzg2l - Relax kconfig constraint to include newer devices. * sprd,sc27xx - Fix wrong scaling mask. - Improve the calibration values. * samsung,ssp - Replace a 'found' variable in favor of an explicit value that was found. * sensortek,stk3xx - Add proximity-near-level binding and driver support. * st,st_sensors: - Drop unused accel_type enum. - Return early in _write_raw() - Drop unnecessary locking in _avail functions. - Add local lock to protect odr against concurrent updates allowing mlock to no longer be used outside of the core. - Use iio_device_claim_direct_mode() rather than racy checking of the current mode. st,stmpe-adc - Fix checks on wait_for_completion_timeout(). - Allow use of of_device_id for matching. * st,stm32-dfsdm - Stop accessing iio_dev->current_mode to find out if the buffer is enabled (so we can hide that variable in the opaque structure) * st,vl53l0x - Fix checks on wait_for_completion_timeout. * ti,ads1015 - Add missing ID for ti,ads1115 in binding doc. - Convert from repeated chip ID look up to selecting static const data. - Switch to read_avail() callback. * ti,ads8688 - Use of_device_id for driver matching. * ti,palmas-adc - Drop a warning on minor calibration mismatch leading to slightly negative values after applying the calibration. * tag 'iio-for-5.19a' of https://git.kernel.org/pub/scm/linux/kernel/git/jic23/iio: (95 commits) iio: ti-ads8688: use of_device_id for OF matching iio: stmpe-adc: use of_device_id for OF matching dt-bindings: iio: Fix incorrect compatible strings in examples iio: gyro: mpu3050: Make mpu3050_common_remove() return void iio: dac: ltc2632: Make use of device properties iio: temperature: max31865: Make use of device properties iio: proximity: mb1232: Switch to use fwnode_irq_get() iio: imu: adis16480: Improve getting the optional clocks iio: imu: adis16480: Use temporary variable for struct device iio: imu: adis16480: Make use of device properties staging: iio: ad2s1210: remove redundant assignment to variable negative iio: adc: sc27xx: add support for PMIC sc2730 iio: adc: sc27xx: add support for PMIC sc2720 and sc2721 iio: adc: sc27xx: refactor some functions for support more PMiCs iio: adc: sc27xx: structure adjustment and optimization iio: adc: sc27xx: Fine tune the scale calibration values iio: adc: sc27xx: fix read big scale voltage not right dt-bindings:iio:adc: add sprd,ump9620-adc dt-binding iio: proximity: stk3310: Export near level property for proximity sensor dt-bindings: iio: light: stk33xx: Add proximity-near-level ...
2022-05-19	thermal/drivers/thermal_of: Add change_mode ops support for thermal_of sensor	Manaf Meethalavalappu Pallikunhi
	The sensor driver which register through thermal_of interface doesn't have an option to get thermal zone mode change notification from thermal core. Add support for change_mode ops in thermal_of interface so that sensor driver can use this ops for mode change notification. Signed-off-by: Manaf Meethalavalappu Pallikunhi <quic_manafm@quicinc.com> Link: https://lore.kernel.org/r/1646767586-31908-1-git-send-email-quic_manafm@quicinc.com Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
2022-05-19	iio: adc: qcom-vadc-common: add reverse scaling for PMIC5 Gen2 ADC_TM	Jishnu Prakash
	Add reverse scaling function for PMIC5 Gen2 ADC_TM, to convert temperature to raw ADC code, for setting thresholds for thermistor channels. Signed-off-by: Jishnu Prakash <quic_jprakash@quicinc.com> Acked-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Link: https://lore.kernel.org/r/1648991869-20899-3-git-send-email-quic_jprakash@quicinc.com Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
2022-05-19	riscv/efi_stub: Add support for RISCV_EFI_BOOT_PROTOCOL	Sunil V L
	Add support for getting the boot hart ID from the Linux EFI stub using RISCV_EFI_BOOT_PROTOCOL. This method is preferred over the existing DT based approach since it works irrespective of DT or ACPI. The specification of the protocol is hosted at: https://github.com/riscv-non-isa/riscv-uefi Signed-off-by: Sunil V L <sunilvl@ventanamicro.com> Acked-by: Palmer Dabbelt <palmer@rivosinc.com> Reviewed-by: Heinrich Schuchardt <heinrich.schuchardt@canonical.com> Link: https://lore.kernel.org/r/20220519051512.136724-2-sunilvl@ventanamicro.com [ardb: minor tweaks for coding style and whitespace] Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
2022-05-19	Merge tag 'amd-drm-next-5.19-2022-05-18' of ↵	Dave Airlie
	https://gitlab.freedesktop.org/agd5f/linux into drm-next amd-drm-next-5.19-2022-05-18: amdgpu: - Misc code cleanups - Additional SMU 13.x enablement - Smartshift fixes - GFX11 fixes - Support for SMU 13.0.4 - SMU mutex fix - Suspend/resume fix amdkfd: - static checker fix - Doorbell/MMIO resource handling fix Signed-off-by: Dave Airlie <airlied@redhat.com> From: Alex Deucher <alexander.deucher@amd.com> Link: https://patchwork.freedesktop.org/patch/msgid/20220518205621.5741-1-alexander.deucher@amd.com
2022-05-18	libceph: fix potential use-after-free on linger ping and resends	Ilya Dryomov
	request_reinit() is not only ugly as the comment rightfully suggests, but also unsafe. Even though it is called with osdc->lock held for write in all cases, resetting the OSD request refcount can still race with handle_reply() and result in use-after-free. Taking linger ping as an example: handle_timeout thread handle_reply thread down_read(&osdc->lock) req = lookup_request(...) ... finish_request(req) # unregisters up_read(&osdc->lock) __complete_request(req) linger_ping_cb(req) # req->r_kref == 2 because handle_reply still holds its ref down_write(&osdc->lock) send_linger_ping(lreq) req = lreq->ping_req # same req # cancel_linger_request is NOT # called - handle_reply already # unregistered request_reinit(req) WARN_ON(req->r_kref != 1) # fires request_init(req) kref_init(req->r_kref) # req->r_kref == 1 after kref_init ceph_osdc_put_request(req) kref_put(req->r_kref) # req->r_kref == 0 after kref_put, req is freed <further req initialization/use> !!! This happens because send_linger_ping() always (re)uses the same OSD request for watch ping requests, relying on cancel_linger_request() to unregister it from the OSD client and rip its messages out from the messenger. send_linger() does the same for watch/notify registration and watch reconnect requests. Unfortunately cancel_request() doesn't guarantee that after it returns the OSD client would be completely done with the OSD request -- a ref could still be held and the callback (if specified) could still be invoked too. The original motivation for request_reinit() was inability to deal with allocation failures in send_linger() and send_linger_ping(). Switching to using osdc->req_mempool (currently only used by CephFS) respects that and allows us to get rid of request_reinit(). Cc: stable@vger.kernel.org Signed-off-by: Ilya Dryomov <idryomov@gmail.com> Reviewed-by: Xiubo Li <xiubli@redhat.com> Acked-by: Jeff Layton <jlayton@kernel.org>
2022-05-18	Merge tag 'devfreq-next-for-5.19' of ↵	Rafael J. Wysocki
	git://git.kernel.org/pub/scm/linux/kernel/git/chanwoo/linux Pull devfreq changes for 5.19-rc1 from Chanwoo Choi: "1. Update devfreq core - Add cpu based scaling support to passive governor. Some device like cache might require the dynamic frequency scaling. But, it has very tightly to cpu frequency. So that use passive governor to scale the frequency according to current cpu frequency. To decide the frequency of the device, the governor does one of the following: : Derives the optimal devfreq device opp from required-opps property of the parent cpu opp_table. : Scales the device frequency in proportion to the CPU frequency. So, if the CPUs are running at their max frequency, the device runs at its max frequency. If the CPUs are running at their min frequency, the device runs at its min frequency. It is interpolated for frequencies in between. 2. Update devfreq drivers - Update rk3399_dmc.c as following: : Convert dt-binding document to YAML and deprecate unused properties. : Use Hz units for the device-tree properties of rk3399_dmc. : rk3399_dmc is able to set the idle time before changing the dmc clock. Specify idle time parameters by using nano-second unit on dt bidning. : Add new disable-freq properties to optimize the power-saving feature of rk3399_dmc. : Disable devfreq-event device on remove() to fix unbalanced enable-disable count. : Use devm_pm_opp_of_add_table() : Block PMU (Power-Management Unit) transitions when scaling frequency by ARM Trust Firmware in order to fix the conflict between PMU and DMC (Dynamic Memory Controller)." * tag 'devfreq-next-for-5.19' of git://git.kernel.org/pub/scm/linux/kernel/git/chanwoo/linux: PM / devfreq: passive: Keep cpufreq_policy for possible cpus PM / devfreq: passive: Reduce duplicate code when passive_devfreq case PM / devfreq: Add cpu based scaling support to passive governor PM / devfreq: Export devfreq_get_freq_range symbol within devfreq PM / devfreq: rk3399_dmc: Block PMU during transitions soc: rockchip: power-domain: Manage resource conflicts with firmware PM / devfreq: rk3399_dmc: Avoid static (reused) profile PM / devfreq: rk3399_dmc: Use devm_pm_opp_of_add_table() PM / devfreq: rk3399_dmc: Disable edev on remove() PM / devfreq: rk3399_dmc: Support new *-ns properties PM / devfreq: rk3399_dmc: Support new disable-freq properties PM / devfreq: rk3399_dmc: Use bitfield macro definitions for ODT_PD PM / devfreq: rk3399_dmc: Drop excess timing properties PM / devfreq: rk3399_dmc: Drop undocumented ondemand DT props dt-bindings: devfreq: rk3399_dmc: Add more disable-freq properties dt-bindings: devfreq: rk3399_dmc: Specify idle params in nanoseconds dt-bindings: devfreq: rk3399_dmc: Fix Hz units dt-bindings: devfreq: rk3399_dmc: Deprecate unused/redundant properties dt-bindings: devfreq: rk3399_dmc: Convert to YAML
2022-05-18	nvme: add support for TP4084 - Time-to-Ready Enhancements	Christoph Hellwig
	Add support for using longer timeouts during controller initialization and letting the controller come up with namespaces that are not ready for I/O yet. We skip these not ready namespaces during scanning and only bring them online once anoter scan is kicked off by the AEN that is set when the NRDY bit gets set in the I/O Command Set Independent Identify Namespace Data Structure. This asynchronous probing avoids blocking the kernel boot when controllers take a very long time to recover after unclean shutdowns (up to minutes). Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Keith Busch <kbusch@kernel.org> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Reviewed-by: Hannes Reinecke <hare@suse.de>
2022-05-18	random: handle latent entropy and command line from random_init()	Jason A. Donenfeld
	Currently, start_kernel() adds latent entropy and the command line to the entropy bool after the RNG has been initialized, deferring when it's actually used by things like stack canaries until the next time the pool is seeded. This surely is not intended. Rather than splitting up which entropy gets added where and when between start_kernel() and random_init(), just do everything in random_init(), which should eliminate these kinds of bugs in the future. While we're at it, rename the awkwardly titled "rand_initialize()" to the more standard "random_init()" nomenclature. Reviewed-by: Dominik Brodowski <linux@dominikbrodowski.net> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2022-05-18	random32: use real rng for non-deterministic randomness	Jason A. Donenfeld
	random32.c has two random number generators in it: one that is meant to be used deterministically, with some predefined seed, and one that does the same exact thing as random.c, except does it poorly. The first one has some use cases. The second one no longer does and can be replaced with calls to random.c's proper random number generator. The relatively recent siphash-based bad random32.c code was added in response to concerns that the prior random32.c was too deterministic. Out of fears that random.c was (at the time) too slow, this code was anonymously contributed. Then out of that emerged a kind of shadow entropy gathering system, with its own tentacles throughout various net code, added willy nilly. Stop👏making👏bespoke👏random👏number👏generators👏. Fortunately, recent advances in random.c mean that we can stop playing with this sketchiness, and just use get_random_u32(), which is now fast enough. In micro benchmarks using RDPMC, I'm seeing the same median cycle count between the two functions, with the mean being _slightly_ higher due to batches refilling (which we can optimize further need be). However, when doing real benchmarks of the net functions that actually use these random numbers, the mean cycles actually decreased slightly (with the median still staying the same), likely because the additional prandom code means icache misses and complexity, whereas random.c is generally already being used by something else nearby. The biggest benefit of this is that there are many users of prandom who probably should be using cryptographically secure random numbers. This makes all of those accidental cases become secure by just flipping a switch. Later on, we can do a tree-wide cleanup to remove the static inline wrapper functions that this commit adds. There are also some low-ish hanging fruits for making this even faster in the future: a get_random_u16() function for use in the networking stack will give a 2x performance boost there, using SIMD for ChaCha20 will let us compute 4 or 8 or 16 blocks of output in parallel, instead of just one, giving us large buffers for cheap, and introducing a get_random_*_bh() function that assumes irqs are already disabled will shave off a few cycles for ordinary calls. These are things we can chip away at down the road. Acked-by: Jakub Kicinski <kuba@kernel.org> Acked-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2022-05-18	siphash: use one source of truth for siphash permutations	Jason A. Donenfeld
	The SipHash family of permutations is currently used in three places: - siphash.c itself, used in the ordinary way it was intended. - random32.c, in a construction from an anonymous contributor. - random.c, as part of its fast_mix function. Each one of these places reinvents the wheel with the same C code, same rotation constants, and same symmetry-breaking constants. This commit tidies things up a bit by placing macros for the permutations and constants into siphash.h, where each of the three .c users can access them. It also leaves a note dissuading more users of them from emerging. Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2022-05-18	fsnotify: introduce mark type iterator	Amir Goldstein
	fsnotify_foreach_iter_mark_type() is used to reduce boilerplate code of iterating all marks of a specific group interested in an event by consulting the iterator report_mask. Use an open coded version of that iterator in fsnotify_iter_next() that collects all marks of the current iteration group without consulting the iterator report_mask. At the moment, the two iterator variants are the same, but this decoupling will allow us to exclude some of the group's marks from reporting the event, for example for event on child and inode marks on parent did not request to watch events on children. Fixes: 2f02fd3fa13e ("fanotify: fix ignore mask logic for events on child and on dir") Reported-by: Jan Kara <jack@suse.com> Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20220511190213.831646-2-amir73il@gmail.com
2022-05-18	clocksource/drivers/ixp4xx: Drop boardfile probe path	Linus Walleij
	The boardfiles for IXP4xx have been deleted. Delete all the quirks and code dealing with that boot path and rely solely on device tree boot. Signed-off-by: Linus Walleij <linus.walleij@linaro.org> Link: https://lore.kernel.org/r/20220406205505.2332821-1-linus.walleij@linaro.org Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
2022-05-17	net/mlx5: Support multiport eswitch mode	Eli Cohen
	Multiport eswitch mode is a LAG mode that allows to add rules that forward traffic to a specific physical port without being affected by LAG affinity configuration. This mode of operation is mutual exclusive with the other LAG modes used by multipath and bonding. To make the transition between the modes, we maintain a counter on the number of rules specifying one of the uplink representors as the target of mirred egress redirect action. An example of such rule would be: $ tc filter add dev enp8s0f0_0 prot all root flower dst_mac \ 00:11:22:33:44:55 action mirred egress redirect dev enp8s0f0 If the reference count just grows to one and LAG is not in use, we create the LAG in multiport eswitch mode. Other mode changes are not allowed while in this mode. When the reference count reaches zero, we destroy the LAG and let other modes be used if needed. logic also changed such that if forwarding to some uplink destination cannot be guaranteed, we fail the operation so the rule will eventually be in software and not in hardware. Signed-off-by: Eli Cohen <elic@nvidia.com> Reviewed-by: Mark Bloch <mbloch@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2022-05-17	net/mlx5: Inline db alloc API function	Tariq Toukan
	Take the wrapper version which picks default node into a header file. This reduces the number of exported functions. Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Reviewed-by: Moshe Shemesh <moshe@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2022-05-17	net/mlx5: Add last command failure syndrome to debugfs	Moshe Shemesh
	Add syndrome of last command failure per command type to debugfs to ease debugging of such failure. last_failed_syndrome - last command failed syndrome returned by FW. Signed-off-by: Moshe Shemesh <moshe@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2022-05-17	ptp: ptp_clockmatrix: Add PTP_CLK_REQ_EXTTS support	Min Li
	Use TOD_READ_SECONDARY for extts to keep TOD_READ_PRIMARY for gettime and settime exclusively. Before this change, TOD_READ_PRIMARY was used for both extts and gettime/settime, which would result in changing TOD read/write triggers between operations. Using TOD_READ_SECONDARY would make extts independent of gettime/settime operation Signed-off-by: Min Li <min.li.xe@renesas.com> Acked-by: Richard Cochran <richardcochran@gmail.com> Link: https://lore.kernel.org/r/1652712427-14703-1-git-send-email-min.li.xe@renesas.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-05-18	locking/atomic: Add generic try_cmpxchg64 support	Uros Bizjak
	Add generic support for try_cmpxchg64{,_acquire,_release,_relaxed} and their falbacks involving cmpxchg64. Signed-off-by: Uros Bizjak <ubizjak@gmail.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20220515184205.103089-2-ubizjak@gmail.com
2022-05-17	audit,io_uring,io-wq: call __audit_uring_exit for dummy contexts	Julian Orth
	Not calling the function for dummy contexts will cause the context to not be reset. During the next syscall, this will cause an error in __audit_syscall_entry: WARN_ON(context->context != AUDIT_CTX_UNUSED); WARN_ON(context->name_count); if (context->context != AUDIT_CTX_UNUSED \|\| context->name_count) { audit_panic("unrecoverable error in audit_syscall_entry()"); return; } These problematic dummy contexts are created via the following call chain: exit_to_user_mode_prepare -> arch_do_signal_or_restart -> get_signal -> task_work_run -> tctx_task_work -> io_req_task_submit -> io_issue_sqe -> audit_uring_entry Cc: stable@vger.kernel.org Fixes: 5bd2182d58e9 ("audit,io_uring,io-wq: add some basic audit support to io_uring") Signed-off-by: Julian Orth <ju.orth@gmail.com> [PM: subject line tweaks] Signed-off-by: Paul Moore <paul@paul-moore.com>
2022-05-17	NFSv4: Add encoders/decoders for the NFSv4.1 dacl and sacl attributes	Trond Myklebust
	Add the ability to set or retrieve the acl using the NFSv4.1 'dacl' and 'sacl' attributes to the NFSv4 xdr encoders/decoders. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2022-05-17	NFSv4: Specify the type of ACL to cache	Trond Myklebust
	When caching a NFSv4 ACL, we want to specify whether we are caching an NFSv4.0 type acl, the NFSv4.1 dacl or the NFSv4.1 sacl. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2022-05-18	cachefiles: implement on-demand read	Jeffle Xu
	Implement the data plane of on-demand read mode. The early implementation [1] place the entry to cachefiles_ondemand_read() in fscache_read(). However, fscache_read() can only detect if the requested file range is fully cache miss, whilst we need to notify the user daemon as long as there's a hole inside the requested file range. Thus the entry is now placed in cachefiles_prepare_read(). When working in on-demand read mode, once a hole detected, the read routine will send a READ request to the user daemon. The user daemon needs to fetch the data and write it to the cache file. After sending the READ request, the read routine will hang there, until the READ request is handled by the user daemon. Then it will retry to read from the same file range. If no progress encountered, the read routine will fail then. A new NETFS_SREQ_ONDEMAND flag is introduced to indicate that on-demand read should be done when a cache miss encountered. [1] https://lore.kernel.org/all/20220406075612.60298-6-jefflexu@linux.alibaba.com/ #v8 Signed-off-by: Jeffle Xu <jefflexu@linux.alibaba.com> Acked-by: David Howells <dhowells@redhat.com> Link: https://lore.kernel.org/r/20220425122143.56815-6-jefflexu@linux.alibaba.com Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
2022-05-18	cachefiles: notify the user daemon when looking up cookie	Jeffle Xu
	Fscache/CacheFiles used to serve as a local cache for a remote networking fs. A new on-demand read mode will be introduced for CacheFiles, which can boost the scenario where on-demand read semantics are needed, e.g. container image distribution. The essential difference between these two modes is seen when a cache miss occurs: In the original mode, the netfs will fetch the data from the remote server and then write it to the cache file; in on-demand read mode, fetching the data and writing it into the cache is delegated to a user daemon. As the first step, notify the user daemon when looking up cookie. In this case, an anonymous fd is sent to the user daemon, through which the user daemon can write the fetched data to the cache file. Since the user daemon may move the anonymous fd around, e.g. through dup(), an object ID uniquely identifying the cache file is also attached. Also add one advisory flag (FSCACHE_ADV_WANT_CACHE_SIZE) suggesting that the cache file size shall be retrieved at runtime. This helps the scenario where one cache file contains multiple netfs files, e.g. for the purpose of deduplication. In this case, netfs itself has no idea the size of the cache file, whilst the user daemon should give the hint on it. Signed-off-by: Jeffle Xu <jefflexu@linux.alibaba.com> Link: https://lore.kernel.org/r/20220509074028.74954-3-jefflexu@linux.alibaba.com Acked-by: David Howells <dhowells@redhat.com> Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
2022-05-17	hwmon: introduce hwmon_sanitize_name()	Michael Walle
	More and more drivers will check for bad characters in the hwmon name and all are using the same code snippet. Consolidate that code by adding a new hwmon_sanitize_name() function. Signed-off-by: Michael Walle <michael@walle.cc> Reviewed-by: Tom Rix <trix@redhat.com> Link: https://lore.kernel.org/r/20220405092452.4033674-2-michael@walle.cc Signed-off-by: Guenter Roeck <linux@roeck-us.net>
2022-05-17	PM / devfreq: passive: Keep cpufreq_policy for possible cpus	Chanwoo Choi
	The passive governor requires the cpu data to get the next target frequency of devfreq device if depending on cpu. In order to reduce the unnecessary memory data, keep cpufreq_policy data for possible cpus instead of NR_CPU. Tested-by: Chen-Yu Tsai <wenst@chromium.org> Tested-by: Johnson Wang <johnson.wang@mediatek.com> Signed-off-by: Chanwoo Choi <cw00.choi@samsung.com>
2022-05-17	PM / devfreq: Add cpu based scaling support to passive governor	Saravana Kannan
	Many CPU architectures have caches that can scale independent of the CPUs. Frequency scaling of the caches is necessary to make sure that the cache is not a performance bottleneck that leads to poor performance and power. The same idea applies for RAM/DDR. To achieve this, this patch adds support for cpu based scaling to the passive governor. This is accomplished by taking the current frequency of each CPU frequency domain and then adjust the frequency of the cache (or any devfreq device) based on the frequency of the CPUs. It listens to CPU frequency transition notifiers to keep itself up to date on the current CPU frequency. To decide the frequency of the device, the governor does one of the following: * Derives the optimal devfreq device opp from required-opps property of the parent cpu opp_table. * Scales the device frequency in proportion to the CPU frequency. So, if the CPUs are running at their max frequency, the device runs at its max frequency. If the CPUs are running at their min frequency, the device runs at its min frequency. It is interpolated for frequencies in between. Tested-by: Chen-Yu Tsai <wenst@chromium.org> Tested-by: Johnson Wang <johnson.wang@mediatek.com> Signed-off-by: Saravana Kannan <skannan@codeaurora.org> [Sibi: Integrated cpu-freqmap governor into passive_governor] Signed-off-by: Sibi Sankar <sibis@codeaurora.org> [Chanwoo: Fix conflict with latest code and cleanup code] Signed-off-by: Chanwoo Choi <cw00.choi@samsung.com>
2022-05-17	nvme: split the enum used for various register constants	Christoph Hellwig
	Instead of having one big enum add one for each register or field. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Keith Busch <kbusch@kernel.org> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
2022-05-16	net: skb: Remove skb_data_area_size()	Ricardo Martinez
	skb_data_area_size() is not needed. As Jakub pointed out [1]: For Rx, drivers can use the size passed during skb allocation or use skb_tailroom(). For Tx, drivers should use skb_headlen(). [1] https://lore.kernel.org/netdev/CAHNKnsTmH-rGgWi3jtyC=ktM1DW2W1VJkYoTMJV2Z_Bt498bsg@mail.gmail.com/ Signed-off-by: Ricardo Martinez <ricardo.martinez@linux.intel.com> Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Reviewed-by: Sergey Ryazanov <ryazanov.s.a@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-05-16	dax: add .recovery_write dax_operation	Jane Chu
	Introduce dax_recovery_write() operation. The function is used to recover a dax range that contains poison. Typical use case is when a user process receives a SIGBUS with si_code BUS_MCEERR_AR indicating poison(s) in a dax range, in response, the user process issues a pwrite() to the page-aligned dax range, thus clears the poison and puts valid data in the range. Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jane Chu <jane.chu@oracle.com> Link: https://lore.kernel.org/r/20220422224508.440670-6-jane.chu@oracle.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2022-05-16	dax: introduce DAX_RECOVERY_WRITE dax access mode	Jane Chu
	Up till now, dax_direct_access() is used implicitly for normal access, but for the purpose of recovery write, dax range with poison is requested. To make the interface clear, introduce enum dax_access_mode { DAX_ACCESS, DAX_RECOVERY_WRITE, } where DAX_ACCESS is used for normal dax access, and DAX_RECOVERY_WRITE is used for dax recovery write. Suggested-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Jane Chu <jane.chu@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Cc: Mike Snitzer <snitzer@redhat.com> Reviewed-by: Vivek Goyal <vgoyal@redhat.com> Link: https://lore.kernel.org/r/165247982851.52965.11024212198889762949.stgit@dwillia2-desk3.amr.corp.intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2022-05-16	mce: fix set_mce_nospec to always unmap the whole page	Jane Chu
	The set_memory_uc() approach doesn't work well in all cases. As Dan pointed out when "The VMM unmapped the bad page from guest physical space and passed the machine check to the guest." "The guest gets virtual #MC on an access to that page. When the guest tries to do set_memory_uc() and instructs cpa_flush() to do clean caches that results in taking another fault / exception perhaps because the VMM unmapped the page from the guest." Since the driver has special knowledge to handle NP or UC, mark the poisoned page with NP and let driver handle it when it comes down to repair. Please refer to discussions here for more details. https://lore.kernel.org/all/CAPcyv4hrXPb1tASBZUg-GgdVs0OOFKXMXLiHmktg_kFi7YBMyQ@mail.gmail.com/ Now since poisoned page is marked as not-present, in order to avoid writing to a not-present page and trigger kernel Oops, also fix pmem_do_write(). Fixes: 284ce4011ba6 ("x86/memory_failure: Introduce {set, clear}_mce_nospec()") Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Jane Chu <jane.chu@oracle.com> Acked-by: Tony Luck <tony.luck@intel.com> Link: https://lore.kernel.org/r/165272615484.103830.2563950688772226611.stgit@dwillia2-desk3.amr.corp.intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2022-05-16	x86/mce: relocate set{clear}_mce_nospec() functions	Jane Chu
	Relocate the twin mce functions to arch/x86/mm/pat/set_memory.c file where they belong. While at it, fixup a function name in a comment. Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Jane Chu <jane.chu@oracle.com> Acked-by: Borislav Petkov <bp@suse.de> Cc: Stephen Rothwell <sfr@canb.auug.org.au> [sfr: gate {set,clear}_mce_nospec() by CONFIG_X86_64] Link: https://lore.kernel.org/r/165272527328.90175.8336008202048685278.stgit@dwillia2-desk3.amr.corp.intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2022-05-16	Merge branch kvm-arm64/vgic-invlpir into kvmarm-master/next	Marc Zyngier
	* kvm-arm64/vgic-invlpir: : . : Implement MMIO-based LPI invalidation for vGICv3. : . KVM: arm64: vgic-v3: Advertise GICR_CTLR.{IR, CES} as a new GICD_IIDR revision KVM: arm64: vgic-v3: Implement MMIO-based LPI invalidation KVM: arm64: vgic-v3: Expose GICR_CTLR.RWP when disabling LPIs irqchip/gic-v3: Exposes bit values for GICR_CTLR.{IR, CES} Signed-off-by: Marc Zyngier <maz@kernel.org>
2022-05-16	iomap: add per-iomap_iter private data	Christoph Hellwig
	Allow the file system to keep state for all iterations. For now only wire it up for direct I/O as there is an immediate need for it there. Reviewed-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Nikolay Borisov <nborisov@suse.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2022-05-16	iomap: allow the file system to provide a bio_set for direct I/O	Christoph Hellwig
	Allow the file system to provide a specific bio_set for allocating direct I/O bios. This will allow file systems that use the ->submit_io hook to stash away additional information for file system use. To make use of this additional space for information in the completion path, the file system needs to override the ->bi_end_io callback and then call back into iomap, so export iomap_dio_bio_end_io for that. Reviewed-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Nikolay Borisov <nborisov@suse.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2022-05-16	fs: add a lockdep check function for sb_start_write()	Naohiro Aota
	Add a function sb_write_started() to allow callers to verify if sb_start_write() is properly called. It will be used for assertion in btrfs. Reviewed-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2022-05-16	Merge 5.18-rc7 into usb-next	Greg Kroah-Hartman
	We need the tty fixes in here as well, as we need to revert one of them :( Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-05-16	firmware: mediatek: Add adsp ipc protocol interface	TingHan Shen
	Some of mediatek processors contain the Tensilica HiFix DSP for audio processing. The communication between Host CPU and DSP firmware is taking place using a shared memory area for message passing. ADSP IPC protocol offers (send/recv) interfaces using mediatek-mailbox APIs. We use two mbox channels to implement a request-reply protocol. Signed-off-by: Allen-KH Cheng <allen-kh.cheng@mediatek.com> Signed-off-by: TingHan Shen <tinghan.shen@mediatek.com> Reviewed-by: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com> Reviewed-by: Curtis Malainey <cujomalainey@chromium.org> Reviewed-by: Tzung-Bi Shih <tzungbi@google.com> Reviewed-by: YC Hung <yc.hung@mediatek.com> Reviewed-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com> Link: https://lore.kernel.org/r/20220512082215.3018-2-tinghan.shen@mediatek.com Signed-off-by: Mark Brown <broonie@kernel.org>
2022-05-16	net: fix dev_fill_forward_path with pppoe + bridge	Felix Fietkau
	When calling dev_fill_forward_path on a pppoe device, the provided destination address is invalid. In order for the bridge fdb lookup to succeed, the pppoe code needs to update ctx->daddr to the correct value. Fix this by storing the address inside struct net_device_path_ctx Fixes: f6efc675c9dd ("net: ppp: resolve forwarding path for bridge pppoe devices") Signed-off-by: Felix Fietkau <nbd@nbd.name> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2022-05-16	net: fix possible race in skb_attempt_defer_free()	Eric Dumazet
	A cpu can observe sd->defer_count reaching 128, and call smp_call_function_single_async() Problem is that the remote CPU can clear sd->defer_count before the IPI is run/acknowledged. Other cpus can queue more packets and also decide to call smp_call_function_single_async() while the pending IPI was not yet delivered. This is a common issue with smp_call_function_single_async(). Callers must ensure correct synchronization and serialization. I triggered this issue while experimenting smaller threshold. Performing the call to smp_call_function_single_async() under sd->defer_lock protection did not solve the problem. Commit 5a18ceca6350 ("smp: Allow smp_call_function_single_async() to insert locked csd") replaced an informative WARN_ON_ONCE() with a return of -EBUSY, which is often ignored. Test of CSD_FLAG_LOCK presence is racy anyway. Fixes: 68822bdf76f1 ("net: generalize skb freeing deferral to per-cpu lists") Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-05-16	net: skb: change the definition SKB_DR_SET()	Menglong Dong
	The SKB_DR_OR() is used to set the drop reason to a value when it is not set or specified yet. SKB_NOT_DROPPED_YET should also be considered as not set. Reviewed-by: Jiang Biao <benbjiang@tencent.com> Reviewed-by: Hao Peng <flyingpeng@tencent.com> Signed-off-by: Menglong Dong <imagedong@tencent.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-05-16	ipv6: Add hop-by-hop header to jumbograms in ip6_output	Coco Li
	Instead of simply forcing a 0 payload_len in IPv6 header, implement RFC 2675 and insert a custom extension header. Note that only TCP stack is currently potentially generating jumbograms, and that this extension header is purely local, it wont be sent on a physical link. This is needed so that packet capture (tcpdump and friends) can properly dissect these large packets. Signed-off-by: Coco Li <lixiaoyan@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Alexander Duyck <alexanderduyck@fb.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-05-16	net: allow gro_max_size to exceed 65536	Alexander Duyck
	Allow the gro_max_size to exceed a value larger than 65536. There weren't really any external limitations that prevented this other than the fact that IPv4 only supports a 16 bit length field. Since we have the option of adding a hop-by-hop header for IPv6 we can allow IPv6 to exceed this value and for IPv4 and non-TCP flows we can cap things at 65536 via a constant rather than relying on gro_max_size. [edumazet] limit GRO_MAX_SIZE to (8 * 65535) to avoid overflows. Signed-off-by: Alexander Duyck <alexanderduyck@fb.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-05-16	net: limit GSO_MAX_SIZE to 524280 bytes	Eric Dumazet
	Make sure we will not overflow shinfo->gso_segs Minimal TCP MSS size is 8 bytes, and shinfo->gso_segs is a 16bit field. TCP_MIN_GSO_SIZE is currently defined in include/net/tcp.h, it seems cleaner to not bring tcp details into include/linux/netdevice.h Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Alexander Duyck <alexanderduyck@fb.com> Signed-off-by: David S. Miller <davem@davemloft.net>