summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2022-01-14vdpa/mlx5: Distribute RX virtqueues in RQT objectEli Cohen
Distribute the available rx virtqueues amongst the available RQT entries. RQTs require to have a power of two entries. When creating or modifying the RQT, use the lowest number of power of two entries that is not less than the number of rx virtqueues. Distribute them in the available entries such that some virtqueus may be referenced twice. This allows to configure any number of virtqueue pairs when multiqueue is used. Reviewed-by: Si-Wei Liu <si-wei.liu@oracle.com> Acked-by: Jason Wang <jasowang@redhat.com> Signed-off-by: Eli Cohen <elic@nvidia.com> Link: https://lore.kernel.org/r/20220105114646.577224-3-elic@nvidia.com Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2022-01-14vdpa: Provide interface to read driver featuresEli Cohen
Provide an interface to read the negotiated features. This is needed when building the netlink message in vdpa_dev_net_config_fill(). Also fix the implementation of vdpa_dev_net_config_fill() to use the negotiated features instead of the device features. To make APIs clearer, make the following name changes to struct vdpa_config_ops so they better describe their operations: get_features -> get_device_features set_features -> set_driver_features Finally, add get_driver_features to return the negotiated features and add implementation to all the upstream drivers. Acked-by: Jason Wang <jasowang@redhat.com> Signed-off-by: Eli Cohen <elic@nvidia.com> Link: https://lore.kernel.org/r/20220105114646.577224-2-elic@nvidia.com Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2022-01-14vdpa: clean up get_config_size ret value handlingLaura Abbott
The return type of get_config_size is size_t so it makes sense to change the type of the variable holding its result. That said, this already got taken care of (differently, and arguably not as well) by commit 3ed21c1451a1 ("vdpa: check that offsets are within bounds"). The added 'c->off > size' test in that commit will be done as an unsigned comparison on 32-bit (safe due to not being signed). On a 64-bit platform, it will be done as a signed comparison, but in that case the comparison will be done in 64-bit, and 'c->off' being an u32 it will be valid thanks to the extended range (ie both values will be positive in 64 bits). So this was a real bug, but it was already addressed and marked for stable. Signed-off-by: Laura Abbott <labbott@kernel.org> Reported-by: Luo Likang <luolikang@nsfocus.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2022-01-14virtio_ring: mark ring unused on errorMichael S. Tsirkin
A recently added error path does not mark ring unused when exiting on OOM, which will lead to BUG on the next entry in debug builds. TODO: refactor code so we have START_USE and END_USE in the same function. Fixes: fc6d70f40b3d ("virtio_ring: check desc == NULL when using indirect with packed") Cc: "Xuan Zhuo" <xuanzhuo@linux.alibaba.com> Cc: Jiasheng Jiang <jiasheng@iscas.ac.cn> Reviewed-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2022-01-14vhost/test: fix memory leak of vhost virtqueuesXianting Tian
We need free the vqs in .release(), which are allocated in .open(). Signed-off-by: Xianting Tian <xianting.tian@linux.alibaba.com> Link: https://lore.kernel.org/r/20211228030924.3468439-1-xianting.tian@linux.alibaba.com Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Jason Wang <jasowang@redhat.com>
2022-01-14vdpa/mlx5: Fix wrong configuration of virtio_version_1_0Eli Cohen
Remove overriding of virtio_version_1_0 which forced the virtqueue object to version 1. Fixes: 1a86b377aa21 ("vdpa/mlx5: Add VDPA driver for supported mlx5 devices") Signed-off-by: Eli Cohen <elic@nvidia.com> Link: https://lore.kernel.org/r/20211230142024.142979-1-elic@nvidia.com Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Parav Pandit <parav@nvidia.com> Acked-by: Jason Wang <jasowang@redhat.com> Reviewed-by: Si-Wei Liu <si-wei.liu@oracle.com>
2022-01-14virtio/virtio_pci_legacy_dev: ensure the correct return valuePeng Hao
When pci_iomap return NULL, the return value is zero. Signed-off-by: Peng Hao <flyingpeng@tencent.com> Link: https://lore.kernel.org/r/20211222112014.87394-1-flyingpeng@tencent.com Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Jason Wang <jasowang@redhat.com>
2022-01-14virtio/virtio_mem: handle a possible NULL as a memcpy parameterPeng Hao
There is a check for vm->sbm.sb_states before, and it should check it here as well. Signed-off-by: Peng Hao <flyingpeng@tencent.com> Link: https://lore.kernel.org/r/20211222011225.40573-1-flyingpeng@tencent.com Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Fixes: 5f1f79bbc9e2 ("virtio-mem: Paravirtualized memory hotplug") Cc: stable@vger.kernel.org # v5.8+
2022-01-14virtio: fix a typo in function "vp_modern_remove" comments.Dapeng Mi
Function name "vp_modern_remove" in comments is written to "vp_modern_probe" incorrectly. Change it. Signed-off-by: Dapeng Mi <dapeng1.mi@intel.com> Link: https://lore.kernel.org/r/20211210073546.700783-1-dapeng1.mi@intel.com Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
2022-01-14virtio-pci: fix the confusing error message王贇
The error message on the failure of pfn check should tell virtio-pci rather than virtio-mmio, just fix it. Signed-off-by: Michael Wang <yun.wang@linux.alibaba.com> Suggested-by: Michael S. Tsirkin <mst@redhat.com> Link: https://lore.kernel.org/r/ae5e154e-ac59-f0fa-a7c7-091a2201f581@linux.alibaba.com Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2022-01-14firmware: qemu_fw_cfg: remove sysfs entries explicitlyJohan Hovold
Explicitly remove the file entries from sysfs before dropping the final reference for symmetry reasons and for consistency with the rest of the driver. Signed-off-by: Johan Hovold <johan@kernel.org> Link: https://lore.kernel.org/r/20211201132528.30025-5-johan@kernel.org Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2022-01-14firmware: qemu_fw_cfg: fix sysfs information leakJohan Hovold
Make sure to always NUL-terminate file names retrieved from the firmware to avoid accessing data beyond the entry slab buffer and exposing it through sysfs in case the firmware data is corrupt. Fixes: 75f3e8e47f38 ("firmware: introduce sysfs driver for QEMU's fw_cfg device") Cc: stable@vger.kernel.org # 4.6 Cc: Gabriel Somlo <somlo@cmu.edu> Signed-off-by: Johan Hovold <johan@kernel.org> Link: https://lore.kernel.org/r/20211201132528.30025-4-johan@kernel.org Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2022-01-14firmware: qemu_fw_cfg: fix kobject leak in probe error pathJohan Hovold
An initialised kobject must be freed using kobject_put() to avoid leaking associated resources (e.g. the object name). Commit fe3c60684377 ("firmware: Fix a reference count leak.") "fixed" the leak in the first error path of the file registration helper but left the second one unchanged. This "fix" would however result in a NULL pointer dereference due to the release function also removing the never added entry from the fw_cfg_entry_cache list. This has now been addressed. Fix the remaining kobject leak by restoring the common error path and adding the missing kobject_put(). Fixes: 75f3e8e47f38 ("firmware: introduce sysfs driver for QEMU's fw_cfg device") Cc: stable@vger.kernel.org # 4.6 Cc: Gabriel Somlo <somlo@cmu.edu> Signed-off-by: Johan Hovold <johan@kernel.org> Link: https://lore.kernel.org/r/20211201132528.30025-3-johan@kernel.org Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2022-01-14firmware: qemu_fw_cfg: fix NULL-pointer deref on duplicate entriesJohan Hovold
Commit fe3c60684377 ("firmware: Fix a reference count leak.") "fixed" a kobject leak in the file registration helper by properly calling kobject_put() for the entry in case registration of the object fails (e.g. due to a name collision). This would however result in a NULL pointer dereference when the release function tries to remove the never added entry from the fw_cfg_entry_cache list. Fix this by moving the list-removal out of the release function. Note that the offending commit was one of the benign looking umn.edu fixes which was reviewed but not reverted. [1][2] [1] https://lore.kernel.org/r/202105051005.49BFABCE@keescook [2] https://lore.kernel.org/all/YIg7ZOZvS3a8LjSv@kroah.com Fixes: fe3c60684377 ("firmware: Fix a reference count leak.") Cc: stable@vger.kernel.org # 5.8 Cc: Qiushi Wu <wu000273@umn.edu> Cc: Kees Cook <keescook@chromium.org> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Johan Hovold <johan@kernel.org> Link: https://lore.kernel.org/r/20211201132528.30025-2-johan@kernel.org Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2022-01-14vdpa: Mark vdpa_config_ops.get_vq_notification as optionalEugenio Pérez
Since vhost_vdpa_mmap checks for its existence before calling it. Signed-off-by: Eugenio Pérez <eperezma@redhat.com> Link: https://lore.kernel.org/r/20211104195248.2088904-1-eperezma@redhat.com Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Jason Wang <jasowang@redhat.com> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
2022-01-14vdpa: Avoid duplicate call to vp_vdpa get_statusEugenio Pérez
It has no sense to call get_status twice, since we already have a variable for that. Signed-off-by: Eugenio Pérez <eperezma@redhat.com> Link: https://lore.kernel.org/r/20211104195833.2089796-1-eperezma@redhat.com Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Jason Wang <jasowang@redhat.com> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
2022-01-14eni_vdpa: Simplify 'eni_vdpa_probe()'Christophe JAILLET
When 'pcim_enable_device()' is used, some resources become automagically managed. There is no need to call 'pci_free_irq_vectors()' when the driver is removed. The same will already be done by 'pcim_release()'. Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Link: https://lore.kernel.org/r/02045bdcbbb25f79bae4827f66029cfcddc90381.1636301587.git.christophe.jaillet@wanadoo.fr Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Jason Wang <jasowang@redhat.com>
2022-01-14net/mlx5_vdpa: Offer VIRTIO_NET_F_MTU when setting MTUEli Cohen
Make sure to offer VIRTIO_NET_F_MTU since we configure the MTU based on what was queried from the device. This allows the virtio driver to allocate large enough buffers based on the reported MTU. Signed-off-by: Eli Cohen <elic@nvidia.com> Link: https://lore.kernel.org/r/20211124170949.51725-1-elic@nvidia.com Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Jason Wang <jasowang@redhat.com> Reviewed-by: Si-Wei Liu <si-wei.liu@oracle.com>
2022-01-14virtio-mem: prepare fake page onlining code for granularity smaller than ↵David Hildenbrand
MAX_ORDER - 1 Let's prepare our fake page onlining code for subblock size smaller than MAX_ORDER - 1: we might get called for ranges not covering properly aligned MAX_ORDER - 1 pages. We have to detect the order to use dynamically. Signed-off-by: David Hildenbrand <david@redhat.com> Link: https://lore.kernel.org/r/20211126134209.17332-3-david@redhat.com Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Zi Yan <ziy@nvidia.com> Reviewed-by: Eric Ren <renzhengeek@gmail.com>
2022-01-14virtio-mem: prepare page onlining code for granularity smaller than ↵David Hildenbrand
MAX_ORDER - 1 Let's prepare our page onlining code for subblock size smaller than MAX_ORDER - 1: we'll get called for a MAX_ORDER - 1 page but might have some subblocks in the range plugged and some unplugged. In that case, fallback to subblock granularity to properly only expose the plugged parts to the buddy. Signed-off-by: David Hildenbrand <david@redhat.com> Link: https://lore.kernel.org/r/20211126134209.17332-2-david@redhat.com Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Zi Yan <ziy@nvidia.com> Reviewed-by: Eric Ren <renzhengeek@gmail.com>
2022-01-14vdpa: add driver_override supportStefano Garzarella
`driver_override` allows to control which of the vDPA bus drivers binds to a vDPA device. If `driver_override` is not set, the previous behaviour is followed: devices use the first vDPA bus driver loaded (unless auto binding is disabled). Tested on Fedora 34 with driverctl(8): $ modprobe virtio-vdpa $ modprobe vhost-vdpa $ modprobe vdpa-sim-net $ vdpa dev add mgmtdev vdpasim_net name dev1 # dev1 is attached to the first vDPA bus driver loaded $ driverctl -b vdpa list-devices dev1 virtio_vdpa $ driverctl -b vdpa set-override dev1 vhost_vdpa $ driverctl -b vdpa list-devices dev1 vhost_vdpa [*] Note: driverctl(8) integrates with udev so the binding is preserved. Suggested-by: Jason Wang <jasowang@redhat.com> Acked-by: Jason Wang <jasowang@redhat.com> Signed-off-by: Stefano Garzarella <sgarzare@redhat.com> Link: https://lore.kernel.org/r/20211126164753.181829-3-sgarzare@redhat.com Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2022-01-14docs: document sysfs ABI for vDPA busStefano Garzarella
Add missing documentation of sysfs ABI for vDPA bus in the new Documentation/ABI/testing/sysfs-bus-vdpa file. Signed-off-by: Stefano Garzarella <sgarzare@redhat.com> Link: https://lore.kernel.org/r/20211126164753.181829-2-sgarzare@redhat.com Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Jason Wang <jasowang@redhat.com>
2022-01-14ifcvf/vDPA: fix misuse virtio-net device config size for blk devZhu Lingshan
This commit fixes a misuse of virtio-net device config size issue for virtio-block devices. A new member config_size in struct ifcvf_hw is introduced and would be initialized through vdpa_dev_add() to record correct device config size. To be more generic, rename ifcvf_hw.net_config to ifcvf_hw.dev_config, the helpers ifcvf_read/write_net_config() to ifcvf_read/write_dev_config() Signed-off-by: Zhu Lingshan <lingshan.zhu@intel.com> Reported-and-suggested-by: Stefano Garzarella <sgarzare@redhat.com> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Fixes: 6ad31d162a4e ("vDPA/ifcvf: enable Intel C5000X-PL virtio-block for vDPA") Cc: <stable@vger.kernel.org> Link: https://lore.kernel.org/r/20211201081255.60187-1-lingshan.zhu@intel.com Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2022-01-14vduse: moving kvfree into callerGuanjun
This free action should be moved into caller 'vduse_ioctl' in concert with the allocation. No functional change. Signed-off-by: Guanjun <guanjun@linux.alibaba.com> Link: https://lore.kernel.org/r/1638780498-55571-1-git-send-email-guanjun@linux.alibaba.com Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2022-01-14hwrng: virtio - unregister device before resetMichael S. Tsirkin
unregister after reset is clearly wrong - device can be used while it's reset. There's an attempt to protect against that using hwrng_removed but it seems racy since access can be in progress when the flag is set. Just unregister, then reset seems simpler and cleaner. NB: we might be able to drop hwrng_removed in a follow-up patch. Signed-off-by: Laurent Vivier <lvivier@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2022-01-14virtio: wrap config->reset callsMichael S. Tsirkin
This will enable cleanups down the road. The idea is to disable cbs, then add "flush_queued_cbs" callback as a parameter, this way drivers can flush any work queued after callbacks have been disabled. Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Link: https://lore.kernel.org/r/20211013105226.20225-1-mst@redhat.com Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2022-01-02Linux 5.16-rc8Linus Torvalds
2022-01-02Merge tag 'perf-tools-fixes-for-v5.16-2022-01-02' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux Pull perf tools fixes from Arnaldo Carvalho de Melo: - Fix TUI exit screen refresh race condition in 'perf top'. - Fix parsing of Intel PT VM time correlation arguments. - Honour CPU filtering command line request of a script's switch events in 'perf script'. - Fix printing of switch events in Intel PT python script. - Fix duplicate alias events list printing in 'perf list', noticed on heterogeneous arm64 systems. - Fix return value of ids__new(), users expect NULL for failure, not ERR_PTR(-ENOMEM). * tag 'perf-tools-fixes-for-v5.16-2022-01-02' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux: perf top: Fix TUI exit screen refresh race condition perf pmu: Fix alias events list perf scripts python: intel-pt-events.py: Fix printing of switch events perf script: Fix CPU filtering of a script's switch events perf intel-pt: Fix parsing of VM time correlation arguments perf expr: Fix return value of ids__new()
2022-01-02Merge branch 'i2c/for-current' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux Pull i2c fixes from Wolfram Sang: "Better input validation for compat ioctls and a documentation bugfix for 5.16" * 'i2c/for-current' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux: Docs: Fixes link to I2C specification i2c: validate user data in compat ioctl
2022-01-02Merge tag 'x86_urgent_for_v5.16_rc8' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fix from Borislav Petkov: - Use the proper CONFIG symbol in a preprocessor check. * tag 'x86_urgent_for_v5.16_rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/build: Use the proper name CONFIG_FW_LOADER
2022-01-02perf top: Fix TUI exit screen refresh race conditionyaowenbin
When the following command is executed several times, a coredump file is generated. $ timeout -k 9 5 perf top -e task-clock ******* ******* ******* 0.01% [kernel] [k] __do_softirq 0.01% libpthread-2.28.so [.] __pthread_mutex_lock 0.01% [kernel] [k] __ll_sc_atomic64_sub_return double free or corruption (!prev) perf top --sort comm,dso timeout: the monitored command dumped core When we terminate "perf top" using sending signal method, SLsmg_reset_smg() called. SLsmg_reset_smg() resets the SLsmg screen management routines by freeing all memory allocated while it was active. However SLsmg_reinit_smg() maybe be called by another thread. SLsmg_reinit_smg() will free the same memory accessed by SLsmg_reset_smg(), thus it results in a double free. SLsmg_reinit_smg() is called already protected by ui__lock, so we fix the problem by adding pthread_mutex_trylock of ui__lock when calling SLsmg_reset_smg(). Signed-off-by: Wenyu Liu <liuwenyu7@huawei.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: wuxu.wu@huawei.com Link: http://lore.kernel.org/lkml/a91e3943-7ddc-f5c0-a7f5-360f073c20e6@huawei.com Signed-off-by: Hewenliang <hewenliang4@huawei.com> Signed-off-by: yaowenbin <yaowenbin1@huawei.com> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-01-02perf pmu: Fix alias events listJohn Garry
Commit 0e0ae8742207c3b4 ("perf list: Display hybrid PMU events with cpu type") changes the event list for uncore PMUs or arm64 heterogeneous CPU systems, such that duplicate aliases are incorrectly listed per PMU (which they should not be), like: # perf list ... unc_cbo_cache_lookup.any_es [Unit: uncore_cbox L3 Lookup any request that access cache and found line in E or S-state] unc_cbo_cache_lookup.any_es [Unit: uncore_cbox L3 Lookup any request that access cache and found line in E or S-state] unc_cbo_cache_lookup.any_i [Unit: uncore_cbox L3 Lookup any request that access cache and found line in I-state] unc_cbo_cache_lookup.any_i [Unit: uncore_cbox L3 Lookup any request that access cache and found line in I-state] ... Notice how the events are listed twice. The named commit changed how we remove duplicate events, in that events for different PMUs are not treated as duplicates. I suppose this is to handle how "Each hybrid pmu event has been assigned with a pmu name". Fix PMU alias listing by restoring behaviour to remove duplicates for non-hybrid PMUs. Fixes: 0e0ae8742207c3b4 ("perf list: Display hybrid PMU events with cpu type") Signed-off-by: John Garry <john.garry@huawei.com> Tested-by: Zhengjun Xing <zhengjun.xing@linux.intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/1640103090-140490-1-git-send-email-john.garry@huawei.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-01-01Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input Pull input fixes from Dmitry Torokhov: "Two small fixups for spaceball joystick driver and appletouch touchpad driver" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input: Input: spaceball - fix parsing of movement data packets Input: appletouch - initialize work before device registration
2021-12-31mm: vmscan: reduce throttling due to a failure to make progress -fixMel Gorman
Hugh Dickins reported the following My tmpfs swapping load (tweaked to use huge pages more heavily than in real life) is far from being a realistic load: but it was notably slowed down by your throttling mods in 5.16-rc, and this patch makes it well again - thanks. But: it very quickly hit NULL pointer until I changed that last line to if (first_pgdat) consider_reclaim_throttle(first_pgdat, sc); The likely issue is that huge pages are a major component of the test workload. When this is the case, first_pgdat may never get set if compaction is ready to continue due to this check if (IS_ENABLED(CONFIG_COMPACTION) && sc->order > PAGE_ALLOC_COSTLY_ORDER && compaction_ready(zone, sc)) { sc->compaction_ready = true; continue; } If this was true for every zone in the zonelist, first_pgdat would never get set resulting in a NULL pointer exception. Link: https://lkml.kernel.org/r/20211209095453.GM3366@techsingularity.net Fixes: 1b4e3f26f9f75 ("mm: vmscan: Reduce throttling due to a failure to make progress") Signed-off-by: Mel Gorman <mgorman@techsingularity.net> Reported-by: Hugh Dickins <hughd@google.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Rik van Riel <riel@surriel.com> Cc: Mike Galbraith <efault@gmx.de> Cc: Darrick J. Wong <djwong@kernel.org> Cc: Shakeel Butt <shakeelb@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-12-31mm: vmscan: Reduce throttling due to a failure to make progressMel Gorman
Mike Galbraith, Alexey Avramov and Darrick Wong all reported similar problems due to reclaim throttling for excessive lengths of time. In Alexey's case, a memory hog that should go OOM quickly stalls for several minutes before stalling. In Mike and Darrick's cases, a small memcg environment stalled excessively even though the system had enough memory overall. Commit 69392a403f49 ("mm/vmscan: throttle reclaim when no progress is being made") introduced the problem although commit a19594ca4a8b ("mm/vmscan: increase the timeout if page reclaim is not making progress") made it worse. Systems at or near an OOM state that cannot be recovered must reach OOM quickly and memcg should kill tasks if a memcg is near OOM. To address this, only stall for the first zone in the zonelist, reduce the timeout to 1 tick for VMSCAN_THROTTLE_NOPROGRESS and only stall if the scan control nr_reclaimed is 0, kswapd is still active and there were excessive pages pending for writeback. If kswapd has stopped reclaiming due to excessive failures, do not stall at all so that OOM triggers relatively quickly. Similarly, if an LRU is simply congested, only lightly throttle similar to NOPROGRESS. Alexey's original case was the most straight forward for i in {1..3}; do tail /dev/zero; done On vanilla 5.16-rc1, this test stalled heavily, after the patch the test completes in a few seconds similar to 5.15. Alexey's second test case added watching a youtube video while tail runs 10 times. On 5.15, playback only jitters slightly, 5.16-rc1 stalls a lot with lots of frames missing and numerous audio glitches. With this patch applies, the video plays similarly to 5.15. [lkp@intel.com: Fix W=1 build warning] Link: https://lore.kernel.org/r/99e779783d6c7fce96448a3402061b9dc1b3b602.camel@gmx.de Link: https://lore.kernel.org/r/20211124011954.7cab9bb4@mail.inbox.lv Link: https://lore.kernel.org/r/20211022144651.19914-1-mgorman@techsingularity.net Link: https://lore.kernel.org/r/20211202150614.22440-1-mgorman@techsingularity.net Link: https://linux-regtracking.leemhuis.info/regzbot/regression/20211124011954.7cab9bb4@mail.inbox.lv/ Reported-and-tested-by: Alexey Avramov <hakavlad@inbox.lv> Reported-and-tested-by: Mike Galbraith <efault@gmx.de> Reported-and-tested-by: Darrick J. Wong <djwong@kernel.org> Reported-by: kernel test robot <lkp@intel.com> Acked-by: Hugh Dickins <hughd@google.com> Tracked-by: Thorsten Leemhuis <regressions@leemhuis.info> Fixes: 69392a403f49 ("mm/vmscan: throttle reclaim when no progress is being made") Signed-off-by: Mel Gorman <mgorman@techsingularity.net> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-12-31Merge branch 'akpm' (patches from Andrew)Linus Torvalds
Merge misc mm fixes from Andrew Morton: "2 patches. Subsystems affected by this patch series: mm (userfaultfd and damon)" * akpm: mm/damon/dbgfs: fix 'struct pid' leaks in 'dbgfs_target_ids_write()' userfaultfd/selftests: fix hugetlb area allocations
2021-12-31Merge tag 'scsi-fixes' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi Pull SCSI fixes from James Bottomley: "Three fixes, all in drivers. The lpfc one doesn't look exploitable, but nasty things could happen in string operations if mybuf ends up with an on stack unterminated string" * tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: scsi: vmw_pvscsi: Set residual data length conditionally scsi: libiscsi: Fix UAF in iscsi_conn_get_param()/iscsi_conn_teardown() scsi: lpfc: Terminate string in lpfc_debugfs_nvmeio_trc_write()
2021-12-31mm/damon/dbgfs: fix 'struct pid' leaks in 'dbgfs_target_ids_write()'SeongJae Park
DAMON debugfs interface increases the reference counts of 'struct pid's for targets from the 'target_ids' file write callback ('dbgfs_target_ids_write()'), but decreases the counts only in DAMON monitoring termination callback ('dbgfs_before_terminate()'). Therefore, when 'target_ids' file is repeatedly written without DAMON monitoring start/termination, the reference count is not decreased and therefore memory for the 'struct pid' cannot be freed. This commit fixes this issue by decreasing the reference counts when 'target_ids' is written. Link: https://lkml.kernel.org/r/20211229124029.23348-1-sj@kernel.org Fixes: 4bc05954d007 ("mm/damon: implement a debugfs-based user space interface") Signed-off-by: SeongJae Park <sj@kernel.org> Cc: <stable@vger.kernel.org> [5.15+] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-12-31userfaultfd/selftests: fix hugetlb area allocationsMike Kravetz
Currently, userfaultfd selftest for hugetlb as run from run_vmtests.sh or any environment where there are 'just enough' hugetlb pages will always fail with: testing events (fork, remap, remove): ERROR: UFFDIO_COPY error: -12 (errno=12, line=616) The ENOMEM error code implies there are not enough hugetlb pages. However, there are free hugetlb pages but they are all reserved. There is a basic problem with the way the test allocates hugetlb pages which has existed since the test was originally written. Due to the way 'cleanup' was done between different phases of the test, this issue was masked until recently. The issue was uncovered by commit 8ba6e8640844 ("userfaultfd/selftests: reinitialize test context in each test"). For the hugetlb test, src and dst areas are allocated as PRIVATE mappings of a hugetlb file. This means that at mmap time, pages are reserved for the src and dst areas. At the start of event testing (and other tests) the src area is populated which results in allocation of huge pages to fill the area and consumption of reserves associated with the area. Then, a child is forked to fault in the dst area. Note that the dst area was allocated in the parent and hence the parent owns the reserves associated with the mapping. The child has normal access to the dst area, but can not use the reserves created/owned by the parent. Thus, if there are no other huge pages available allocation of a page for the dst by the child will fail. Fix by not creating reserves for the dst area. In this way the child can use free (non-reserved) pages. Also, MAP_PRIVATE of a file only makes sense if you are interested in the contents of the file before making a COW copy. The test does not do this. So, just use MAP_ANONYMOUS | MAP_HUGETLB to create an anonymous hugetlb mapping. There is no need to create a hugetlb file in the non-shared case. Link: https://lkml.kernel.org/r/20211217172919.7861-1-mike.kravetz@oracle.com Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com> Cc: Axel Rasmussen <axelrasmussen@google.com> Cc: Peter Xu <peterx@redhat.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Mina Almasry <almasrymina@google.com> Cc: Shuah Khan <shuah@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-12-31Docs: Fixes link to I2C specificationDeep Majumder
The link to the I2C specification is broken. Although "https://www.nxp.com" hosts Rev 7 (2021) of this specification, it is behind a login-wall. Thus, an additional link has been added (which doesn't require a login) and the NXP official docs link has been updated. Signed-off-by: Deep Majumder <deep@fastmail.in> [wsa: minor updates to text and commit message] Signed-off-by: Wolfram Sang <wsa@kernel.org>
2021-12-31i2c: validate user data in compat ioctlPavel Skripkin
Wrong user data may cause warning in i2c_transfer(), ex: zero msgs. Userspace should not be able to trigger warnings, so this patch adds validation checks for user data in compact ioctl to prevent reported warnings Reported-and-tested-by: syzbot+e417648b303855b91d8a@syzkaller.appspotmail.com Fixes: 7d5cb45655f2 ("i2c compat ioctls: move to ->compat_ioctl()") Signed-off-by: Pavel Skripkin <paskripkin@gmail.com> Signed-off-by: Wolfram Sang <wsa@kernel.org>
2021-12-30Input: spaceball - fix parsing of movement data packetsLeo L. Schwab
The spaceball.c module was not properly parsing the movement reports coming from the device. The code read axis data as signed 16-bit little-endian values starting at offset 2. In fact, axis data in Spaceball movement reports are signed 16-bit big-endian values starting at offset 3. This was determined first by visually inspecting the data packets, and later verified by consulting: http://spacemice.org/pdf/SpaceBall_2003-3003_Protocol.pdf If this ever worked properly, it was in the time before Git... Signed-off-by: Leo L. Schwab <ewhac@ewhac.org> Link: https://lore.kernel.org/r/20211221101630.1146385-1-ewhac@ewhac.org Cc: stable@vger.kernel.org Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
2021-12-30Input: appletouch - initialize work before device registrationPavel Skripkin
Syzbot has reported warning in __flush_work(). This warning is caused by work->func == NULL, which means missing work initialization. This may happen, since input_dev->close() calls cancel_work_sync(&dev->work), but dev->work initalization happens _after_ input_register_device() call. So this patch moves dev->work initialization before registering input device Fixes: 5a6eb676d3bc ("Input: appletouch - improve powersaving for Geyser3 devices") Reported-and-tested-by: syzbot+b88c5eae27386b252bbd@syzkaller.appspotmail.com Signed-off-by: Pavel Skripkin <paskripkin@gmail.com> Link: https://lore.kernel.org/r/20211230141151.17300-1-paskripkin@gmail.com Cc: stable@vger.kernel.org Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
2021-12-30Merge tag 'drm-fixes-2021-12-31' of git://anongit.freedesktop.org/drm/drmLinus Torvalds
Pull drm fixes from Dave Airlie: "This is a bit bigger than I'd like, however it has two weeks of amdgpu fixes in it, since they missed last week, which was very small. The nouveau regression is probably the biggest fix in here, and it needs to go into 5.15 as well, two i915 fixes, and then a scattering of amdgpu fixes. The biggest fix in there is for a fencing NULL pointer dereference, the rest are pretty minor. For the misc team, I've pulled the two misc fixes manually since I'm not sure what is happening at this time of year! The amdgpu maintainers have the outstanding runpm regression to fix still, they are just working through the last bits of it now. Summary: nouveau: - fencing regression fix i915: - Fix possible uninitialized variable - Fix composite fence seqno icrement on each fence creation amdgpu: - Fencing fix - XGMI fix - VCN regression fix - IP discovery regression fixes - Fix runpm documentation - Suspend/resume fixes - Yellow Carp display fixes - MCLK power management fix - dma-buf fix" * tag 'drm-fixes-2021-12-31' of git://anongit.freedesktop.org/drm/drm: drm/amd/display: Changed pipe split policy to allow for multi-display pipe split drm/amd/display: Fix USB4 null pointer dereference in update_psp_stream_config drm/amd/display: Set optimize_pwr_state for DCN31 drm/amd/display: Send s0i2_rdy in stream_count == 0 optimization drm/amd/display: Added power down for DCN10 drm/amd/display: fix B0 TMDS deepcolor no dislay issue drm/amdgpu: no DC support for headless chips drm/amdgpu: put SMU into proper state on runpm suspending for BOCO capable platform drm/amdgpu: always reset the asic in suspend (v2) drm/amd/pm: skip setting gfx cgpg in the s0ix suspend-resume drm/i915: Increment composite fence seqno drm/i915: Fix possible uninitialized variable in parallel extension drm/amdgpu: fix runpm documentation drm/nouveau: wait for the exclusive fence after the shared ones v2 drm/amdgpu: add support for IP discovery gc_info table v2 drm/amdgpu: When the VCN(1.0) block is suspended, powergating is explicitly enabled drm/amd/pm: Fix xgmi link control on aldebaran drm/amdgpu: introduce new amdgpu_fence object to indicate the job embedded fence drm/amdgpu: fix dropped backing store handling in amdgpu_dma_buf_move_notify
2021-12-31Merge branch 'drm-misc-fixes' of ssh://git.freedesktop.org/git/drm/drm-misc ↵Dave Airlie
into drm-fixes This merges two fixes that haven't been sent to me yet, but I wanted to get in. One amdgpu fix, but one nouveau regression fixer. Signed-off-by: Dave Airlie <airlied@redhat.com>
2021-12-30fs/mount_setattr: always cleanup mount_kattrChristian Brauner
Make sure that finish_mount_kattr() is called after mount_kattr was succesfully built in both the success and failure case to prevent leaking any references we took when we built it. We returned early if path lookup failed thereby risking to leak an additional reference we took when building mount_kattr when an idmapped mount was requested. Cc: linux-fsdevel@vger.kernel.org Cc: stable@vger.kernel.org Fixes: 9caccd41541a ("fs: introduce MOUNT_ATTR_IDMAP") Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-12-30Merge tag 'net-5.16-rc8' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Pull networking fixes from Jakub Kicinski: "Including fixes from.. Santa? No regressions on our radar at this point. The igc problem fixed here was the last one I was tracking but it was broken in previous releases, anyway. Mostly driver fixes and a couple of largish SMC fixes. Current release - regressions: - xsk: initialise xskb free_list_node, fixup for a -rc7 fix Current release - new code bugs: - mlx5: handful of minor fixes: - use first online CPU instead of hard coded CPU - fix some error handling paths in 'mlx5e_tc_add_fdb_flow()' - fix skb memory leak when TC classifier action offloads are disabled - fix memory leak with rules with internal OvS port Previous releases - regressions: - igc: do not enable crosstimestamping for i225-V models Previous releases - always broken: - udp: use datalen to cap ipv6 udp max gso segments - fix use-after-free in tw_timer_handler due to early free of stats - smc: fix kernel panic caused by race of smc_sock - smc: don't send CDC/LLC message if link not ready, avoid timeouts - sctp: use call_rcu to free endpoint, avoid UAF in sock diag - bridge: mcast: add and enforce query interval minimum - usb: pegasus: do not drop long Ethernet frames - mlx5e: fix ICOSQ recovery flow for XSK - nfc: uapi: use kernel size_t to fix user-space builds" * tag 'net-5.16-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (47 commits) fsl/fman: Fix missing put_device() call in fman_port_probe selftests: net: using ping6 for IPv6 in udpgro_fwd.sh Documentation: fix outdated interpretation of ip_no_pmtu_disc net/ncsi: check for error return from call to nla_put_u32 net: bridge: mcast: fix br_multicast_ctx_vlan_global_disabled helper net: fix use-after-free in tw_timer_handler selftests: net: Fix a typo in udpgro_fwd.sh selftests/net: udpgso_bench_tx: fix dst ip argument net: bridge: mcast: add and enforce startup query interval minimum net: bridge: mcast: add and enforce query interval minimum ipv6: raw: check passed optlen before reading xsk: Initialise xskb free_list_node net/mlx5e: Fix wrong features assignment in case of error net/mlx5e: TC, Fix memory leak with rules with internal port ionic: Initialize the 'lif->dbid_inuse' bitmap igc: Fix TX timestamp support for non-MSI-X platforms igc: Do not enable crosstimestamping for i225-V models net/smc: fix kernel panic caused by race of smc_sock net/smc: don't send CDC/LLC message if link not ready NFC: st21nfca: Fix memory leak in device probe and remove ...
2021-12-30Merge tag 'char-misc-5.16' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc Pull char/misc fixes from Greg KH: "Here are two misc driver fixes for 5.16-final: - binder accounting fix to resolve reported problem - nitro_enclaves fix for mmap assert warning output Both of these have been for over a week with no reported issues" * tag 'char-misc-5.16' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc: nitro_enclaves: Use get_user_pages_unlocked() call to handle mmap assert binder: fix async_free_space accounting for empty parcels
2021-12-30Merge tag 'usb-5.16' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usbLinus Torvalds
Pull USB fixes from Greg KH: "Here are some small USB driver fixes for 5.16 to resolve some reported problems: - mtu3 driver fixes - typec ucsi driver fix - xhci driver quirk added - usb gadget f_fs fix for reported crash All of these have been in linux-next for a while with no reported problems" * tag 'usb-5.16' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb: usb: typec: ucsi: Only check the contract if there is a connection xhci: Fresco FL1100 controller should not have BROKEN_MSI quirk set. usb: mtu3: set interval of FS intr and isoc endpoint usb: mtu3: fix list_head check warning usb: mtu3: add memory barrier before set GPD's HWO usb: mtu3: fix interval value for intr and isoc usb: gadget: f_fs: Clear ffs_eventfd in ffs_data_clear.
2021-12-30fsl/fman: Fix missing put_device() call in fman_port_probeMiaoqian Lin
The reference taken by 'of_find_device_by_node()' must be released when not needed anymore. Add the corresponding 'put_device()' in the and error handling paths. Fixes: 18a6c85fcc78 ("fsl/fman: Add FMan Port Support") Signed-off-by: Miaoqian Lin <linmq006@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>