Age | Commit message (Collapse) | Author |
|
We need the tty/serial fixes in here for testing.
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
We need it here to apply other char/misc driver changes to.
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging
Pull in pre-requisite patches from Guenter Roeck to constify
pointers to hwmon_channel_info.
* 'hwmon-const' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging:
hwmon: constify pointers to hwmon_channel_info
Link: https://lore.kernel.org/all/3a0391e7-21f6-432a-9872-329e298e1582@roeck-us.net/
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl
Pull compute express link (cxl) fixes from Dan Williams:
"Several fixes for driver startup regressions that landed during the
merge window as well as some older bugs.
The regressions were due to a lack of testing with what the CXL
specification calls Restricted CXL Host (RCH) topologies compared to
the testing with Virtual Host (VH) CXL topologies. A VH topology is
typical PCIe while RCH topologies map CXL endpoints as Root Complex
Integrated endpoints. The impact is some driver crashes on startup.
This merge window also added compatibility for range registers (the
mechanism that CXL 1.1 defined for mapping memory) to treat them like
HDM decoders (the mechanism that CXL 2.0 defined for mapping
Host-managed Device Memory). That work collided with the new region
enumeration code that was tested with CXL 2.0 setups, and fails with
crashes at startup.
Lastly, the DOE (Data Object Exchange) implementation for retrieving
an ACPI-like data table from CXL devices is being reworked for v6.4.
Several fixes fell out of that work that are suitable for v6.3.
All of this has been in linux-next for a while, and all reported
issues [1] have been addressed.
Summary:
- Fix several issues with region enumeration in RCH topologies that
can trigger crashes on driver startup or shutdown.
- Fix CXL DVSEC range register compatibility versus region
enumeration that leads to startup crashes
- Fix CDAT endiannes handling
- Fix multiple buffer handling boundary conditions
- Fix Data Object Exchange (DOE) workqueue usage vs
CONFIG_DEBUG_OBJECTS warn splats"
Link: http://lore.kernel.org/r/20230405075704.33de8121@canb.auug.org.au [1]
* tag 'cxl-fixes-6.3-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl:
cxl/hdm: Extend DVSEC range register emulation for region enumeration
cxl/hdm: Limit emulation to the number of range registers
cxl/region: Move coherence tracking into cxl_region_attach()
cxl/region: Fix region setup/teardown for RCDs
cxl/port: Fix find_cxl_root() for RCDs and simplify it
cxl/hdm: Skip emulation when driver manages mem_enable
cxl/hdm: Fix double allocation of @cxlhdm
PCI/DOE: Fix memory leak with CONFIG_DEBUG_OBJECTS=y
PCI/DOE: Silence WARN splat with CONFIG_DEBUG_OBJECTS=y
cxl/pci: Handle excessive CDAT length
cxl/pci: Handle truncated CDAT entries
cxl/pci: Handle truncated CDAT header
cxl/pci: Fix CDAT retrieval on big endian
|
|
There was a sort of rush surrounding commit 88c0a6b503b7 ("net: create a
netdev notifier for DSA to reject PTP on DSA master"), due to a desire
to convert DSA's attempt to deny TX timestamping on a DSA master to
something that doesn't block the kernel-wide API conversion from
ndo_eth_ioctl() to ndo_hwtstamp_set().
What was required was a mechanism that did not depend on ndo_eth_ioctl(),
and what was provided was a mechanism that did not depend on
ndo_eth_ioctl(), while at the same time introducing something that
wasn't absolutely necessary - a new netdev notifier.
There have been objections from Jakub Kicinski that using notifiers in
general when they are not absolutely necessary creates complications to
the control flow and difficulties to maintainers who look at the code.
So there is a desire to not use notifiers.
In addition to that, the notifier chain gets called even if there is no
DSA in the system and no one is interested in applying any restriction.
Take the model of udp_tunnel_nic_ops and introduce a stub mechanism,
through which net/core/dev_ioctl.c can call into DSA even when
CONFIG_NET_DSA=m.
Compared to the code that existed prior to the notifier conversion, aka
what was added in commits:
- 4cfab3566710 ("net: dsa: Add wrappers for overloaded ndo_ops")
- 3369afba1e46 ("net: Call into DSA netdevice_ops wrappers")
this is different because we are not overloading any struct
net_device_ops of the DSA master anymore, but rather, we are exposing a
rather specific functionality which is orthogonal to which API is used
to enable it - ndo_eth_ioctl() or ndo_hwtstamp_set().
Also, what is similar is that both approaches use function pointers to
get from built-in code to DSA.
There is no point in replicating the function pointers towards
__dsa_master_hwtstamp_validate() once for every CPU port (dev->dsa_ptr).
Instead, it is sufficient to introduce a singleton struct dsa_stubs,
built into the kernel, which contains a single function pointer to
__dsa_master_hwtstamp_validate().
I find this approach preferable to what we had originally, because
dev->dsa_ptr->netdev_ops->ndo_do_ioctl() used to require going through
struct dsa_port (dev->dsa_ptr), and so, this was incompatible with any
attempts to add any data encapsulation and hide DSA data structures from
the outside world.
Link: https://lore.kernel.org/netdev/20230403083019.120b72fd@kernel.org/
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
arch_kexec_kernel_image_load() only calls kexec_image_load_default(), and
there are no arch-specific implementations.
Remove the unnecessary arch_kexec_kernel_image_load() and make
kexec_image_load_default() static.
No functional change intended.
Link: https://lkml.kernel.org/r/20230307224416.907040-3-helgaas@kernel.org
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Acked-by: Baoquan He <bhe@redhat.com>
Cc: Borislav Petkov (AMD) <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Eric Biederman <ebiederm@xmission.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Patch series "kexec: Remove unnecessary arch hook", v2.
There are no arch-specific things in arch_kexec_kernel_image_load(), so
remove it and just use the generic version.
This patch (of 2):
The x86 implementation of arch_kexec_kernel_image_load() is functionally
identical to the generic arch_kexec_kernel_image_load():
arch_kexec_kernel_image_load # x86
if (!image->fops || !image->fops->load)
return ERR_PTR(-ENOEXEC);
return image->fops->load(image, image->kernel_buf, ...)
arch_kexec_kernel_image_load # generic
kexec_image_load_default
if (!image->fops || !image->fops->load)
return ERR_PTR(-ENOEXEC);
return image->fops->load(image, image->kernel_buf, ...)
Remove the x86-specific version and use the generic
arch_kexec_kernel_image_load(). No functional change intended.
Link: https://lkml.kernel.org/r/20230307224416.907040-1-helgaas@kernel.org
Link: https://lkml.kernel.org/r/20230307224416.907040-2-helgaas@kernel.org
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Acked-by: Baoquan He <bhe@redhat.com>
Cc: Borislav Petkov (AMD) <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Eric Biederman <ebiederm@xmission.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
For the sake of cleaning up the kernel.h split the hexadecimal related
helpers to own header called 'hex.h'.
Link: https://lkml.kernel.org/r/20230323155029.40000-1-andriy.shevchenko@linux.intel.com
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull MM fixes from Andrew Morton:
"28 hotfixes.
23 are cc:stable and the other five address issues which were
introduced during this merge cycle.
20 are for MM and the remainder are for other subsystems"
* tag 'mm-hotfixes-stable-2023-04-07-16-23' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (28 commits)
maple_tree: fix a potential concurrency bug in RCU mode
maple_tree: fix get wrong data_end in mtree_lookup_walk()
mm/swap: fix swap_info_struct race between swapoff and get_swap_pages()
nilfs2: fix sysfs interface lifetime
mm: take a page reference when removing device exclusive entries
mm: vmalloc: avoid warn_alloc noise caused by fatal signal
nilfs2: initialize "struct nilfs_binfo_dat"->bi_pad field
nilfs2: fix potential UAF of struct nilfs_sc_info in nilfs_segctor_thread()
zsmalloc: document freeable stats
zsmalloc: document new fullness grouping
fsdax: force clear dirty mark if CoW
mm/hugetlb: fix uffd wr-protection for CoW optimization path
mm: enable maple tree RCU mode by default
maple_tree: add RCU lock checking to rcu callback functions
maple_tree: add smp_rmb() to dead node detection
maple_tree: fix write memory barrier of nodes once dead for RCU mode
maple_tree: remove extra smp_wmb() from mas_dead_leaves()
maple_tree: fix freeing of nodes in rcu mode
maple_tree: detect dead nodes in mas_start()
maple_tree: be more cautious about dead nodes
...
|
|
The T241 platform suffers from the T241-FABRIC-4 erratum which causes
unexpected behavior in the GIC when multiple transactions are received
simultaneously from different sources. This hardware issue impacts
NVIDIA server platforms that use more than two T241 chips
interconnected. Each chip has support for 320 {E}SPIs.
This issue occurs when multiple packets from different GICs are
incorrectly interleaved at the target chip. The erratum text below
specifies exactly what can cause multiple transfer packets susceptible
to interleaving and GIC state corruption. GIC state corruption can
lead to a range of problems, including kernel panics, and unexpected
behavior.
>From the erratum text:
"In some cases, inter-socket AXI4 Stream packets with multiple
transfers, may be interleaved by the fabric when presented to ARM
Generic Interrupt Controller. GIC expects all transfers of a packet
to be delivered without any interleaving.
The following GICv3 commands may result in multiple transfer packets
over inter-socket AXI4 Stream interface:
- Register reads from GICD_I* and GICD_N*
- Register writes to 64-bit GICD registers other than GICD_IROUTERn*
- ITS command MOVALL
Multiple commands in GICv4+ utilize multiple transfer packets,
including VMOVP, VMOVI, VMAPP, and 64-bit register accesses."
This issue impacts system configurations with more than 2 sockets,
that require multi-transfer packets to be sent over inter-socket
AXI4 Stream interface between GIC instances on different sockets.
GICv4 cannot be supported. GICv3 SW model can only be supported
with the workaround. Single and Dual socket configurations are not
impacted by this issue and support GICv3 and GICv4."
Link: https://developer.nvidia.com/docs/t241-fabric-4/nvidia-t241-fabric-4-errata.pdf
Writing to the chip alias region of the GICD_In{E} registers except
GICD_ICENABLERn has an equivalent effect as writing to the global
distributor. The SPI interrupt deactivate path is not impacted by
the erratum.
To fix this problem, implement a workaround that ensures read accesses
to the GICD_In{E} registers are directed to the chip that owns the
SPI, and disable GICv4.x features. To simplify code changes, the
gic_configure_irq() function uses the same alias region for both read
and write operations to GICD_ICFGR.
Co-developed-by: Vikram Sethi <vsethi@nvidia.com>
Signed-off-by: Vikram Sethi <vsethi@nvidia.com>
Signed-off-by: Shanker Donthineni <sdonthineni@nvidia.com>
Acked-by: Sudeep Holla <sudeep.holla@arm.com> (for SMCCC/SOC ID bits)
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20230319024314.3540573-2-sdonthineni@nvidia.com
|
|
With the last non-OF, non-ACPI user of the GIC being removed in
e73307b9ebc4 ("ARM: cns3xxx: remove entire platform"), we can finally
drop the entry point and do some minor cleanup.
We also make the driver depend on CONFIG_OF, which is required
even when CONFIG_ACPI is selected.
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20230315130218.3212033-1-maz@kernel.org
|
|
The functions thermal_of_zone_register() and
thermal_of_zone_unregister() are no longer needed from the drivers as
the devm_ variant is always used.
Make them static in the C file and remove their declaration from thermal.h
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Link: https://lore.kernel.org/r/20230404075138.2914680-2-daniel.lezcano@linaro.org
|
|
Similar to the existing reg_downshift mechanism, that is used to
translate register addresses on busses that have a smaller address
stride, it's also possible to want to upshift register addresses.
Such a case was encountered when network PHYs and PCS that usually sit
on a MDIO bus (16-bits register with a stride of 1) are integrated
directly as memory-mapped devices. Here, the same register layout
defined in 802.3 is used, but the register now have a larger stride.
Introduce a mechanism to also allow upshifting register addresses.
Re-purpose reg_downshift into a more generic, signed reg_shift, whose
sign indicates the direction of the shift. To avoid confusion, also
introduce macros to explicitly indicate if we want to downshift or
upshift.
For bisectability, change any use of reg_downshift to use reg_shift.
Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Tested-by: Colin Foster <colin.foster@in-advantage.com>
Link: https://lore.kernel.org/r/20230407152604.105467-1-maxime.chevallier@bootlin.com
Signed-off-by: Mark Brown <broonie@kernel.org>
|
|
HWmon core receives an array of pointers to hwmon_channel_info and it
does not modify it, thus it can be array of const pointers for safety.
This allows drivers to make them also const.
Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
|
|
Since the cpufreq core directly uses freq_table, for cpufreq drivers
that set their target_index() callback, make it mandatory for them to
set the same.
Since this is set per policy and normally from policy->init(), do this
from cpufreq_table_validate_and_sort() which gets called right after
->init().
Reported-by: Yajun Deng <yajun.deng@linux.dev>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
|
|
Remove struct thermal_bind_params because no one is using it for thermal
binding now.
Signed-off-by: Zhang Rui <rui.zhang@intel.com>
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Link: https://lore.kernel.org/r/20230330104526.3196-1-rui.zhang@intel.com
|
|
WED version 2 (on MT7986 and later) can offload flows originating from
wireless devices.
In order to make that work, ndo_setup_tc needs to be implemented on the
netdevs. This adds the required code to offload flows coming in from WED,
while keeping track of the incoming wed index used for selecting the
correct PPE device.
Signed-off-by: Felix Fietkau <nbd@nbd.name>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
dma_default_coherent was decleared unconditionally at kernel/dma/mapping.c
but only decleared when any of non-coherent options is enabled in
dma-map-ops.h.
Guard the declaration in mapping.c with non-coherent options and provide
a fallback definition.
Signed-off-by: Jiaxun Yang <jiaxun.yang@flygoat.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
|
|
Every task_work will try to wake the task to be executed, which causes
excessive scheduling and additional overhead. For some tw it's
justified, but others won't do much but post a single CQE.
When a task waits for multiple cqes, every such task_work will wake it
up. Instead, the task may give a hint about how many cqes it waits for,
io_req_local_work_add() will compare against it and skip wake ups
if #cqes + #tw is not enough to satisfy the waiting condition. Task_work
that uses the optimisation should be simple enough and never post more
than one CQE. It's also ignored for non DEFER_TASKRUN rings.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/d2b77e99d1e86624d8a69f7037d764b739dcd225.1680782017.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
pci_msix_can_alloc_dyn() is not declared when CONFIG_PCI_MSI is disabled.
There is no existing user of pci_msix_can_alloc_dyn() but work is in
progress to change this. This work encounters the following error when
CONFIG_PCI_MSI is disabled:
drivers/vfio/pci/vfio_pci_intrs.c:427:21: error: implicit declaration of function 'pci_msix_can_alloc_dyn' [-Werror=implicit-function-declaration]
Provide definition for pci_msix_can_alloc_dyn() in preparation for users
that need to compile when CONFIG_PCI_MSI is disabled.
[bhelgaas: Also reported by Arnd Bergmann <arnd@kernel.org> in
drivers/net/ethernet/mellanox/mlx5/core/pci_irq.c; added his Fixes: line]
Fixes: fb0a6a268dcd ("net/mlx5: Provide external API for allocating vectors")
Fixes: 34026364df8e ("PCI/MSI: Provide post-enable dynamic allocation interfaces for MSI-X")
Link: https://lore.kernel.org/oe-kbuild-all/202303291000.PWFqGCxH-lkp@intel.com/
Link: https://lore.kernel.org/r/310ecc4815dae4174031062f525245f0755c70e2.1680119924.git.reinette.chatre@intel.com
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
Cc: stable@vger.kernel.org # v6.2+
|
|
Conflicts:
drivers/net/ethernet/google/gve/gve.h
3ce934558097 ("gve: Secure enough bytes in the first TX desc for all TCP pkts")
75eaae158b1b ("gve: Add XDP DROP and TX support for GQI-QPL format")
https://lore.kernel.org/all/20230406104927.45d176f5@canb.auug.org.au/
https://lore.kernel.org/all/c5872985-1a95-0bc8-9dcc-b6f23b439e9d@tessares.net/
Adjacent changes:
net/can/isotp.c
051737439eae ("can: isotp: fix race between isotp_sendsmg() and isotp_release()")
96d1c81e6a04 ("can: isotp: add module parameter for maximum pdu size")
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Pull networking fixes from Jakub Kicinski:
"Including fixes from wireless and can.
Current release - regressions:
- wifi: mac80211:
- fix potential null pointer dereference
- fix receiving mesh packets in forwarding=0 networks
- fix mesh forwarding
Current release - new code bugs:
- virtio/vsock: fix leaks due to missing skb owner
Previous releases - regressions:
- raw: fix NULL deref in raw_get_next().
- sctp: check send stream number after wait_for_sndbuf
- qrtr:
- fix a refcount bug in qrtr_recvmsg()
- do not do DEL_SERVER broadcast after DEL_CLIENT
- wifi: brcmfmac: fix SDIO suspend/resume regression
- wifi: mt76: fix use-after-free in fw features query.
- can: fix race between isotp_sendsmg() and isotp_release()
- eth: mtk_eth_soc: fix remaining throughput regression
- eth: ice: reset FDIR counter in FDIR init stage
Previous releases - always broken:
- core: don't let netpoll invoke NAPI if in xmit context
- icmp: guard against too small mtu
- ipv6: fix an uninit variable access bug in __ip6_make_skb()
- wifi: mac80211: fix the size calculation of
ieee80211_ie_len_eht_cap()
- can: fix poll() to not report false EPOLLOUT events
- eth: gve: secure enough bytes in the first TX desc for all TCP
pkts"
* tag 'net-6.3-rc6-2' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (47 commits)
net: stmmac: check fwnode for phy device before scanning for phy
net: stmmac: Add queue reset into stmmac_xdp_open() function
selftests: net: rps_default_mask.sh: delete veth link specifically
net: fec: make use of MDIO C45 quirk
can: isotp: fix race between isotp_sendsmg() and isotp_release()
can: isotp: isotp_ops: fix poll() to not report false EPOLLOUT events
can: isotp: isotp_recvmsg(): use sock_recv_cmsgs() to get SOCK_RXQ_OVFL infos
can: j1939: j1939_tp_tx_dat_new(): fix out-of-bounds memory access
gve: Secure enough bytes in the first TX desc for all TCP pkts
netlink: annotate lockless accesses to nlk->max_recvmsg_len
ethtool: reset #lanes when lanes is omitted
ping: Fix potentail NULL deref for /proc/net/icmp.
raw: Fix NULL deref in raw_get_next().
ice: Reset FDIR counter in FDIR init stage
ice: fix wrong fallback logic for FDIR
net: stmmac: fix up RX flow hash indirection table when setting channels
net: ethernet: ti: am65-cpsw: Fix mdio cleanup in probe
wifi: mt76: ignore key disable commands
wifi: ath11k: reduce the MHI timeout to 20s
ipv6: Fix an uninit variable access bug in __ip6_make_skb()
...
|
|
linux/acpi.h includes irqdomain.h which includes of.h. Break the include
chain by replacing the irqdomain include with forward declarations for
struct irq_domain and irq_domain_ops which is sufficient for acpi.h.
of.h also includes mod_devicetable.h which many drivers implicitly
depend on. As acpi.h already includes it, just move it out of the
'#ifdef CONFIG_ACPI'.
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Rob Herring <robh@kernel.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
|
|
Remove several calls to compound_head() and the last caller of
set_page_writeback_keepwrite(), so remove the wrapper too.
Also export bio_add_folio() as this is the first caller from a module.
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com>
Reviewed-by: Theodore Ts'o <tytso@mit.edu>
Link: https://lore.kernel.org/r/20230324180129.1220691-4-willy@infradead.org
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
|
fscrypt_is_bounce_folio() is the equivalent of fscrypt_is_bounce_page()
and fscrypt_pagecache_folio() is the equivalent of fscrypt_pagecache_page().
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com>
Link: https://lore.kernel.org/r/20230324180129.1220691-3-willy@infradead.org
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
|
This particular combination of flags is used by most filesystems
in their ->write_begin method, although it does find use in a
few other places. Before folios, it warranted its own function
(grab_cache_page_write_begin()), but I think that just having specialised
flags is enough. It certainly helps the few places that have been
converted from grab_cache_page_write_begin() to __filemap_get_folio().
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Link: https://lore.kernel.org/r/20230324180129.1220691-2-willy@infradead.org
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
|
Fix wrongly named 'dev' parameter in doc block, should have been iface:
drivers/usb/core/message.c:1939: warning: Function parameter or member 'iface' not described in 'usb_set_wireless_status'
drivers/usb/core/message.c:1939: warning: Excess function parameter 'dev' description in 'usb_set_wireless_status'
And fix missing struct member doc in kernel API, and reorder to
match struct:
include/linux/usb.h:270: warning: Function parameter or member 'wireless_status_work' not described in 'usb_interface'
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Link: https://lore.kernel.org/linux-next/20230405114807.5a57bf46@canb.auug.org.au/T/#t
Fixes: 0a4db185f078 ("USB: core: Add API to change the wireless_status")
Signed-off-by: Bastien Nocera <hadess@hadess.net>
Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Link: https://lore.kernel.org/r/20230405092754.36579-1-hadess@hadess.net
[bentiss: fix checkpatch warning]
Signed-off-by: Benjamin Tissoires <benjamin.tissoires@redhat.com>
|
|
The PWM capture assumes that the input selector is set to default
input and that the slave mode is disabled. Force reset state for
TISEL and SMCR registers to match this requirement.
Note that slave mode disabling is not a pre-requisite by itself
for capture mode, as hardware supports it for PWM capture.
However, the current implementation of the driver does not
allow slave mode for PWM capture. Setting slave mode for PWM
capture results in wrong capture values.
Signed-off-by: Olivier Moysan <olivier.moysan@foss.st.com>
Acked-by: Lee Jones <lee@kernel.org>
Acked-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Signed-off-by: Thierry Reding <thierry.reding@gmail.com>
|
|
https://gitlab.freedesktop.org/drm/tegra into drm-next
drm/tegra: Changes for v6.4-rc1
The majority of this is minor cleanups and fixes. Other than those, this
contains Uwe's conversion to the new driver remove callback and Thomas'
fbdev DRM client conversion. The driver can now also be built on other
architectures to easy compile coverage.
Finally, this adds Mikko as a second maintainer for the driver. As a
next step we also want Tegra DRM to move into drm-misc to streamline the
maintenance process.
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
From: Thierry Reding <thierry.reding@gmail.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20230406121404.967704-1-thierry.reding@gmail.com
|
|
Fix the __iomem annotation of the iomem_base pointers in the apple-gmux
code. The __iomem should go before the *.
This fixes a bunch of sparse warnings like this one:
drivers/platform/x86/apple-gmux.c:224:48: sparse:
expected void const [noderef] __iomem *
got unsigned char [usertype] *
Fixes: 0c18184de990 ("platform/x86: apple-gmux: support MMIO gmux on T2 Macs")
Reported-by: kernel test robot <lkp@intel.com>
Link: https://lore.kernel.org/oe-kbuild-all/202304040401.IMxt7Ubi-lkp@intel.com/
Suggested-by: Dan Carpenter <error27@gmail.com>
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Reviewed-by: Orlando Chamberlain <orlandoch.dev@gmail.com>
Link: https://lore.kernel.org/r/20230404111955.43266-1-hdegoede@redhat.com
|
|
Recent attempt to ensure PREROUTING hook is executed again when a
decrypted ipsec packet received on a bridge passes through the network
stack a second time broke the physdev match in INPUT hook.
We can't discard the nf_bridge info strct from sabotage_in hook, as
this is needed by the physdev match.
Keep the struct around and handle this with another conditional instead.
Fixes: 2b272bb558f1 ("netfilter: br_netfilter: disable sabotage_in hook after first suppression")
Reported-and-tested-by: Farid BENAMROUCHE <fariouche@yahoo.fr>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
Functions for searching module kallsyms should have non-empty
definitions only if CONFIG_MODULES=y and CONFIG_KALLSYMS=y. Until now,
only CONFIG_MODULES check was used for many of these, which may have
caused complilation errors on some configs.
This patch moves all relevant functions under the correct configs.
Fixes: bd5314f8dd2d ("kallsyms, bpf: Move find_kallsyms_symbol_value out of internal header")
Signed-off-by: Viktor Malik <vmalik@redhat.com>
Reported-by: kernel test robot <lkp@intel.com>
Link: https://lore.kernel.org/oe-kbuild-all/202303181535.RFDCnz3E-lkp@intel.com/
Link: https://lore.kernel.org/r/20230330102001.2183693-1-vmalik@redhat.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
This function returned zero unconditionally. Convert it to return no
value instead. This makes it more obvious what happens in the callers.
One caller is converted to return zero explicitly. The only other caller
(smd_subdev_stop() in drivers/remoteproc/qcom_common.c) already ignored
the return value before.
Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Signed-off-by: Bjorn Andersson <andersson@kernel.org>
Link: https://lore.kernel.org/r/20230321154039.355098-2-u.kleine-koenig@pengutronix.de
|
|
before: last 6 bits of PID is used as index to store information about
tasks accessing VMA's.
after: hash_32 is used to take of cases where tasks are created over a
period of time, and thus improve collision probability.
Result:
The patch series overall improves autonuma cost.
Kernbench around more than 5% improvement and system time in mmtest
autonuma showed more than 80% improvement
Link: https://lkml.kernel.org/r/d5a9f75513300caed74e5c8570bba9317b963c2b.1677672277.git.raghavendra.kt@amd.com
Signed-off-by: Raghavendra K T <raghavendra.kt@amd.com>
Suggested-by: Peter Zijlstra <peterz@infradead.org>
Cc: Bharata B Rao <bharata@amd.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Disha Talreja <dishaa.talreja@amd.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Mike Rapoport <rppt@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
This helps to ensure that only recently accessed PIDs scan the VMAs.
Current implementation: (idea supported by PeterZ)
1. Accessing PID information is maintained in two windows.
access_pids[1] being newest.
2. Reset old access PID info i.e. access_pid[0] every (4 *
sysctl_numa_balancing_scan_delay) interval after initial scan delay
period expires.
The above interval seemed to be experimentally optimum since it avoids
frequent reset of access info as well as helps clearing the old access
info regularly. The reset logic is implemented in scan path.
Link: https://lkml.kernel.org/r/f7a675f66d1442d048b4216b2baf94515012c405.1677672277.git.raghavendra.kt@amd.com
Signed-off-by: Raghavendra K T <raghavendra.kt@amd.com>
Suggested-by: Mel Gorman <mgorman@techsingularity.net>
Cc: Bharata B Rao <bharata@amd.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Disha Talreja <dishaa.talreja@amd.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
During Numa scanning make sure only relevant vmas of the tasks are
scanned.
Before:
All the tasks of a process participate in scanning the vma even if they
do not access vma in it's lifespan.
Now:
Except cases of first few unconditional scans, if a process do
not touch vma (exluding false positive cases of PID collisions)
tasks no longer scan all vma
Logic used:
1) 6 bits of PID used to mark active bit in vma numab status during
fault to remember PIDs accessing vma. (Thanks Mel)
2) Subsequently in scan path, vma scanning is skipped if current PID
had not accessed vma.
3) First two times we do allow unconditional scan to preserve earlier
behaviour of scanning.
Acknowledgement to Bharata B Rao <bharata@amd.com> for initial patch to
store pid information and Peter Zijlstra <peterz@infradead.org> (Usage of
test and set bit)
Link: https://lkml.kernel.org/r/092f03105c7c1d3450f4636b1ea350407f07640e.1677672277.git.raghavendra.kt@amd.com
Signed-off-by: Raghavendra K T <raghavendra.kt@amd.com>
Suggested-by: Mel Gorman <mgorman@techsingularity.net>
Cc: David Hildenbrand <david@redhat.com>
Cc: Disha Talreja <dishaa.talreja@amd.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Mike Rapoport <rppt@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Pach series "sched/numa: Enhance vma scanning", v3.
The patchset proposes one of the enhancements to numa vma scanning
suggested by Mel. This is continuation of [3].
Reposting the rebased patchset to akpm mm-unstable tree (March 1)
Existing mechanism of scan period involves, scan period derived from
per-thread stats. Process Adaptive autoNUMA [1] proposed to gather NUMA
fault stats at per-process level to capture aplication behaviour better.
During that course of discussion, Mel proposed several ideas to enhance
current numa balancing. One of the suggestion was below
Track what threads access a VMA. The suggestion was to use an unsigned
long pid_mask and use the lower bits to tag approximately what threads
access a VMA. Skip VMAs that did not trap a fault. This would be
approximate because of PID collisions but would reduce scanning of areas
the thread is not interested in. The above suggestion intends not to
penalize threads that has no interest in the vma, thus reduce scanning
overhead.
V3 changes are mostly based on PeterZ comments (details below in changes)
Summary of patchset:
Current patchset implements:
1. Delay the vma scanning logic for newly created VMA's so that
additional overhead of scanning is not incurred for short lived tasks
(implementation by Mel)
2. Store the information of tasks accessing VMA in 2 windows. It is
regularly cleared in (4*sysctl_numa_balancing_scan_delay) interval.
The above time is derived from experimenting (Suggested by PeterZ) to
balance between frequent clearing vs obsolete access data
3. hash_32 used to encode task index accessing VMA information
4. VMA's acess information is used to skip scanning for the tasks
which had not accessed VMA
Changes since V2:
patch1:
- Renaming of structure, macro to function,
- Add explanation to heuristics
- Adding more details from result (PeterZ)
Patch2:
- Usage of test and set bit (PeterZ)
- Move storing access PID info to numa_migrate_prep()
- Add a note on fainess among tasks allowed to scan
(PeterZ)
Patch3:
- Maintain two windows of access PID information
(PeterZ supported implementation and Gave idea to extend
to N if needed)
Patch4:
- Apply hash_32 function to track VMA accessing PIDs (PeterZ)
Changes since RFC V1:
- Include Mel's vma scan delay patch
- Change the accessing pid store logic (Thanks Mel)
- Fencing structure / code to NUMA_BALANCING (David, Mel)
- Adding clearing access PID logic (Mel)
- Descriptive change log ( Mike Rapoport)
Things to ponder over:
==========================================
- Improvement to clearing accessing PIDs logic (discussed in-detail in
patch3 itself (Done in this patchset by implementing 2 window history)
- Current scan period is not changed in the patchset, so we do see
frequent tries to scan. Relaxing scan period dynamically could improve
results further.
[1] sched/numa: Process Adaptive autoNUMA
Link: https://lore.kernel.org/lkml/20220128052851.17162-1-bharata@amd.com/T/
[2] RFC V1 Link:
https://lore.kernel.org/all/cover.1673610485.git.raghavendra.kt@amd.com/
[3] V2 Link:
https://lore.kernel.org/lkml/cover.1675159422.git.raghavendra.kt@amd.com/
Results:
Summary: Huge autonuma cost reduction seen in mmtest. Kernbench improvement
is more than 5% and huge system time (80%+) improvement from mmtest autonuma.
(dbench had huge std deviation to post)
kernbench
===========
6.2.0-mmunstable-base 6.2.0-mmunstable-patched
Amean user-256 22002.51 ( 0.00%) 22649.95 * -2.94%*
Amean syst-256 10162.78 ( 0.00%) 8214.13 * 19.17%*
Amean elsp-256 160.74 ( 0.00%) 156.92 * 2.38%*
Duration User 66017.43 67959.84
Duration System 30503.15 24657.03
Duration Elapsed 504.61 493.12
6.2.0-mmunstable-base 6.2.0-mmunstable-patched
Ops NUMA alloc hit 1738835089.00 1738780310.00
Ops NUMA alloc local 1738834448.00 1738779711.00
Ops NUMA base-page range updates 477310.00 392566.00
Ops NUMA PTE updates 477310.00 392566.00
Ops NUMA hint faults 96817.00 87555.00
Ops NUMA hint local faults % 10150.00 2192.00
Ops NUMA hint local percent 10.48 2.50
Ops NUMA pages migrated 86660.00 85363.00
Ops AutoNUMA cost 489.07 442.14
autonumabench
===============
6.2.0-mmunstable-base 6.2.0-mmunstable-patched
Amean syst-NUMA01 399.50 ( 0.00%) 52.05 * 86.97%*
Amean syst-NUMA01_THREADLOCAL 0.21 ( 0.00%) 0.22 * -5.41%*
Amean syst-NUMA02 0.80 ( 0.00%) 0.78 * 2.68%*
Amean syst-NUMA02_SMT 0.65 ( 0.00%) 0.68 * -3.95%*
Amean elsp-NUMA01 313.26 ( 0.00%) 313.11 * 0.05%*
Amean elsp-NUMA01_THREADLOCAL 1.06 ( 0.00%) 1.08 * -1.76%*
Amean elsp-NUMA02 3.19 ( 0.00%) 3.24 * -1.52%*
Amean elsp-NUMA02_SMT 3.72 ( 0.00%) 3.61 * 2.92%*
Duration User 396433.47 324835.96
Duration System 2808.70 376.66
Duration Elapsed 2258.61 2258.12
6.2.0-mmunstable-base 6.2.0-mmunstable-patched
Ops NUMA alloc hit 59921806.00 49623489.00
Ops NUMA alloc miss 0.00 0.00
Ops NUMA interleave hit 0.00 0.00
Ops NUMA alloc local 59920880.00 49622594.00
Ops NUMA base-page range updates 152259275.00 50075.00
Ops NUMA PTE updates 152259275.00 50075.00
Ops NUMA PMD updates 0.00 0.00
Ops NUMA hint faults 154660352.00 39014.00
Ops NUMA hint local faults % 138550501.00 23139.00
Ops NUMA hint local percent 89.58 59.31
Ops NUMA pages migrated 8179067.00 14147.00
Ops AutoNUMA cost 774522.98 195.69
This patch (of 4):
Currently whenever a new task is created we wait for
sysctl_numa_balancing_scan_delay to avoid unnessary scanning overhead.
Extend the same logic to new or very short-lived VMAs.
[raghavendra.kt@amd.com: add initialization in vm_area_dup())]
Link: https://lkml.kernel.org/r/cover.1677672277.git.raghavendra.kt@amd.com
Link: https://lkml.kernel.org/r/7a6fbba87c8b51e67efd3e74285bb4cb311a16ca.1677672277.git.raghavendra.kt@amd.com
Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
Signed-off-by: Raghavendra K T <raghavendra.kt@amd.com>
Cc: Bharata B Rao <bharata@amd.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Disha Talreja <dishaa.talreja@amd.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
vma->lock being part of the vm_area_struct causes performance regression
during page faults because during contention its count and owner fields
are constantly updated and having other parts of vm_area_struct used
during page fault handling next to them causes constant cache line
bouncing. Fix that by moving the lock outside of the vm_area_struct.
All attempts to keep vma->lock inside vm_area_struct in a separate cache
line still produce performance regression especially on NUMA machines.
Smallest regression was achieved when lock is placed in the fourth cache
line but that bloats vm_area_struct to 256 bytes.
Considering performance and memory impact, separate lock looks like the
best option. It increases memory footprint of each VMA but that can be
optimized later if the new size causes issues. Note that after this
change vma_init() does not allocate or initialize vma->lock anymore. A
number of drivers allocate a pseudo VMA on the stack but they never use
the VMA's lock, therefore it does not need to be allocated. The future
drivers which might need the VMA lock should use
vm_area_alloc()/vm_area_free() to allocate the VMA.
Link: https://lkml.kernel.org/r/20230227173632.3292573-34-surenb@google.com
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
call_rcu() can take a long time when callback offloading is enabled. Its
use in the vm_area_free can cause regressions in the exit path when
multiple VMAs are being freed.
Because exit_mmap() is called only after the last mm user drops its
refcount, the page fault handlers can't be racing with it. Any other
possible user like oom-reaper or process_mrelease are already synchronized
using mmap_lock. Therefore exit_mmap() can free VMAs directly, without
the use of call_rcu().
Expose __vm_area_free() and use it from exit_mmap() to avoid possible
call_rcu() floods and performance regressions caused by it.
Link: https://lkml.kernel.org/r/20230227173632.3292573-33-surenb@google.com
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Add a new CONFIG_PER_VMA_LOCK_STATS config option to dump extra statistics
about handling page fault under VMA lock.
Link: https://lkml.kernel.org/r/20230227173632.3292573-29-surenb@google.com
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Add a new flag to distinguish page faults handled under protection of
per-vma lock.
[surenb@google.com: document FAULT_FLAG_VMA_LOCK flag]
Link: https://lkml.kernel.org/r/20230301022720.1380780-2-surenb@google.com
Link: https://lore.kernel.org/all/20230301113648.7c279865@canb.auug.org.au/
Link: https://lkml.kernel.org/r/20230227173632.3292573-26-surenb@google.com
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Reviewed-by: Laurent Dufour <laurent.dufour@fr.ibm.com>
Cc: Dan Carpenter <error27@gmail.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: David Howells <dhowells@redhat.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: Liam R. Howlett <Liam.Howlett@oracle.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
Cc: Yu Zhao <yuzhao@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Introduce lock_vma_under_rcu function to lookup and lock a VMA during page
fault handling. When VMA is not found, can't be locked or changes after
being locked, the function returns NULL. The lookup is performed under
RCU protection to prevent the found VMA from being destroyed before the
VMA lock is acquired. VMA lock statistics are updated according to the
results. For now only anonymous VMAs can be searched this way. In other
cases the function returns NULL.
Link: https://lkml.kernel.org/r/20230227173632.3292573-24-surenb@google.com
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Per-vma locking mechanism will search for VMA under RCU protection and
then after locking it, has to ensure it was not removed from the VMA tree
after we found it. To make this check efficient, introduce a
vma->detached flag to mark VMAs which were removed from the VMA tree.
Link: https://lkml.kernel.org/r/20230227173632.3292573-23-surenb@google.com
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Protect VMA from concurrent page fault handler while collapsing a huge
page. Page fault handler needs a stable PMD to use PTL and relies on
per-VMA lock to prevent concurrent PMD changes. pmdp_collapse_flush(),
set_huge_pmd() and collapse_and_free_pmd() can modify a PMD, which will
not be detected by a page fault handler without proper locking.
Before this patch, page tables can be walked under any one of the
mmap_lock, the mapping lock, and the anon_vma lock; so when khugepaged
unlinks and frees page tables, it must ensure that all of those either are
locked or don't exist. This patch adds a fourth lock under which page
tables can be traversed, and so khugepaged must also lock out that one.
[surenb@google.com: vm_lock/i_mmap_rwsem inversion in retract_page_tables]
Link: https://lkml.kernel.org/r/20230303213250.3555716-1-surenb@google.com
[surenb@google.com: build fix]
Link: https://lkml.kernel.org/r/CAJuCfpFjWhtzRE1X=J+_JjgJzNKhq-=JT8yTBSTHthwp0pqWZw@mail.gmail.com
Link: https://lkml.kernel.org/r/20230227173632.3292573-16-surenb@google.com
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Updates to vm_flags have to be done with VMA marked as being written for
preventing concurrent page faults or other modifications.
Link: https://lkml.kernel.org/r/20230227173632.3292573-14-surenb@google.com
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Introduce per-VMA locking. The lock implementation relies on a per-vma
and per-mm sequence counters to note exclusive locking:
- read lock - (implemented by vma_start_read) requires the vma
(vm_lock_seq) and mm (mm_lock_seq) sequence counters to differ.
If they match then there must be a vma exclusive lock held somewhere.
- read unlock - (implemented by vma_end_read) is a trivial vma->lock
unlock.
- write lock - (vma_start_write) requires the mmap_lock to be held
exclusively and the current mm counter is assigned to the vma counter.
This will allow multiple vmas to be locked under a single mmap_lock
write lock (e.g. during vma merging). The vma counter is modified
under exclusive vma lock.
- write unlock - (vma_end_write_all) is a batch release of all vma
locks held. It doesn't pair with a specific vma_start_write! It is
done before exclusive mmap_lock is released by incrementing mm
sequence counter (mm_lock_seq).
- write downgrade - if the mmap_lock is downgraded to the read lock, all
vma write locks are released as well (effectivelly same as write
unlock).
Link: https://lkml.kernel.org/r/20230227173632.3292573-13-surenb@google.com
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Move mmap_lock assert function definitions up so that they can be used by
other mmap_lock routines.
Link: https://lkml.kernel.org/r/20230227173632.3292573-12-surenb@google.com
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
This prepares for page faults handling under VMA lock, looking up VMAs
under protection of an rcu read lock, instead of the usual mmap read lock.
Link: https://lkml.kernel.org/r/20230227173632.3292573-11-surenb@google.com
Signed-off-by: Michel Lespinasse <michel@lespinasse.org>
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Having previously laid the foundation for converting vread() to an
iterator function, pull the trigger and do so.
This patch attempts to provide minimal refactoring and to reflect the
existing logic as best we can, for example we continue to zero portions of
memory not read, as before.
Overall, there should be no functional difference other than a performance
improvement in /proc/kcore access to vmalloc regions.
Now we have eliminated the need for a bounce buffer in read_kcore_iter(),
we dispense with it, and try to write to user memory optimistically but
with faults disabled via copy_page_to_iter_nofault(). We already have
preemption disabled by holding a spin lock. We continue faulting in until
the operation is complete.
Additionally, we must account for the fact that at any point a copy may
fail (most likely due to a fault not being able to occur), we exit
indicating fewer bytes retrieved than expected.
[sfr@canb.auug.org.au: fix sparc64 warning]
Link: https://lkml.kernel.org/r/20230320144721.663280c3@canb.auug.org.au
[lstoakes@gmail.com: redo Stephen's sparc build fix]
Link: https://lkml.kernel.org/r/8506cbc667c39205e65a323f750ff9c11a463798.1679566220.git.lstoakes@gmail.com
[akpm@linux-foundation.org: unbreak uio.h includes]
Link: https://lkml.kernel.org/r/941f88bc5ab928e6656e1e2593b91bf0f8c81e1b.1679511146.git.lstoakes@gmail.com
Signed-off-by: Lorenzo Stoakes <lstoakes@gmail.com>
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Reviewed-by: Baoquan He <bhe@redhat.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: David Hildenbrand <david@redhat.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Liu Shixin <liushixin2@huawei.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Uladzislau Rezki (Sony) <urezki@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Provide a means to copy a page to user space from an iterator, aborting if
a page fault would occur. This supports compound pages, but may be passed
a tail page with an offset extending further into the compound page, so we
cannot pass a folio.
This allows for this function to be called from atomic context and _try_
to user pages if they are faulted in, aborting if not.
The function does not use _copy_to_iter() in order to not specify
might_fault(), this is similar to copy_page_from_iter_atomic().
This is being added in order that an iteratable form of vread() can be
implemented while holding spinlocks.
Link: https://lkml.kernel.org/r/19734729defb0f498a76bdec1bef3ac48a3af3e8.1679511146.git.lstoakes@gmail.com
Signed-off-by: Lorenzo Stoakes <lstoakes@gmail.com>
Reviewed-by: Baoquan He <bhe@redhat.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: David Hildenbrand <david@redhat.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Liu Shixin <liushixin2@huawei.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Uladzislau Rezki (Sony) <urezki@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|