Age | Commit message (Collapse) | Author |
|
The perf subsystem assumes that all counters are by default per-CPU. So
the user space tool reads a counter from each CPU. However, the IOMMU
counters are system-wide and can be read from any CPU. Here we use a CPU
mask to restrict counting to one CPU to handle the issue. (with CPU
hotplug notifier to choose a different CPU if the chosen one is taken
off-line).
The CPU is exposed to /sys/bus/event_source/devices/dmar*/cpumask for
the user space perf tool.
Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Link: https://lore.kernel.org/r/20230128200428.1459118-6-kan.liang@linux.intel.com
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
|
|
Implement the IOMMU performance monitor capability, which supports the
collection of information about key events occurring during operation of
the remapping hardware, to aid performance tuning and debug.
The IOMMU perfmon support is implemented as part of the IOMMU driver and
interfaces with the Linux perf subsystem.
The IOMMU PMU has the following unique features compared with the other
PMUs.
- Support counting. Not support sampling.
- Does not support per-thread counting. The scope is system-wide.
- Support per-counter capability register. The event constraints can be
enumerated.
- The available event and event group can also be enumerated.
- Extra Enhanced Commands are introduced to control the counters.
Add a new variable, struct iommu_pmu *pmu, to in the struct intel_iommu
to track the PMU related information.
Add iommu_pmu_register() and iommu_pmu_unregister() to register and
unregister a IOMMU PMU. The register function setup the IOMMU PMU ops
and invoke the standard perf_pmu_register() interface to register a PMU
in the perf subsystem. This patch only exposes the functions. The
following patch will enable them in the IOMMU driver.
The IOMMU PMUs can be found under /sys/bus/event_source/devices/dmar*
The available filters and event format can be found at the format folder
$ ls /sys/bus/event_source/devices/dmar1/format/
event event_group filter_ats filter_ats_en filter_page_table
filter_page_table_en
The supported events can be found at the events folder
$ ls /sys/bus/event_source/devices/dmar1/events/
ats_blocked fs_nonleaf_hit int_cache_hit_posted
iommu_mem_blocked iotlb_hit pasid_cache_lookup ss_nonleaf_hit
ctxt_cache_hit fs_nonleaf_lookup int_cache_lookup
iommu_mrds iotlb_lookup pg_req_posted ss_nonleaf_lookup
ctxt_cache_lookup int_cache_hit_nonposted iommu_clocks
iommu_requests pasid_cache_hit pw_occupancy
The command below illustrates filter usage with a simple example.
$ perf stat -e dmar1/iommu_requests,filter_ats_en=0x1,filter_ats=0x1/
-a sleep 1
Performance counter stats for 'system wide':
368,947 dmar1/iommu_requests,filter_ats_en=0x1,filter_ats=0x1/
1.002592074 seconds time elapsed
Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Link: https://lore.kernel.org/r/20230128200428.1459118-5-kan.liang@linux.intel.com
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
|
|
The Enhanced Command Register is to submit command and operand of
enhanced commands to DMA Remapping hardware. It can supports up to 256
enhanced commands.
There is a HW register to indicate the availability of all 256 enhanced
commands. Each bit stands for each command. But there isn't an existing
interface to read/write all 256 bits. Introduce the u64 ecmdcap[4] to
store the existence of each enhanced command. Read 4 times to get all of
them in map_iommu().
Add a helper to facilitate an enhanced command launch. Make sure hardware
complete the command. Also add a helper to facilitate the check of PMU
essentials. These helpers will be used later.
Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Link: https://lore.kernel.org/r/20230128200428.1459118-4-kan.liang@linux.intel.com
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
|
|
The performance monitoring infrastructure, perfmon, is to support
collection of information about key events occurring during operation of
the remapping hardware, to aid performance tuning and debug. Each
remapping hardware unit has capability registers that indicate support
for performance monitoring features and enumerate the capabilities.
Add alloc_iommu_pmu() to retrieve IOMMU perfmon capability information
for each iommu unit. The information is stored in the iommu->pmu data
structure. Capability registers are read-only, so it's safe to prefetch
and store them in the pmu structure. This could avoid unnecessary VMEXIT
when this code is running in the virtualization environment.
Add free_iommu_pmu() to free the saved capability information when
freeing the iommu unit.
Add a kernel config option for the IOMMU perfmon feature. Unless a user
explicitly uses the perf tool to monitor the IOMMU perfmon event, there
isn't any impact for the existing IOMMU. Enable it by default.
Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Link: https://lore.kernel.org/r/20230128200428.1459118-3-kan.liang@linux.intel.com
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
|
|
A new field, which indicates the size of the remapping hardware register
set for this remapping unit, is introduced in the DMA-remapping hardware
unit definition (DRHD) structure with the VT-d Spec 4.0. With this
information, SW doesn't need to 'guess' the size of the register set
anymore.
Update the struct acpi_dmar_hardware_unit to reflect the field. Store the
size of the register set in struct dmar_drhd_unit for each dmar device.
The 'size' information is ResvZ for the old BIOS and platforms. Fall back
to the old guessing method. There is nothing changed.
Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Link: https://lore.kernel.org/r/20230128200428.1459118-2-kan.liang@linux.intel.com
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
|
|
Setup No Execute Enable bit (Bit 133) of a scalable mode PASID entry.
This is to allow the use of XD bit of the first level page table.
Fixes: ddf09b6d43ec ("iommu/vt-d: Setup pasid entries for iova over first level")
Signed-off-by: Ashok Raj <ashok.raj@intel.com>
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Link: https://lore.kernel.org/r/20230126095438.354205-1-baolu.lu@linux.intel.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
|
|
After commit be51b1d6bbff ("iommu/sva: Refactoring
iommu_sva_bind/unbind_device()"), the iommu driver doesn't need to
return an iommu_sva pointer anymore. This removes the sva field
from intel_svm_dev and cleanups the code accordingly.
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Link: https://lore.kernel.org/r/20230109014955.147068-5-baolu.lu@linux.intel.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
|
|
It was used as a reference counter of an existing bond between device
and user application memory address. Commit be51b1d6bbff ("iommu/sva:
Refactoring iommu_sva_bind/unbind_device()") has added this in iommu
core. Remove it to avoid duplicate code.
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Link: https://lore.kernel.org/r/20230109014955.147068-4-baolu.lu@linux.intel.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
|
|
They aren't used anywhere. Remove them to avoid dead code.
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Link: https://lore.kernel.org/r/20230109014955.147068-3-baolu.lu@linux.intel.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
|
|
There's no need to have a public header for Intel SVA implementation.
The device driver should interact with Intel SVA implementation via
the IOMMU generic APIs.
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Link: https://lore.kernel.org/r/20230109014955.147068-2-baolu.lu@linux.intel.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
|
|
The 'acpiid' buffer in the parse_ivrs_acpihid function may overflow,
because the string specifier in the format string sscanf()
has no width limitation.
Found by InfoTeCS on behalf of Linux Verification Center
(linuxtesting.org) with SVACE.
Fixes: ca3bf5d47cec ("iommu/amd: Introduces ivrs_acpihid kernel parameter")
Cc: stable@vger.kernel.org
Signed-off-by: Ilia.Gavrilov <Ilia.Gavrilov@infotecs.ru>
Reviewed-by: Kim Phillips <kim.phillips@amd.com>
Link: https://lore.kernel.org/r/20230202082719.1513849-1-Ilia.Gavrilov@infotecs.ru
Signed-off-by: Joerg Roedel <jroedel@suse.de>
|
|
Merge tag phy-devm_of_phy_optional_get into next to bring in the new
devm_of_phy_optional_get() API and users
|
|
pci_device_group() can return an already existing IOMMU group if the PCI
device's pagetables have to be shared with another one due to bus
toplogy, isolation features and/or DMA alias quirks.
apple_dart_device_group() however assumes that the group has just been
created and overwrites its iommudata which will eventually lead to
apple_dart_release_group leaving stale entries in sid2group.
Fix that by merging the iommudata if the returned group already exists.
Fixes: f0b636804c7c ("iommu/dart: Clear sid2group entry when a group is freed")
Signed-off-by: Sven Peter <sven@svenpeter.dev>
Reviewed-by: Eric Curtin <ecurtin@redhat.com>
Link: https://lore.kernel.org/r/20230128113532.94651-1-sven@svenpeter.dev
Signed-off-by: Joerg Roedel <jroedel@suse.de>
|
|
Add set_platform_dma_ops() required for proper driver operation on ARM
32bit arch after recent changes in the IOMMU framework (detach ops
removal).
Fixes: c1fe9119ee70 ("iommu: Add set_platform_dma_ops callbacks")
Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
Link: https://lore.kernel.org/r/20230123093102.12392-1-m.szyprowski@samsung.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
|
|
Add a driver for the motorcomm yt8531 gigabit ethernet phy. We have
verified the driver on AM335x platform with yt8531 board. On the
board, yt8531 gigabit ethernet phy works in utp mode, RGMII
interface, supports 1000M/100M/10M speeds, and wol(magic package).
Signed-off-by: Frank Sae <Frank.Sae@motor-comm.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Add dts support for Motorcomm yt8531s gigabit ethernet phy.
Change yt8521_probe to support clk config of yt8531s. Becase
yt8521_probe does the things which yt8531s is needed, so
removed yt8531s function.
This patch has been verified on AM335x platform with yt8531s board.
Signed-off-by: Frank Sae <Frank.Sae@motor-comm.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Add dts support for Motorcomm yt8521 gigabit ethernet phy.
Add ytphy_rgmii_clk_delay_config function to support dst config for
the delay of rgmii clk. This funciont is common for yt8521, yt8531s
and yt8531.
This patch has been verified on AM335x platform.
Signed-off-by: Frank Sae <Frank.Sae@motor-comm.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Add BIT macro for Motorcomm yt8521/yt8531 gigabit ethernet phy.
This is a preparatory patch. Add BIT macro for 0xA012 reg, and
supplement for 0xA001 and 0xA003 reg. These will be used to support dts.
Signed-off-by: Frank Sae <Frank.Sae@motor-comm.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
In order to offload connections in other states besides "established" the
driver offload callbacks need to have access to connection conntrack info.
Flow offload intermediate representation data structure already contains
that data encoded in 'cookie' field, so just reuse it in the drivers.
Reject offloading IP_CT_NEW connections for now by returning an error in
relevant driver callbacks based on value of ctinfo. Support for offloading
such connections will need to be added to the drivers afterwards.
Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
According to:
https://github.com/intel/ipu6-drivers/blob/master/patch/int3472-support-independent-clock-and-LED-gpios-5.17%2B.patch
Bits 31-24 of the _DSM pin entry integer value codes the active-value,
that is the actual physical signal (0 or 1) which needs to be output on
the pin to turn the sensor chip on (to make it active).
So if bits 31-24 are 0 for a reset pin, then the actual value of the reset
pin needs to be 0 to take the chip out of reset. IOW in this case the reset
signal is active-high rather then the default active-low.
And if bits 31-24 are 0 for a clk-en pin then the actual value of the clk
pin needs to be 0 to enable the clk. So in this case the clk-en signal
is active-low rather then the default active-high.
IOW if bits 31-24 are 0 for a pin, then the default polarity of the pin
is inverted.
Add a check for this and also propagate this new polarity to the clock
registration.
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Reviewed-by: Sakari Ailus <sakari.ailus@linux.intel.com>
Link: https://lore.kernel.org/r/20230127203729.10205-6-hdegoede@redhat.com
|
|
skl_int3472_register_clock()
Move the requesting of the clk-enable GPIO to skl_int3472_register_clock()
(and move the gpiod_put to unregister).
This mirrors the GPIO handling in skl_int3472_register_regulator() and
allows removing skl_int3472_map_gpio_to_clk() from discrete.c.
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Reviewed-by: Sakari Ailus <sakari.ailus@linux.intel.com>
Link: https://lore.kernel.org/r/20230127203729.10205-5-hdegoede@redhat.com
|
|
On some systems, e.g. the Lenovo ThinkPad X1 Yoga gen 7 and the ThinkPad
X1 Nano gen 2 there is no clock-enable pin, triggering the:
"No clk GPIO. The privacy LED won't work" warning and causing the privacy
LED to not work.
Fix this by modeling the privacy LED as a LED class device rather then
integrating it with the registered clock.
Note this relies on media subsys changes to actually turn the LED on/off
when the sensor's v4l2_subdev's s_stream() operand gets called.
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Reviewed-by: Sakari Ailus <sakari.ailus@linux.intel.com>
Link: https://lore.kernel.org/r/20230127203729.10205-4-hdegoede@redhat.com
|
|
Add a helper function to map the type returned by the _DSM
method to a function name + the default polarity for that function.
And fold the INT3472_GPIO_TYPE_RESET and INT3472_GPIO_TYPE_POWERDOWN
cases into a single generic case.
This is a preparation patch for further GPIO mapping changes.
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Reviewed-by: Sakari Ailus <sakari.ailus@linux.intel.com>
Link: https://lore.kernel.org/r/20230127203729.10205-3-hdegoede@redhat.com
|
|
present
Make v4l2_async_register_subdev_sensor() try to get a privacy LED
associated with the sensor and extend the call_s_stream() wrapper to
enable/disable the privacy LED if found.
This makes the core handle privacy LED control, rather then having to
duplicate this code in all the sensor drivers.
Suggested-by: Sakari Ailus <sakari.ailus@linux.intel.com>
Acked-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Acked-by: Sakari Ailus <sakari.ailus@linux.intel.com>
Link: https://lore.kernel.org/r/20230127203729.10205-2-hdegoede@redhat.com
|
|
Enable debugfs for vcap for lan966x. This will allow to print all the
entries in the VCAP and also the port information regarding which keys
are configured.
Signed-off-by: Horatiu Vultur <horatiu.vultur@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The previous code set the speed by the interface mode of PHY.
Also this hardware has a restriction which cannot change the speed
at runtime. To use other speed, add "max-speed" handling to set
each port's speed if needed.
Signed-off-by: Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Some Ethernet PHYs (like marvell10g) will decide the host interface
mode by the media-side speed. So, the rswitch driver needs to
initialize one of the Ethernet SERDES (r8a779f0-eth-serdes) ports
after linked the Ethernet PHY up. The r8a779f0-eth-serdes driver has
.init() for initializing all ports and .power_on() for initializing
each port. So, add phy_power_{on,off} calling for it.
Signed-off-by: Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Set phydev->host_interfaces before calling of_phy_connect() to
configure the PHY with the information of host_interfaces.
Signed-off-by: Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Intended to set phy_device->host_interfaces by phylink in the future.
But there is difficult to implement phylink properly, especially
supporting the in-band mode on this driver because extra initialization
is needed after linked the ethernet PHY up. So, convert to phy_device
from phylink.
Signed-off-by: Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Simplify struct phy *serdes handling by keeping the valiable in
the struct rswitch_device.
Signed-off-by: Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Some devices like ZBT WE1326 and ZBT WF3526-P and some Netgear models need
to delay phy port initialization after calling the mt7621_pcie_init_port()
driver function to get into reliable boots for both warm and hard resets.
The delay required to detect the ports seems to be in the range [75-100]
milliseconds.
If the ports are not detected the controller is not functional.
There is no datasheet or something similar to really understand why this
extra delay is needed only for these devices and it is not for most of
the boards that are built on mt7621 SoC.
This issue has been reported by openWRT community and the complete
discussion is in [0]. The 100 milliseconds delay has been tested in all
devices to validate it.
Add the extra 100 milliseconds delay to fix the issue.
[0]: https://github.com/openwrt/openwrt/pull/11220
Link: https://lore.kernel.org/r/20221231074041.264738-1-sergio.paracuellos@gmail.com
Fixes: 2bdd5238e756 ("PCI: mt7621: Add MediaTek MT7621 PCIe host controller driver")
Signed-off-by: Sergio Paracuellos <sergio.paracuellos@gmail.com>
Signed-off-by: Lorenzo Pieralisi <lpieralisi@kernel.org>
|
|
After calling fwnode_phy_find_device(), the phy device refcount is
incremented. Then, when the phy device is attached to a netdev with
phy_attach_direct(), the refcount is also incremented but only
decremented in the caller if phy_attach_direct() fails. Move
phy_device_free() before the "if" to always release it correctly.
Indeed, either phy_attach_direct() failed and we don't want to keep a
reference to the phydev or it succeeded and a reference has been taken
internally.
Fixes: 25396f680dd6 ("net: phylink: introduce phylink_fwnode_phy_connect()")
Signed-off-by: Clément Léger <clement.leger@bootlin.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Immutable branch from LEDs due for the v6.3 merge window
|
|
Simplify code by using min_t helper macro for logical evaluation
and value assignment. Use the _t variant of min macro since the
variable types are not same.
This issue is identified by coccicheck using the minmax.cocci file.
Signed-off-by: Deepak R Varma <drv@mailo.com>
Link: https://lore.kernel.org/r/Y9QupEMPFoZpWIiM@ubun2204.myguest.virtualbox.org
Reviewed-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
|
|
Simplify code by using min_t helper macro for logical evaluation
and value assignment. Use the _t variant of min macro since the
variable types are not same.
This issue is identified by coccicheck using the minmax.cocci file.
Signed-off-by: Deepak R Varma <drv@mailo.com>
Acked-by: Pali Rohár <pali@kernel.org>
Link: https://lore.kernel.org/r/Y9P8debIztOZXazW@ubun2204.myguest.virtualbox.org
Reviewed-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
|
|
Until now, the dell-wmi-ddv driver needs to be manually
patched and compiled to test compatibility with unknown
DDV WMI interface versions.
Add a module param to allow users to force loading even
when a unknown interface version was detected. Since this
might cause various unwanted side effects, the module param
is marked as unsafe.
Signed-off-by: Armin Wolf <W_Armin@gmx.de>
Link: https://lore.kernel.org/r/20230126194021.381092-5-W_Armin@gmx.de
Reviewed-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
|
|
When the ACPI WMI interface returns a valid ACPI object
which has the wrong type, then ENOMSG instead of EIO
should be returned, since the WMI method was still
successfully evaluated.
Signed-off-by: Armin Wolf <W_Armin@gmx.de>
Link: https://lore.kernel.org/r/20230126194021.381092-4-W_Armin@gmx.de
Reviewed-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
|
|
In several cases, the DDV WMI interface can return buffers
with a length of zero. Return -ENODATA in such a case for
proper error handling. Also replace some -EIO errors with
more specialized ones.
Signed-off-by: Armin Wolf <W_Armin@gmx.de>
Link: https://lore.kernel.org/r/20230126194021.381092-3-W_Armin@gmx.de
Reviewed-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
|
|
While trying to solve a bugreport on bugzilla, i learned that
some devices (for example the Dell XPS 17 9710) provide a more
recent DDV WMI interface (version 3).
Since the new interface version just adds an additional method,
no code changes are necessary apart from whitelisting the version.
Signed-off-by: Armin Wolf <W_Armin@gmx.de>
Link: https://lore.kernel.org/r/20230126194021.381092-2-W_Armin@gmx.de
Reviewed-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
|
|
USB_OHCI_SH is a dummy option that never builds any code, remove it.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20230113062339.1909087-3-hch@lst.de
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
Fix typos and add the following to the scripts/spelling.txt:
exsits||exists
Link: https://lkml.kernel.org/r/20230126152205.959277-1-luca.ceresoli@bootlin.com
Signed-off-by: Luca Ceresoli <luca.ceresoli@bootlin.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
When calling debugfs_lookup() the result must have dput() called on it,
otherwise the memory will leak over time. To make things simpler, just
call debugfs_lookup_and_remove() instead which handles all of the logic
at once.
Cc: "K. Y. Srinivasan" <kys@microsoft.com>
Cc: Haiyang Zhang <haiyangz@microsoft.com>
Cc: Wei Liu <wei.liu@kernel.org>
Cc: Dexuan Cui <decui@microsoft.com>
Fixes: d180e0a1be6c ("Drivers: hv: Create debugfs file with hyper-v balloon usage information")
Cc: stable <stable@kernel.org>
Reviewed-by: Michael Kelley <mikelley@microsoft.com>
Link: https://lore.kernel.org/r/20230202140918.2289522-1-gregkh@linuxfoundation.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
The ->rw_page method is a special purpose bypass of the usual bio handling
path that is limited to single-page reads and writes and synchronous which
causes a lot of extra code in the drivers, callers and the block layer.
The only remaining user is the MM swap code. Switch that swap code to
simply submit a single-vec on-stack bio an synchronously wait on it based
on a newly added QUEUE_FLAG_SYNCHRONOUS flag set by the drivers that
currently implement ->rw_page instead. While this touches one extra cache
line and executes extra code, it simplifies the block layer and drivers
and ensures that all feastures are properly supported by all drivers, e.g.
right now ->rw_page bypassed cgroup writeback entirely.
[akpm@linux-foundation.org: fix comment typo, per Dan]
Link: https://lkml.kernel.org/r/20230125133436.447864-8-hch@lst.de
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dan Williams <dan.j.williams@intel.com>
Cc: Dave Jiang <dave.jiang@intel.com>
Cc: Ira Weiny <ira.weiny@intel.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Keith Busch <kbusch@kernel.org>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Sergey Senozhatsky <senozhatsky@chromium.org>
Cc: Vishal Verma <vishal.l.verma@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Patch series "Introduce per NUMA node memory error statistics", v2.
Background
==========
In the RFC for Kernel Support of Memory Error Detection [1], one advantage
of software-based scanning over hardware patrol scrubber is the ability to
make statistics visible to system administrators. The statistics include
2 categories:
* Memory error statistics, for example, how many memory error are
encountered, how many of them are recovered by the kernel. Note these
memory errors are non-fatal to kernel: during the machine check
exception (MCE) handling kernel already classified MCE's severity to be
unnecessary to panic (but either action required or optional).
* Scanner statistics, for example how many times the scanner have fully
scanned a NUMA node, how many errors are first detected by the scanner.
The memory error statistics are useful to userspace and actually not
specific to scanner detected memory errors, and are the focus of this
patchset.
Motivation
==========
Memory error stats are important to userspace but insufficient in kernel
today. Datacenter administrators can better monitor a machine's memory
health with the visible stats. For example, while memory errors are
inevitable on servers with 10+ TB memory, starting server maintenance when
there are only 1~2 recovered memory errors could be overreacting; in cloud
production environment maintenance usually means live migrate all the
workload running on the server and this usually causes nontrivial
disruption to the customer. Providing insight into the scope of memory
errors on a system helps to determine the appropriate follow-up action.
In addition, the kernel's existing memory error stats need to be
standardized so that userspace can reliably count on their usefulness.
Today kernel provides following memory error info to userspace, but they
are not sufficient or have disadvantages:
* HardwareCorrupted in /proc/meminfo: number of bytes poisoned in total,
not per NUMA node stats though
* ras:memory_failure_event: only available after explicitly enabled
* /dev/mcelog provides many useful info about the MCEs, but doesn't
capture how memory_failure recovered memory MCEs
* kernel logs: userspace needs to process log text
Exposing memory error stats is also a good start for the in-kernel memory
error detector. Today the data source of memory error stats are either
direct memory error consumption, or hardware patrol scrubber detection
(either signaled as UCNA or SRAO). Once in-kernel memory scanner is
implemented, it will be the main source as it is usually configured to
scan memory DIMMs constantly and faster than hardware patrol scrubber.
How Implemented
===============
As Naoya pointed out [2], exposing memory error statistics to userspace is
useful independent of software or hardware scanner. Therefore we
implement the memory error statistics independent of the in-kernel memory
error detector. It exposes the following per NUMA node memory error
counters:
/sys/devices/system/node/node${X}/memory_failure/total
/sys/devices/system/node/node${X}/memory_failure/recovered
/sys/devices/system/node/node${X}/memory_failure/ignored
/sys/devices/system/node/node${X}/memory_failure/failed
/sys/devices/system/node/node${X}/memory_failure/delayed
These counters describe how many raw pages are poisoned and after the
attempted recoveries by the kernel, their resolutions: how many are
recovered, ignored, failed, or delayed respectively. This approach can be
easier to extend for future use cases than /proc/meminfo, trace event, and
log. The following math holds for the statistics:
* total = recovered + ignored + failed + delayed
These memory error stats are reset during machine boot.
The 1st commit introduces these sysfs entries. The 2nd commit populates
memory error stats every time memory_failure attempts memory error
recovery. The 3rd commit adds documentations for introduced stats.
[1] https://lore.kernel.org/linux-mm/7E670362-C29E-4626-B546-26530D54F937@gmail.com/T/#mc22959244f5388891c523882e61163c6e4d703af
[2] https://lore.kernel.org/linux-mm/7E670362-C29E-4626-B546-26530D54F937@gmail.com/T/#m52d8d7a333d8536bd7ce74253298858b1c0c0ac6
This patch (of 3):
Today kernel provides following memory error info to userspace, but each
has its own disadvantage
* HardwareCorrupted in /proc/meminfo: number of bytes poisoned in total,
not per NUMA node stats though
* ras:memory_failure_event: only available after explicitly enabled
* /dev/mcelog provides many useful info about the MCEs, but
doesn't capture how memory_failure recovered memory MCEs
* kernel logs: userspace needs to process log text
Exposes per NUMA node memory error stats as sysfs entries:
/sys/devices/system/node/node${X}/memory_failure/total
/sys/devices/system/node/node${X}/memory_failure/recovered
/sys/devices/system/node/node${X}/memory_failure/ignored
/sys/devices/system/node/node${X}/memory_failure/failed
/sys/devices/system/node/node${X}/memory_failure/delayed
These counters describe how many raw pages are poisoned and after the
attempted recoveries by the kernel, their resolutions: how many are
recovered, ignored, failed, or delayed respectively. The following math
holds for the statistics:
* total = recovered + ignored + failed + delayed
Link: https://lkml.kernel.org/r/20230120034622.2698268-1-jiaqiyan@google.com
Link: https://lkml.kernel.org/r/20230120034622.2698268-2-jiaqiyan@google.com
Signed-off-by: Jiaqi Yan <jiaqiyan@google.com>
Acked-by: David Rientjes <rientjes@google.com>
Acked-by: Naoya Horiguchi <naoya.horiguchi@nec.com>
Cc: Kefeng Wang <wangkefeng.wang@huawei.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Yang Shi <shy828301@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
__GFP_ATOMIC serves little purpose. Its main effect is to set
ALLOC_HARDER which adds a few little boosts to increase the chance of an
allocation succeeding, one of which is to lower the water-mark at which it
will succeed.
It is *always* paired with __GFP_HIGH which sets ALLOC_HIGH which also
adjusts this watermark. It is probable that other users of __GFP_HIGH
should benefit from the other little bonuses that __GFP_ATOMIC gets.
__GFP_ATOMIC also gives a warning if used with __GFP_DIRECT_RECLAIM.
There is little point to this. We already get a might_sleep() warning if
__GFP_DIRECT_RECLAIM is set.
__GFP_ATOMIC allows the "watermark_boost" to be side-stepped. It is
probable that testing ALLOC_HARDER is a better fit here.
__GFP_ATOMIC is used by tegra-smmu.c to check if the allocation might
sleep. This should test __GFP_DIRECT_RECLAIM instead.
This patch:
- removes __GFP_ATOMIC
- allows __GFP_HIGH allocations to ignore watermark boosting as well
as GFP_ATOMIC requests.
- makes other adjustments as suggested by the above.
The net result is not change to GFP_ATOMIC allocations. Other
allocations that use __GFP_HIGH will benefit from a few different extra
privileges. This affects:
xen, dm, md, ntfs3
the vermillion frame buffer
hibernation
ksm
swap
all of which likely produce more benefit than cost if these selected
allocation are more likely to succeed quickly.
[mgorman: Minor adjustments to rework on top of a series]
Link: https://lkml.kernel.org/r/163712397076.13692.4727608274002939094@noble.neil.brown.name
Link: https://lkml.kernel.org/r/20230113111217.14134-7-mgorman@techsingularity.net
Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Thierry Reding <thierry.reding@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
There is only a single user of uuid_le_cmp() API, let's make it private
to that user.
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Link: https://lore.kernel.org/r/20230202145412.87569-1-andriy.shevchenko@linux.intel.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
https://git.kernel.org/pub/scm/linux/kernel/git/jic23/iio into char-misc-next
Jonathan writes:
1st set of IIO new device support, features and cleanups for the 6.3 cycle
The usual mixed bag. So far this has been a quiet cycle for IIO.
New device support
* adi,ad8686
- Add support for the AD5337 DAC - ID and 8 bit channel support.
* maxim,max5522
- New driver for this 2 channel DAC.
* nxp,imx93-adc
- New driver for this SoC ADC which is a fresh IP that will probably
turn up in additional SoCs going forwards.
* st,magn
- Add support for magnetometer part of LSM303C which is very similar
to standalone LIS3MDL already supported.
* ti,ads7924
- New driver for this 4 channel, 12-bit I2C ADC.
* ti,lmp92064
- New driver for this 12 bit SPI ADC.
* ti,tmag5273
- New driver for this 3D Hall-Effect Sensor.
Features
* core
- Add a standard structure for the value pairs in IIO_VAL_INT_PLUS_MICRO
available attributes and similar.
* cirrus,ep93xx
- Add DT binding docs and convert driver to DT based probing.
- Enable testing building with CONFIG_COMPILE_TEST.
* st,stm32-dfsdm
- Enable ID register support for discovery of hardware capabilities on
some devices.
Cleanups and minor fixes
* core
- Drop the custom iio_sysfs_match_string_with_gaps().
The special ability of this function to skip gaps in an array
was never used by any upstream driver.
- Sort headers whilst touching this file.
* tools
- Fix memory leak in iio_utils.c
* various
- leftover i2c probe_new() conversions.
- scnprintf() -> sysfs_emit() cleanups.
- hand rolled devm enables -> devm_regulator[_bulk]_get_enable()
- typo fixes
- dt-binding cleanup (whitespace, excess quotes and similar)
* adi,ad7746
- Set variable without pointless conditional.
* fsl,mma9551
- Squash false positives about use of uninitialized variable where
garbage undergoes an endian conversion before being ignored.
* measspec,ms5611
- Switch to fully devm_ managed probe() and so drop explicit remove()
* qcom,spmi-adc
- Use dev_err_probe() to suppress deferred print.
* qcom,spmi-adc5
- Define a missing channel used for battery identification.
* qcom,spmi-iadc
- Document a compatible seen in wild.
* semtech,sx9360
- Fix units on semtech,resolution dt-binding.
* sensiron,scd30
- dev_err_probe() usage to simplify error paths a little.
* st,lsm6dsx
- Add missing mount matrix for the gyro IIO device.
* taos,tsl2563
- Respect firmware configured interrupt polarity if present.
- Use i2c_smbus_write_word_data() in a few cases not previously covered.
- Factor out duplicated interrupt configuration.
- Switch to GENMASK() / BIT() from hand coded equivalents.
- Tidy up unused definitions.
- Use dev_err_probe() as appropriate.
- Drop platform_data as no in kernel users and there are better ways to
do equivalent if any are added.
- Add local struct device variable to tidy up code.
- Avoid dance via i2c_client to get the drvdata.
- Tidy up headers ordering and Makefile ordering.
* ti,adc128s052
- Use new spi_get_device_match_data().
- Drop ACPI_PTR() protection.
- Sort headers whilst here.
- Use asm instead of incorrect include of asm-generic/unaligned.h
* vishay,vcn4000
- Interrupt support for vcnl4040 (lots of refactoring needed)
* xilinx,ams
- Use fwnode_device_is_compatible() instead of open coding it.
* tag 'iio-for-6.3a' of https://git.kernel.org/pub/scm/linux/kernel/git/jic23/iio: (71 commits)
iio: adc: ad7291: Fix indentation error by adding extra spaces
iio: accel: mma9551_core: Prevent uninitialized variable in mma9551_read_config_word()
iio: accel: mma9551_core: Prevent uninitialized variable in mma9551_read_status_word()
dt-bindings: iio/proximity: semtech,sx9360: Fix 'semtech,resolution' type
iio: imu: fix spdx format
iio: adc: imx93: Fix spelling mistake "geting" -> "getting"
dt-bindings: iio: cleanup examples - indentation
dt-bindings: iio: use lowercase hex in examples
dt-bindings: iio: correct node names in examples
dt-bindings: iio: minor whitespace cleanups
dt-bindings: iio: drop unneeded quotes
dt-bindings: iio: adc: Add NXP IMX93 ADC
iio: adc: add imx93 adc support
dt-bindings: iio: adc: add Texas Instruments ADS7924
iio: adc: ti-ads7924: add Texas Instruments ADS7924 driver
iio: imu: st_lsm6dsx: add 'mount_matrix' sysfs entry to gyro channel.
iio: imu: st_lsm6dsx: fix naming of 'struct iio_info' in st_lsm6dsx_shub.c.
iio: light: vcnl4000: Add interrupt support for vcnl4040
iio: light: vcnl4000: Make irq handling more generic
iio: light: vcnl4000: Prepare for more generic setup
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/mani/mhi into char-misc-next
Manivannan writes:
MHI Host
========
- Fixed the module description
MHI Endpoint
============
- Powered down the MHI EP stack completely during MHI RESET instead of just
doing transfer abort as the MMIO register access will be prohibited
afterwards. EP stack will also be powered on again in case the RESET
happened due to SYS_ERR.
- Added a sanity check before processing the command ring to make sure that
the channel is supported by the controller.
- Added a check to make sure the xfer_cb is available for the channel
before trying to send the error status to the client drivers. This
helps in avoiding a potential null pointer dereference.
- Fixed the debug log of RESET command
- Modified the channel ring handler lock to protect the whole handler
instead of locking it partially. This helps in avoiding a race that may
happen if a channel STOP/RESET command is issued by the host parallely.
- Saved the MHI state locally during suspend and resume. Otherwise, the MHI
EP stack will not be aware of a channel that got disabled and may try to
access it later.
- Changed the MHI state_lock to mutex instead of spinlock. This helps in
avoiding the sleeping in atomic bug reported by Dan Carpenter and also
allows the lock to be held throughout the state change.
- Fixed the off by one error while doing the MHI channel check during
command ring processing.
MHI Generic
===========
- Updated the MHI toplevel Makefile to use Kconfig flags for building the
host and endpoint sub-directories conditionally.
* tag 'mhi-for-v6.3' of git://git.kernel.org/pub/scm/linux/kernel/git/mani/mhi:
bus: mhi: ep: Fix off by one in mhi_ep_process_cmd_ring()
bus: mhi: ep: Change state_lock to mutex
bus: mhi: ep: Save channel state locally during suspend and resume
bus: mhi: ep: Move chan->lock to the start of processing queued ch ring
bus: mhi: ep: Fix the debug message for MHI_PKT_TYPE_RESET_CHAN_CMD cmd
bus: mhi: ep: Only send -ENOTCONN status if client driver is available
bus: mhi: ep: Check if the channel is supported by the controller
bus: mhi: ep: Power up/down MHI stack during MHI RESET
bus: mhi: host: Update mhi driver description
bus: mhi: Update Makefile to used Kconfig flags
|
|
Use the new devm_of_phy_optional_get() helper instead of open-coding the
same operation.
As devm_of_phy_optional_get() returns NULL if either the PHY cannot be
found, or if support for the PHY framework is not enabled, it is no
longer needed to check for -ENODEV or -ENOSYS.
This lets us drop several checks for IS_ERR(), as phy_power_{on,off}()
handle NULL parameters fine.
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Acked-by: Alan Stern <stern@rowland.harvard.edu>
Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Link: https://lore.kernel.org/r/3adc5dd1149a17ea7daf4463549feab886c6b145.1674584626.git.geert+renesas@glider.be
Signed-off-by: Vinod Koul <vkoul@kernel.org>
|
|
Use the new devm_of_phy_optional_get() helper instead of open-coding the
same operation.
As devm_of_phy_optional_get() returns NULL if either the PHY cannot be
found, or if support for the PHY framework is not enabled, it is no
longer needed to check for -ENODEV or -ENOSYS.
This lets us drop several checks for IS_ERR(), as phy_power_{on,off}()
handle NULL parameters fine.
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Link: https://lore.kernel.org/r/a28baf4e07e464c43aff9e52263b5a902f5da9a0.1674584626.git.geert+renesas@glider.be
Signed-off-by: Vinod Koul <vkoul@kernel.org>
|