summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2020-01-14soundwire: cadence: remove useless variable incrementationPierre-Louis Bossart
Fix cppcheck warning: drivers/soundwire/cadence_master.c:992:9: style: Variable 'offset' is assigned a value that is never used. [unreadVariable] offset += stream->num_out; ^ Signed-off-by: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com> Link: https://lore.kernel.org/r/20200113211025.27973-3-pierre-louis.bossart@linux.intel.com Signed-off-by: Vinod Koul <vkoul@kernel.org>
2020-01-14soundwire: cadence: update kernel-doc parameter descriptionsPierre-Louis Bossart
make W=1 reports inconsistencies with parameter descriptions, fix Signed-off-by: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com> Link: https://lore.kernel.org/r/20200113211025.27973-2-pierre-louis.bossart@linux.intel.com Signed-off-by: Vinod Koul <vkoul@kernel.org>
2020-01-14soundwire: qcom: add support for SoundWire controllerSrinivas Kandagatla
Qualcomm SoundWire Master controller is present in most Qualcomm SoCs either integrated as part of WCD audio codecs via slimbus or as part of SOC I/O. This patchset adds support to a very basic controller which has been tested with WCD934x SoundWire controller connected to WSA881x smart speaker amplifiers. Signed-off-by: Srinivas Kandagatla <srinivas.kandagatla@linaro.org> Link: https://lore.kernel.org/r/20200113132153.27239-3-srinivas.kandagatla@linaro.org Signed-off-by: Vinod Koul <vkoul@kernel.org>
2020-01-14dt-bindings: soundwire: add bindings for Qcom controllerSrinivas Kandagatla
This patch adds bindings for Qualcomm soundwire controller. Qualcomm SoundWire Master controller is present in most Qualcomm SoCs either integrated as part of WCD audio codecs via slimbus or as part of SOC I/O. Signed-off-by: Srinivas Kandagatla <srinivas.kandagatla@linaro.org> Reviewed-by: Rob Herring <robh@kernel.org> Link: https://lore.kernel.org/r/20200113132153.27239-2-srinivas.kandagatla@linaro.org Signed-off-by: Vinod Koul <vkoul@kernel.org>
2020-01-14soundwire: bus: check first if Slaves become UNATTACHEDPierre-Louis Bossart
Before checking for the presence of Device0, we first need to clean-up the internal state of Slaves that are no longer attached. Signed-off-by: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com> Link: https://lore.kernel.org/r/20200110215731.30747-7-pierre-louis.bossart@linux.intel.com Signed-off-by: Vinod Koul <vkoul@kernel.org>
2020-01-14soundwire: cadence_master: handle multiple status reports per SlavePierre-Louis Bossart
When a Slave reports multiple status in the sticky bits, find the latest configuration from the mirror of the PING frame status and update the status directly. Signed-off-by: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com> Link: https://lore.kernel.org/r/20200110215731.30747-6-pierre-louis.bossart@linux.intel.com Signed-off-by: Vinod Koul <vkoul@kernel.org>
2020-01-14soundwire: cadence_master: remove config update for interrupt settingRander Wang
Config only needs to be updated when setting MCP_Config, MCP_Control and MCP_CmdCtrl to make these register setting effective. When updating config in master, master will communicate with slave to update status. Communication will be failed when masters and slaves are in clock stop state, and this unnecessary config update makes interrupt setting failed. Tested on Comet Lake with soundwire enabled Signed-off-by: Rander Wang <rander.wang@intel.com> Signed-off-by: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com> Link: https://lore.kernel.org/r/20200110215731.30747-5-pierre-louis.bossart@linux.intel.com Signed-off-by: Vinod Koul <vkoul@kernel.org>
2020-01-14soundwire: cadence_master: log more useful information during timeoutsPierre-Louis Bossart
Add the type of command, device number, register offset and length to reverse engineer what caused the issue. Signed-off-by: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com> Link: https://lore.kernel.org/r/20200110215731.30747-4-pierre-louis.bossart@linux.intel.com Signed-off-by: Vinod Koul <vkoul@kernel.org>
2020-01-14soundwire: cadence_master: clear interrupt status before enabling interruptRander Wang
make sure all interrupts status are cleared before enabling interrupt so that there is no unexpected interrupt triggered. Signed-off-by: Rander Wang <rander.wang@intel.com> Signed-off-by: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com> Link: https://lore.kernel.org/r/20200110215731.30747-3-pierre-louis.bossart@linux.intel.com Signed-off-by: Vinod Koul <vkoul@kernel.org>
2020-01-14soundwire: cadence_master: filter out bad interruptsPierre-Louis Bossart
If somehow we read the interrupt status while the IP is not powered the result is probably undefined or 0xffffffff. We do know that some of the bits are reserved and read as zero, so use as a filter to discard invalid configurations. Signed-off-by: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com> Link: https://lore.kernel.org/r/20200110215731.30747-2-pierre-louis.bossart@linux.intel.com Signed-off-by: Vinod Koul <vkoul@kernel.org>
2020-01-14dt-bindings: usb: Convert Allwinner A80 USB PHY controller to a schemaMaxime Ripard
The Allwinner A80 SoCs have a USB PHY controller that is used by Linux, with a matching Device Tree binding. Now that we have the DT validation in place, let's convert the device tree bindings for that controller over to a YAML schemas. Reviewed-by: Chen-Yu Tsai <wens@csie.org> Signed-off-by: Maxime Ripard <maxime@cerno.tech> Reviewed-by: Rob Herring <robh@kernel.org> Signed-off-by: Kishon Vijay Abraham I <kishon@ti.com>
2020-01-14phy: intel-lgm-emmc: Fix warning by adding missing MODULE_LICENSERamuthevar Vadivel Murugan
commit 95f1061f715e ("phy: intel-lgm-emmc: Add support for eMMC PHY") introduces the below warning WARNING: modpost: missing MODULE_LICENSE() in drivers/phy/intel/phy-intel-emmc.o Fix it by adding missing MODULE_LICENSE. Signed-off-by: Ramuthevar Vadivel Murugan <vadivel.muruganx.ramuthevar@linux.intel.com> Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Kishon Vijay Abraham I <kishon@ti.com>
2020-01-14phy: ti: j721e-wiz: Manage typec-gpio-dirRoger Quadros
Based on this GPIO state we need to configure LN10 bit to swap lane0 and lane1 if required (flipped connector). Type-C companions typically need some time after the cable is plugged before and before they reflect the correct status of Type-C plug orientation on the DIR line. Type-C Spec specifies CC attachment debounce time (tCCDebounce) of 100 ms (min) to 200 ms (max). Use the DT property to figure out if we need to add delay or not before sampling the Type-C DIR line. Signed-off-by: Roger Quadros <rogerq@ti.com> Signed-off-by: Sekhar Nori <nsekhar@ti.com> Reviewed-by: Jyri Sarha <jsarha@ti.com> Signed-off-by: Kishon Vijay Abraham I <kishon@ti.com>
2020-01-14dt-bindings: phy: ti,phy-j721e-wiz: Add Type-C dir GPIORoger Quadros
This is an optional GPIO, if specified will be used to swap lane 0 and lane 1 based on GPIO status. This is required to achieve plug flip support for USB Type-C. Type-C companions typically need some time after the cable is plugged before and before they reflect the correct status of Type-C plug orientation on the DIR line. Type-C Spec specifies CC attachment debounce time (tCCDebounce) of 100 ms (min) to 200 ms (max). Allow the DT node to specify the time (in ms) that we need to wait before sampling the DIR line. Signed-off-by: Roger Quadros <rogerq@ti.com> Cc: Rob Herring <robh@kernel.org> Reviewed-by: Rob Herring <robh@kernel.org> Signed-off-by: Kishon Vijay Abraham I <kishon@ti.com>
2020-01-14phy: cadence: Sierra: add phy_reset hookRoger Quadros
Some platforms e.g. J721e need lane swap register to be programmed before reset is deasserted. This patch ensures that we propagate the phy_reset back to the reset controller driver. Signed-off-by: Roger Quadros <rogerq@ti.com> Signed-off-by: Sekhar Nori <nsekhar@ti.com> Reviewed-by: Jyri Sarha <jsarha@ti.com> Signed-off-by: Kishon Vijay Abraham I <kishon@ti.com>
2020-01-14phy: cadence: Sierra: remove redundant initialization of pointer regmapColin Ian King
The pointer regmap is being initialized with a value that is never read and it is being updated later with a new value from phy->regmap_common_cdb. The initialization is redundant and can be removed. Addresses-Coverity: ("Unused value") Signed-off-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: Kishon Vijay Abraham I <kishon@ti.com>
2020-01-14phy: Add DisplayPort configuration optionsYuti Amonkar
Allow DisplayPort PHYs to be configured through the generic functions through a custom structure added to the generic union. The configuration structure is used for reconfiguration of DisplayPort PHYs during link training operation. The parameters added here are the ones defined in the DisplayPort spec v1.4 which include link rate, number of lanes, voltage swing and pre-emphasis. Add the DisplayPort phy mode to the generic phy_mode enum. Signed-off-by: Yuti Amonkar <yamonkar@cadence.com> Reviewed-by: Maxime Ripard <mripard@kernel.org> Reviewed-by: Jyri Sarha <jsarha@ti.com> Signed-off-by: Kishon Vijay Abraham I <kishon@ti.com>
2020-01-14phy: Enable compile testing for some of driversKrzysztof Kozlowski
Some of the phy drivers can be compile tested to increase build coverage. Signed-off-by: Krzysztof Kozlowski <krzk@kernel.org> Acked-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: Kishon Vijay Abraham I <kishon@ti.com>
2020-01-14phy: mediatek: Fix Kconfig indentationKrzysztof Kozlowski
Adjust indentation from spaces to tab (+optional two spaces) as in coding style. Signed-off-by: Krzysztof Kozlowski <krzk@kernel.org> Reviewed-by: Stanley Chu <stanley.chu@mediatek.com> Acked-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: Kishon Vijay Abraham I <kishon@ti.com>
2020-01-14phy: intel-lgm-emmc: Add support for eMMC PHYRamuthevar Vadivel Murugan
Add support for eMMC PHY on Intel's Lightning Mountain SoC. Signed-off-by: Ramuthevar Vadivel Murugan <vadivel.muruganx.ramuthevar@linux.intel.com> Reviewed-by: Andy Shevchenko <andriy.shevchenko@intel.com> Signed-off-by: Kishon Vijay Abraham I <kishon@ti.com>
2020-01-14dt-bindings: phy: intel-emmc-phy: Add YAML schema for LGM eMMC PHYRamuthevar Vadivel Murugan
Add a YAML schema to use the host controller driver with the eMMC PHY on Intel's Lightning Mountain SoC. Signed-off-by: Ramuthevar Vadivel Murugan <vadivel.muruganx.ramuthevar@linux.intel.com> Reviewed-by: Rob Herring <robh@kernel.org> Signed-off-by: Kishon Vijay Abraham I <kishon@ti.com>
2020-01-14phy: ti: j721e-wiz: Add support for WIZ module present in TI J721E SoCKishon Vijay Abraham I
Add support for WIZ module present in TI's J721E SoC. WIZ is a SERDES wrapper used to configure some of the input signals to the SERDES. It is used with both Sierra(16G) and Torrent(10G) SERDES. This driver configures three clock selects (pll0, pll1, dig), two divider clocks and supports resets for each of the lanes. [jsarha@ti.com: Add support for Torrent(10G) SERDES wrapper] Signed-off-by: Jyri Sarha <jsarha@ti.com> Signed-off-by: Kishon Vijay Abraham I <kishon@ti.com>
2020-01-14dt-bindings: phy: Document WIZ (SERDES wrapper) bindingsKishon Vijay Abraham I
Add DT binding documentation for WIZ (SERDES wrapper). WIZ is *NOT* a PHY but a wrapper used to configure some of the input signals to the SERDES. It is used with both Sierra(16G) and Torrent(10G) serdes. Signed-off-by: Kishon Vijay Abraham I <kishon@ti.com> [jsarha@ti.com: Add separate compatible for Sierra(16G) and Torrent(10G) SERDES] Signed-off-by: Jyri Sarha <jsarha@ti.com> Reviewed-by: Rob Herring <robh@kernel.org>
2020-01-13NFC: pn533: fix bulk-message timeoutJohan Hovold
The driver was doing a synchronous uninterruptible bulk-transfer without using a timeout. This could lead to the driver hanging on probe due to a malfunctioning (or malicious) device until the device is physically disconnected. While sleeping in probe the driver prevents other devices connected to the same hub from being added to (or removed from) the bus. An arbitrary limit of five seconds should be more than enough. Fixes: dbafc28955fa ("NFC: pn533: don't send USB data off of the stack") Signed-off-by: Johan Hovold <johan@kernel.org> Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-01-13qmi_wwan: Add support for Quectel RM500QKristian Evensen
RM500Q is a 5G module from Quectel, supporting both standalone and non-standalone modes. The normal Quectel quirks apply (DTR and dynamic interface numbers). Signed-off-by: Kristian Evensen <kristian.evensen@gmail.com> Acked-by: Bjørn Mork <bjorn@mork.no> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-01-13Merge tag 'Intel-CVE-2019-14615' from bundle by Akeem Abodunrin.Linus Torvalds
Merge Intel Gen9 graphics fix from Akeem Abodunrin: "Insufficient control flow in certain data structures for some Intel Processors with Intel Processor Graphics may allow an unauthenticated user to potentially enable information disclosure via local access This provides mitigation for Gen9 hardware. Note that Gen8 is not impacted due to a previously implemented workaround. The mitigation involves using an existing hardware feature to forcibly clear down all EU state at each context switch" * tag 'Intel-CVE-2019-14615' of emailed bundle from Akeem G Abodunrin <akeem.g.abodunrin@intel.com>: drm/i915/gen9: Clear residual context state on context switch
2020-01-13net: macb: fix for fixed-link modeMilind Parab
This patch fix the issue with fixed link. With fixed-link device opening fails due to macb_phylink_connect not handling fixed-link mode, in which case no MAC-PHY connection is needed and phylink_connect return success (0), however in current driver attempt is made to search and connect to PHY even for fixed-link. Fixes: 7897b071ac3b ("net: macb: convert to phylink") Signed-off-by: Milind Parab <mparab@cadence.com> Reviewed-by: Claudiu Beznea <claudiu.beznea@microchip.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-01-13Merge branch 'stmmac-ETF-support'Jakub Kicinski
Jose Abreu says: ==================== net: stmmac: ETF support This series adds the support for ETF scheduler in stmmac. 1) Starts adding the support by implementing Enhanced Descriptors in stmmac main core. This is needed for ETF feature in XGMAC and QoS cores. 2) Integrates the ETF logic into stmmac TC core. 3) and 4) adds the HW specific support for ETF in XGMAC and QoS cores. The IP feature is called TBS (Time Based Scheduling). 5) Enables ETF in GMAC5 IPK PCI entry for all Queues except Queue 0. 6) Adds the new TBS feature and even more information into the debugFS HW features file. ==================== Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-01-13net: stmmac: selftests: Add a test for TBS featureJose Abreu
Add a new test for TBS feature which is used in ETF scheduler. In this test, we send a packet with a launch time specified as now + 500ms and check if the packet was transmitted on that time frame. Changes from v2: - Use the TBS bitfield - Remove debug message Signed-off-by: Jose Abreu <Jose.Abreu@synopsys.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-01-13net: stmmac: selftests: Switch to dev_direct_xmit()Jose Abreu
In the upcoming commit for TBS selftest we will need to send a packet on a specific Queue. As stmmac fallsback to netdev_pick_tx() on the select Queue callback, we need to switch all selftests logic to dev_direct_xmit() so that we can send the given SKB on a specific Queue. Signed-off-by: Jose Abreu <Jose.Abreu@synopsys.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-01-13net: stmmac: Add missing information in DebugFS capabilities fileJose Abreu
Adds more information regarding HW Capabilities in the corresponding DebugFS file. Changes from v2: - Remove the TX/RX queues in use (Jakub) Signed-off-by: Jose Abreu <Jose.Abreu@synopsys.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-01-13net: stmmac: pci: Enable TBS on GMAC5 IPK PCI entryJose Abreu
Enable TBS support on GMAC5 PCI entry for all Queues except Queue 0. Signed-off-by: Jose Abreu <Jose.Abreu@synopsys.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-01-13net: stmmac: gmac4+: Add TBS supportJose Abreu
Adds all the necessary HW hooks to support TBS feature in QoS cores. Changes from v1: - Remove unneeded LT shift as the IP already does this. Signed-off-by: Jose Abreu <Jose.Abreu@synopsys.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-01-13net: stmmac: xgmac: Add TBS supportJose Abreu
Adds all the necessary HW hooks to support TBS feature in XGMAC cores. Changes from v1: - Remove unneeded LT shift as the IP already does this. Signed-off-by: Jose Abreu <Jose.Abreu@synopsys.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-01-13net: stmmac: tc: Add support for ETF Scheduler using TBSJose Abreu
Adds the support for ETF scheduler using TBS feature which is available in XGMAC and QoS IPs. Changes from v2: - Fix checkpatch issues (Jakub) - Use the TBS bitfield Signed-off-by: Jose Abreu <Jose.Abreu@synopsys.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-01-13net: stmmac: Initial support for TBSJose Abreu
Adds the initial hooks for TBS support. This needs a 32 byte descriptor in order for it to work with current HW. Adds all the logic for Enhanced Descriptors in main core but no HW related logic for now. Changes from v2: - Use bitfield for TBS status / support (Jakub) - Remove unneeded cache alignment (Jakub) - Fix checkpatch issues Signed-off-by: Jose Abreu <Jose.Abreu@synopsys.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-01-13io_uring: don't setup async context for read/write fixedJens Axboe
We don't need it, and if we have it, then the retry handler will attempt to copy the non-existent iovec with the inline iovec, with a segment count that doesn't make sense. Fixes: f67676d160c6 ("io_uring: ensure async punted read/write requests copy iovec") Reported-by: Jonathan Lemon <jonathan.lemon@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-01-13amd-xgbe: remove unnecessary conversion to boolChen Zhou
The conversion to bool is not needed, remove it. Signed-off-by: Chen Zhou <chenzhou10@huawei.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-01-13mm: khugepaged: add trace status description for SCAN_PAGE_HAS_PRIVATEYang Shi
Commit 99cb0dbd47a1 ("mm,thp: add read-only THP support for (non-shmem) FS") introduced a new khugepaged scan result: SCAN_PAGE_HAS_PRIVATE, but the corresponding description for trace events were not added. Link: http://lkml.kernel.org/r/1574793844-2914-1-git-send-email-yang.shi@linux.alibaba.com Fixes: 99cb0dbd47a1 ("mm,thp: add read-only THP support for (non-shmem) FS") Signed-off-by: Yang Shi <yang.shi@linux.alibaba.com> Cc: Song Liu <songliubraving@fb.com> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Anshuman Khandual <anshuman.khandual@arm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-01-13mm: memcg/slab: call flush_memcg_workqueue() only if memcg workqueue is validAdrian Huang
When booting with amd_iommu=off, the following WARNING message appears: AMD-Vi: AMD IOMMU disabled on kernel command-line ------------[ cut here ]------------ WARNING: CPU: 0 PID: 0 at kernel/workqueue.c:2772 flush_workqueue+0x42e/0x450 Modules linked in: CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.5.0-rc3-amd-iommu #6 Hardware name: Lenovo ThinkSystem SR655-2S/7D2WRCZ000, BIOS D8E101L-1.00 12/05/2019 RIP: 0010:flush_workqueue+0x42e/0x450 Code: ff 0f 0b e9 7a fd ff ff 4d 89 ef e9 33 fe ff ff 0f 0b e9 7f fd ff ff 0f 0b e9 bc fd ff ff 0f 0b e9 a8 fd ff ff e8 52 2c fe ff <0f> 0b 31 d2 48 c7 c6 e0 88 c5 95 48 c7 c7 d8 ad f0 95 e8 19 f5 04 Call Trace: kmem_cache_destroy+0x69/0x260 iommu_go_to_state+0x40c/0x5ab amd_iommu_prepare+0x16/0x2a irq_remapping_prepare+0x36/0x5f enable_IR_x2apic+0x21/0x172 default_setup_apic_routing+0x12/0x6f apic_intr_mode_init+0x1a1/0x1f1 x86_late_time_init+0x17/0x1c start_kernel+0x480/0x53f secondary_startup_64+0xb6/0xc0 ---[ end trace 30894107c3749449 ]--- x2apic: IRQ remapping doesn't support X2APIC mode x2apic disabled The warning is caused by the calling of 'kmem_cache_destroy()' in free_iommu_resources(). Here is the call path: free_iommu_resources kmem_cache_destroy flush_memcg_workqueue flush_workqueue The root cause is that the IOMMU subsystem runs before the workqueue subsystem, which the variable 'wq_online' is still 'false'. This leads to the statement 'if (WARN_ON(!wq_online))' in flush_workqueue() is 'true'. Since the variable 'memcg_kmem_cache_wq' is not allocated during the time, it is unnecessary to call flush_memcg_workqueue(). This prevents the WARNING message triggered by flush_workqueue(). Link: http://lkml.kernel.org/r/20200103085503.1665-1-ahuang12@lenovo.com Fixes: 92ee383f6daab ("mm: fix race between kmem_cache destroy, create and deactivate") Signed-off-by: Adrian Huang <ahuang12@lenovo.com> Reported-by: Xiaochun Lee <lixc17@lenovo.com> Reviewed-by: Shakeel Butt <shakeelb@google.com> Cc: Joerg Roedel <jroedel@suse.de> Cc: Christoph Lameter <cl@linux.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: David Rientjes <rientjes@google.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Michal Hocko <mhocko@kernel.org> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-01-13mm/page-writeback.c: improve arithmetic divisionsWen Yang
Use div64_ul() instead of do_div() if the divisor is unsigned long, to avoid truncation to 32-bit on 64-bit platforms. Link: http://lkml.kernel.org/r/20200102081442.8273-4-wenyang@linux.alibaba.com Signed-off-by: Wen Yang <wenyang@linux.alibaba.com> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Cc: Qian Cai <cai@lca.pw> Cc: Tejun Heo <tj@kernel.org> Cc: Jens Axboe <axboe@kernel.dk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-01-13mm/page-writeback.c: use div64_ul() for u64-by-unsigned-long divideWen Yang
The two variables 'numerator' and 'denominator', though they are declared as long, they should actually be unsigned long (according to the implementation of the fprop_fraction_percpu() function) And do_div() does a 64-by-32 division, while the divisor 'denominator' is unsigned long, thus 64-bit on 64-bit platforms. Hence the proper function to call is div64_ul(). Link: http://lkml.kernel.org/r/20200102081442.8273-3-wenyang@linux.alibaba.com Signed-off-by: Wen Yang <wenyang@linux.alibaba.com> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Cc: Qian Cai <cai@lca.pw> Cc: Tejun Heo <tj@kernel.org> Cc: Jens Axboe <axboe@kernel.dk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-01-13mm/page-writeback.c: avoid potential division by zero in wb_min_max_ratio()Wen Yang
Patch series "use div64_ul() instead of div_u64() if the divisor is unsigned long". We were first inspired by commit b0ab99e7736a ("sched: Fix possible divide by zero in avg_atom () calculation"), then refer to the recently analyzed mm code, we found this suspicious place. 201 if (min) { 202 min *= this_bw; 203 do_div(min, tot_bw); 204 } And we also disassembled and confirmed it: /usr/src/debug/kernel-4.9.168-016.ali3000/linux-4.9.168-016.ali3000.alios7.x86_64/mm/page-writeback.c: 201 0xffffffff811c37da <__wb_calc_thresh+234>: xor %r10d,%r10d 0xffffffff811c37dd <__wb_calc_thresh+237>: test %rax,%rax 0xffffffff811c37e0 <__wb_calc_thresh+240>: je 0xffffffff811c3800 <__wb_calc_thresh+272> /usr/src/debug/kernel-4.9.168-016.ali3000/linux-4.9.168-016.ali3000.alios7.x86_64/mm/page-writeback.c: 202 0xffffffff811c37e2 <__wb_calc_thresh+242>: imul %r8,%rax /usr/src/debug/kernel-4.9.168-016.ali3000/linux-4.9.168-016.ali3000.alios7.x86_64/mm/page-writeback.c: 203 0xffffffff811c37e6 <__wb_calc_thresh+246>: mov %r9d,%r10d ---> truncates it to 32 bits here 0xffffffff811c37e9 <__wb_calc_thresh+249>: xor %edx,%edx 0xffffffff811c37eb <__wb_calc_thresh+251>: div %r10 0xffffffff811c37ee <__wb_calc_thresh+254>: imul %rbx,%rax 0xffffffff811c37f2 <__wb_calc_thresh+258>: shr $0x2,%rax 0xffffffff811c37f6 <__wb_calc_thresh+262>: mul %rcx 0xffffffff811c37f9 <__wb_calc_thresh+265>: shr $0x2,%rdx 0xffffffff811c37fd <__wb_calc_thresh+269>: mov %rdx,%r10 This series uses div64_ul() instead of div_u64() if the divisor is unsigned long, to avoid truncation to 32-bit on 64-bit platforms. This patch (of 3): The variables 'min' and 'max' are unsigned long and do_div truncates them to 32 bits, which means it can test non-zero and be truncated to zero for division. Fix this issue by using div64_ul() instead. Link: http://lkml.kernel.org/r/20200102081442.8273-2-wenyang@linux.alibaba.com Fixes: 693108a8a667 ("writeback: make bdi->min/max_ratio handling cgroup writeback aware") Signed-off-by: Wen Yang <wenyang@linux.alibaba.com> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Cc: Qian Cai <cai@lca.pw> Cc: Tejun Heo <tj@kernel.org> Cc: Jens Axboe <axboe@kernel.dk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-01-13mm, debug_pagealloc: don't rely on static keys too earlyVlastimil Babka
Commit 96a2b03f281d ("mm, debug_pagelloc: use static keys to enable debugging") has introduced a static key to reduce overhead when debug_pagealloc is compiled in but not enabled. It relied on the assumption that jump_label_init() is called before parse_early_param() as in start_kernel(), so when the "debug_pagealloc=on" option is parsed, it is safe to enable the static key. However, it turns out multiple architectures call parse_early_param() earlier from their setup_arch(). x86 also calls jump_label_init() even earlier, so no issue was found while testing the commit, but same is not true for e.g. ppc64 and s390 where the kernel would not boot with debug_pagealloc=on as found by our QA. To fix this without tricky changes to init code of multiple architectures, this patch partially reverts the static key conversion from 96a2b03f281d. Init-time and non-fastpath calls (such as in arch code) of debug_pagealloc_enabled() will again test a simple bool variable. Fastpath mm code is converted to a new debug_pagealloc_enabled_static() variant that relies on the static key, which is enabled in a well-defined point in mm_init() where it's guaranteed that jump_label_init() has been called, regardless of architecture. [sfr@canb.auug.org.au: export _debug_pagealloc_enabled_early] Link: http://lkml.kernel.org/r/20200106164944.063ac07b@canb.auug.org.au Link: http://lkml.kernel.org/r/20191219130612.23171-1-vbabka@suse.cz Fixes: 96a2b03f281d ("mm, debug_pagelloc: use static keys to enable debugging") Signed-off-by: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> Cc: Michal Hocko <mhocko@kernel.org> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Matthew Wilcox <willy@infradead.org> Cc: Mel Gorman <mgorman@techsingularity.net> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Qian Cai <cai@lca.pw> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-01-13mm: memcg/slab: fix percpu slab vmstats flushingRoman Gushchin
Currently slab percpu vmstats are flushed twice: during the memcg offlining and just before freeing the memcg structure. Each time percpu counters are summed, added to the atomic counterparts and propagated up by the cgroup tree. The second flushing is required due to how recursive vmstats are implemented: counters are batched in percpu variables on a local level, and once a percpu value is crossing some predefined threshold, it spills over to atomic values on the local and each ascendant levels. It means that without flushing some numbers cached in percpu variables will be dropped on floor each time a cgroup is destroyed. And with uptime the error on upper levels might become noticeable. The first flushing aims to make counters on ancestor levels more precise. Dying cgroups may resume in the dying state for a long time. After kmem_cache reparenting which is performed during the offlining slab counters of the dying cgroup don't have any chances to be updated, because any slab operations will be performed on the parent level. It means that the inaccuracy caused by percpu batching will not decrease up to the final destruction of the cgroup. By the original idea flushing slab counters during the offlining should minimize the visible inaccuracy of slab counters on the parent level. The problem is that percpu counters are not zeroed after the first flushing. So every cached percpu value is summed twice. It creates a small error (up to 32 pages per cpu, but usually less) which accumulates on parent cgroup level. After creating and destroying of thousands of child cgroups, slab counter on parent level can be way off the real value. For now, let's just stop flushing slab counters on memcg offlining. It can't be done correctly without scheduling a work on each cpu: reading and zeroing it during css offlining can race with an asynchronous update, which doesn't expect values to be changed underneath. With this change, slab counters on parent level will become eventually consistent. Once all dying children are gone, values are correct. And if not, the error is capped by 32 * NR_CPUS pages per dying cgroup. It's not perfect, as slab are reparented, so any updates after the reparenting will happen on the parent level. It means that if a slab page was allocated, a counter on child level was bumped, then the page was reparented and freed, the annihilation of positive and negative counter values will not happen until the child cgroup is released. It makes slab counters different from others, and it might want us to implement flushing in a correct form again. But it's also a question of performance: scheduling a work on each cpu isn't free, and it's an open question if the benefit of having more accurate counters is worth it. We might also consider flushing all counters on offlining, not only slab counters. So let's fix the main problem now: make the slab counters eventually consistent, so at least the error won't grow with uptime (or more precisely the number of created and destroyed cgroups). And think about the accuracy of counters separately. Link: http://lkml.kernel.org/r/20191220042728.1045881-1-guro@fb.com Fixes: bee07b33db78 ("mm: memcontrol: flush percpu slab vmstats on kmem offlining") Signed-off-by: Roman Gushchin <guro@fb.com> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Acked-by: Michal Hocko <mhocko@suse.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-01-13mm/shmem.c: thp, shmem: fix conflict of above-47bit hint address and PMD ↵Kirill A. Shutemov
alignment Shmem/tmpfs tries to provide THP-friendly mappings if huge pages are enabled. But it doesn't work well with above-47bit hint address. Normally, the kernel doesn't create userspace mappings above 47-bit, even if the machine allows this (such as with 5-level paging on x86-64). Not all user space is ready to handle wide addresses. It's known that at least some JIT compilers use higher bits in pointers to encode their information. Userspace can ask for allocation from full address space by specifying hint address (with or without MAP_FIXED) above 47-bits. If the application doesn't need a particular address, but wants to allocate from whole address space it can specify -1 as a hint address. Unfortunately, this trick breaks THP alignment in shmem/tmp: shmem_get_unmapped_area() would not try to allocate PMD-aligned area if *any* hint address specified. This can be fixed by requesting the aligned area if the we failed to allocated at user-specified hint address. The request with inflated length will also take the user-specified hint address. This way we will not lose an allocation request from the full address space. [kirill@shutemov.name: fold in a fixup] Link: http://lkml.kernel.org/r/20191223231309.t6bh5hkbmokihpfu@box Link: http://lkml.kernel.org/r/20191220142548.7118-3-kirill.shutemov@linux.intel.com Fixes: b569bab78d8d ("x86/mm: Prepare to expose larger address space to userspace") Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: "Willhalm, Thomas" <thomas.willhalm@intel.com> Cc: Dan Williams <dan.j.williams@intel.com> Cc: "Bruggeman, Otto G" <otto.g.bruggeman@intel.com> Cc: "Aneesh Kumar K . V" <aneesh.kumar@linux.vnet.ibm.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-01-13mm/huge_memory.c: thp: fix conflict of above-47bit hint address and PMD ↵Kirill A. Shutemov
alignment Patch series "Fix two above-47bit hint address vs. THP bugs". The two get_unmapped_area() implementations have to be fixed to provide THP-friendly mappings if above-47bit hint address is specified. This patch (of 2): Filesystems use thp_get_unmapped_area() to provide THP-friendly mappings. For DAX in particular. Normally, the kernel doesn't create userspace mappings above 47-bit, even if the machine allows this (such as with 5-level paging on x86-64). Not all user space is ready to handle wide addresses. It's known that at least some JIT compilers use higher bits in pointers to encode their information. Userspace can ask for allocation from full address space by specifying hint address (with or without MAP_FIXED) above 47-bits. If the application doesn't need a particular address, but wants to allocate from whole address space it can specify -1 as a hint address. Unfortunately, this trick breaks thp_get_unmapped_area(): the function would not try to allocate PMD-aligned area if *any* hint address specified. Modify the routine to handle it correctly: - Try to allocate the space at the specified hint address with length padding required for PMD alignment. - If failed, retry without length padding (but with the same hint address); - If the returned address matches the hint address return it. - Otherwise, align the address as required for THP and return. The user specified hint address is passed down to get_unmapped_area() so above-47bit hint address will be taken into account without breaking alignment requirements. Link: http://lkml.kernel.org/r/20191220142548.7118-2-kirill.shutemov@linux.intel.com Fixes: b569bab78d8d ("x86/mm: Prepare to expose larger address space to userspace") Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Reported-by: Thomas Willhalm <thomas.willhalm@intel.com> Tested-by: Dan Williams <dan.j.williams@intel.com> Cc: "Aneesh Kumar K . V" <aneesh.kumar@linux.vnet.ibm.com> Cc: "Bruggeman, Otto G" <otto.g.bruggeman@intel.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-01-13mm/memory_hotplug: don't free usage map when removing a re-added early sectionDavid Hildenbrand
When we remove an early section, we don't free the usage map, as the usage maps of other sections are placed into the same page. Once the section is removed, it is no longer an early section (especially, the memmap is freed). When we re-add that section, the usage map is reused, however, it is no longer an early section. When removing that section again, we try to kfree() a usage map that was allocated during early boot - bad. Let's check against PageReserved() to see if we are dealing with an usage map that was allocated during boot. We could also check against !(PageSlab(usage_page) || PageCompound(usage_page)), but PageReserved() is cleaner. Can be triggered using memtrace under ppc64/powernv: $ mount -t debugfs none /sys/kernel/debug/ $ echo 0x20000000 > /sys/kernel/debug/powerpc/memtrace/enable $ echo 0x20000000 > /sys/kernel/debug/powerpc/memtrace/enable ------------[ cut here ]------------ kernel BUG at mm/slub.c:3969! Oops: Exception in kernel mode, sig: 5 [#1] LE PAGE_SIZE=3D64K MMU=3DHash SMP NR_CPUS=3D2048 NUMA PowerNV Modules linked in: CPU: 0 PID: 154 Comm: sh Not tainted 5.5.0-rc2-next-20191216-00005-g0be1dba7b7c0 #61 NIP kfree+0x338/0x3b0 LR section_deactivate+0x138/0x200 Call Trace: section_deactivate+0x138/0x200 __remove_pages+0x114/0x150 arch_remove_memory+0x3c/0x160 try_remove_memory+0x114/0x1a0 __remove_memory+0x20/0x40 memtrace_enable_set+0x254/0x850 simple_attr_write+0x138/0x160 full_proxy_write+0x8c/0x110 __vfs_write+0x38/0x70 vfs_write+0x11c/0x2a0 ksys_write+0x84/0x140 system_call+0x5c/0x68 ---[ end trace 4b053cbd84e0db62 ]--- The first invocation will offline+remove memory blocks. The second invocation will first add+online them again, in order to offline+remove them again (usually we are lucky and the exact same memory blocks will get "reallocated"). Tested on powernv with boot memory: The usage map will not get freed. Tested on x86-64 with DIMMs: The usage map will get freed. Using Dynamic Memory under a Power DLAPR can trigger it easily. Triggering removal (I assume after previously removed+re-added) of memory from the HMC GUI can crash the kernel with the same call trace and is fixed by this patch. Link: http://lkml.kernel.org/r/20191217104637.5509-1-david@redhat.com Fixes: 326e1b8f83a4 ("mm/sparsemem: introduce a SECTION_IS_EARLY flag") Signed-off-by: David Hildenbrand <david@redhat.com> Tested-by: Pingfan Liu <piliu@redhat.com> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Oscar Salvador <osalvador@suse.de> Cc: Michal Hocko <mhocko@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-01-13mm, thp: tweak reclaim/compaction effort of local-only and all-node allocationsVlastimil Babka
THP page faults now attempt a __GFP_THISNODE allocation first, which should only compact existing free memory, followed by another attempt that can allocate from any node using reclaim/compaction effort specified by global defrag setting and madvise. This patch makes the following changes to the scheme: - Before the patch, the first allocation relies on a check for pageblock order and __GFP_IO to prevent excessive reclaim. This however affects also the second attempt, which is not limited to single node. Instead of that, reuse the existing check for costly order __GFP_NORETRY allocations, and make sure the first THP attempt uses __GFP_NORETRY. As a side-effect, all costly order __GFP_NORETRY allocations will bail out if compaction needs reclaim, while previously they only bailed out when compaction was deferred due to previous failures. This should be still acceptable within the __GFP_NORETRY semantics. - Before the patch, the second allocation attempt (on all nodes) was passing __GFP_NORETRY. This is redundant as the check for pageblock order (discussed above) was stronger. It's also contrary to madvise(MADV_HUGEPAGE) which means some effort to allocate THP is requested. After this patch, the second attempt doesn't pass __GFP_THISNODE nor __GFP_NORETRY. To sum up, THP page faults now try the following attempts: 1. local node only THP allocation with no reclaim, just compaction. 2. for madvised VMA's or when synchronous compaction is enabled always - THP allocation from any node with effort determined by global defrag setting and VMA madvise 3. fallback to base pages on any node Link: http://lkml.kernel.org/r/08a3f4dd-c3ce-0009-86c5-9ee51aba8557@suse.cz Fixes: b39d0ee2632d ("mm, page_alloc: avoid expensive reclaim when compaction may not succeed") Signed-off-by: Vlastimil Babka <vbabka@suse.cz> Acked-by: Michal Hocko <mhocko@suse.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Mel Gorman <mgorman@suse.de> Cc: "Kirill A. Shutemov" <kirill@shutemov.name> Cc: David Rientjes <rientjes@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-01-13ptr_ring: add include of linux/mm.hJesper Dangaard Brouer
Commit 0bf7800f1799 ("ptr_ring: try vmalloc() when kmalloc() fails") started to use kvmalloc_array and kvfree, which are defined in mm.h, the previous functions kcalloc and kfree, which are defined in slab.h. Add the missing include of linux/mm.h. This went unnoticed as other include files happened to include mm.h. Fixes: 0bf7800f1799 ("ptr_ring: try vmalloc() when kmalloc() fails") Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>