summaryrefslogtreecommitdiff
path: root/include/linux
AgeCommit message (Collapse)Author
2019-06-14docs: cgroup-v1: convert docs to ReST and rename to *.rstMauro Carvalho Chehab
Convert the cgroup-v1 files to ReST format, in order to allow a later addition to the admin-guide. The conversion is actually: - add blank lines and identation in order to identify paragraphs; - fix tables markups; - add some lists markups; - mark literal blocks; - adjust title markups. At its new index.rst, let's add a :orphan: while this is not linked to the main index.rst file, in order to avoid build warnings. Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org> Acked-by: Tejun Heo <tj@kernel.org> Signed-off-by: Tejun Heo <tj@kernel.org>
2019-06-14docs: fault-injection: convert docs to ReST and rename to *.rstMauro Carvalho Chehab
The conversion is actually: - add blank lines and identation in order to identify paragraphs; - fix tables markups; - add some lists markups; - mark literal blocks; - adjust title markups. At its new index.rst, let's add a :orphan: while this is not linked to the main index.rst file, in order to avoid build warnings. Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org> Acked-by: Federico Vaga <federico.vaga@vaga.pv.it> Signed-off-by: Jonathan Corbet <corbet@lwn.net>
2019-06-14Merge tag 'v5.2-rc4' into mauroJonathan Corbet
We need to pick up post-rc1 changes to various document files so they don't get lost in Mauro's massive RST conversion push.
2019-06-14Merge tag 'mac80211-next-for-davem-2019-06-14' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211-next Johannes Berg says: ==================== Many changes all over: * HE (802.11ax) work continues * WPA3 offloads * work on extended key ID handling continues * fixes to honour AP supported rates with auth/assoc frames * nl80211 netlink policy improvements to fix some issues with strict validation on new commands with old attrs ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2019-06-14net: phylink: further mac_config documentation improvementsRussell King - ARM Linux admin
While reviewing the DPAA2 work, it has become apparent that we need better documentation about which members of the phylink link state structure are valid in the mac_config call. Improve this documentation. Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-06-14LSM: switch to blocking policy update notifiersJanne Karhunen
Atomic policy updaters are not very useful as they cannot usually perform the policy updates on their own. Since it seems that there is no strict need for the atomicity, switch to the blocking variant. While doing so, rename the functions accordingly. Signed-off-by: Janne Karhunen <janne.karhunen@gmail.com> Acked-by: Paul Moore <paul@paul-moore.com> Acked-by: James Morris <jamorris@linux.microsoft.com> Signed-off-by: Mimi Zohar <zohar@linux.ibm.com>
2019-06-14ieee80211: Add a missing extended capability flag definitionIlan Peer
Add the "OBSS Narrow Bandwidth RU In OFDMA Tolerance Support" flag definition to the definitions of the flags covered by the Extended Capability IE. Signed-off-by: Ilan Peer <ilan.peer@intel.com> Signed-off-by: Luca Coelho <luciano.coelho@intel.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2019-06-14nl80211: add support for SAE authentication offloadChung-Hsien Hsu
Let drivers advertise support for station-mode SAE authentication offload with a new NL80211_EXT_FEATURE_SAE_OFFLOAD flag. Signed-off-by: Chung-Hsien Hsu <stanley.hsu@cypress.com> Signed-off-by: Chi-Hsien Lin <chi-hsien.lin@cypress.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2019-06-14usb: chipidea: imx: add imx7ulp supportPeter Chen
In this commit, we add CI_HDRC_PMQOS to avoid system entering idle, at imx7ulp, if the system enters idle, the DMA will stop, so the USB transfer can't work at this case. Signed-off-by: Peter Chen <peter.chen@nxp.com>
2019-06-14dma-buf: add DMA_BUF_SET_NAME ioctlsGreg Hackmann
This patch adds complimentary DMA_BUF_SET_NAME ioctls, which lets userspace processes attach a free-form name to each buffer. This information can be extremely helpful for tracking and accounting shared buffers. For example, on Android, we know what each buffer will be used for at allocation time: GL, multimedia, camera, etc. The userspace allocator can use DMA_BUF_SET_NAME to associate that information with the buffer, so we can later give developers a breakdown of how much memory they're allocating for graphics, camera, etc. Signed-off-by: Greg Hackmann <ghackmann@google.com> Signed-off-by: Chenbo Feng <fengc@google.com> Signed-off-by: Sumit Semwal <sumit.semwal@linaro.org> Link: https://patchwork.freedesktop.org/patch/msgid/20190613223408.139221-3-fengc@google.com
2019-06-14PM: hibernate: powerpc: Expose pfn_is_nosave() prototypeMathieu Malaterre
The declaration for pfn_is_nosave is only available in kernel/power/power.h. Since this function can be override in arch, expose it globally. Having a prototype will make sure to avoid warning (sometime treated as error with W=1) such as: arch/powerpc/kernel/suspend.c:18:5: error: no previous prototype for 'pfn_is_nosave' [-Werror=missing-prototypes] This moves the declaration into a globally visible header file and add missing include to avoid a warning on powerpc. Also remove the duplicated prototypes since not required anymore. Signed-off-by: Mathieu Malaterre <malat@debian.org> Acked-by: Michael Ellerman <mpe@ellerman.id.au> (powerpc) Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2019-06-14gpio: Drop the parent_irq from gpio_irq_chipLinus Walleij
We already have an array named "parents" so instead of letting one point to the other, simply allocate a dynamic array to hold the parents, just one if desired and drop the number of members in gpio_irq_chip by 1. Rename gpiochip to gc in the process. Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
2019-06-13ptp: ptp_clock: Publish scaled_ppm_to_ppbShalom Toledo
Publish scaled_ppm_to_ppb to allow drivers to use it. Signed-off-by: Shalom Toledo <shalomt@mellanox.com> Reviewed-by: Petr Machata <petrm@mellanox.com> Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-06-13mm/devm_memremap_pages: fix final page put raceDan Williams
Logan noticed that devm_memremap_pages_release() kills the percpu_ref drops all the page references that were acquired at init and then immediately proceeds to unplug, arch_remove_memory(), the backing pages for the pagemap. If for some reason device shutdown actually collides with a busy / elevated-ref-count page then arch_remove_memory() should be deferred until after that reference is dropped. As it stands the "wait for last page ref drop" happens *after* devm_memremap_pages_release() returns, which is obviously too late and can lead to crashes. Fix this situation by assigning the responsibility to wait for the percpu_ref to go idle to devm_memremap_pages() with a new ->cleanup() callback. Implement the new cleanup callback for all devm_memremap_pages() users: pmem, devdax, hmm, and p2pdma. Link: http://lkml.kernel.org/r/155727339156.292046.5432007428235387859.stgit@dwillia2-desk3.amr.corp.intel.com Fixes: 41e94a851304 ("add devm_memremap_pages") Signed-off-by: Dan Williams <dan.j.williams@intel.com> Reported-by: Logan Gunthorpe <logang@deltatee.com> Reviewed-by: Ira Weiny <ira.weiny@intel.com> Reviewed-by: Logan Gunthorpe <logang@deltatee.com> Cc: Bjorn Helgaas <bhelgaas@google.com> Cc: "Jérôme Glisse" <jglisse@redhat.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: "Rafael J. Wysocki" <rafael@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-06-13lib/genalloc: introduce chunk ownersDan Williams
The p2pdma facility enables a provider to publish a pool of dma addresses for a consumer to allocate. A genpool is used internally by p2pdma to collect dma resources, 'chunks', to be handed out to consumers. Whenever a consumer allocates a resource it needs to pin the 'struct dev_pagemap' instance that backs the chunk selected by pci_alloc_p2pmem(). Currently that reference is taken globally on the entire provider device. That sets up a lifetime mismatch whereby the p2pdma core needs to maintain hacks to make sure the percpu_ref is not released twice. This lifetime mismatch also stands in the way of a fix to devm_memremap_pages() whereby devm_memremap_pages_release() must wait for the percpu_ref ->release() callback to complete before it can proceed to teardown pages. So, towards fixing this situation, introduce the ability to store a 'chunk owner' at gen_pool_add() time, and a facility to retrieve the owner at gen_pool_{alloc,free}() time. For p2pdma this will be used to store and recall individual dev_pagemap reference counter instances per-chunk. Link: http://lkml.kernel.org/r/155727338118.292046.13407378933221579644.stgit@dwillia2-desk3.amr.corp.intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com> Reviewed-by: Ira Weiny <ira.weiny@intel.com> Reviewed-by: Logan Gunthorpe <logang@deltatee.com> Cc: Bjorn Helgaas <bhelgaas@google.com> Cc: "Jérôme Glisse" <jglisse@redhat.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: "Rafael J. Wysocki" <rafael@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-06-13mm/devm_memremap_pages: introduce devm_memunmap_pagesDan Williams
Use the new devm_release_action() facility to allow devm_memremap_pages_release() to be manually triggered. Link: http://lkml.kernel.org/r/155727337088.292046.5774214552136776763.stgit@dwillia2-desk3.amr.corp.intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com> Reviewed-by: Ira Weiny <ira.weiny@intel.com> Reviewed-by: Logan Gunthorpe <logang@deltatee.com> Cc: Bjorn Helgaas <bhelgaas@google.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: "Jérôme Glisse" <jglisse@redhat.com> Cc: "Rafael J. Wysocki" <rafael@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-06-13drivers/base/devres: introduce devm_release_action()Dan Williams
Patch series "mm/devm_memremap_pages: Fix page release race", v2. Logan audited the devm_memremap_pages() shutdown path and noticed that it was possible to proceed to arch_remove_memory() before all potential page references have been reaped. Introduce a new ->cleanup() callback to do the work of waiting for any straggling page references and then perform the percpu_ref_exit() in devm_memremap_pages_release() context. For p2pdma this involves some deeper reworks to reference count resources on a per-instance basis rather than a per pci-device basis. A modified genalloc api is introduced to convey a driver-private pointer through gen_pool_{alloc,free}() interfaces. Also, a devm_memunmap_pages() api is introduced since p2pdma does not auto-release resources on a setup failure. The dax and pmem changes pass the nvdimm unit tests, and the p2pdma changes should now pass testing with the pci_p2pdma_release() fix. Jrme, how does this look for HMM? This patch (of 6): The devm_add_action() facility allows a resource allocation routine to add custom devm semantics. One such user is devm_memremap_pages(). There is now a need to manually trigger devm_memremap_pages_release(). Introduce devm_release_action() so the release action can be triggered via a new devm_memunmap_pages() api in a follow-on change. Link: http://lkml.kernel.org/r/155727336530.292046.2926860263201336366.stgit@dwillia2-desk3.amr.corp.intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com> Reviewed-by: Ira Weiny <ira.weiny@intel.com> Reviewed-by: Logan Gunthorpe <logang@deltatee.com> Cc: Bjorn Helgaas <bhelgaas@google.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: "Rafael J. Wysocki" <rafael@kernel.org> Cc: "Jérôme Glisse" <jglisse@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-06-13coredump: fix race condition between collapse_huge_page() and core dumpingAndrea Arcangeli
When fixing the race conditions between the coredump and the mmap_sem holders outside the context of the process, we focused on mmget_not_zero()/get_task_mm() callers in 04f5866e41fb70 ("coredump: fix race condition between mmget_not_zero()/get_task_mm() and core dumping"), but those aren't the only cases where the mmap_sem can be taken outside of the context of the process as Michal Hocko noticed while backporting that commit to older -stable kernels. If mmgrab() is called in the context of the process, but then the mm_count reference is transferred outside the context of the process, that can also be a problem if the mmap_sem has to be taken for writing through that mm_count reference. khugepaged registration calls mmgrab() in the context of the process, but the mmap_sem for writing is taken later in the context of the khugepaged kernel thread. collapse_huge_page() after taking the mmap_sem for writing doesn't modify any vma, so it's not obvious that it could cause a problem to the coredump, but it happens to modify the pmd in a way that breaks an invariant that pmd_trans_huge_lock() relies upon. collapse_huge_page() needs the mmap_sem for writing just to block concurrent page faults that call pmd_trans_huge_lock(). Specifically the invariant that "!pmd_trans_huge()" cannot become a "pmd_trans_huge()" doesn't hold while collapse_huge_page() runs. The coredump will call __get_user_pages() without mmap_sem for reading, which eventually can invoke a lockless page fault which will need a functional pmd_trans_huge_lock(). So collapse_huge_page() needs to use mmget_still_valid() to check it's not running concurrently with the coredump... as long as the coredump can invoke page faults without holding the mmap_sem for reading. This has "Fixes: khugepaged" to facilitate backporting, but in my view it's more a bug in the coredump code that will eventually have to be rewritten to stop invoking page faults without the mmap_sem for reading. So the long term plan is still to drop all mmget_still_valid(). Link: http://lkml.kernel.org/r/20190607161558.32104-1-aarcange@redhat.com Fixes: ba76149f47d8 ("thp: khugepaged") Signed-off-by: Andrea Arcangeli <aarcange@redhat.com> Reported-by: Michal Hocko <mhocko@suse.com> Acked-by: Michal Hocko <mhocko@suse.com> Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Jann Horn <jannh@google.com> Cc: Hugh Dickins <hughd@google.com> Cc: Mike Rapoport <rppt@linux.vnet.ibm.com> Cc: Mike Kravetz <mike.kravetz@oracle.com> Cc: Peter Xu <peterx@redhat.com> Cc: Jason Gunthorpe <jgg@mellanox.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-06-13mm: memcontrol: don't batch updates of local VM stats and eventsJohannes Weiner
The kernel test robot noticed a 26% will-it-scale pagefault regression from commit 42a300353577 ("mm: memcontrol: fix recursive statistics correctness & scalabilty"). This appears to be caused by bouncing the additional cachelines from the new hierarchical statistics counters. We can fix this by getting rid of the batched local counters instead. Originally, there were *only* group-local counters, and they were fully maintained per cpu. A reader of a stats file high up in the cgroup tree would have to walk the entire subtree and collect each level's per-cpu counters to get the recursive view. This was prohibitively expensive, and so we switched to per-cpu batched updates of the local counters during a983b5ebee57 ("mm: memcontrol: fix excessive complexity in memory.stat reporting"), reducing the complexity from nr_subgroups * nr_cpus to nr_subgroups. With growing machines and cgroup trees, the tree walk itself became too expensive for monitoring top-level groups, and this is when the culprit patch added hierarchy counters on each cgroup level. When the per-cpu batch size would be reached, both the local and the hierarchy counters would get batch-updated from the per-cpu delta simultaneously. This makes local and hierarchical counter reads blazingly fast, but it unfortunately makes the write-side too cache line intense. Since local counter reads were never a problem - we only centralized them to accelerate the hierarchy walk - and use of the local counters are becoming rarer due to replacement with hierarchical views (ongoing rework in the page reclaim and workingset code), we can make those local counters unbatched per-cpu counters again. The scheme will then be as such: when a memcg statistic changes, the writer will: - update the local counter (per-cpu) - update the batch counter (per-cpu). If the batch is full: - spill the batch into the group's atomic_t - spill the batch into all ancestors' atomic_ts - empty out the batch counter (per-cpu) when a local memcg counter is read, the reader will: - collect the local counter from all cpus when a hiearchy memcg counter is read, the reader will: - read the atomic_t We might be able to simplify this further and make the recursive counters unbatched per-cpu counters as well (batch upward propagation, but leave per-cpu collection to the readers), but that will require a more in-depth analysis and testing of all the callsites. Deal with the immediate regression for now. Link: http://lkml.kernel.org/r/20190521151647.GB2870@cmpxchg.org Fixes: 42a300353577 ("mm: memcontrol: fix recursive statistics correctness & scalabilty") Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> Reported-by: kernel test robot <rong.a.chen@intel.com> Tested-by: kernel test robot <rong.a.chen@intel.com> Cc: Michal Hocko <mhocko@kernel.org> Cc: Shakeel Butt <shakeelb@google.com> Cc: Roman Gushchin <guro@fb.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-06-13rcu: Don't return a value from rcu_assign_pointer()Andrea Parri
Quoting Paul [1]: "Given that a quick (and perhaps error-prone) search of the uses of rcu_assign_pointer() in v5.1 didn't find a single use of the return value, let's please instead change the documentation and implementation to eliminate the return value." [1] https://lkml.kernel.org/r/20190523135013.GL28207@linux.ibm.com Signed-off-by: Andrea Parri <andrea.parri@amarulasolutions.com> Cc: "Paul E. McKenney" <paulmck@linux.ibm.com> Cc: Josh Triplett <josh@joshtriplett.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Lai Jiangshan <jiangshanlai@gmail.com> Cc: Joel Fernandes <joel@joelfernandes.org> Cc: rcu@vger.kernel.org Cc: Peter Zijlstra <peterz@infradead.org> Cc: Will Deacon <will.deacon@arm.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Sasha Levin <sashal@kernel.org> Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
2019-06-13rcu: Force inlining of rcu_read_lock()Waiman Long
When debugging options are turned on, the rcu_read_lock() function might not be inlined. This results in lockdep's print_lock() function printing "rcu_read_lock+0x0/0x70" instead of rcu_read_lock()'s caller. For example: [ 10.579995] ============================= [ 10.584033] WARNING: suspicious RCU usage [ 10.588074] 4.18.0.memcg_v2+ #1 Not tainted [ 10.593162] ----------------------------- [ 10.597203] include/linux/rcupdate.h:281 Illegal context switch in RCU read-side critical section! [ 10.606220] [ 10.606220] other info that might help us debug this: [ 10.606220] [ 10.614280] [ 10.614280] rcu_scheduler_active = 2, debug_locks = 1 [ 10.620853] 3 locks held by systemd/1: [ 10.624632] #0: (____ptrval____) (&type->i_mutex_dir_key#5){.+.+}, at: lookup_slow+0x42/0x70 [ 10.633232] #1: (____ptrval____) (rcu_read_lock){....}, at: rcu_read_lock+0x0/0x70 [ 10.640954] #2: (____ptrval____) (rcu_read_lock){....}, at: rcu_read_lock+0x0/0x70 These "rcu_read_lock+0x0/0x70" strings are not providing any useful information. This commit therefore forces inlining of the rcu_read_lock() function so that rcu_read_lock()'s caller is instead shown. Signed-off-by: Waiman Long <longman@redhat.com> Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
2019-06-13rcu: Fix irritating whitespace error in rcu_assign_pointer()Paul E. McKenney
Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
2019-06-13PCI: Decode PCIe 32 GT/s link speedGustavo Pimentel
PCIe r5.0, sec 7.5.3.18, defines a new 32.0 GT/s bit in the Supported Link Speeds Vector of Link Capabilities 2. Decode this new speed. This does not affect the speed of the link, which should be negotiated automatically by the hardware; it only adds decoding when showing the speed to the user. Previously, reading the speed of a link operating at this speed showed "Unknown speed" instead of "32.0 GT/s". Link: https://lore.kernel.org/lkml/92365e3caf0fc559f9ab14bcd053bfc92d4f661c.1559664969.git.gustavo.pimentel@synopsys.com Signed-off-by: Gustavo Pimentel <gustavo.pimentel@synopsys.com> [bhelgaas: changelog] Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2019-06-13device property: Add helpers to count items in an arrayAndy Shevchenko
The usual pattern to allocate the necessary space for an array of properties is to count them first by calling: count = device_property_read_uXX_array(dev, propname, NULL, 0); if (count < 0) return count; Introduce helpers device_property_count_uXX() to count items by supplying hard coded last two parameters to device_property_readXX_array(). Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Acked-by: Sakari Ailus <sakari.ailus@linux.intel.com> Reviewed-by: Heikki Krogerus <heikki.krogerus@linux.intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2019-06-13net/mlx5: Report devlink health on FW fatal issuesMoshe Shemesh
Report devlink health on FW fatal issues via fw_fatal_reporter. The driver recover flow for FW fatal error is now being handled by the devlink health. Having the recovery controlled by devlink health, the user has the ability to cancel the auto-recovery for debug session and run it manually. Call mlx5_enter_error_state() before calling devlink_health_report() to ensure entering device error state even if auto-recovery is off. Signed-off-by: Moshe Shemesh <moshe@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-06-13net/mlx5: Add fw fatal devlink_health_reporterMoshe Shemesh
Create mlx5_devlink_health_reporter for fw fatal reporter. The fw fatal reporter is added in addition to the fw reporter and implements the recover callback. The point of having two reporters for FW issues, is that we don't want to run FW recover on any issue, but only fatal ones. Signed-off-by: Moshe Shemesh <moshe@mellanox.com> Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-06-13net/mlx5: Report devlink health on FW issuesMoshe Shemesh
Use devlink_health_report() to report any symptom of FW issue as FW counter miss or new health syndrome. The FW issues detected in mlx5 during poll_health which is called in timer atomic context and so health work queue is used to schedule the reports. Signed-off-by: Moshe Shemesh <moshe@mellanox.com> Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-06-13net/mlx5: Create FW devlink_health_reporterMoshe Shemesh
Create mlx5_devlink_health_reporter for FW reporter. The FW reporter implements devlink_health_reporter diagnose callback. The fw reporter diagnose command can be triggered any time by the user to check current fw status. In healthy status, it will return clear syndrome. Otherwise it will return the syndrome and description of the error type. Command example and output on healthy status: $ devlink health diagnose pci/0000:82:00.0 reporter fw Syndrome: 0 Command example and output on non healthy status: $ devlink health diagnose pci/0000:82:00.0 reporter fw Syndrome: 8 Description: unrecoverable hardware error Signed-off-by: Moshe Shemesh <moshe@mellanox.com> Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-06-13net/mlx5: Issue SW reset on FW assertFeras Daoud
If a FW assert is considered fatal, indicated by a new bit in the health buffer, reset the FW. After the reset go through the normal recovery flow. Only one PF needs to issue the reset, so an attempt is made to prevent the 2nd function from also issuing the reset. It's not an error if that happens, it just slows recovery. Signed-off-by: Feras Daoud <ferasda@mellanox.com> Signed-off-by: Alex Vesker <valex@mellanox.com> Signed-off-by: Moshe Shemesh <moshe@mellanox.com> Signed-off-by: Daniel Jurgens <danielj@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-06-13net/mlx5: Handle SW reset of FW in error flowFeras Daoud
New mlx5 adapters allow the driver to reset the FW in the event of an error, this action called "SW Reset". When an SW reset is issued on any PF all PFs enter reset state which is a recoverable condition. The existing recovery flow was designed to allow the recovery of a VF after a PF driver reload. This patch adds the sw reset to the NIC states as a preparation for sw reset handling. When a software reset is issued the following occurs: 1. The NIC interface mode is set to 7 while the reset is in progress. 2. Once the reset completes the NIC interface mode is set to 1. Signed-off-by: Feras Daoud <ferasda@mellanox.com> Signed-off-by: Moshe Shemesh <moshe@mellanox.com> Signed-off-by: Daniel Jurgens <danielj@mellanox.com> Reviewed-by: Alex Vesker <valex@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-06-13net/mlx5: Add Crdump supportAlex Vesker
Crdump allows the driver to retrieve a dump of the FW PCI crspace. This is useful in case of catastrophic issues which may require FW reset. The crspace dump can be used for later debug. Signed-off-by: Alex Vesker <valex@mellanox.com> Signed-off-by: Moshe Shemesh <moshe@mellanox.com> Reviewed-by: Feras Daoud <ferasda@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-06-13net/mlx5: Add Vendor Specific Capability access gatewayAlex Vesker
The Vendor Specific Capability (VSC) is used to activate a gateway interfacing with the device. The gateway is used to read or write device configurations, which are organized in different domains (spaces). A configuration access may result in multiple actions, reads, writes. Example usages are accessing the Crspace domain to read the crspace or locking a device semaphore using the Semaphore domain. The configuration access use pci_cfg_access to prevent parallel access to the VSC space by the driver and userspace calls. Signed-off-by: Alex Vesker <valex@mellanox.com> Signed-off-by: Feras Daoud <ferasda@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-06-13net/mlx5: Add EQ enable/disable APIYuval Avnery
Previously, EQ joined the chain notifier on creation. This forced the caller to be ready to handle events before creating the EQ through eq_create_generic interface. To help the caller control when the created EQ will be attached to the IRQ, add enable/disable API. Signed-off-by: Yuval Avnery <yuvalav@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-06-13net/mlx5: Use a single IRQ for all async EQsAriel Levkovich
The patch modifies the IRQ allocation so that all async EQs are assigned to the same IRQ resulting in more available IRQs for completion EQs. The changes are using the support for IRQ sharing and EQ polling budget that was introduced in previous patches so when the shared interrupt is triggered, the kernel will serially call the handler of each of the sharing EQs with a certain budget of EQEs to poll in order to prevent starvation. Signed-off-by: Ariel Levkovich <lariel@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-06-13net/mlx5: Separate IRQ data from EQ table dataYuval Avnery
IRQ table should only exist for mlx5_core_dev for PF and VF only. EQ table of mediated devices should hold a pointer to the IRQ table of the parent PCI device. Signed-off-by: Yuval Avnery <yuvalav@mellanox.com> Reviewed-by: Parav Pandit <parav@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-06-13net/mlx5: Separate IRQ request/free from EQ life cycleYuval Avnery
Instead of requesting IRQ with eq creation, IRQs will be requested before EQ table creation. Instead of freeing the IRQs after EQ destroy, free IRQs after eq table destroy. Signed-off-by: Yuval Avnery <yuvalav@mellanox.com> Reviewed-by: Parav Pandit <parav@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-06-13net/mlx5: Change interrupt handler to call chain notifierYuval Avnery
Multiple EQs may share the same IRQ in subsequent patches. Instead of calling the IRQ handler directly, the EQ will register to an atomic chain notfier. The Linux built-in shared IRQ is not used because it forces the caller to disable the IRQ and clear affinity before free_irq() can be called. This patch is the first step in the separation of IRQ and EQ logic. Signed-off-by: Yuval Avnery <yuvalav@mellanox.com> Reviewed-by: Parav Pandit <parav@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-06-13net/mlx5: Support querying max VFs from deviceBodong Wang
For ECPF with eswitch manager privilege, query the host max VF count by querying the device using query_functions command. With this enhancement: 1. flow steering entries are created only for valid vports based on the max VF count of the PF. 2. Driver only queries cap of valid vport. Eswitch requires the max VFs when doing initialization, so do sr-iov init before eswitch init. Signed-off-by: Bodong Wang <bodong@mellanox.com> Reviewed-by: Parav Pandit <parav@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-06-13regulator: max8952: Convert to use GPIO descriptorsLinus Walleij
This finalizes the descriptor conversion of the MAX8952 driver by letting the VID0 and VID1 GPIOs be fetched from descriptors. Both VID0 and VID1 must be supplied for the VID selection to work, I add some code to preserve the semantics that if only one of the two VID gpios is supplied, it will be initialized to low. This might be a bit overzealous, but I want to preserve any implicit semantics. This is currently only used by device tree in-kernel but it is still also possible to supply the same GPIOs using a machine descriptor table if a board file is used. Ideally this should be phased over to using gpio-regulator.c that does the same thing, but it might require some refactoring and needs testing on real hardware. Cc: Tomasz Figa <tfiga@chromium.org> Cc: MyungJoo Ham <myungjoo.ham@samsung.com> Cc: Marek Szyprowski <m.szyprowski@samsung.com> Signed-off-by: Linus Walleij <linus.walleij@linaro.org> Signed-off-by: Mark Brown <broonie@kernel.org>
2019-06-13NTB: Introduce MSI libraryLogan Gunthorpe
The NTB MSI library allows passing MSI interrupts across a memory window. This offers similar functionality to doorbells or messages except will often have much better latency and the client can potentially use significantly more remote interrupts than typical hardware provides for doorbells. (Which can be important in high-multiport setups.) The library utilizes one memory window per peer and uses the highest index memory windows. Before any ntb_msi function may be used, the user must call ntb_msi_init(). It may then setup and tear down the memory windows when the link state changes using ntb_msi_setup_mws() and ntb_msi_clear_mws(). The peer which receives the interrupt must call ntb_msim_request_irq() to assign the interrupt handler (this function is functionally similar to devm_request_irq()) and the returned descriptor must be transferred to the peer which can use it to trigger the interrupt. The triggering peer, once having received the descriptor, can trigger the interrupt by calling ntb_msi_peer_trigger(). Signed-off-by: Logan Gunthorpe <logang@deltatee.com> Cc: Dave Jiang <dave.jiang@intel.com> Cc: Allen Hubbe <allenbh@gmail.com> Signed-off-by: Jon Mason <jdmason@kudzu.us>
2019-06-13NTB: Introduce functions to calculate multi-port resource indexLogan Gunthorpe
When using multi-ports each port uses resources (dbs, msgs, mws, etc) on every other port. Creating a mapping for these resources such that each port has a corresponding resource on every other port is a bit tricky. Introduce the ntb_peer_resource_idx() function for this purpose. It returns the peer resource number that will correspond with the local peer index on the remote peer. Also, introduce ntb_peer_highest_mw_idx() which will use ntb_peer_resource_idx() but return the MW index starting with the highest index and working down. Signed-off-by: Logan Gunthorpe <logang@deltatee.com> Cc: Dave Jiang <dave.jiang@intel.com> Cc: Allen Hubbe <allenbh@gmail.com> Signed-off-by: Jon Mason <jdmason@kudzu.us>
2019-06-13NTB: Introduce helper functions to calculate logical port numberLogan Gunthorpe
This patch introduces the "Logical Port Number" which is similar to the "Port Number" in that it enumerates the ports in the system. The original (or Physical) "Port Number" can be any number used by the hardware to uniquely identify a port in the system. The "Logical Port Number" enumerates all ports in the system from 0 to the number of ports minus one. For example a system with 5 ports might have the following port numbers which would be enumerated thusly: Port Number: 1 2 5 7 116 Logical Port Number: 0 1 2 3 4 The logical port number is useful when calculating which resources to use for which peers. So we thus define two helper functions: ntb_logical_port_number() and ntb_peer_logical_port_number() which provide the "Logical Port Number" for the local port and any peer respectively. Signed-off-by: Logan Gunthorpe <logang@deltatee.com> Cc: Dave Jiang <dave.jiang@intel.com> Cc: Allen Hubbe <allenbh@gmail.com> Cc: Serge Semin <fancer.lancer@gmail.com> Signed-off-by: Jon Mason <jdmason@kudzu.us>
2019-06-13PCI/MSI: Support allocating virtual MSI interruptsLogan Gunthorpe
For NTB devices, we want to be able to trigger MSI interrupts through a memory window. In these cases we may want to use more interrupts than the NTB PCI device has available in its MSI-X table. We allow for this by creating a new 'virtual' interrupt. These interrupts are allocated as usual but are not programmed into the MSI-X table (as there may not be space for them). The MSI address and data will then handled through an NTB MSI library introduced later in this series. Signed-off-by: Logan Gunthorpe <logang@deltatee.com> Acked-by: Bjorn Helgaas <bhelgaas@google.com> Signed-off-by: Jon Mason <jdmason@kudzu.us>
2019-06-13NTB: correct ntb_dev_ops and ntb_dev comment typosWesley Sheng
The comment for ntb_dev_ops and ntb_dev incorrectly referred to ntb_ctx_ops and ntb_device. Signed-off-by: Wesley Sheng <wesley.sheng@microchip.com> Reviewed-by: Logan Gunthorpe <logang@deltatee.com> Signed-off-by: Jon Mason <jdmason@kudzu.us>
2019-06-12firmware: ti_sci: Add support for processor controlSuman Anna
Texas Instrument's System Control Interface (TI-SCI) Message Protocol is used in Texas Instrument's System on Chip (SoC) such as those in K3 family AM654 SoC to communicate between various compute processors with a central system controller entity. The system controller provides various services including the control of other compute processors within the SoC. Extend the TI-SCI protocol support to add various TI-SCI commands to invoke services associated with power and reset control, and boot vector management of the various compute processors from the Linux kernel. Signed-off-by: Suman Anna <s-anna@ti.com> Signed-off-by: Tero Kristo <t-kristo@ti.com> Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
2019-06-12firmware: ti_sci: Add resource management APIs for ringacc, psi-l and udmaPeter Ujfalusi
Configuration of NAVSS resource, like rings, UDMAP channels, flows and PSI-L thread management need to be done via TISCI. Add the needed structures and functions for NAVSS resource configuration of the following: Rings from Ring Accelerator PSI-L thread management UDMAP tchan, rchan and rflow configuration. Signed-off-by: Peter Ujfalusi <peter.ujfalusi@ti.com> Reviewed-by: Lokesh Vutla <lokeshvutla@ti.com> Signed-off-by: Tero Kristo <t-kristo@ti.com> Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
2019-06-13gpio: Fix build warnings on undefined struct pinctrl_devEnrico Weigelt
This fixes the warnings: * include/linux/gpio.h:254:11: warning: 'struct pinctrl_dev' declared inside parameter list will not be visible outside of this definition or declaration * include/linux/gpio/driver.h:602:11: warning: 'struct pinctrl_dev' declared inside parameter list will not be visible outside of this definition or declaration Fixes: 78b99577b393 ("pinctrl: remove unused pin_is_valid()") Reported-by: kbuild test robot <lkp@intel.com> Signed-off-by: Enrico Weigelt <info@metux.net> Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
2019-06-12tcp: add optional per socket transmit delayEric Dumazet
Adding delays to TCP flows is crucial for studying behavior of TCP stacks, including congestion control modules. Linux offers netem module, but it has unpractical constraints : - Need root access to change qdisc - Hard to setup on egress if combined with non trivial qdisc like FQ - Single delay for all flows. EDT (Earliest Departure Time) adoption in TCP stack allows us to enable a per socket delay at a very small cost. Networking tools can now establish thousands of flows, each of them with a different delay, simulating real world conditions. This requires FQ packet scheduler or a EDT-enabled NIC. This patchs adds TCP_TX_DELAY socket option, to set a delay in usec units. unsigned int tx_delay = 10000; /* 10 msec */ setsockopt(fd, SOL_TCP, TCP_TX_DELAY, &tx_delay, sizeof(tx_delay)); Note that FQ packet scheduler limits might need some tweaking : man tc-fq PARAMETERS limit Hard limit on the real queue size. When this limit is reached, new packets are dropped. If the value is lowered, packets are dropped so that the new limit is met. Default is 10000 packets. flow_limit Hard limit on the maximum number of packets queued per flow. Default value is 100. Use of TCP_TX_DELAY option will increase number of skbs in FQ qdisc, so packets would be dropped if any of the previous limit is hit. Use of a jump label makes this support runtime-free, for hosts never using the option. Also note that TSQ (TCP Small Queues) limits are slightly changed with this patch : we need to account that skbs artificially delayed wont stop us providind more skbs to feed the pipe (netem uses skb_orphan_partial() for this purpose, but FQ can not use this trick) Because of that, using big delays might very well trigger old bugs in TSO auto defer logic and/or sndbuf limited detection. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-06-12fbcon: Call con2fb_map functions directlyDaniel Vetter
These are actually fbcon ioctls which just happen to be exposed through /dev/fb*. They completely ignore which fb_info they're called on, and I think the userspace tool even hardcodes to /dev/fb0. Hence just forward the entire thing to fbcon.c wholesale. Note that this patch drops the fb_lock/unlock on the set side. Since the ioctl can operate on any fb (as passed in through con2fb.framebuffer) this is bogus. Also note that fbcon.c in general never calls fb_lock on anything, so this has been badly broken already. With this the last user of the fbcon notifier callback is gone, and we can garbage collect that too. v2: add missing uaccess.h include (alpha fails to compile otherwise), reported by kbuild. v3: Remember to also drop the #defines (Maarten) v4: Add the static inline to dummy functions. Signed-off-by: Daniel Vetter <daniel.vetter@intel.com> Reviewed-by: Sam Ravnborg <sam@ravnborg.org> Reviewed-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Cc: Bartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com> Cc: Hans de Goede <hdegoede@redhat.com> Cc: Yisheng Xie <ysxie@foxmail.com> Cc: "Michał Mirosław" <mirq-linux@rere.qmqm.pl> Cc: Peter Rosin <peda@axentia.se> Cc: Mikulas Patocka <mpatocka@redhat.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190528090304.9388-31-daniel.vetter@ffwll.ch
2019-06-12vgaswitcheroo: call fbcon_remap_all directlyDaniel Vetter
While at it, clean up the interface a bit and push the console locking into fbcon.c. v2: Remove now outdated comment (Lukas). v3: Forgot to add static inline to the dummy function. Acked-by: Lukas Wunner <lukas@wunner.de> Signed-off-by: Daniel Vetter <daniel.vetter@intel.com> Reviewed-by: Sam Ravnborg <sam@ravnborg.org> Reviewed-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Cc: Lukas Wunner <lukas@wunner.de> Cc: David Airlie <airlied@linux.ie> Cc: Daniel Vetter <daniel@ffwll.ch> Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Cc: Maxime Ripard <maxime.ripard@bootlin.com> Cc: Sean Paul <sean@poorly.run> Cc: Bartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com> Cc: Hans de Goede <hdegoede@redhat.com> Cc: Yisheng Xie <ysxie@foxmail.com> Cc: linux-fbdev@vger.kernel.org Link: https://patchwork.freedesktop.org/patch/msgid/20190528090304.9388-30-daniel.vetter@ffwll.ch