summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2018-04-04net: dsa: mt7530: Use NULL instead of plain integerFlorian Fainelli
We would be passing 0 instead of NULL as the rsp argument to mt7530_fdb_cmd(), fix that. Fixes: b8f126a8d543 ("net-next: dsa: add dsa support for Mediatek MT7530 switch") Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Acked-by: Sean Wang <sean.wang@mediatek.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-04-04net: dsa: b53: Fix sparse warnings in b53_mmap.cFlorian Fainelli
sparse complains about the following warnings: drivers/net/dsa/b53/b53_mmap.c:33:31: warning: incorrect type in initializer (different address spaces) drivers/net/dsa/b53/b53_mmap.c:33:31: expected unsigned char [noderef] [usertype] <asn:2>*regs drivers/net/dsa/b53/b53_mmap.c:33:31: got void *priv and indeed, while what we are doing is functional, we are dereferencing a void * pointer into a void __iomem * which is not great. Just use the defined b53_mmap_priv structure which holds our register base and use that. Fixes: 967dd82ffc52 ("net: dsa: b53: Add support for Broadcom RoboSwitch") Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-04-04af_unix: remove redundant lockdep classCong Wang
After commit 581319c58600 ("net/socket: use per af lockdep classes for sk queues") sock queue locks now have per-af lockdep classes, including unix socket. It is no longer necessary to workaround it. I noticed this while looking at a syzbot deadlock report, this patch itself doesn't fix it (this is why I don't add Reported-by). Fixes: 581319c58600 ("net/socket: use per af lockdep classes for sk queues") Cc: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Acked-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-04-04Merge branch 'net-Broadcom-drivers-sparse-fixes'David S. Miller
Florian Fainelli says: ==================== net: Broadcom drivers sparse fixes This patch series fixes the same warning reported by sparse in bcmsysport and bcmgenet in the code that deals with inserting the TX checksum pointers: drivers/net/ethernet/broadcom/bcmsysport.c:1155:26: warning: cast from restricted __be16 drivers/net/ethernet/broadcom/bcmsysport.c:1155:26: warning: incorrect type in argument 1 (different base types) drivers/net/ethernet/broadcom/bcmsysport.c:1155:26: expected unsigned short [unsigned] [usertype] val drivers/net/ethernet/broadcom/bcmsysport.c:1155:26: got restricted __be16 [usertype] protocol This patch fixes both issues by using the same construct and not swapping skb->protocol but instead the values we are checking against. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2018-04-04net: systemport: Fix sparse warnings in bcm_sysport_insert_tsb()Florian Fainelli
skb->protocol is a __be16 which we would be calling htons() against, while this is not wrong per-se as it correctly results in swapping the value on LE hosts, this still upsets sparse. Adopt a similar pattern to what other drivers do and just assign ip_ver to skb->protocol, and then use htons() against the different constants such that the compiler can resolve the values at build time. Fixes: 80105befdb4b ("net: systemport: add Broadcom SYSTEMPORT Ethernet MAC driver") Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-04-04net: bcmgenet: Fix sparse warnings in bcmgenet_put_tx_csum()Florian Fainelli
skb->protocol is a __be16 which we would be calling htons() against, while this is not wrong per-se as it correctly results in swapping the value on LE hosts, this still upsets sparse. Adopt a similar pattern to what other drivers do and just assign ip_ver to skb->protocol, and then use htons() against the different constants such that the compiler can resolve the values at build time. Fixes: 1c1008c793fa ("net: bcmgenet: add main driver file") Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-04-04rxrpc: Fix undefined packet handlingDavid Howells
By analogy with other Rx implementations, RxRPC packet types 9, 10 and 11 should just be discarded rather than being aborted like other undefined packet types. Reported-by: Jeffrey Altman <jaltman@auristor.com> Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-04-04MAINTAINERS: Add John Garry as maintainer for HiSilicon LPC driverJohn Garry
Add John Garry as maintainer for drivers/bus/hisi_lpc.c, the HiSilicon LPC driver. Signed-off-by: John Garry <john.garry@huawei.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2018-04-04HISI LPC: Add ACPI supportJohn Garry
Based on the previous patches, this patch supports the LPC host on Hip06/Hip07 for ACPI FW. It is the responsibility of the LPC host driver to enumerate the child devices, as the ACPI scan code will not enumerate children of "indirect IO" hosts. The ACPI table for the LPC host controller and the child devices is in the following format: Device (LPC0) { Name (_HID, "HISI0191") // HiSi LPC Name (_CRS, ResourceTemplate () { Memory32Fixed (ReadWrite, 0xa01b0000, 0x1000) }) } Device (LPC0.IPMI) { Name (_HID, "IPI0001") Name (LORS, ResourceTemplate() { QWordIO ( ResourceConsumer, MinNotFixed, // _MIF MaxNotFixed, // _MAF PosDecode, EntireRange, 0x0, // _GRA 0xe4, // _MIN 0x3fff, // _MAX 0x0, // _TRA 0x04, // _LEN , , BTIO ) }) Since the IO resources of the child devices need to be translated from LPC bus addresses to logical PIO addresses, and we shouldn't modify the resources of the devices generated in the FW scan, a per-child MFD is created as a substitute. The MFD IO resources will be the translated bus addresses of the ACPI child. Tested-by: dann frazier <dann.frazier@canonical.com> Signed-off-by: John Garry <john.garry@huawei.com> Signed-off-by: Zhichang Yuan <yuanzhichang@hisilicon.com> Signed-off-by: Gabriele Paoloni <gabriele.paoloni@huawei.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Andy Shevchenko <andy.shevchenko@gmail.com>
2018-04-04ACPI / scan: Do not enumerate Indirect IO host childrenJohn Garry
Through the logical PIO framework, systems which otherwise have no IO space access to legacy ISA/LPC devices may access these devices through so-called "indirect IO" method. In this, IO space accesses for non-PCI hosts are redirected to a host LLDD to manually generate the IO space (bus) accesses. Hosts are able to register a region in logical PIO space to map to its bus address range. Indirect IO child devices have an associated host-specific bus address. Special translation is required to map between a logical PIO address for a device and its host bus address. Since in the ACPI tables the child device IO resources would be the host-specific values, it is required the ACPI scan code should not enumerate these devices, and that this should be the responsibility of the host driver so that it can "fixup" the resources so that they map to the appropriate logical PIO addresses. To avoid enumerating these child devices, add a check from acpi_device_enumeration_by_parent() as to whether the parent for a device is a member of a known list of "indirect IO" hosts. For now, the HiSilicon LPC host controller ID is added. Tested-by: dann frazier <dann.frazier@canonical.com> Signed-off-by: John Garry <john.garry@huawei.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Andy Shevchenko <andy.shevchenko@gmail.com> Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2018-04-04ACPI / scan: Rename acpi_is_serial_bus_slave() for more general useJohn Garry
Currently the ACPI scan has special handling for serial bus slaves, in that it makes it the responsibility of the slave device's parent to enumerate the device. To support other types of slave devices which require the same special handling but where the bus is not strictly a serial bus, such as devices on the HiSilicon LPC controller bus, rename acpi_is_serial_bus_slave() to acpi_device_enumeration_by_parent(), so that the name can fit the wider purpose. Also rename the associated device flag acpi_device_flags.serial_bus_slave to .enumeration_by_parent. Signed-off-by: John Garry <john.garry@huawei.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2018-04-04HISI LPC: Support the LPC host on Hip06/Hip07 with DT bindingsZhichang Yuan
The low-pin-count (LPC) interface of Hip06/Hip07 accesses I/O port space of peripherals. Implement the LPC host controller driver which performs the I/O operations on the underlying hardware. We don't want to touch existing drivers such as ipmi-bt, so this driver applies the indirect-IO introduced in the previous patch after registering an indirect-IO node to the indirect-IO devices list which will be searched by the I/O accessors to retrieve the host-local I/O port. The driver config is set as a bool instead of a tristate. The reason here is that, by the very nature of the driver providing a logical PIO range, it does not make sense to have this driver as a loadable module. Another more specific reason is that the Huawei D03 board which includes Hip06 SoC requires the LPC bus for UART console, so should be built in. Tested-by: dann frazier <dann.frazier@canonical.com> Signed-off-by: Zou Rongrong <zourongrong@huawei.com> Signed-off-by: Zhichang Yuan <yuanzhichang@hisilicon.com> Signed-off-by: John Garry <john.garry@huawei.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Andy Shevchenko <andy.shevchenko@gmail.com> Acked-by: Rob Herring <robh@kernel.org> # dts part
2018-04-04of: Add missing I/O range exception for indirect-IO devicesZhichang Yuan
There are some special ISA/LPC devices that work on a specific I/O range where it is not correct to specify a 'ranges' property in the DTS parent node as CPU addresses translated from DTS node are only for memory space on some architectures, such as ARM64. Without the parent 'ranges' property, of_translate_address() returns an error. Here we add special handling for this case. During the OF address translation, some checking will be performed to identify whether the device node is registered as indirect-IO. If it is, the I/O translation will be done in a different way from that one of PCI MMIO. In this way, the I/O 'reg' property of the special ISA/LPC devices will be parsed correctly. Tested-by: dann frazier <dann.frazier@canonical.com> Signed-off-by: Zhichang Yuan <yuanzhichang@hisilicon.com> Signed-off-by: Gabriele Paoloni <gabriele.paoloni@huawei.com> Signed-off-by: Arnd Bergmann <arnd@arndb.de> # earlier draft Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Andy Shevchenko <andy.shevchenko@gmail.com> Acked-by: Rob Herring <robh@kernel.org>
2018-04-04PCI: Apply the new generic I/O management on PCI IO hostsZhichang Yuan
After introducing the new generic I/O space management (Logical PIO), the original PCI MMIO relevant helpers need to be updated based on the new interfaces defined in logical PIO. Adapt the corresponding code to match the changes introduced by logical PIO. Tested-by: dann frazier <dann.frazier@canonical.com> Signed-off-by: Zhichang Yuan <yuanzhichang@hisilicon.com> Signed-off-by: Gabriele Paoloni <gabriele.paoloni@huawei.com> Signed-off-by: Arnd Bergmann <arnd@arndb.de> # earlier draft Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Andy Shevchenko <andy.shevchenko@gmail.com>
2018-04-04PCI: Add fwnode handler as input param of pci_register_io_range()Gabriele Paoloni
In preparation for having the PCI MMIO helpers use the new generic I/O space management (logical PIO) we need to add the fwnode handler as an extra input parameter. Changes the signature of pci_register_io_range() and its callers as needed. Tested-by: dann frazier <dann.frazier@canonical.com> Signed-off-by: Gabriele Paoloni <gabriele.paoloni@huawei.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Andy Shevchenko <andy.shevchenko@gmail.com> Acked-by: Rob Herring <robh@kernel.org>
2018-04-04PCI: Remove __weak tag from pci_register_io_range()Gabriele Paoloni
pci_register_io_range() has only one definition, so there is no need for the __weak attribute. Remove it. Tested-by: dann frazier <dann.frazier@canonical.com> Signed-off-by: Gabriele Paoloni <gabriele.paoloni@huawei.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Andy Shevchenko <andy.shevchenko@gmail.com>
2018-04-04fscache: Attach the index key and aux data to the cookieDavid Howells
Attach copies of the index key and auxiliary data to the fscache cookie so that: (1) The callbacks to the netfs for this stuff can be eliminated. This can simplify things in the cache as the information is still available, even after the cache has relinquished the cookie. (2) Simplifies the locking requirements of accessing the information as we don't have to worry about the netfs object going away on us. (3) The cache can do lazy updating of the coherency information on disk. As long as the cache is flushed before reboot/poweroff, there's no need to update the coherency info on disk every time it changes. (4) Cookies can be hashed or put in a tree as the index key is easily available. This allows: (a) Checks for duplicate cookies can be made at the top fscache layer rather than down in the bowels of the cache backend. (b) Caching can be added to a netfs object that has a cookie if the cache is brought online after the netfs object is allocated. A certain amount of space is made in the cookie for inline copies of the data, but if it won't fit there, extra memory will be allocated for it. The downside of this is that live cache operation requires more memory. Signed-off-by: David Howells <dhowells@redhat.com> Acked-by: Anna Schumaker <anna.schumaker@netapp.com> Tested-by: Steve Dickson <steved@redhat.com>
2018-04-04fscache: Add more tracepointsDavid Howells
Add more tracepoints to fscache, including: (*) fscache_page - Tracks netfs pages known to fscache. (*) fscache_check_page - Tracks the netfs querying whether a page is pending storage. (*) fscache_wake_cookie - Tracks cookies being woken up after a page completes/aborts storage in the cache. (*) fscache_op - Tracks operations being initialised. (*) fscache_wrote_page - Tracks return of the backend write_page op. (*) fscache_gang_lookup - Tracks lookup of pages to be stored in the write operation. Signed-off-by: David Howells <dhowells@redhat.com>
2018-04-04fscache: Add tracepointsDavid Howells
Add some tracepoints to fscache: (*) fscache_cookie - Tracks a cookie's usage count. (*) fscache_netfs - Logs registration of a network filesystem, including the pointer to the cookie allocated. (*) fscache_acquire - Logs cookie acquisition. (*) fscache_relinquish - Logs cookie relinquishment. (*) fscache_enable - Logs enablement of a cookie. (*) fscache_disable - Logs disablement of a cookie. (*) fscache_osm - Tracks execution of states in the object state machine. and cachefiles: (*) cachefiles_ref - Tracks a cachefiles object's usage count. (*) cachefiles_lookup - Logs result of lookup_one_len(). (*) cachefiles_mkdir - Logs result of vfs_mkdir(). (*) cachefiles_create - Logs result of vfs_create(). (*) cachefiles_unlink - Logs calls to vfs_unlink(). (*) cachefiles_rename - Logs calls to vfs_rename(). (*) cachefiles_mark_active - Logs an object becoming active. (*) cachefiles_wait_active - Logs a wait for an old object to be destroyed. (*) cachefiles_mark_inactive - Logs an object becoming inactive. (*) cachefiles_mark_buried - Logs the burial of an object. Signed-off-by: David Howells <dhowells@redhat.com>
2018-04-04fscache: Fix hanging wait on page discarded by writebackDavid Howells
If the fscache asynchronous write operation elects to discard a page that's pending storage to the cache because the page would be over the store limit then it needs to wake the page as someone may be waiting on completion of the write. The problem is that the store limit may be updated by a different asynchronous operation - and so may miss the write - and that the store limit may not even get updated until later by the netfs. Fix the kernel hang by making fscache_write_op() mark as written any pages that are over the limit. Signed-off-by: David Howells <dhowells@redhat.com>
2018-04-04fscache: Detect multiple relinquishment of a cookieDavid Howells
Report if an fscache cookie is relinquished multiple times by the netfs. Signed-off-by: David <dhowells@redhat.com>
2018-04-04fscache: Pass the correct cancelled indications to fscache_op_complete()David Howells
The last parameter to fscache_op_complete() is a bool indicating whether or not the operation was cancelled. A lot of the time the inverse value is given or no differentiation is made. Fix this. Signed-off-by: David Howells <dhowells@redhat.com>
2018-04-04fscache, cachefiles: Fix checker warningsDavid Howells
Fix a couple of checker warnings in fscache and cachefiles: (1) fscache_n_op_requeue is never used, so get rid of it. (2) cachefiles_uncache_page() is passed in a lock that it releases, so this needs annotating. Signed-off-by: David Howells <dhowells@redhat.com>
2018-04-04afs: Be more aggressive in retiring cached vnodesDavid Howells
When relinquishing cookies, either due to iget failure or to inode eviction, retire a cookie if we think the corresponding vnode got deleted on the server rather than just letting it lie in the cache. Signed-off-by: David Howells <dhowells@redhat.com>
2018-04-04afs: Use the vnode ID uniquifier in the cache key not the aux dataDavid Howells
AFS vnodes (files) are referenced by a triplet of { volume ID, vnode ID, uniquifier }. Currently, kafs is only using the vnode ID as the file key in the volume fscache index and checking the uniquifier on cookie acquisition against the contents of the auxiliary data stored in the cache. Unfortunately, this is subject to a race in which an FS.RemoveFile or FS.RemoveDir op is issued against the server but the local afs inode isn't torn down and disposed off before another thread issues something like FS.CreateFile. The latter then gets given the vnode ID that just got removed, but with a new uniquifier and a cookie collision occurs in the cache because the cookie is only keyed on the vnode ID whereas the inode is keyed on the vnode ID plus the uniquifier. Fix this by keying the cookie on the uniquifier in addition to the vnode ID and dropping the uniquifier from the auxiliary data supplied. Signed-off-by: David Howells <dhowells@redhat.com>
2018-04-04afs: Invalidate cache on server data changeDavid Howells
Invalidate any data stored in fscache for a vnode that changes on the server so that we don't end up with the cache in a bad state locally. Signed-off-by: David Howells <dhowells@redhat.com>
2018-04-04cxl: Fix possible deadlock when processing page faults from cxllibFrederic Barrat
cxllib_handle_fault() is called by an external driver when it needs to have the host resolve page faults for a buffer. The buffer can cover several pages and VMAs. The function iterates over all the pages used by the buffer, based on the page size of the VMA. To ensure some stability while processing the faults, the thread T1 grabs the mm->mmap_sem semaphore with read access (R1). However, when processing a page fault for a single page, one of the underlying functions, copro_handle_mm_fault(), also grabs the same semaphore with read access (R2). So the thread T1 takes the semaphore twice. If another thread T2 tries to access the semaphore in write mode W1 (say, because it wants to allocate memory and calls 'brk'), then that thread T2 will have to wait because there's a reader (R1). If the thread T1 is processing a new page at that time, it won't get an automatic grant at R2, because there's now a writer thread waiting (T2). And we have a deadlock. The timeline is: 1. thread T1 owns the semaphore with read access R1 2. thread T2 requests write access W1 and waits 3. thread T1 requests read access R2 and waits The fix is for the thread T1 to release the semaphore R1 once it got the information it needs from the current VMA. The address space/VMAs could evolve while T1 iterates over the full buffer, but in the unlikely case where T1 misses a page, the external driver will raise a new page fault when retrying the memory access. Fixes: 3ced8d730063 ("cxl: Export library to support IBM XSL") Cc: stable@vger.kernel.org # 4.13+ Signed-off-by: Frederic Barrat <fbarrat@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-04-04powerpc/hw_breakpoint: Only disable hw breakpoint if cpu supports itNaveen N. Rao
We get the below warning if we try to use kexec on P9: kexec_core: Starting new kernel WARNING: CPU: 0 PID: 1223 at arch/powerpc/kernel/process.c:826 __set_breakpoint+0xb4/0x140 [snip] NIP __set_breakpoint+0xb4/0x140 LR kexec_prepare_cpus_wait+0x58/0x150 Call Trace: 0xc0000000ee70fb20 (unreliable) 0xc0000000ee70fb20 default_machine_kexec+0x234/0x2c0 machine_kexec+0x84/0x90 kernel_kexec+0xd8/0xe0 SyS_reboot+0x214/0x2c0 system_call+0x58/0x6c This happens since we are trying to clear hw breakpoint on POWER9, though we don't have CPU_FTR_DAWR enabled. Guard __set_breakpoint() within hw_breakpoint_disable() with ppc_breakpoint_available() to address this. Fixes: 9654153158d3 ("powerpc: Disable DAWR in the base POWER9 CPU features") Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-04-04openrisc: Set CONFIG_MULTI_IRQ_HANDLERPalmer Dabbelt
arm has an optional MULTI_IRQ_HANDLER, which openrisc copied but didn't make optional. The multi irq handler infrastructure has been copied to generic code selectable with a new config symbol. That symbol can be selected by randconfig builds and can cause build breakage. Introduce CONFIG_MULTI_IRQ_HANDLER as an intermediate step which prevents the core config symbol from being selected. The openrisc local config symbol will be removed once openrisc gets converted to the generic code. Signed-off-by: Palmer Dabbelt <palmer@sifive.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Arnd Bergmann <arnd@arndb.de> Link: https://lkml.kernel.org/r/20180404043130.31277-3-palmer@sifive.com
2018-04-04arm64: Set CONFIG_MULTI_IRQ_HANDLERPalmer Dabbelt
arm has an optional MULTI_IRQ_HANDLER, which arm64 copied but didn't make optional. The multi irq handler infrastructure has been copied to generic code selectable with a new config symbol. That symbol can be selected by randconfig builds and can cause build breakage. Introduce CONFIG_MULTI_IRQ_HANDLER as an intermediate step which prevents the core config symbol from being selected. The arm64 local config symbol will be removed once arm64 gets converted to the generic code. Signed-off-by: Palmer Dabbelt <palmer@sifive.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Arnd Bergmann <arnd@arndb.de> Link: https://lkml.kernel.org/r/20180404043130.31277-2-palmer@sifive.com
2018-04-04genirq: Make GENERIC_IRQ_MULTI_HANDLER depend on !MULTI_IRQ_HANDLERPalmer Dabbelt
These config switches enable the same code in the core and the not yet converted architecture code. They can be selected both by randconfig builds and cause linker error because the same symbols are defined twice. Make the new GENERIC_IRQ_MULTI_HANDLER depend on !MULTI_IRQ_HANDLER to prevent that. The dependency will be removed once all architectures are converted over. Signed-off-by: Palmer Dabbelt <palmer@sifive.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Arnd Bergmann <arnd@arndb.de> Link: https://lkml.kernel.org/r/20180404043130.31277-4-palmer@sifive.com
2018-04-04powerpc/mm/radix: Update command line parsing for disable_radixAneesh Kumar K.V
kernel parameter disable_radix takes different options disable_radix=yes|no|1|0 or just disable_radix. prom_init parsing is not supporting these options. Fixes: 1fd6c0220710 ("powerpc/mm: Add a CONFIG option to choose if radix is used by default") Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-04-04powerpc/mm/radix: Parse disable_radix commandline correctly.Aneesh Kumar K.V
kernel parameter disable_radix takes different options disable_radix=yes|no|1|0 or just disable_radix. When using the later format we get below error. `Malformed early option 'disable_radix'` Fixes: 1fd6c0220710 ("powerpc/mm: Add a CONFIG option to choose if radix is used by default") Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-04-04powerpc/mm/hugetlb: initialize the pagetable cache correctly for hugetlbAneesh Kumar K.V
With 64k page size, we have hugetlb pte entries at the pmd and pud level for book3s64. We don't need to create a separate page table cache for that. With 4k we need to make sure hugepd page table cache for 16M is placed at PUD level and 16G at the PGD level. Simplify all these by not using HUGEPD_PD_SHIFT which is confusing for book3s64. Without this patch, with 64k page size we create pagetable caches with shift value 10 and 7 which are not used at all. Fixes: 419df06eea5b ("powerpc: Reduce the PTE_INDEX_SIZE") Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-04-04powerpc/mm/radix: Update pte fragment count from 16 to 256 on radixAneesh Kumar K.V
With split PTL (page table lock) config, we allocate the level 4 (leaf) page table using pte fragment framework instead of slab cache like other levels. This was done to enable us to have split page table lock at the level 4 of the page table. We use page->plt backing the all the level 4 pte fragment for the lock. Currently with Radix, we use only 16 fragments out of the allocated page. In radix each fragment is 256 bytes which means we use only 4k out of the allocated 64K page wasting 60k of the allocated memory. This was done earlier to keep it closer to hash. This patch update the pte fragment count to 256, thereby using the full 64K page and reducing the memory usage. Performance tests shows really low impact even with THP disabled. With THP disabled we will be contenting further less on level 4 ptl and hence the impact should be further low. 256 threads: without patch (10 runs of ./ebizzy -m -n 1000 -s 131072 -S 100) median = 15678.5 stdev = 42.1209 with patch: median = 15354 stdev = 194.743 This is with THP disabled. With THP enabled the impact of the patch will be less. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-04-04powerpc/mm/keys: Update documentation and remove unnecessary checkAneesh Kumar K.V
Adds more code comments. We also remove an unnecessary pkey check after we check for pkey error in this patch. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-04-04Merge branch 'old.dcache' into work.dcacheAl Viro
2018-04-03RDMA: Use ib_gid_attr during GID modificationParav Pandit
Now that ib_gid_attr contains device, port and index, simplify the provider APIs add_gid() and del_gid() to use device, port and index fields from the ib_gid_attr attributes structure. Signed-off-by: Parav Pandit <parav@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-04-03IB/providers: Avoid null netdev check for RoCEParav Pandit
Now that IB core GID cache ensures that all RoCE entries have an associated netdev remove null checks from the provider drivers for clarity. Reviewed-by: Mark Bloch <markb@mellanox.com> Signed-off-by: Parav Pandit <parav@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-04-03IB/providers: Avoid zero GID check for RoCEParav Pandit
Now that the IB core GID cache ensures that a zero GID doesn't exist in the GID table remove zero GID checks from the provider drivers for clarity. Reviewed-by: Mark Bloch <markb@mellanox.com> Signed-off-by: Parav Pandit <parav@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-04-03IB/core: Refactor GID modify code for RoCEParav Pandit
Code is refactored to prepare separate functions for RoCE which can do more complex operations related to reference counting, while still maintainining code readability. This includes (a) Simplification to not perform netdevice checks and modifications for IB link layer. (b) Do not add RoCE GID entry which has NULL netdevice; instead return an error. (c) If GID addition fails at provider level add_gid(), do not add the entry in the cache and keep the entry marked as INVALID. (d) Simplify and reuse the ib_cache_gid_add()/del() routines so that they can be used even for modifying default GIDs. This avoid some code duplication in modifying default GIDs. (e) find_gid() routine refers to the data entry flags to qualify a GID as valid or invalid GID rather than depending on attributes and zeroness of the GID content. (f) gid_table_reserve_default() sets the GID default attribute at beginning while setting up the GID table. There is no need to use default_gid flag in low level functions such as write_gid(), add_gid(), del_gid(), as they never need to update the DEFAULT property of the GID entry while during GID table update. As as result of this refactor, reserved GID 0:0:0:0:0:0:0:0 is no longer searchable as described below. A unicast GID entry of 0:0:0:0:0:0:0:0 is Reserved GID as per the IB spec version 1.3 section 4.1.1, point (6) whose snippet is below. "The unicast GID address 0:0:0:0:0:0:0:0 is reserved - referred to as the Reserved GID. It shall never be assigned to any endport. It shall not be used as a destination address or in a global routing header (GRH)." GID table cache now only stores valid GID entries. Before this patch, Reserved GID 0:0:0:0:0:0:0:0 was searchable in the GID table using ib_find_cached_gid_by_port() and other similar find routines. Zero GID is no longer searchable as it shall not to be present in GRH or path recored entry as described in IB spec version 1.3 section 4.1.1, point (6), section 12.7.10 and section 12.7.20. ib_cache_update() is simplified to check link layer once, use unified locking scheme for all link layers, removed temporary gid table allocation/free logic. Additionally, (a) Expand ib_gid_attr to store port and index so that GID query routines can get port and index information from the attribute structure. (b) Expand ib_gid_attr to store device as well so that in future code when GID reference counting is done, device is used to reach back to the GID table entry. Signed-off-by: Parav Pandit <parav@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-04-03IB/core: Simplify ib_query_gid to always refer to cacheParav Pandit
Currently following inconsistencies exist. 1. ib_query_gid() returns GID from the software cache for a RoCE port and returns GID from the HCA for an IB port. This is incorrect because software GID cache is maintained regardless of HCA port type. 2. GID is queries from the HCA via ib_query_gid and updated in the software cache for IB link layer. Both of them might not be in sync. ULPs such as SRP initiator, SRP target, IPoIB driver have historically used ib_query_gid() API to query the GID. However CM used cached version during CM processing, When software cache was introduced, this inconsitency remained. In order to simplify, improve readability and avoid link layer specific above inconsistencies, this patch brings following changes. 1. ib_query_gid() always refers to the cache layer regardless of link layer. 2. cache module who reads the GID entry from HCA and builds the cache, directly invokes the HCA provider verb's query_gid() callback function. 3. ib_query_port() is being called in early stage where GID cache is not yet build while reading port immutable property. Therefore it needs to read the default GID from the HCA for IB link layer to publish the subnet prefix. Signed-off-by: Parav Pandit <parav@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-04-03RDMA/providers: Simplify query_gid callback of RoCE providersParav Pandit
ib_query_gid() fetches the GID from the software cache maintained in ib_core for RoCE ports. Therefore, simplify the provider drivers for RoCE to treat query_gid() callback as never called for RoCE, and only require non-RoCE devices to implement it. Signed-off-by: Parav Pandit <parav@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-04-03RDMA/core: Update query_gid documentation for HCA driversParav Pandit
query_gid() should return right GID value for iWarp and IB link layers. It is a no-op for RoCE link layer. Update the documentation to reflect this. Signed-off-by: Parav Pandit <parav@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-04-03RDMA/ucma: Don't allow setting RDMA_OPTION_IB_PATH without an RDMA deviceRoland Dreier
Check to make sure that ctx->cm_id->device is set before we use it. Otherwise userspace can trigger a NULL dereference by doing RDMA_USER_CM_CMD_SET_OPTION on an ID that is not bound to a device. Cc: <stable@vger.kernel.org> Reported-by: <syzbot+a67bc93e14682d92fc2f@syzkaller.appspotmail.com> Signed-off-by: Roland Dreier <roland@purestorage.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-04-03Merge branch 'userns-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace Pull namespace updates from Eric Biederman: "There was a lot of work this cycle fixing bugs that were discovered after the merge window and getting everything ready where we can reasonably support fully unprivileged fuse. The bug fixes you already have and much of the unprivileged fuse work is coming in via other trees. Still left for fully unprivileged fuse is figuring out how to cleanly handle .set_acl and .get_acl in the legacy case, and properly handling of evm xattrs on unprivileged mounts. Included in the tree is a cleanup from Alexely that replaced a linked list with a statically allocated fix sized array for the pid caches, which simplifies and speeds things up. Then there is are some cleanups and fixes for the ipc namespace. The motivation was that in reviewing other code it was discovered that access ipc objects from different pid namespaces recorded pids in such a way that when asked the wrong pids were returned. In the worst case there has been a measured 30% performance impact for sysvipc semaphores. Other test cases showed no measurable performance impact. Manfred Spraul and Davidlohr Bueso who tend to work on sysvipc performance both gave the nod that this is good enough. Casey Schaufler and James Morris have given their approval to the LSM side of the changes. I simplified the types and the code dealing with sysvipc to pass just kern_ipc_perm for all three types of ipc. Which reduced the header dependencies throughout the kernel and simplified the lsm code. Which let me work on the pid fixes without having to worry about trivial changes causing complete kernel recompiles" * 'userns-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace: ipc/shm: Fix pid freeing. ipc/shm: fix up for struct file no longer being available in shm.h ipc/smack: Tidy up from the change in type of the ipc security hooks ipc: Directly call the security hook in ipc_ops.associate ipc/sem: Fix semctl(..., GETPID, ...) between pid namespaces ipc/msg: Fix msgctl(..., IPC_STAT, ...) between pid namespaces ipc/shm: Fix shmctl(..., IPC_STAT, ...) between pid namespaces. ipc/util: Helpers for making the sysvipc operations pid namespace aware ipc: Move IPCMNI from include/ipc.h into ipc/util.h msg: Move struct msg_queue into ipc/msg.c shm: Move struct shmid_kernel into ipc/shm.c sem: Move struct sem and struct sem_array into ipc/sem.c msg/security: Pass kern_ipc_perm not msg_queue into the msg_queue security hooks shm/security: Pass kern_ipc_perm not shmid_kernel into the shm security hooks sem/security: Pass kern_ipc_perm not sem_array into the sem security hooks pidns: simpler allocation of pid_* caches
2018-04-03orangefs: document package install and xfstests procedureMartin Brandenburg
Unless one is working on the userspace code, there's no need to compile OrangeFS. The package works just fine. (But leave the documentation for building from source since not everyone uses distributions which include the package.) Also document the process to run xfstests against OrangeFS. Signed-off-by: Martin Brandenburg <martin@omnibond.com> Signed-off-by: Mike Marshall <hubcap@omnibond.com>
2018-04-03orangefs: remove unused codeMartin Brandenburg
Signed-off-by: Martin Brandenburg <martin@omnibond.com> Signed-off-by: Mike Marshall <hubcap@omnibond.com>
2018-04-03orangefs: make several *_operations structs staticMartin Brandenburg
Signed-off-by: Martin Brandenburg <martin@omnibond.com> Signed-off-by: Mike Marshall <hubcap@omnibond.com>
2018-04-03orangefs: implement vm_ops->faultMartin Brandenburg
Must retrieve size before running filemap_fault so the kernel has an up-to-date size. This should have been caught by xfstests generic/246, but it was masked by orangefs_new_inode, which set i_size to PAGE_SIZE. When nothing caused a getattr prior to a pagefault, i_size was still PAGE_SIZE. Since xfstests only read 10 bytes, it did not catch this bug. When orangefs_new_inode was modified to perform a getattr instead, i_size was set to zero, as it was a newly created file. Then orangefs_file_write_iter did NOT set i_size. Instead it invalidated the attribute cache, which should have caused the next caller to retrieve i_size. But the fault handler did not know it was supposed to retrieve i_size. So during xfstests, i_size was still zero, and filemap_fault returned VM_FAULT_SIGBUS. Fixes xfstests generic/452. Signed-off-by: Martin Brandenburg <martin@omnibond.com> Signed-off-by: Mike Marshall <hubcap@omnibond.com>