summaryrefslogtreecommitdiff
path: root/include
AgeCommit message (Collapse)Author
2017-11-14Merge branch 'pci/virtualization' into nextBjorn Helgaas
* pci/virtualization: PCI: Document reset method return values PCI: Detach driver before procfs & sysfs teardown on device remove PCI: Apply Cavium ThunderX ACS quirk to more Root Ports PCI: Set Cavium ACS capability quirk flags to assert RR/CR/SV/UF PCI: Restore ARI Capable Hierarchy before setting numVFs PCI: Create SR-IOV virtfn/physfn links before attaching driver PCI: Expose SR-IOV offset, stride, and VF device ID via sysfs PCI: Cache the VF device ID in the SR-IOV structure PCI: Add Kconfig PCI_IOV dependency for PCI_REALLOC_ENABLE_AUTO PCI: Remove unused function __pci_reset_function() PCI: Remove reset argument from pci_iov_{add,remove}_virtfn()
2017-11-14Merge branch 'pci/resource' into nextBjorn Helgaas
* pci/resource: PCI: Fail pci_map_rom() if the option ROM is invalid PCI: Move pci_map_rom() error path x86/PCI: Enable a 64bit BAR on AMD Family 15h (Models 00-1f, 30-3f, 60-7f) PCI: Add pci_resize_resource() for resizing BARs PCI: Add resizable BAR infrastructure PCI: Add PCI resource type mask #define
2017-11-14Merge branch 'pci/msi' into nextBjorn Helgaas
* pci/msi: PCI/portdrv: Compute MSI/MSI-X IRQ vectors after final allocation PCI/portdrv: Factor out Interrupt Message Number lookup PCI/portdrv: Consolidate comments PCI/portdrv: Add #defines for AER and DPC Interrupt Message Number masks
2017-11-14Merge branch 'pci/hotplug' into nextBjorn Helgaas
* pci/hotplug: PCI: pciehp: Do not clear Presence Detect Changed during initialization PCI: pciehp: Fix race condition handling surprise link down PCI: Distribute available resources to hotplug-capable bridges PCI: Distribute available buses to hotplug-capable bridges PCI: Do not allocate more buses than available in parent PCI: Open-code the two pass loop when scanning bridges PCI: Move pci_hp_add_bridge() to drivers/pci/probe.c PCI: Add for_each_pci_bridge() helper PCI: shpchp: Convert timers to use timer_setup() PCI: cpqphp: Convert timers to use timer_setup() PCI: pciehp: Convert timers to use timer_setup() PCI: ibmphp: Use common error handling code in unconfigure_boot_device()
2017-11-14PCI/ASPM: Add L1 Substates definitionsBjorn Helgaas
Add and use #defines for L1 Substate register fields instead of hard-coding the masks. Also update comments to use names from the spec. No functional change intended. Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Vidya Sagar <vidyas@nvidia.com>
2017-11-14PCI/ASPM: Reformat ASPM register definitionsBjorn Helgaas
Reformat register field definitions in the style used elsewhere and align comments with names used in the spec. No functional change intended. Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Vidya Sagar <vidyas@nvidia.com>
2017-11-06PCI: Add for_each_pci_bridge() helperAndy Shevchenko
The following pattern is often used: list_for_each_entry(dev, &bus->devices, bus_list) { if (pci_is_bridge(dev)) { ... } } Add a for_each_pci_bridge() helper to make that code easier to write and read by reducing indentation level. It also saves one or few lines of code in each occurrence. Convert PCI core parts here at the same time. Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> [bhelgaas: fold in http://lkml.kernel.org/r/20171013165352.25550-1-andriy.shevchenko@linux.intel.com] Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2017-10-25PCI: Add pci_resize_resource() for resizing BARsChristian König
Add a pci_resize_resource() interface to allow device drivers to resize BARs of their devices. This is useful for devices with large local storage, e.g., graphics devices. These devices often only expose 256MB BARs initially to be compatible with 32-bit systems. This function only tries to reprogram the windows of the bridge directly above the requesting device and only the BAR of the same type (usually mem, 64bit, prefetchable). This is done to avoid disturbing other drivers by changing the BARs of their devices. Drivers should use the following sequence to resize their BARs: 1. Disable memory decoding of the device using the PCI cfg dword. 2. Use pci_release_resource() to release all BARs which can move during the resize, including the one you want to resize. 3. Call pci_resize_resource() for each BAR you want to resize. 4. Call pci_assign_unassigned_bus_resources() to reassign new locations for all BARs which are not resized, but could move. 5. If everything worked as expected, enable memory decoding in the device again using the PCI cfg dword. Signed-off-by: Christian König <christian.koenig@amd.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2017-10-24PCI: Add resizable BAR infrastructureChristian König
Add resizable BAR infrastructure, including defines and helper functions to read the possible sizes of a BAR and update its size. See PCIe r3.1, sec 7.22. Link: https://pcisig.com/sites/default/files/specification_documents/ECN_Resizable-BAR_24Apr2008.pdf Signed-off-by: Christian König <christian.koenig@amd.com> [bhelgaas: rename to functions with "rebar" (to match #defines), drop shift #defines, drop "_MASK" suffixes, fix typos, fix kerneldoc] Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Andy Shevchenko <andy.shevchenko@gmail.com>
2017-10-19PCI/portdrv: Add #defines for AER and DPC Interrupt Message Number masksDongdong Liu
In the AER case, the mask isn't strictly necessary because there are no higher-order bits above the Interrupt Message Number, but using a #define will make it possible to grep for it. Suggested-by: Bjorn Helgaas <bhelgaas@google.com> Signed-off-by: Dongdong Liu <liudongdong3@huawei.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
2017-10-05PCI: Remove unused function __pci_reset_function()Jan H. Schönherr
The last caller of __pci_reset_function() has been removed. Remove the function as well. Signed-off-by: Jan H. Schönherr <jschoenh@amazon.de> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2017-10-05PCI: Remove reset argument from pci_iov_{add,remove}_virtfn()Jan H. Schönherr
The "reset" argument passed to pci_iov_add_virtfn() and pci_iov_remove_virtfn() is always zero since 46cb7b1bd86f ("PCI: Remove unused SR-IOV VF Migration support") Remove the argument together with the associated code. Signed-off-by: Jan H. Schönherr <jschoenh@amazon.de> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Acked-by: Russell Currey <ruscur@russell.cc>
2017-10-01Merge branch 'timers-urgent-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull timer fixes from Thomas Gleixner: "This adds a new timer wheel function which is required for the conversion of the timer callback function from the 'unsigned long data' argument to 'struct timer_list *timer'. This conversion has two benefits: 1) It makes struct timer_list smaller 2) Many callers hand in a pointer to the timer or to the structure containing the timer, which happens via type casting both at setup and in the callback. This change gets rid of the typecasts. Once the conversion is complete, which is planned for 4.15, the old setup function and the intermediate typecast in the new setup function go away along with the data field in struct timer_list. Merging this now into mainline allows a smooth queueing of the actual conversion in the affected maintainer trees without creating dependencies" * 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: um/time: Fixup namespace collision timer: Prepare to change timer callback argument type
2017-10-01Merge branch 'smp-urgent-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull smp/hotplug fixes from Thomas Gleixner: "This addresses the fallout of the new lockdep mechanism which covers completions in the CPU hotplug code. The lockdep splats are false positives, but there is no way to annotate that reliably. The solution is to split the completions for CPU up and down, which requires some reshuffling of the failure rollback handling as well" * 'smp-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: smp/hotplug: Hotplug state fail injection smp/hotplug: Differentiate the AP completion between up and down smp/hotplug: Differentiate the AP-work lockdep class between up and down smp/hotplug: Callback vs state-machine consistency smp/hotplug: Rewrite AP state machine core smp/hotplug: Allow external multi-instance rollback smp/hotplug: Add state diagram
2017-10-01Merge branch 'sched-urgent-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull scheduler fixes from Thomas Gleixner: "The scheduler pull request comes with the following updates: - Prevent a divide by zero issue by validating the input value of sysctl_sched_time_avg - Make task state printing consistent all over the place and have explicit state characters for IDLE and PARKED so they wont be displayed as 'D' state which confuses tools" * 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: sched/sysctl: Check user input value of sysctl_sched_time_avg sched/debug: Add explicit TASK_PARKED printing sched/debug: Ignore TASK_IDLE for SysRq-W sched/debug: Add explicit TASK_IDLE printing sched/tracing: Use common task-state helpers sched/tracing: Fix trace_sched_switch task-state printing sched/debug: Remove unused variable sched/debug: Convert TASK_state to hex sched/debug: Implement consistent task-state printing
2017-09-29Merge tag 'pci-v4.14-fixes-3' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci Pull PCI fixes from Bjorn Helgaas: - fix CONFIG_PCI=n build error (introduced in v4.14-rc1) (Geert Uytterhoeven) - fix a race in sysfs driver_override store/show (Nicolai Stange) * tag 'pci-v4.14-fixes-3' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci: PCI: Fix race condition with driver_override PCI: Add dummy pci_acs_enabled() for CONFIG_PCI=n build
2017-09-29Merge tag 'iommu-fixes-v4.14-rc2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu Pull IOMMU fixes from Joerg Roedel: - A comment fix for 'struct iommu_ops' - Format string fixes for AMD IOMMU, unfortunatly I missed that during review. - Limit mediatek physical addresses to 32 bit for v7s to fix a warning triggered in io-page-table code. - Fix dma-sync in io-pgtable-arm-v7s code * tag 'iommu-fixes-v4.14-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu: iommu: Fix comment for iommu_ops.map_sg iommu/amd: pr_err() strings should end with newlines iommu/mediatek: Limit the physical address in 32bit for v7s iommu/io-pgtable-arm-v7s: Need dma-sync while there is no QUIRK_NO_DMA
2017-09-29Merge branch 'fixes-v4.14-rc3' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security Pull keys fixes from James Morris: "Notable here is a rewrite of big_key crypto by Jason Donenfeld to address some issues in the original code. From Jason's commit log: "This started out as just replacing the use of crypto/rng with get_random_bytes_wait, so that we wouldn't use bad randomness at boot time. But, upon looking further, it appears that there were even deeper underlying cryptographic problems, and that this seems to have been committed with very little crypto review. So, I rewrote the whole thing, trying to keep to the conventions introduced by the previous author, to fix these cryptographic flaws." There has been positive review of the new code by Eric Biggers and Herbert Xu, and it passes basic testing via the keyutils test suite. Eric also manually tested it. Generally speaking, we likely need to improve the amount of crypto review for kernel crypto users including keys (I'll post a note separately to ksummit-discuss)" * 'fixes-v4.14-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security: security/keys: rewrite all of big_key crypto security/keys: properly zero out sensitive key material in big_key KEYS: use kmemdup() in request_key_auth_new() KEYS: restrict /proc/keys by credentials at open time KEYS: reset parent each time before searching key_user_tree KEYS: prevent KEYCTL_READ on negative key KEYS: prevent creating a different user's keyrings KEYS: fix writing past end of user-supplied buffer in keyring_read() KEYS: fix key refcount leak in keyctl_read_key() KEYS: fix key refcount leak in keyctl_assume_authority() KEYS: don't revoke uninstantiated key in request_key_auth_new() KEYS: fix cred refcount leak in request_key_auth_new()
2017-09-29sched/debug: Add explicit TASK_PARKED printingPeter Zijlstra
Currently TASK_PARKED is masqueraded as TASK_INTERRUPTIBLE, give it its own print state because it will not in fact get woken by regular wakeups and is a long-term state. This requires moving TASK_PARKED into the TASK_REPORT mask, and since that latter needs to be a contiguous bitmask, we need to shuffle the bits around a bit. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-09-29sched/debug: Add explicit TASK_IDLE printingPeter Zijlstra
Markus reported that kthreads that idle using TASK_IDLE instead of TASK_INTERRUPTIBLE are reported in as TASK_UNINTERRUPTIBLE and things like htop mark those red. This is undesirable, so add an explicit state for TASK_IDLE. Reported-by: Markus Trippelsdorf <markus@trippelsdorf.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-09-29sched/tracing: Use common task-state helpersPeter Zijlstra
Remove yet another task-state char instance. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-09-29sched/tracing: Fix trace_sched_switch task-state printingPeter Zijlstra
Convert trace_sched_switch to use the common task-state helpers and fix the "X" and "Z" order, possibly they ended up in the wrong order because TASK_REPORT has them in the wrong order too. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-09-29sched/debug: Convert TASK_state to hexPeter Zijlstra
Bit patterns are easier in hex. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-09-29sched/debug: Implement consistent task-state printingPeter Zijlstra
Currently get_task_state() and task_state_to_char() report different states, create a number of common helpers and unify the reported state space. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-09-28Merge tag 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma Pull rdma fixes from Doug Ledford: "Second -rc update for 4.14. Both Mellanox and Intel had a series of -rc fixes that landed this week. The Mellanox bunch is spread throughout the stack and not just in their driver, where as the Intel bunch was mostly in the hfi1 driver. And, several of the fixes in the hfi1 driver were more than just simple 5 line fixes. As a result, the hfi1 driver fixes has a sizable LOC count. Everything else is as one would expect in an RC cycle in terms of LOC count. One item that might jump out and make you think "That's not an rc item" is the fix that corrects a typo. But, that change fixes a typo in a user visible API that was just added in this merge window, so if we fix it now, we can fix it. If we don't, the typo is in the API forever. Another that might not appear to be a fix at first glance is the Simplify mlx5_ib_cont_pages patch, but the simplification allows them to fix a bug in the existing function whenever the length of an SGE exceeded page size. We also had to revert one patch from the merge window that was wrong. Summary: - a few core fixes - a few ipoib fixes - a few mlx5 fixes - a 7-patch hfi1 related series" * tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma: IB/hfi1: Unsuccessful PCIe caps tuning should not fail driver load IB/hfi1: On error, fix use after free during user context setup Revert "IB/ipoib: Update broadcast object if PKey value was changed in index 0" IB/hfi1: Return correct value in general interrupt handler IB/hfi1: Check eeprom config partition validity IB/hfi1: Only reset QSFP after link up and turn off AOC TX IB/hfi1: Turn off AOC TX after offline substates IB/mlx5: Fix NULL deference on mlx5_ib_update_xlt failure IB/mlx5: Simplify mlx5_ib_cont_pages IB/ipoib: Fix inconsistency with free_netdev and free_rdma_netdev IB/ipoib: Fix sysfs Pkey create<->remove possible deadlock IB: Correct MR length field to be 64-bit IB/core: Fix qp_sec use after free access IB/core: Fix typo in the name of the tag-matching cap struct
2017-09-28timer: Prepare to change timer callback argument typeKees Cook
Modern kernel callback systems pass the structure associated with a given callback to the callback function. The timer callback remains one of the legacy cases where an arbitrary unsigned long argument continues to be passed as the callback argument. This has several problems: - This bloats the timer_list structure with a normally redundant .data field. - No type checking is being performed, forcing callbacks to do explicit type casts of the unsigned long argument into the object that was passed, rather than using container_of(), as done in most of the other callback infrastructure. - Neighboring buffer overflows can overwrite both the .function and the .data field, providing attackers with a way to elevate from a buffer overflow into a simplistic ROP-like mechanism that allows calling arbitrary functions with a controlled first argument. - For future Control Flow Integrity work, this creates a unique function prototype for timer callbacks, instead of allowing them to continue to be clustered with other void functions that take a single unsigned long argument. This adds a new timer initialization API, which will ultimately replace the existing setup_timer(), setup_{deferrable,pinned,etc}_timer() family, named timer_setup() (to mirror hrtimer_setup(), making instances of its use much easier to grep for). In order to support the migration of existing timers into the new callback arguments, timer_setup() casts its arguments to the existing legacy types, and explicitly passes the timer pointer as the legacy data argument. Once all setup_*timer() callers have been replaced with timer_setup(), the casts can be removed, and the data argument can be dropped with the timer expiration code changed to just pass the timer to the callback directly. Since the regular pattern of using container_of() during local variable declaration repeats the need for the variable type declaration to be included, this adds a helper modeled after other from_*() helpers that wrap container_of(), named from_timer(). This helper uses typeof(*variable), removing the type redundancy and minimizing the need for line wraps in forthcoming conversions from "unsigned data long" to "struct timer_list *" in the timer callbacks: -void callback(unsigned long data) +void callback(struct timer_list *t) { - struct some_data_structure *local = (struct some_data_structure *)data; + struct some_data_structure *local = from_timer(local, t, timer); Finally, in order to support the handful of timer users that perform open-coded assignments of the .function (and .data) fields, provide cast macros (TIMER_FUNC_TYPE and TIMER_DATA_TYPE) that can be used temporarily. Once conversion has been completed, these can be globally trivially removed. Signed-off-by: Kees Cook <keescook@chromium.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lkml.kernel.org/r/20170928133817.GA113410@beast
2017-09-28Merge commit 'keys-fixes-20170927' into fixes-v4.14-rc3James Morris
From David Howells: "There are two sets of patches here: (1) A bunch of core keyrings bug fixes from Eric Biggers. (2) Fixing big_key to use safe crypto from Jason A. Donenfeld."
2017-09-27iommu: Fix comment for iommu_ops.map_sgJean-Philippe Brucker
The definition of map_sg was split during a recent addition to iommu_ops. Put it back together. Fixes: add02cfdc9bc ("iommu: Introduce Interface for IOMMU TLB Flushing") Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com> Signed-off-by: Joerg Roedel <jroedel@suse.de>
2017-09-25Merge branch 'for-linus' of git://git.kernel.dk/linux-blockLinus Torvalds
Pull block fixes from Jens Axboe: - Two sets of NVMe pull requests from Christoph: - Fixes for the Fibre Channel host/target to fix spec compliance - Allow a zero keep alive timeout - Make the debug printk for broken SGLs work better - Fix queue zeroing during initialization - Set of RDMA and FC fixes - Target div-by-zero fix - bsg double-free fix. - ndb unknown ioctl fix from Josef. - Buffered vs O_DIRECT page cache inconsistency fix. Has been floating around for a long time, well reviewed. From Lukas. - brd overflow fix from Mikulas. - Fix for a loop regression in this merge window, where using a union for two members of the loop_cmd turned out to be a really bad idea. From Omar. - Fix for an iostat regression fix in this series, using the wrong API to get at the block queue. From Shaohua. - Fix for a potential blktrace delection deadlock. From Waiman. * 'for-linus' of git://git.kernel.dk/linux-block: (30 commits) nvme-fcloop: fix port deletes and callbacks nvmet-fc: sync header templates with comments nvmet-fc: ensure target queue id within range. nvmet-fc: on port remove call put outside lock nvme-rdma: don't fully stop the controller in error recovery nvme-rdma: give up reconnect if state change fails nvme-core: Use nvme_wq to queue async events and fw activation nvme: fix sqhd reference when admin queue connect fails block: fix a crash caused by wrong API fs: Fix page cache inconsistency when mixing buffered and AIO DIO nvmet: implement valid sqhd values in completions nvme-fabrics: Allow 0 as KATO value nvme: allow timed-out ios to retry nvme: stop aer posting if controller state not live nvme-pci: Print invalid SGL only once nvme-pci: initialize queue memory before interrupts nvmet-fc: fix failing max io queue connections nvme-fc: use transport-specific sgl format nvme: add transport SGL definitions nvme.h: remove FC transport-specific error values ...
2017-09-25smp/hotplug: Hotplug state fail injectionPeter Zijlstra
Add a sysfs file to one-time fail a specific state. This can be used to test the state rollback code paths. Something like this (hotplug-up.sh): #!/bin/bash echo 0 > /debug/sched_debug echo 1 > /debug/tracing/events/cpuhp/enable ALL_STATES=`cat /sys/devices/system/cpu/hotplug/states | cut -d':' -f1` STATES=${1:-$ALL_STATES} for state in $STATES do echo 0 > /sys/devices/system/cpu/cpu1/online echo 0 > /debug/tracing/trace echo Fail state: $state echo $state > /sys/devices/system/cpu/cpu1/hotplug/fail cat /sys/devices/system/cpu/cpu1/hotplug/fail echo 1 > /sys/devices/system/cpu/cpu1/online cat /debug/tracing/trace > hotfail-${state}.trace sleep 1 done Can be used to test for all possible rollback (barring multi-instance) scenarios on CPU-up, CPU-down is a trivial modification of the above. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: bigeasy@linutronix.de Cc: efault@gmx.de Cc: rostedt@goodmis.org Cc: max.byungchul.park@gmail.com Link: https://lkml.kernel.org/r/20170920170546.972581715@infradead.org
2017-09-25smp/hotplug: Add state diagramPeter Zijlstra
Add a state diagram to clarify when which states are ran where. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: bigeasy@linutronix.de Cc: efault@gmx.de Cc: rostedt@goodmis.org Cc: max.byungchul.park@gmail.com Link: https://lkml.kernel.org/r/20170920170546.661598270@infradead.org
2017-09-25nvmet-fc: sync header templates with commentsJames Smart
Comments were incorrect: - defer_rcv was in host port template. moved to target port template - Added Mandatory statements for target port template items Signed-off-by: James Smart <james.smart@broadcom.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2017-09-25PCI: Add dummy pci_acs_enabled() for CONFIG_PCI=n buildGeert Uytterhoeven
If CONFIG_PCI=n and gcc (e.g. 4.1.2) decides not to inline get_pci_function_alias_group(), the build fails with: drivers/iommu/iommu.o: In function `get_pci_function_alias_group': iommu.c:(.text+0xfdc): undefined reference to `pci_acs_enabled' Due to the various dummies for PCI calls in the CONFIG_PCI=n case, pci_acs_enabled() never called, but not all versions of gcc are smart enough to realize that. While explicitly marking get_pci_function_alias_group() inline would fix the build, this would inflate the code for the CONFIG_PCI=y case, as get_pci_function_alias_group() is a not-so-small function called from two places. Hence fix the issue by introducing a dummy for pci_acs_enabled() instead. Fixes: 0ae349a0f33f ("iommu/qcom: Add qcom_iommu") Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Alex Williamson <alex.williamson@redhat.com>
2017-09-25IB: Correct MR length field to be 64-bitParav Pandit
The ib_mr->length represents the length of the MR in bytes as per the IBTA spec 1.3 section 11.2.10.3 (REGISTER PHYSICAL MEMORY REGION). Currently ib_mr->length field is defined as only 32-bits field. This might result into truncation and failed WRs of consumers who registers more than 4GB bytes memory regions and whose WRs accessing such MRs. This patch makes the length 64-bit to avoid such truncation. Cc: Sagi Grimberg <sagi@grimberg.me> Cc: Chuck Lever <chuck.lever@oracle.com> Cc: Faisal Latif <faisal.latif@intel.com> Fixes: 4c67e2bfc8b7 ("IB/core: Introduce new fast registration API") Signed-off-by: Ilya Lesokhin <ilyal@mellanox.com> Signed-off-by: Parav Pandit <parav@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-09-25IB/core: Fix typo in the name of the tag-matching cap structLeon Romanovsky
The tag matching functionality is implemented by mlx5 driver by extending XRQ, however this internal kernel information was exposed to user space applications with *xrq* name instead of *tm*. This patch renames *xrq* to *tm* to handle that. Fixes: 8d50505ada72 ("IB/uverbs: Expose XRQ capabilities") Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Reviewed-by: Yishai Hadas <yishaih@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-09-25nvme: add transport SGL definitionsJames Smart
Add transport SGL defintions from NVMe TP 4008, required for the final NVMe-FC standard. Signed-off-by: James Smart <james.smart@broadcom.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2017-09-25nvme.h: remove FC transport-specific error valuesJames Smart
The NVM express group recinded the reserved range for the transport. Remove the FC-centric values that had been defined. Signed-off-by: James Smart <james.smart@broadcom.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2017-09-25blktrace: Fix potential deadlock between delete & sysfs opsWaiman Long
The lockdep code had reported the following unsafe locking scenario: CPU0 CPU1 ---- ---- lock(s_active#228); lock(&bdev->bd_mutex/1); lock(s_active#228); lock(&bdev->bd_mutex); *** DEADLOCK *** The deadlock may happen when one task (CPU1) is trying to delete a partition in a block device and another task (CPU0) is accessing tracing sysfs file (e.g. /sys/block/dm-1/trace/act_mask) in that partition. The s_active isn't an actual lock. It is a reference count (kn->count) on the sysfs (kernfs) file. Removal of a sysfs file, however, require a wait until all the references are gone. The reference count is treated like a rwsem using lockdep instrumentation code. The fact that a thread is in the sysfs callback method or in the ioctl call means there is a reference to the opended sysfs or device file. That should prevent the underlying block structure from being removed. Instead of using bd_mutex in the block_device structure, a new blk_trace_mutex is now added to the request_queue structure to protect access to the blk_trace structure. Suggested-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Waiman Long <longman@redhat.com> Acked-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Fix typo in patch subject line, and prune a comment detailing how the code used to work. Signed-off-by: Jens Axboe <axboe@kernel.dk>
2017-09-25KEYS: prevent creating a different user's keyringsEric Biggers
It was possible for an unprivileged user to create the user and user session keyrings for another user. For example: sudo -u '#3000' sh -c 'keyctl add keyring _uid.4000 "" @u keyctl add keyring _uid_ses.4000 "" @u sleep 15' & sleep 1 sudo -u '#4000' keyctl describe @u sudo -u '#4000' keyctl describe @us This is problematic because these "fake" keyrings won't have the right permissions. In particular, the user who created them first will own them and will have full access to them via the possessor permissions, which can be used to compromise the security of a user's keys: -4: alswrv-----v------------ 3000 0 keyring: _uid.4000 -5: alswrv-----v------------ 3000 0 keyring: _uid_ses.4000 Fix it by marking user and user session keyrings with a flag KEY_FLAG_UID_KEYRING. Then, when searching for a user or user session keyring by name, skip all keyrings that don't have the flag set. Fixes: 69664cf16af4 ("keys: don't generate user and user session keyrings unless they're accessed") Cc: <stable@vger.kernel.org> [v2.6.26+] Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: David Howells <dhowells@redhat.com>
2017-09-24Merge tag 'devicetree-fixes-for-4.14' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux Pull DeviceTree fixes from Rob Herring: - fix build for !OF providing empty of_find_device_by_node - fix Abracon vendor prefix - sync dtx_diff include paths (again) - a stm32h7 clock binding doc fix * tag 'devicetree-fixes-for-4.14' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux: dt-bindings: clk: stm32h7: fix clock-cell size scripts/dtc: dtx_diff - 2nd update of include dts paths to match build dt-bindings: fix vendor prefix for Abracon of: provide inline helper for of_find_device_by_node
2017-09-24Merge branch 'irq-urgent-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull irq fixes from Ingo Molnar: "Three irqchip driver fixes, and an affinity mask helper function bug fix affecting x86" * 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: Revert "genirq: Restrict effective affinity to interrupts actually using it" irqchip.mips-gic: Fix shared interrupt mask writes irqchip/gic-v4: Fix building with ancient gcc irqchip/gic-v3: Iterate over possible CPUs by for_each_possible_cpu()
2017-09-24Merge branch 'core-urgent-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull address-limit checking fixes from Ingo Molnar: "This fixes a number of bugs in the address-limit (USER_DS) checks that got introduced in the merge window, (mostly) affecting the ARM and ARM64 platforms" * 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: arm64/syscalls: Move address limit check in loop arm/syscalls: Optimize address limit check Revert "arm/syscalls: Check address limit on user-mode return" syscalls: Use CHECK_DATA_CORRUPTION for addr_limit_user_check
2017-09-23Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netLinus Torvalds
Pull networking fixes from David Miller: 1) Fix NAPI poll list corruption in enic driver, from Christian Lamparter. 2) Fix route use after free, from Eric Dumazet. 3) Fix regression in reuseaddr handling, from Josef Bacik. 4) Assert the size of control messages in compat handling since we copy it in from userspace twice. From Meng Xu. 5) SMC layer bug fixes (missing RCU locking, bad refcounting, etc.) from Ursula Braun. 6) Fix races in AF_PACKET fanout handling, from Willem de Bruijn. 7) Don't use ARRAY_SIZE on spinlock array which might have zero entries, from Geert Uytterhoeven. 8) Fix miscomputation of checksum in ipv6 udp code, from Subash Abhinov Kasiviswanathan. 9) Push the ipv6 header properly in ipv6 GRE tunnel driver, from Xin Long. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (75 commits) inet: fix improper empty comparison net: use inet6_rcv_saddr to compare sockets net: set tb->fast_sk_family net: orphan frags on stand-alone ptype in dev_queue_xmit_nit MAINTAINERS: update git tree locations for ieee802154 subsystem net: prevent dst uses after free net: phy: Fix truncation of large IRQ numbers in phy_attached_print() net/smc: no close wait in case of process shut down net/smc: introduce a delay net/smc: terminate link group if out-of-sync is received net/smc: longer delay for client link group removal net/smc: adapt send request completion notification net/smc: adjust net_device refcount net/smc: take RCU read lock for routing cache lookup net/smc: add receive timeout check net/smc: add missing dev_put net: stmmac: Cocci spatch "of_table" lan78xx: Use default values loaded from EEPROM/OTP after reset lan78xx: Allow EEPROM write for less than MAX_EEPROM_SIZE lan78xx: Fix for eeprom read/write when device auto suspend ...
2017-09-22Merge tag 'acpi-4.14-rc2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull ACPI fixes from Rafael Wysocki: "These fix the initialization of resources in the ACPI WDAT watchdog driver, a recent regression in the ACPI device properties handling, a recent change in behavior causing the ACPI_HANDLE() macro to only work for GPL code and create a MAINTAINERS entry for ACPI PMIC drivers in order to specify the official reviewers for that code. Specifics: - Fix the initialization of resources in the ACPI WDAT watchdog driver that uses unititialized memory which causes compiler warnings to be triggered (Arnd Bergmann). - Fix a recent regression in the ACPI device properties handling that causes some device properties data to be skipped during enumeration (Sakari Ailus). - Fix a recent change in behavior that caused the ACPI_HANDLE() macro to stop working for non-GPL code which is a problem for the NVidia binary graphics driver, for example (John Hubbard). - Add a MAINTAINERS entry for the ACPI PMIC drivers to specify the official reviewers for that code (Rafael Wysocki)" * tag 'acpi-4.14-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: ACPI: properties: Return _DSD hierarchical extension (data) sub-nodes correctly ACPI / bus: Make ACPI_HANDLE() work for non-GPL code again ACPI / watchdog: properly initialize resources ACPI / PMIC: Add code reviewers to MAINTAINERS
2017-09-22Merge tag 'pm-4.14-rc2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull power management fixes from Rafael Wysocki: "These fix a cpufreq regression introduced by recent changes related to the generic DT driver, an initialization time memory leak in cpuidle on ARM, a PM core bug that may cause system suspend/resume to fail on some systems, a request type validation issue in the PM QoS framework and two documentation-related issues. Specifics: - Fix a regression in cpufreq on systems using DT as the source of CPU configuration information where two different code paths attempt to create the cpufreq-dt device object (there can be only one) and fix up the "compatible" matching for some TI platforms on top of that (Viresh Kumar, Dave Gerlach). - Fix an initialization time memory leak in cpuidle on ARM which occurs if the cpuidle driver initialization fails (Stefan Wahren). - Fix a PM core function that checks whether or not there are any system suspend/resume callbacks for a device, but forgets to check legacy callbacks which then may be skipped incorrectly and the system may crash and/or the device may become unusable after a suspend-resume cycle (Rafael Wysocki). - Fix request type validation for latency tolerance PM QoS requests which may lead to unexpected behavior (Jan Schönherr). - Fix a broken link to PM documentation from a header file and a typo in a PM document (Geert Uytterhoeven, Rafael Wysocki)" * tag 'pm-4.14-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: cpufreq: ti-cpufreq: Support additional am43xx platforms ARM: cpuidle: Avoid memleak if init fail cpufreq: dt-platdev: Add some missing platforms to the blacklist PM: core: Fix device_pm_check_callbacks() PM: docs: Drop an excess character from devices.rst PM / QoS: Use the correct variable to check the QoS request type driver core: Fix link to device power management documentation
2017-09-22Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input Pull input fixes from Dmitry Torokhov: - fixes for two long standing issues (lock up and a crash) in force feedback handling in uinput driver - tweak to firmware update timing in Elan I2C touchpad driver. * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input: Input: elan_i2c - extend Flash-Write delay Input: uinput - avoid crash when sending FF request to device going away Input: uinput - avoid FF flush when destroying device
2017-09-22Merge tag 'seccomp-v4.14-rc2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux Pull seccomp updates from Kees Cook: "Major additions: - sysctl and seccomp operation to discover available actions (tyhicks) - new per-filter configurable logging infrastructure and sysctl (tyhicks) - SECCOMP_RET_LOG to log allowed syscalls (tyhicks) - SECCOMP_RET_KILL_PROCESS as the new strictest possible action - self-tests for new behaviors" [ This is the seccomp part of the security pull request during the merge window that was nixed due to unrelated problems - Linus ] * tag 'seccomp-v4.14-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux: samples: Unrename SECCOMP_RET_KILL selftests/seccomp: Test thread vs process killing seccomp: Implement SECCOMP_RET_KILL_PROCESS action seccomp: Introduce SECCOMP_RET_KILL_PROCESS seccomp: Rename SECCOMP_RET_KILL to SECCOMP_RET_KILL_THREAD seccomp: Action to log before allowing seccomp: Filter flag to log all actions except SECCOMP_RET_ALLOW seccomp: Selftest for detection of filter flag support seccomp: Sysctl to configure actions that are allowed to be logged seccomp: Operation for checking if an action is available seccomp: Sysctl to display available actions seccomp: Provide matching filter for introspection selftests/seccomp: Refactor RET_ERRNO tests selftests/seccomp: Add simple seccomp overhead benchmark selftests/seccomp: Add tests for basic ptrace actions
2017-09-22Merge tag 'for-linus-4.14b-rc2-tag' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip Pull xen fixes from Juergen Gross: "A fix for a missing __init annotation and two cleanup patches" * tag 'for-linus-4.14b-rc2-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip: xen, arm64: drop dummy lookup_address() xen: don't compile pv-specific parts if XEN_PV isn't configured xen: x86: mark xen_find_pt_base as __init
2017-09-21net: prevent dst uses after freeEric Dumazet
In linux-4.13, Wei worked hard to convert dst to a traditional refcounted model, removing GC. We now want to make sure a dst refcount can not transition from 0 back to 1. The problem here is that input path attached a not refcounted dst to an skb. Then later, because packet is forwarded and hits skb_dst_force() before exiting RCU section, we might try to take a refcount on one dst that is about to be freed, if another cpu saw 1 -> 0 transition in dst_release() and queued the dst for freeing after one RCU grace period. Lets unify skb_dst_force() and skb_dst_force_safe(), since we should always perform the complete check against dst refcount, and not assume it is not zero. Bugzilla : https://bugzilla.kernel.org/show_bug.cgi?id=197005 [ 989.919496] skb_dst_force+0x32/0x34 [ 989.919498] __dev_queue_xmit+0x1ad/0x482 [ 989.919501] ? eth_header+0x28/0xc6 [ 989.919502] dev_queue_xmit+0xb/0xd [ 989.919504] neigh_connected_output+0x9b/0xb4 [ 989.919507] ip_finish_output2+0x234/0x294 [ 989.919509] ? ipt_do_table+0x369/0x388 [ 989.919510] ip_finish_output+0x12c/0x13f [ 989.919512] ip_output+0x53/0x87 [ 989.919513] ip_forward_finish+0x53/0x5a [ 989.919515] ip_forward+0x2cb/0x3e6 [ 989.919516] ? pskb_trim_rcsum.part.9+0x4b/0x4b [ 989.919518] ip_rcv_finish+0x2e2/0x321 [ 989.919519] ip_rcv+0x26f/0x2eb [ 989.919522] ? vlan_do_receive+0x4f/0x289 [ 989.919523] __netif_receive_skb_core+0x467/0x50b [ 989.919526] ? tcp_gro_receive+0x239/0x239 [ 989.919529] ? inet_gro_receive+0x226/0x238 [ 989.919530] __netif_receive_skb+0x4d/0x5f [ 989.919532] netif_receive_skb_internal+0x5c/0xaf [ 989.919533] napi_gro_receive+0x45/0x81 [ 989.919536] ixgbe_poll+0xc8a/0xf09 [ 989.919539] ? kmem_cache_free_bulk+0x1b6/0x1f7 [ 989.919540] net_rx_action+0xf4/0x266 [ 989.919543] __do_softirq+0xa8/0x19d [ 989.919545] irq_exit+0x5d/0x6b [ 989.919546] do_IRQ+0x9c/0xb5 [ 989.919548] common_interrupt+0x93/0x93 [ 989.919548] </IRQ> Similarly dst_clone() can use dst_hold() helper to have additional debugging, as a follow up to commit 44ebe79149ff ("net: add debug atomic_inc_not_zero() in dst_hold()") In net-next we will convert dst atomic_t to refcount_t for peace of mind. Fixes: a4c2fd7f7891 ("net: remove DST_NOCACHE flag") Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Wei Wang <weiwan@google.com> Reported-by: Paweł Staszewski <pstaszewski@itcare.pl> Bisected-by: Paweł Staszewski <pstaszewski@itcare.pl> Acked-by: Wei Wang <weiwan@google.com> Acked-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-21Input: uinput - avoid FF flush when destroying deviceDmitry Torokhov
Normally, when input device supporting force feedback effects is being destroyed, we try to "flush" currently playing effects, so that the physical device does not continue vibrating (or executing other effects). Unfortunately this does not work well for uinput as flushing of the effects deadlocks with the destroy action: - if device is being destroyed because the file descriptor is being closed, then there is noone to even service FF requests; - if device is being destroyed because userspace sent UI_DEV_DESTROY, while theoretically it could be possible to service FF requests, userspace is unlikely to do so (they'd need to make sure FF handling happens on a separate thread) even if kernel solves the issue with FF ioctls deadlocking with UI_DEV_DESTROY ioctl on udev->mutex. To avoid lockups like the one below, let's install a custom input device flush handler, and avoid trying to flush force feedback effects when we destroying the device, and instead rely on uinput to shut off the device properly. NMI watchdog: Watchdog detected hard LOCKUP on cpu 3 ... <<EOE>> [<ffffffff817a0307>] _raw_spin_lock_irqsave+0x37/0x40 [<ffffffff810e633d>] complete+0x1d/0x50 [<ffffffffa00ba08c>] uinput_request_done+0x3c/0x40 [uinput] [<ffffffffa00ba587>] uinput_request_submit.part.7+0x47/0xb0 [uinput] [<ffffffffa00bb62b>] uinput_dev_erase_effect+0x5b/0x76 [uinput] [<ffffffff815d91ad>] erase_effect+0xad/0xf0 [<ffffffff815d929d>] flush_effects+0x4d/0x90 [<ffffffff815d4cc0>] input_flush_device+0x40/0x60 [<ffffffff815daf1c>] evdev_cleanup+0xac/0xc0 [<ffffffff815daf5b>] evdev_disconnect+0x2b/0x60 [<ffffffff815d74ac>] __input_unregister_device+0xac/0x150 [<ffffffff815d75f7>] input_unregister_device+0x47/0x70 [<ffffffffa00bac45>] uinput_destroy_device+0xb5/0xc0 [uinput] [<ffffffffa00bb2de>] uinput_ioctl_handler.isra.9+0x65e/0x740 [uinput] [<ffffffff811231ab>] ? do_futex+0x12b/0xad0 [<ffffffffa00bb3f8>] uinput_ioctl+0x18/0x20 [uinput] [<ffffffff81241248>] do_vfs_ioctl+0x298/0x480 [<ffffffff81337553>] ? security_file_ioctl+0x43/0x60 [<ffffffff812414a9>] SyS_ioctl+0x79/0x90 [<ffffffff817a04ee>] entry_SYSCALL_64_fastpath+0x12/0x71 Reported-by: Rodrigo Rivas Costa <rodrigorivascosta@gmail.com> Reported-by: Clément VUCHENER <clement.vuchener@gmail.com> Fixes: https://bugzilla.kernel.org/show_bug.cgi?id=193741 Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>