summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2020-03-28PCI/ASPM: Reduce severity of common clock config messageChris Packham
When the UEFI/BIOS or bootloader has not initialised a PCIe device we would get the following message: kern.warning: pci 0000:00:01.0: ASPM: current common clock configuration is broken, reconfiguring "warning" and "broken" are slightly misleading. On an embedded system it is quite possible for the bootloader to avoid configuring PCIe devices if they are not needed. Downgrade the message to pci_info() and change "broken" to "inconsistent" since we fix up the inconsistency in the code immediately following the message (and emit an error if that fails). Link: https://lore.kernel.org/r/20200323035530.11569-1-chris.packham@alliedtelesis.co.nz Signed-off-by: Chris Packham <chris.packham@alliedtelesis.co.nz> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2020-03-28PCI/AER: Rationalize error status register clearingKuppuswamy Sathyanarayanan
The AER interfaces to clear error status registers were a confusing mess: - pci_cleanup_aer_uncorrect_error_status() cleared non-fatal errors from the Uncorrectable Error Status register. - pci_aer_clear_fatal_status() cleared fatal errors from the Uncorrectable Error Status register. - pci_cleanup_aer_error_status_regs() cleared the Root Error Status register (for Root Ports), the Uncorrectable Error Status register, and the Correctable Error Status register. Rename them to make them consistent: From To ---------------------------------------- ------------------------------- pci_cleanup_aer_uncorrect_error_status() pci_aer_clear_nonfatal_status() pci_aer_clear_fatal_status() pci_aer_clear_fatal_status() pci_cleanup_aer_error_status_regs() pci_aer_clear_status() Since pci_cleanup_aer_error_status_regs() (renamed to pci_aer_clear_status()) is only used within drivers/pci/, move the declaration from <linux/aer.h> to drivers/pci/pci.h. [bhelgaas: commit log, add renames] Link: https://lore.kernel.org/r/d1310a75dc3d28f7e8da4e99c45fbd3e60fe238e.1585000084.git.sathyanarayanan.kuppuswamy@linux.intel.com Signed-off-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2020-03-28PCI/DPC: Add Error Disconnect Recover (EDR) supportKuppuswamy Sathyanarayanan
Error Disconnect Recover (EDR) is a feature that allows ACPI firmware to notify OSPM that a device has been disconnected due to an error condition (ACPI v6.3, sec 5.6.6). OSPM advertises its support for EDR on PCI devices via _OSC (see [1], sec 4.5.1, table 4-4). The OSPM EDR notify handler should invalidate software state associated with disconnected devices and may attempt to recover them. OSPM communicates the status of recovery to the firmware via _OST (sec 6.3.5.2). For PCIe, firmware may use Downstream Port Containment (DPC) to support EDR. Per [1], sec 4.5.1, table 4-6, even if firmware has retained control of DPC, OSPM may read/write DPC control and status registers during the EDR notification processing window, i.e., from the time it receives an EDR notification until it clears the DPC Trigger Status. Note that per [1], sec 4.5.1 and 4.5.2.4, 1. If the OS supports EDR, it should advertise that to firmware by setting OSC_PCI_EDR_SUPPORT in _OSC Support. 2. If the OS sets OSC_PCI_EXPRESS_DPC_CONTROL in _OSC Control to request control of the DPC capability, it must also set OSC_PCI_EDR_SUPPORT in _OSC Support. Add an EDR notify handler to attempt recovery. [1] Downstream Port Containment Related Enhancements ECN, Jan 28, 2019, affecting PCI Firmware Specification, Rev. 3.2 https://members.pcisig.com/wg/PCI-SIG/document/12888 [bhelgaas: squash add/enable patches into one] Link: https://lore.kernel.org/r/90f91fe6d25c13f9d2255d2ce97ca15be307e1bb.1585000084.git.sathyanarayanan.kuppuswamy@linux.intel.com Signed-off-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net> Cc: Len Brown <lenb@kernel.org>
2020-03-28PCI/DPC: Expose dpc_process_error(), dpc_reset_link() for use by EDRKuppuswamy Sathyanarayanan
If firmware controls DPC, it is generally responsible for managing the DPC capability and events, and the OS should not access the DPC capability. However, if firmware controls DPC and both the OS and the platform support Error Disconnect Recover (EDR) notifications, the OS EDR notify handler is responsible for recovery, and the notify handler may read/write the DPC capability until it clears the DPC Trigger Status bit. See [1], sec 4.5.1, table 4-6. Expose some DPC error handling functions so they can be used by the EDR notify handler. [1] Downstream Port Containment Related Enhancements ECN, Jan 28, 2019, affecting PCI Firmware Specification, Rev. 3.2 https://members.pcisig.com/wg/PCI-SIG/document/12888 Link: https://lore.kernel.org/r/e9000bb15b3a4293e81d98bb29ead7c84a6393c9.1585000084.git.sathyanarayanan.kuppuswamy@linux.intel.com Signed-off-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2020-03-28PCI/AER: Add pci_aer_raw_clear_status() to unconditionally clear Error StatusKuppuswamy Sathyanarayanan
Per the SFI _OSC and DPC Updates ECN [1] implementation note flowchart, the OS seems to be expected to clear AER status even if it doesn't have ownership of the AER capability. Unlike the DPC capability, where a DPC ECN [2] specifies a window when the OS is allowed to access DPC registers even if it doesn't have ownership, there is no clear model for AER. Add pci_aer_raw_clear_status() to clear the AER error status registers unconditionally. This is intended for use only by the EDR path (see [2]). [1] System Firmware Intermediary (SFI) _OSC and DPC Updates ECN, Feb 24, 2020, affecting PCI Firmware Specification, Rev. 3.2 https://members.pcisig.com/wg/PCI-SIG/document/14076 [2] Downstream Port Containment Related Enhancements ECN, Jan 28, 2019, affecting PCI Firmware Specification, Rev. 3.2 https://members.pcisig.com/wg/PCI-SIG/document/12888 [bhelgaas: changelog] Link: https://lore.kernel.org/r/c19ad28f3633cce67448609e89a75635da0da07d.1585000084.git.sathyanarayanan.kuppuswamy@linux.intel.com Signed-off-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2020-03-28PCI/DPC: Cache DPC capabilities in pci_init_capabilities()Kuppuswamy Sathyanarayanan
Since Error Disconnect Recover needs to use DPC error handling routines even if the OS doesn't have control of DPC, move the initalization and caching of DPC capabilities from the DPC driver to pci_init_capabilities(). Link: https://lore.kernel.org/r/5888380657c8b9551675b5dbd48e370e4fd2703d.1585000084.git.sathyanarayanan.kuppuswamy@linux.intel.com Signed-off-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2020-03-28PCI/ERR: Return status of pcie_do_recovery()Kuppuswamy Sathyanarayanan
As per the DPC Enhancements ECN [1], sec 4.5.1, table 4-4, if the OS supports Error Disconnect Recover (EDR), it must invalidate the software state associated with child devices of the port without attempting to access the child device hardware. In addition, if the OS supports DPC, it must attempt to recover the child devices if the port implements the DPC Capability. If the OS continues operation, the OS must inform the firmware of the status of the recovery operation via the _OST method. Return the result of pcie_do_recovery() so we can report it to firmware via _OST. [1] Downstream Port Containment Related Enhancements ECN, Jan 28, 2019, affecting PCI Firmware Specification, Rev. 3.2 https://members.pcisig.com/wg/PCI-SIG/document/12888 Link: https://lore.kernel.org/r/eb60ec89448769349c6722954ffbf2de163155b5.1585000084.git.sathyanarayanan.kuppuswamy@linux.intel.com Signed-off-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2020-03-28PCI/ERR: Remove service dependency in pcie_do_recovery()Kuppuswamy Sathyanarayanan
Previously we passed the PCIe service type parameter to pcie_do_recovery(), where reset_link() looked up the underlying pci_port_service_driver and its .reset_link() function pointer. Instead of using this roundabout way, we can just pass the driver-specific .reset_link() callback function when calling pcie_do_recovery() function. This allows us to call pcie_do_recovery() from code that is not a PCIe port service driver, e.g., Error Disconnect Recover (EDR) support. Remove pcie_port_find_service() and pcie_port_service_driver.reset_link since they are now unused. Link: https://lore.kernel.org/r/60e02b87b526cdf2930400059d98704bf0a147d1.1585000084.git.sathyanarayanan.kuppuswamy@linux.intel.com Signed-off-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2020-03-28bpf: Fix build warning regarding missing prototypesJean-Philippe Menil
Fix build warnings when building net/bpf/test_run.o with W=1 due to missing prototype for bpf_fentry_test{1..6}. Instead of declaring prototypes, turn off warnings with __diag_{push,ignore,pop} as pointed out by Alexei. Signed-off-by: Jean-Philippe Menil <jpmenil@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20200327204713.28050-1-jpmenil@gmail.com
2020-03-28PCI/DPC: Move DPC data into struct pci_devBjorn Helgaas
We only need 25 bits of data for DPC, so I don't think it's worth the complexity of allocating and keeping track of the struct dpc_dev separately from the pci_dev. Move that data into the struct pci_dev. Link: https://lore.kernel.org/r/98323eaa18080adbe5bb30846862f09f8722d4b3.1585000084.git.sathyanarayanan.kuppuswamy@linux.intel.com Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2020-03-28PCI/ERR: Update error status after reset_link()Kuppuswamy Sathyanarayanan
Commit bdb5ac85777d ("PCI/ERR: Handle fatal error recovery") uses reset_link() to recover from fatal errors. But during fatal error recovery, if the initial value of error status is PCI_ERS_RESULT_DISCONNECT or PCI_ERS_RESULT_NO_AER_DRIVER then even after successful recovery (using reset_link()) pcie_do_recovery() will report the recovery result as failure. Update the status of error after reset_link(). You can reproduce this issue by triggering a SW DPC using "DPC Software Trigger" bit in "DPC Control Register". You should see recovery failed dmesg log as below: pcieport 0000:00:16.0: DPC: containment event, status:0x1f27 source:0x0000 pcieport 0000:00:16.0: DPC: software trigger detected pci 0000:04:00.0: AER: can't recover (no error_detected callback) pcieport 0000:00:16.0: AER: device recovery failed Fixes: bdb5ac85777d ("PCI/ERR: Handle fatal error recovery") Link: https://lore.kernel.org/r/a255fcb3a3fdebcd90f84e08b555f1786eb8eba2.1585000084.git.sathyanarayanan.kuppuswamy@linux.intel.com [bhelgaas: split pci_channel_io_frozen simplification to separate patch] Signed-off-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Acked-by: Keith Busch <keith.busch@intel.com> Cc: Ashok Raj <ashok.raj@intel.com>
2020-03-28PCI/ERR: Combine pci_channel_io_frozen casesKuppuswamy Sathyanarayanan
pcie_do_recovery() had two "if (state == pci_channel_io_frozen)" cases right after each other. Combine them to make this easier to read. No functional change intended. Link: https://lore.kernel.org/r/20200317170654.GA23125@infradead.org [bhelgaas: split from https://lore.kernel.org/r/a255fcb3a3fdebcd90f84e08b555f1786eb8eba2.1585000084.git.sathyanarayanan.kuppuswamy@linux.intel.com] Signed-off-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2020-03-28MIPS: ralink: mt7621: Fix soc_device introductionThomas Bogendoerfer
Depending on selected SMP config options soc_device didn't get initialised at all. With UP config vmlinux didn't link because of missing soc bus. Fixes: 71b9b5e0130d ("MIPS: ralink: mt7621: introduce 'soc_device' initialization") Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de> Tested-by: René van Dorst <opensource@vdorst.com>
2020-03-28Merge tag 'scsi-fixes' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi Pull SCSI fixes from James Bottomley: "Two small fixes: one in drivers (qla2xxx), and one in the core (sd) to try to cope with USB enclosures that silently change reported parameters" * tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: scsi: sd: Fix optimal I/O size for devices that change reported values scsi: qla2xxx: Fix I/Os being passed down when FC device is being deleted
2020-03-28libbpf, xsk: Init all ring members in xsk_umem__create and xsk_socket__createFletcher Dunn
Fix a sharp edge in xsk_umem__create and xsk_socket__create. Almost all of the members of the ring buffer structs are initialized, but the "cached_xxx" variables are not all initialized. The caller is required to zero them. This is needlessly dangerous. The results if you don't do it can be very bad. For example, they can cause xsk_prod_nb_free and xsk_cons_nb_avail to return values greater than the size of the queue. xsk_ring_cons__peek can return an index that does not refer to an item that has been queued. I have confirmed that without this change, my program misbehaves unless I memset the ring buffers to zero before calling the function. Afterwards, my program works without (or with) the memset. Signed-off-by: Fletcher Dunn <fletcherd@valvesoftware.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Magnus Karlsson <magnus.karlsson@intel.com> Link: https://lore.kernel.org/bpf/85f12913cde94b19bfcb598344701c38@valvesoftware.com
2020-03-28Merge tag 'nfs-rdma-for-5.7-1' of ↵Trond Myklebust
git://git.linux-nfs.org/projects/anna/linux-nfs NFSoRDMA Client Updates for Linux 5.7 New Features: - Allow one active connection and several zombie connections to prevent blocking if the remote server is unresponsive. Bugfixes and Cleanups: - Enhance MR-related trace points - Refactor connection set-up and disconnect functions - Make Protection Domains per-connection instead of per-transport - Merge struct rpcrdma_ia into rpcrdma_ep
2020-03-28NFS: Remove unused FLUSH_SYNC support in nfs_initiate_pgio()Trond Myklebust
If the FLUSH_SYNC flag is set, nfs_initiate_pgio() will currently wait for completion, and then return the status of the I/O operation. What we actually want to report in nfs_pageio_doio() is whether or not the RPC call was launched successfully, whereas actual I/O status is intended handled in the reply callbacks. Since FLUSH_SYNC is never set by any of the callers anyway, let's just remove that code altogether. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2020-03-28ACPI: CPPC: clean up acpi_get_psd_map()Liguang Zhang
In acpi_get_psd_map() variable all_cpu_data[] can't be NULL and variable match_cpc_ptr has been checked before, no need check again at the end of the funchtion. Some additional optimizations can be made on top of that. Signed-off-by: Liguang Zhang <zhangliguang@linux.alibaba.com> [ rjw: Subject & changelog ] Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2020-03-28fs/buffer: Make BH_Uptodate_Lock bit_spin_lock a regular spinlock_tThomas Gleixner
Bit spinlocks are problematic if PREEMPT_RT is enabled, because they disable preemption, which is undesired for latency reasons and breaks when regular spinlocks are taken within the bit_spinlock locked region because regular spinlocks are converted to 'sleeping spinlocks' on RT. PREEMPT_RT replaced the bit spinlocks with regular spinlocks to avoid this problem. The replacement was done conditionaly at compile time, but Christoph requested to do an unconditional conversion. Jan suggested to move the spinlock into a existing padding hole which avoids a size increase of struct buffer_head on production kernels. As a benefit the lock gains lockdep coverage. [ bigeasy: Remove the wrapper and use always spinlock_t and move it into the padding hole ] Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Jan Kara <jack@suse.cz> Cc: Christoph Hellwig <hch@infradead.org> Link: https://lkml.kernel.org/r/20191118132824.rclhrbujqh4b4g4d@linutronix.de
2020-03-28thermal/x86_pkg_temp: Make pkg_temp_lock a raw_spinlock_tClark Williams
The pkg_temp_lock spinlock is acquired in the thermal vector handler which is truly atomic context even on PREEMPT_RT kernels. The critical sections are tiny, so change it to a raw spinlock. Signed-off-by: Clark Williams <williams@redhat.com> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/20191008110021.2j44ayunal7fkb7i@linutronix.de
2020-03-28Documentation/locking/locktypes: Minor copy editor fixesRandy Dunlap
Minor editorial fixes: - remove 'enabled' from PREEMPT_RT enabled kernels for consistency - add some periods for consistency - add "'" for possessive CPU's - spell out interrupts [ tglx: Picked up Paul's suggestions ] Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Paul E. McKenney <paulmck@kernel.org> Link: https://lkml.kernel.org/r/ac615f36-0b44-408d-aeab-d76e4241add4@infradead.org
2020-03-28Documentation/locking/locktypes: Further clarifications and wordsmithingThomas Gleixner
The documentation of rw_semaphores is wrong as it claims that the non-owner reader release is not supported by RT. That's just history biased memory distortion. Split the 'Owner semantics' section up and add separate sections for semaphore and rw_semaphore to reflect reality. Aside of that the following updates are done: - Add pseudo code to document the spinlock state preserving mechanism on PREEMPT_RT - Wordsmith the bitspinlock and lock nesting sections Co-developed-by: Paul McKenney <paulmck@kernel.org> Signed-off-by: Paul McKenney <paulmck@kernel.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Link: https://lkml.kernel.org/r/87wo78y5yy.fsf@nanos.tec.linutronix.de
2020-03-28s390/mm: cleanup init_new_context() callbackAlexander Gordeev
The set of values asce_limit may be assigned with is TASK_SIZE_MAX, _REGION1_SIZE, _REGION2_SIZE and 0 as a special case if the callback was called from execve(). Do VM_BUG_ON() if asce_limit is something else. Save few CPU cycles by removing unnecessary asce_limit re-assignment in case of 3-level task and redundant PGD entry type reconstruction. Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com> Reviewed-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2020-03-28s390/mm: cleanup virtual memory constants usageAlexander Gordeev
Remove duplicate definitions and consolidate usage of virutal and address translation constants. Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com> Reviewed-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2020-03-28s390/mm: remove page table downgrade supportAlexander Gordeev
This update consolidates page table handling code. Because there are hardly any 31-bit binaries left we do not need to optimize for that. No extra efforts are needed to ensure that a compat task does not map anything above 2GB. The TASK_SIZE limit for 31-bit tasks is 2GB already and the generic code does check that a resulting map address would not surpass that limit. Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com> Reviewed-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2020-03-28x86/boot/compressed: Fix debug_puthex() parameter typeJoerg Roedel
In the CONFIG_X86_VERBOSE_BOOTUP=Y case, the debug_puthex() macro just turns into __puthex(), which takes 'unsigned long' as parameter. But in the CONFIG_X86_VERBOSE_BOOTUP=N case, it is a function which takes 'unsigned char *', causing compile warnings when the function is used. Fix the parameter type to get rid of the warnings. Signed-off-by: Joerg Roedel <jroedel@suse.de> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20200319091407.1481-11-joro@8bytes.org
2020-03-28Merge branch 'uaccess.futex' of ↵Thomas Gleixner
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs into locking/core Pull uaccess futex cleanups for Al Viro: Consolidate access_ok() usage and the futex uaccess function zoo.
2020-03-28Merge branch 'next.uaccess-2' of ↵Thomas Gleixner
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs into x86/cleanups Pull uaccess cleanups from Al Viro: Consolidate the user access areas and get rid of uaccess_try(), user_ex() and other warts.
2020-03-28m68knommu: Remove mm.h include from uaccess_no.hThomas Gleixner
In file included from include/linux/huge_mm.h:8, from include/linux/mm.h:567, from arch/m68k/include/asm/uaccess_no.h:8, from arch/m68k/include/asm/uaccess.h:3, from include/linux/uaccess.h:11, from include/linux/sched/task.h:11, from include/linux/sched/signal.h:9, from include/linux/rcuwait.h:6, from include/linux/percpu-rwsem.h:7, from kernel/locking/percpu-rwsem.c:6: include/linux/fs.h:1422:29: error: array type has incomplete element type 'struct percpu_rw_semaphore' 1422 | struct percpu_rw_semaphore rw_sem[SB_FREEZE_LEVELS]; Removing the include of linux/mm.h from the uaccess header solves the problem and various build tests of nommu configurations still work. Fixes: 80fbaf1c3f29 ("rcuwait: Add @state argument to rcuwait_wait_event()") Reported-by: kbuild test robot <lkp@intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Geert Uytterhoeven <geert@linux-m68k.org> Link: https://lkml.kernel.org/r/87fte1qzh0.fsf@nanos.tec.linutronix.de
2020-03-28cpu/hotplug: Ignore pm_wakeup_pending() for disable_nonboot_cpus()Thomas Gleixner
A recent change to freeze_secondary_cpus() which added an early abort if a wakeup is pending missed the fact that the function is also invoked for shutdown, reboot and kexec via disable_nonboot_cpus(). In case of disable_nonboot_cpus() the wakeup event needs to be ignored as the purpose is to terminate the currently running kernel. Add a 'suspend' argument which is only set when the freeze is in context of a suspend operation. If not set then an eventually pending wakeup event is ignored. Fixes: a66d955e910a ("cpu/hotplug: Abort disabling secondary CPUs if wakeup is pending") Reported-by: Boqun Feng <boqun.feng@gmail.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Pavankumar Kondeti <pkondeti@codeaurora.org> Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/874kuaxdiz.fsf@nanos.tec.linutronix.de
2020-03-28Revert "clocksource/drivers/timer-probe: Avoid creating dead devices"Thomas Gleixner
This reverts commit 4f41fe386a94639cd9a1831298d4f85db5662f1e. The change breaks systems on which the DT node of a device is used by multiple drivers. The proposed workaround to clear OF_POPULATED is just a band aid and this needs to be cleaned up at the root of the problem. Revert this for now. Reported-by: Ionela Voinescu <ionela.voinescu@arm.com> Reported-by: Jon Hunter <jonathanh@nvidia.com> Requested-by: Rob Herring <robh+dt@kernel.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Saravana Kannan <saravanak@google.com> Cc: Daniel Lezcano <daniel.lezcano@linaro.org> Link: https://lore.kernel.org/r/20200324175955.GA16972@arm.com
2020-03-28pcmcia: soc_common.h: Replace zero-length array with flexible-array memberGustavo A. R. Silva
The current codebase makes use of the zero-length array language extension to the C90 standard, but the preferred mechanism to declare variable-length types such as these ones is a flexible array member[1][2], introduced in C99: struct foo { int stuff; struct boo array[]; }; By making use of the mechanism above, we will get a compiler warning in case the flexible array does not occur last in the structure, which will help us prevent some kind of undefined behavior bugs from being inadvertently introduced[3] to the codebase from now on. Also, notice that, dynamic memory allocations won't be affected by this change: "Flexible array members have incomplete type, and so the sizeof operator may not be applied. As a quirk of the original implementation of zero-length arrays, sizeof evaluates to zero."[1] This issue was found with the help of Coccinelle. [1] https://gcc.gnu.org/onlinedocs/gcc/Zero-Length.html [2] https://github.com/KSPP/linux/issues/21 [3] commit 76497732932f ("cxgb3/l2t: Fix undefined behaviour") Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com> Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net>
2020-03-28pcmcia: cs_internal.h: Replace zero-length array with flexible-array memberGustavo A. R. Silva
The current codebase makes use of the zero-length array language extension to the C90 standard, but the preferred mechanism to declare variable-length types such as these ones is a flexible array member[1][2], introduced in C99: struct foo { int stuff; struct boo array[]; }; By making use of the mechanism above, we will get a compiler warning in case the flexible array does not occur last in the structure, which will help us prevent some kind of undefined behavior bugs from being inadvertently introduced[3] to the codebase from now on. Also, notice that, dynamic memory allocations won't be affected by this change: "Flexible array members have incomplete type, and so the sizeof operator may not be applied. As a quirk of the original implementation of zero-length arrays, sizeof evaluates to zero."[1] This issue was found with the help of Coccinelle. [1] https://gcc.gnu.org/onlinedocs/gcc/Zero-Length.html [2] https://github.com/KSPP/linux/issues/21 [3] commit 76497732932f ("cxgb3/l2t: Fix undefined behaviour") Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com> Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net>
2020-03-28MAINTAINERS: erofs: update my email addressGao Xiang
This email address will not be available in a few days. Update my own email address to xiang@kernel.org, which should be available all the time. Link: https://lore.kernel.org/r/20200328040036.117974-1-gaoxiang25@huawei.com Acked-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Gao Xiang <gaoxiang25@huawei.com>
2020-03-28i2c: pca-platform: Use platform_irq_get_optionalChris Packham
The interrupt is not required so use platform_irq_get_optional() to avoid error messages like i2c-pca-platform 22080000.i2c: IRQ index 0 not found Signed-off-by: Chris Packham <chris.packham@alliedtelesis.co.nz> Signed-off-by: Wolfram Sang <wsa@the-dreams.de>
2020-03-28i2c: st: fix missing struct parameter descriptionAlain Volmat
Fix a missing struct parameter description to allow warning free W=1 compilation. Signed-off-by: Alain Volmat <avolmat@me.com> Reviewed-by: Patrice Chotard <patrice.chotard@st.com> Signed-off-by: Wolfram Sang <wsa@the-dreams.de>
2020-03-27x86: get rid of user_atomic_cmpxchg_inatomic()Al Viro
Only one user left; the thing had been made polymorphic back in 2013 for the sake of MPX. No point keeping it now that MPX is gone. Convert futex_atomic_cmpxchg_inatomic() to user_access_{begin,end}() while we are at it. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2020-03-27generic arch_futex_atomic_op_inuser() doesn't need access_ok()Al Viro
uses get_user() and put_user() for memory accesses Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2020-03-27x86: don't reload after cmpxchg in unsafe_atomic_op2() loopAl Viro
lock cmpxchg leaves the current value in eax; no need to reload it. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2020-03-27x86: convert arch_futex_atomic_op_inuser() to ↵Al Viro
user_access_begin/user_access_end() Lift stac/clac pairs from __futex_atomic_op{1,2} into arch_futex_atomic_op_inuser(), fold them with access_ok() in there. The switch in arch_futex_atomic_op_inuser() is what has required the previous (objtool) commit... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2020-03-27objtool: whitelist __sanitizer_cov_trace_switch()Al Viro
it's not really different from e.g. __sanitizer_cov_trace_cmp4(); as it is, the switches that generate an array of labels get rejected by objtool, while slightly different set of cases that gets compiled into a series of comparisons is accepted. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2020-03-27[parisc, s390, sparc64] no need for access_ok() in futex handlingAl Viro
access_ok() is always true on those Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2020-03-27sh: no need of access_ok() in arch_futex_atomic_op_inuser()Al Viro
everything it uses is doing access_ok() already Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2020-03-27futex: arch_futex_atomic_op_inuser() calling conventions changeAl Viro
Move access_ok() in and pagefault_enable()/pagefault_disable() out. Mechanical conversion only - some instances don't really need a separate access_ok() at all (e.g. the ones only using get_user()/put_user(), or architectures where access_ok() is always true); we'll deal with that in followups. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2020-03-27Merge branch 'cgroup-helpers'Alexei Starovoitov
Daniel Borkmann says: ==================== This adds various straight-forward helper improvements and additions to BPF cgroup based connect(), sendmsg(), recvmsg() and bind-related hooks which would allow to implement more fine-grained policies and improve current load balancer limitations we're seeing. For details please see individual patches. I've tested them on Kubernetes & Cilium and also added selftests for the small verifier extension. Thanks! ==================== Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2020-03-27bpf: Add selftest cases for ctx_or_null argument typeDaniel Borkmann
Add various tests to make sure the verifier keeps catching them: # ./test_verifier [...] #230/p pass ctx or null check, 1: ctx OK #231/p pass ctx or null check, 2: null OK #232/p pass ctx or null check, 3: 1 OK #233/p pass ctx or null check, 4: ctx - const OK #234/p pass ctx or null check, 5: null (connect) OK #235/p pass ctx or null check, 6: null (bind) OK #236/p pass ctx or null check, 7: ctx (bind) OK #237/p pass ctx or null check, 8: null (bind) OK [...] Summary: 1595 PASSED, 0 SKIPPED, 0 FAILED Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/c74758d07b1b678036465ef7f068a49e9efd3548.1585323121.git.daniel@iogearbox.net
2020-03-27bpf: Enable retrival of pid/tgid/comm from bpf cgroup hooksDaniel Borkmann
We already have the bpf_get_current_uid_gid() helper enabled, and given we now have perf event RB output available for connect(), sendmsg(), recvmsg() and bind-related hooks, add a trivial change to enable bpf_get_current_pid_tgid() and bpf_get_current_comm() as well. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Andrii Nakryiko <andriin@fb.com> Link: https://lore.kernel.org/bpf/18744744ed93c06343be8b41edcfd858706f39d7.1585323121.git.daniel@iogearbox.net
2020-03-27bpf: Enable bpf cgroup hooks to retrieve cgroup v2 and ancestor idDaniel Borkmann
Enable the bpf_get_current_cgroup_id() helper for connect(), sendmsg(), recvmsg() and bind-related hooks in order to retrieve the cgroup v2 context which can then be used as part of the key for BPF map lookups, for example. Given these hooks operate in process context 'current' is always valid and pointing to the app that is performing mentioned syscalls if it's subject to a v2 cgroup. Also with same motivation of commit 7723628101aa ("bpf: Introduce bpf_skb_ancestor_cgroup_id helper") enable retrieval of ancestor from current so the cgroup id can be used for policy lookups which can then forbid connect() / bind(), for example. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/d2a7ef42530ad299e3cbb245e6c12374b72145ef.1585323121.git.daniel@iogearbox.net
2020-03-27bpf: Allow to retrieve cgroup v1 classid from v2 hooksDaniel Borkmann
Today, Kubernetes is still operating on cgroups v1, however, it is possible to retrieve the task's classid based on 'current' out of connect(), sendmsg(), recvmsg() and bind-related hooks for orchestrators which attach to the root cgroup v2 hook in a mixed env like in case of Cilium, for example, in order to then correlate certain pod traffic and use it as part of the key for BPF map lookups. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/555e1c69db7376c0947007b4951c260e1074efc3.1585323121.git.daniel@iogearbox.net
2020-03-27bpf: Add netns cookie and enable it for bpf cgroup hooksDaniel Borkmann
In Cilium we're mainly using BPF cgroup hooks today in order to implement kube-proxy free Kubernetes service translation for ClusterIP, NodePort (*), ExternalIP, and LoadBalancer as well as HostPort mapping [0] for all traffic between Cilium managed nodes. While this works in its current shape and avoids packet-level NAT for inter Cilium managed node traffic, there is one major limitation we're facing today, that is, lack of netns awareness. In Kubernetes, the concept of Pods (which hold one or multiple containers) has been built around network namespaces, so while we can use the global scope of attaching to root BPF cgroup hooks also to our advantage (e.g. for exposing NodePort ports on loopback addresses), we also have the need to differentiate between initial network namespaces and non-initial one. For example, ExternalIP services mandate that non-local service IPs are not to be translated from the host (initial) network namespace as one example. Right now, we have an ugly work-around in place where non-local service IPs for ExternalIP services are not xlated from connect() and friends BPF hooks but instead via less efficient packet-level NAT on the veth tc ingress hook for Pod traffic. On top of determining whether we're in initial or non-initial network namespace we also have a need for a socket-cookie like mechanism for network namespaces scope. Socket cookies have the nice property that they can be combined as part of the key structure e.g. for BPF LRU maps without having to worry that the cookie could be recycled. We are planning to use this for our sessionAffinity implementation for services. Therefore, add a new bpf_get_netns_cookie() helper which would resolve both use cases at once: bpf_get_netns_cookie(NULL) would provide the cookie for the initial network namespace while passing the context instead of NULL would provide the cookie from the application's network namespace. We're using a hole, so no size increase; the assignment happens only once. Therefore this allows for a comparison on initial namespace as well as regular cookie usage as we have today with socket cookies. We could later on enable this helper for other program types as well as we would see need. (*) Both externalTrafficPolicy={Local|Cluster} types [0] https://github.com/cilium/cilium/blob/master/bpf/bpf_sock.c Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/c47d2346982693a9cf9da0e12690453aded4c788.1585323121.git.daniel@iogearbox.net