summaryrefslogtreecommitdiff
path: root/include
AgeCommit message (Collapse)Author
2019-02-05drm/atomic: Add drm_atomic_state->duplicatedLyude Paul
Since commit 39b50c603878 ("drm/atomic_helper: Stop modesets on unregistered connectors harder") We've been failing atomic checks if they try to enable new displays on unregistered connectors. This is fine except for the one situation that breaks atomic assumptions: suspend/resume. If a connector is unregistered before we attempt to restore the atomic state, something we end up failing the atomic check that happens when trying to restore the state during resume. Normally this would be OK: we try our best to make sure that the atomic state pre-suspend can be restored post-suspend, but failures at that point usually don't cause problems. That is of course, until we introduced the new atomic MST VCPI helpers: [drm:drm_atomic_helper_check_modeset [drm_kms_helper]] [CRTC:65:pipe B] active changed [drm:drm_atomic_helper_check_modeset [drm_kms_helper]] Updating routing for [CONNECTOR:123:DP-5] [drm:drm_atomic_helper_check_modeset [drm_kms_helper]] Disabling [CONNECTOR:123:DP-5] [drm:drm_atomic_get_private_obj_state [drm]] Added new private object 0000000025844636 state 000000009fd2899a to 000000003a13d7b8 WARNING: CPU: 6 PID: 1070 at drivers/gpu/drm/drm_dp_mst_topology.c:3153 drm_dp_atomic_release_vcpi_slots+0xb9/0x200 [drm_kms_helper] Modules linked in: fuse vfat fat snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_codec_generic joydev iTCO_wdt i915(O) wmi_bmof intel_rapl btusb btrtl x86_pkg_temp_thermal btbcm btintel coretemp i2c_algo_bit drm_kms_helper(O) crc32_pclmul snd_hda_intel syscopyarea sysfillrect snd_hda_codec sysimgblt snd_hda_core bluetooth fb_sys_fops snd_pcm pcspkr drm(O) psmouse snd_timer mei_me ecdh_generic i2c_i801 mei i2c_core ucsi_acpi typec_ucsi typec wmi thinkpad_acpi ledtrig_audio snd soundcore tpm_tis rfkill tpm_tis_core video tpm acpi_pad pcc_cpufreq uas usb_storage crc32c_intel nvme serio_raw xhci_pci nvme_core xhci_hcd CPU: 6 PID: 1070 Comm: gnome-shell Tainted: G W O 5.0.0-rc2Lyude-Test+ #1 Hardware name: LENOVO 20L8S2N800/20L8S2N800, BIOS N22ET35W (1.12 ) 04/09/2018 RIP: 0010:drm_dp_atomic_release_vcpi_slots+0xb9/0x200 [drm_kms_helper] Code: 00 4c 39 6d f0 74 49 48 8d 7b 10 48 89 f9 48 c1 e9 03 42 80 3c 21 00 0f 85 d2 00 00 00 48 8b 6b 10 48 8d 5d f0 49 39 ee 75 c5 <0f> 0b 48 c7 c7 c0 78 b3 a0 48 89 c2 4c 89 ee e8 03 6c aa ff b8 ea RSP: 0018:ffff88841235f268 EFLAGS: 00010246 RAX: ffff88841bf12ab0 RBX: ffff88841bf12aa8 RCX: 1ffff110837e2557 RDX: dffffc0000000000 RSI: 0000000000000000 RDI: ffffed108246bde0 RBP: ffff88841bf12ab8 R08: ffffed1083db3c93 R09: ffffed1083db3c92 R10: ffffed1083db3c92 R11: ffff88841ed9e497 R12: ffff888419555d80 R13: ffff8883bc499100 R14: ffff88841bf12ab8 R15: 0000000000000000 FS: 00007f16fbd4cd00(0000) GS:ffff88841ed80000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007f1687c9f000 CR3: 00000003ba3cc003 CR4: 00000000003606e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: drm_atomic_helper_check_modeset+0xf21/0x2f50 [drm_kms_helper] ? drm_atomic_helper_commit_modeset_enables+0xa90/0xa90 [drm_kms_helper] ? __printk_safe_exit+0x10/0x10 ? save_stack+0x8c/0xb0 ? vprintk_func+0x96/0x1bf ? __printk_safe_exit+0x10/0x10 intel_atomic_check+0x234/0x4750 [i915] ? printk+0x9f/0xc5 ? kmsg_dump_rewind_nolock+0xd9/0xd9 ? _raw_spin_lock_irqsave+0xa4/0x140 ? drm_atomic_check_only+0xb1/0x28b0 [drm] ? drm_dbg+0x186/0x1b0 [drm] ? drm_dev_dbg+0x200/0x200 [drm] ? intel_link_compute_m_n+0xb0/0xb0 [i915] ? drm_mode_put_tile_group+0x20/0x20 [drm] ? skl_plane_format_mod_supported+0x17f/0x1b0 [i915] ? drm_plane_check_pixel_format+0x14a/0x310 [drm] drm_atomic_check_only+0x13c4/0x28b0 [drm] ? drm_state_info+0x220/0x220 [drm] ? drm_atomic_helper_disable_plane+0x1d0/0x1d0 [drm_kms_helper] ? pick_single_encoder_for_connector+0xe0/0xe0 [drm_kms_helper] ? kasan_unpoison_shadow+0x35/0x40 drm_atomic_commit+0x3b/0x100 [drm] drm_atomic_helper_set_config+0xd5/0x100 [drm_kms_helper] drm_mode_setcrtc+0x636/0x1660 [drm] ? vprintk_func+0x96/0x1bf ? drm_dev_dbg+0x200/0x200 [drm] ? drm_mode_getcrtc+0x790/0x790 [drm] ? printk+0x9f/0xc5 ? mutex_unlock+0x1d/0x40 ? drm_mode_addfb2+0x2e9/0x3a0 [drm] ? rcu_sync_dtor+0x2e0/0x2e0 ? drm_dbg+0x186/0x1b0 [drm] ? set_page_dirty+0x271/0x4d0 drm_ioctl_kernel+0x203/0x290 [drm] ? drm_mode_getcrtc+0x790/0x790 [drm] ? drm_setversion+0x7f0/0x7f0 [drm] ? __switch_to_asm+0x34/0x70 ? __switch_to_asm+0x34/0x70 drm_ioctl+0x445/0x950 [drm] ? drm_mode_getcrtc+0x790/0x790 [drm] ? drm_getunique+0x220/0x220 [drm] ? expand_files.part.10+0x920/0x920 do_vfs_ioctl+0x1a1/0x13d0 ? ioctl_preallocate+0x2b0/0x2b0 ? __fget_light+0x2d6/0x390 ? schedule+0xd7/0x2e0 ? fget_raw+0x10/0x10 ? apic_timer_interrupt+0xa/0x20 ? apic_timer_interrupt+0xa/0x20 ? rcu_cleanup_dead_rnp+0x2c0/0x2c0 ksys_ioctl+0x60/0x90 __x64_sys_ioctl+0x6f/0xb0 do_syscall_64+0x136/0x440 ? syscall_return_slowpath+0x2d0/0x2d0 ? do_page_fault+0x89/0x330 ? __do_page_fault+0x9c0/0x9c0 ? prepare_exit_to_usermode+0x188/0x200 ? perf_trace_sys_enter+0x1090/0x1090 ? __x64_sys_sigaltstack+0x280/0x280 ? __put_user_4+0x1c/0x30 entry_SYSCALL_64_after_hwframe+0x44/0xa9 RIP: 0033:0x7f16ff89a09b Code: 0f 1e fa 48 8b 05 ed bd 0c 00 64 c7 00 26 00 00 00 48 c7 c0 ff ff ff ff c3 66 0f 1f 44 00 00 f3 0f 1e fa b8 10 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d bd bd 0c 00 f7 d8 64 89 01 48 RSP: 002b:00007fff001232b8 EFLAGS: 00000246 ORIG_RAX: 0000000000000010 RAX: ffffffffffffffda RBX: 00007fff001232f0 RCX: 00007f16ff89a09b RDX: 00007fff001232f0 RSI: 00000000c06864a2 RDI: 000000000000000b RBP: 00007fff001232f0 R08: 0000000000000000 R09: 000055a79d484460 R10: 000055a79d44e770 R11: 0000000000000246 R12: 00000000c06864a2 R13: 000000000000000b R14: 0000000000000000 R15: 000055a79d44e770 WARNING: CPU: 6 PID: 1070 at drivers/gpu/drm/drm_dp_mst_topology.c:3153 drm_dp_atomic_release_vcpi_slots+0xb9/0x200 [drm_kms_helper] ---[ end trace d536c05c13c83be2 ]--- [drm:drm_dp_atomic_release_vcpi_slots [drm_kms_helper]] *ERROR* no VCPI for [MST PORT:00000000f9e2b143] found in mst state 000000009fd2899a This appears to be happening because we destroy the VCPI allocations when disabling all connected displays while suspending, and those VCPI allocations don't get restored on resume due to failing to restore the atomic state. So, fix this by introducing the suspending option to drm_atomic_helper_duplicate_state() and use that to indicate in the atomic state that it's being used for suspending or resuming the system, and thus needs to be fixed up by the driver. We can then use the new state->duplicated hook to tell update_connector_routing() in drm_atomic_check_modeset() to allow for modesets on unregistered connectors, which allows us to restore atomic states that contain MST topologies that were removed after the state was duplicated and thus: mostly fixing suspend and resume. This just leaves some issues that were introduced with nouveau, that will be addressed next. Changes since v3: * Remove ->duplicated hunks that I left in the VCPI helpers by accident. These don't need to be here, that was the supposed to be the purpose of the last revision Changes since v2: * Remove the changes in this patch to the VCPI helpers, they aren't needed anymore Changes since v1: * Rename suspend_or_resume to duplicated Signed-off-by: Lyude Paul <lyude@redhat.com> Fixes: eceae1472467 ("drm/dp_mst: Start tracking per-port VCPI allocations") Cc: Daniel Vetter <daniel@ffwll.ch> Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch> Link: https://patchwork.freedesktop.org/patch/msgid/20190202002023.29665-4-lyude@redhat.com
2019-02-05ALSA: compress: Fix stop handling on compressed capture streamsCharles Keepax
It is normal user behaviour to start, stop, then start a stream again without closing it. Currently this works for compressed playback streams but not capture ones. The states on a compressed capture stream go directly from OPEN to PREPARED, unlike a playback stream which moves to SETUP and waits for a write of data before moving to PREPARED. Currently however, when a stop is sent the state is set to SETUP for both types of streams. This leaves a capture stream in the situation where a new start can't be sent as that requires the state to be PREPARED and a new set_params can't be sent as that requires the state to be OPEN. The only option being to close the stream, and then reopen. Correct this issues by allowing snd_compr_drain_notify to set the state depending on the stream direction, as we already do in set_params. Fixes: 49bb6402f1aa ("ALSA: compress_core: Add support for capture streams") Signed-off-by: Charles Keepax <ckeepax@opensource.cirrus.com> Cc: <stable@vger.kernel.org> Signed-off-by: Takashi Iwai <tiwai@suse.de>
2019-02-05virtio: drop internal struct from UAPIMichael S. Tsirkin
There's no reason to expose struct vring_packed in UAPI - if we do we won't be able to change or drop it, and it's not part of any interface. Let's move it to virtio_ring.c Cc: Tiwei Bie <tiwei.bie@intel.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2019-02-05vfio-mdev: Switch to use new generic UUID APIAndy Shevchenko
There are new types and helpers that are supposed to be used in new code. As a preparation to get rid of legacy types and API functions do the conversion here. Cc: Kirti Wankhede <kwankhede@nvidia.com> Cc: Alex Williamson <alex.williamson@redhat.com> Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2019-02-05drm: add definitions for DP Audio/Video compliance testsChandan Uddaraju
This change adds definitions needed for DP audio compliance testing. It also adds missing definition for DP video compliance. Changes in V2: -- Delete cover letter for this patch. -- Move the description from cover letter into patch commit message. -- Remove DPU from subject prefix Reviewed-by: Sean Paul <sean@poorly.run> Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch> # for merging through -msm. Signed-off-by: Chandan Uddaraju <chandanu@codeaurora.org> Signed-off-by: Sean Paul <seanpaul@chromium.org>
2019-02-05mtd: rawnand: remove ->legacy.erase and single_erase()Masahiro Yamada
Now that the last user of this hook, denali.c, stopped using it, we can remove the erase hook from nand_legacy. I squashed single_erase() because only the difference between single_erase() and nand_erase_op() is the number of bit shifts. The status/ret conversion in nand_erase_nand() is unneeded since commit eb94555e9e97 ("mtd: nand: use usual return values for the ->erase() hook"). Cleaned it up now. Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com> Reviewed-by: Boris Brezillon <bbrezillon@kernel.org> Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com>
2019-02-05mtd: rawnand: Simplify the lockingBoris Brezillon
nand_get_device() was complex for apparently no good reason. Let's replace this locking scheme with 2 mutexes: one attached to the controller and another one attached to the chip. Every time the core calls nand_get_device(), it will first lock the chip and if the chip is not suspended, will then lock the controller. nand_release_device() will release both lock in the reverse order. nand_get_device() can sleep, just like the previous implementation, which means you should never call that from an atomic context. We also get rid of - the chip->state field, since all it was used for was flagging the chip as suspended. We replace it by a field called chip->suspended and directly set it from nand_suspend/resume() - the controller->wq and controller->active fields which are no longer needed since the new controller->lock (now a mutex) guarantees that all operations are serialized at the controller level - panic_nand_get_device() which would anyway be a no-op. Talking about panic write, I keep thinking the rawnand implementation is unsafe because there's not negotiation with the controller to know when it's actually done with it's previous operation. I don't intend to fix that here, but that's probably something we should look at, or maybe we should consider dropping the ->_panic_write() implementation Last important change to mention: we now return -EBUSY when someone tries to access a device that as been suspended, and propagate this error to the upper layer. Signed-off-by: Boris Brezillon <boris.brezillon@bootlin.com> Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com>
2019-02-05irqdesc: Add domain handler for NMIsJulien Thierry
NMI handling code should be executed between calls to nmi_enter and nmi_exit. Add a separate domain handler to properly setup NMI context when handling an interrupt requested as NMI. Signed-off-by: Julien Thierry <julien.thierry@arm.com> Acked-by: Marc Zyngier <marc.zyngier@arm.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Marc Zyngier <marc.zyngier@arm.com> Cc: Will Deacon <will.deacon@arm.com> Cc: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2019-02-05genirq: Provide NMI handlersJulien Thierry
Provide flow handlers that are NMI safe for interrupts and percpu_devid interrupts. Signed-off-by: Julien Thierry <julien.thierry@arm.com> Acked-by: Marc Zyngier <marc.zyngier@arm.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Marc Zyngier <marc.zyngier@arm.com> Cc: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2019-02-05genirq: Provide NMI management for percpu_devid interruptsJulien Thierry
Add support for percpu_devid interrupts treated as NMIs. Percpu_devid NMIs need to be setup/torn down on each CPU they target. The same restrictions as for global NMIs still apply for percpu_devid NMIs. Signed-off-by: Julien Thierry <julien.thierry@arm.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ingo Molnar <mingo@kernel.org> Cc: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2019-02-05genirq: Provide basic NMI management for interrupt linesJulien Thierry
Add functionality to allocate interrupt lines that will deliver IRQs as Non-Maskable Interrupts. These allocations are only successful if the irqchip provides the necessary support and allows NMI delivery for the interrupt line. Interrupt lines allocated for NMI delivery must be enabled/disabled through enable_nmi/disable_nmi_nosync to keep their state consistent. To treat a PERCPU IRQ as NMI, the interrupt must not be shared nor threaded, the irqchip directly managing the IRQ must be the root irqchip and the irqchip cannot be behind a slow bus. Signed-off-by: Julien Thierry <julien.thierry@arm.com> Reviewed-by: Marc Zyngier <marc.zyngier@arm.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ingo Molnar <mingo@kernel.org> Cc: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2019-02-05firmware: xilinx: Add zynqmp_pm_get_chipid() APINava kishore Manne
This patch adds a new API to provide access to the hardware related data like soc revision, IDCODE... etc. Signed-off-by: Nava kishore Manne <nava.manne@xilinx.com> Signed-off-by: Michal Simek <michal.simek@xilinx.com>
2019-02-05drm/i915: Expose RPCS (SSEU) configuration to userspace (Gen11 only)Tvrtko Ursulin
We want to allow userspace to reconfigure the subslice configuration on a per context basis. This is required for the functional requirement of shutting down non-VME enabled sub-slices on Gen11 parts. To do so, we expose a context parameter to allow adjustment of the RPCS register stored within the context image (and currently not accessible via LRI). If the context is adjusted before first use or whilst idle, the adjustment is for "free"; otherwise if the context is active we queue a request to do so (using the kernel context), following all other activity by that context, which is also marked as barrier for all following submission against the same context. Since the overhead of device re-configuration during context switching can be significant, especially in multi-context workloads, we limit this new uAPI to only support the Gen11 VME use case. In this use case either the device is fully enabled, and exactly one slice and half of the subslices are enabled. Example usage: struct drm_i915_gem_context_param_sseu sseu = { }; struct drm_i915_gem_context_param arg = { .param = I915_CONTEXT_PARAM_SSEU, .ctx_id = gem_context_create(fd), .size = sizeof(sseu), .value = to_user_pointer(&sseu) }; /* Query device defaults. */ gem_context_get_param(fd, &arg); /* Set VME configuration on a 1x6x8 part. */ sseu.slice_mask = 0x1; sseu.subslice_mask = 0xe0; gem_context_set_param(fd, &arg); v2: Fix offset of CTX_R_PWR_CLK_STATE in intel_lr_context_set_sseu() (Lionel) v3: Add ability to program this per engine (Chris) v4: Move most get_sseu() into i915_gem_context.c (Lionel) v5: Validate sseu configuration against the device's capabilities (Lionel) v6: Change context powergating settings through MI_SDM on kernel context (Chris) v7: Synchronize the requests following a powergating setting change using a global dependency (Chris) Iterate timelines through dev_priv.gt.active_rings (Tvrtko) Disable RPCS configuration setting for non capable users (Lionel/Tvrtko) v8: s/union intel_sseu/struct intel_sseu/ (Lionel) s/dev_priv/i915/ (Tvrtko) Change uapi class/instance fields to u16 (Tvrtko) Bump mask fields to 64bits (Lionel) Don't return EPERM when dynamic sseu is disabled (Tvrtko) v9: Import context image into kernel context's ppgtt only when reconfiguring powergated slice/subslices (Chris) Use aliasing ppgtt when needed (Michel) Tvrtko Ursulin: v10: * Update for upstream changes. * Request submit needs a RPM reference. * Reject on !FULL_PPGTT for simplicity. * Pull out get/set param to helpers for readability and less indent. * Use i915_request_await_dma_fence in add_global_barrier to skip waits on the same timeline and avoid GEM_BUG_ON. * No need to explicitly assign a NULL pointer to engine in legacy mode. * No need to move gen8_make_rpcs up. * Factored out global barrier as prep patch. * Allow to only CAP_SYS_ADMIN if !Gen11. v11: * Remove engine vfunc in favour of local helper. (Chris Wilson) * Stop retiring requests before updates since it is not needed (Chris Wilson) * Implement direct CPU update path for idle contexts. (Chris Wilson) * Left side dependency needs only be on the same context timeline. (Chris Wilson) * It is sufficient to order the timeline. (Chris Wilson) * Reject !RCS configuration attempts with -ENODEV for now. v12: * Rebase for make_rpcs. v13: * Centralize SSEU normalization to make_rpcs. * Type width checking (uAPI <-> implementation). * Gen11 restrictions uAPI checks. * Gen11 subslice count differences handling. Chris Wilson: * args->size handling fixes. * Update context image from GGTT. * Postpone context image update to pinning. * Use i915_gem_active_raw instead of last_request_on_engine. v14: * Add activity tracker on intel_context to fix the lifetime issues and simplify the code. (Chris Wilson) v15: * Fix context pin leak if no space in ring by simplifying the context pinning sequence. v16: * Rebase for context get/set param locking changes. * Just -ENODEV on !Gen11. (Joonas) v17: * Fix one Gen11 subslice enablement rule. * Handle error from i915_sw_fence_await_sw_fence_gfp. (Chris Wilson) v18: * Update commit message. (Joonas) * Restrict uAPI to VME use case. (Joonas) v19: * Rebase. v20: * Rebase for ce->active_tracker. v21: * Rebase for IS_GEN changes. v22: * Reserve uAPI for flags straight away. (Chris Wilson) v23: * Rebase for RUNTIME_INFO. v24: * Added some headline docs for the uapi usage. (Joonas/Chris) v25: * Renamed class/instance to engine_class/engine_instance to avoid clash with C++ keyword. (Tony Ye) v26: * Rebased for runtime pm api changes. v27: * Rebased for intel_context_init. * Wrap commit msg to 75. v28: (Chris Wilson) * Use i915_gem_ggtt. * Use i915_request_await_dma_fence to show a better example. v29: * i915_timeline_set_barrier can now fail. (Chris Wilson) v30: * Capture some acks. v31: * Drop the WARN_ON from use controllable paths. (Chris Wilson) * Use overflows_type for all checks. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=100899 Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=107634 Issue: https://github.com/intel/media-driver/issues/267 Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Cc: Dmitry Rogozhkin <dmitry.v.rogozhkin@intel.com> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: Zhipeng Gong <zhipeng.gong@intel.com> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: Tony Ye <tony.ye@intel.com> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Acked-by: Timo Aaltonen <timo.aaltonen@canonical.com> Acked-by: Takashi Iwai <tiwai@suse.de> Acked-by: Stéphane Marchesin <marcheu@chromium.org> Link: https://patchwork.freedesktop.org/patch/msgid/20190205095032.22673-4-tvrtko.ursulin@linux.intel.com
2019-02-05xfrm: destroy xfrm_state synchronously on net exit pathCong Wang
xfrm_state_put() moves struct xfrm_state to the GC list and schedules the GC work to clean it up. On net exit call path, xfrm_state_flush() is called to clean up and xfrm_flush_gc() is called to wait for the GC work to complete before exit. However, this doesn't work because one of the ->destructor(), ipcomp_destroy(), schedules the same GC work again inside the GC work. It is hard to wait for such a nested async callback. This is also why syzbot still reports the following warning: WARNING: CPU: 1 PID: 33 at net/ipv6/xfrm6_tunnel.c:351 xfrm6_tunnel_net_exit+0x2cb/0x500 net/ipv6/xfrm6_tunnel.c:351 ... ops_exit_list.isra.0+0xb0/0x160 net/core/net_namespace.c:153 cleanup_net+0x51d/0xb10 net/core/net_namespace.c:551 process_one_work+0xd0c/0x1ce0 kernel/workqueue.c:2153 worker_thread+0x143/0x14a0 kernel/workqueue.c:2296 kthread+0x357/0x430 kernel/kthread.c:246 ret_from_fork+0x3a/0x50 arch/x86/entry/entry_64.S:352 In fact, it is perfectly fine to bypass GC and destroy xfrm_state synchronously on net exit call path, because it is in process context and doesn't need a work struct to do any blocking work. This patch introduces xfrm_state_put_sync() which simply bypasses GC, and lets its callers to decide whether to use this synchronous version. On net exit path, xfrm_state_fini() and xfrm6_tunnel_net_exit() use it. And, as ipcomp_destroy() itself is blocking, it can use xfrm_state_put_sync() directly too. Also rename xfrm_state_gc_destroy() to ___xfrm_state_destroy() to reflect this change. Fixes: b48c05ab5d32 ("xfrm: Fix warning in xfrm6_tunnel_net_exit.") Reported-and-tested-by: syzbot+e9aebef558e3ed673934@syzkaller.appspotmail.com Cc: Steffen Klassert <steffen.klassert@secunet.com> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2019-02-04net: phy: fixed-phy: Drop GPIO from fixed_phy_add()Linus Walleij
All users of the fixed_phy_add() pass -1 as GPIO number to the fixed phy driver, and all users of fixed_phy_register() pass -1 as GPIO number as well, except for the device tree MDIO bus. Any new users should create a proper device and pass the GPIO as a descriptor associated with the device so delete the GPIO argument from the calls and drop the code looking requesting a GPIO in fixed_phy_add(). In fixed phy_register(), investigate the "fixed-link" node and pick the GPIO descriptor from "link-gpios" if this property exists. Move the corresponding code out of of_mdio.c as the fixed phy code anyways requires OF to be in use. Tested-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: Linus Walleij <linus.walleij@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-04rds: add type of service(tos) infrastructureSantosh Shilimkar
RDS Service type (TOS) is user-defined and needs to be configured via RDS IOCTL interface. It must be set before initiating any traffic and once set the TOS can not be changed. All out-going traffic from the socket will be associated with its TOS. Reviewed-by: Sowmini Varadhan <sowmini.varadhan@oracle.com> Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com> [yanjun.zhu@oracle.com: Adapted original patch with ipv6 changes] Signed-off-by: Zhu Yanjun <yanjun.zhu@oracle.com>
2019-02-04netfilter: ipv6: avoid indirect calls for IPV6=y caseFlorian Westphal
indirect calls are only needed if ipv6 is a module. Add helpers to abstract the v6ops indirections and use them instead. fragment, reroute and route_input are kept as indirect calls. The first two are not not used in hot path and route_input is only used by bridge netfilter. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2019-02-04netfilter: nat: remove module dependency on ipv6 coreFlorian Westphal
nf_nat_ipv6 calls two ipv6 core functions, so add those to v6ops to avoid the module dependency. This is a prerequisite for merging ipv4 and ipv6 nat implementations. Add wrappers to avoid the indirection if ipv6 is builtin. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2019-02-04netfilter: nf_tables: unbind set in rule from commit pathPablo Neira Ayuso
Anonymous sets that are bound to rules from the same transaction trigger a kernel splat from the abort path due to double set list removal and double free. This patch updates the logic to search for the transaction that is responsible for creating the set and disable the set list removal and release, given the rule is now responsible for this. Lookup is reverse since the transaction that adds the set is likely to be at the tail of the list. Moreover, this patch adds the unbind step to deliver the event from the commit path. This should not be done from the worker thread, since we have no guarantees of in-order delivery to the listener. This patch removes the assumption that both activate and deactivate callbacks need to be provided. Fixes: cd5125d8f518 ("netfilter: nf_tables: split set destruction in deactivate and destroy phase") Reported-by: Mikhail Morfikov <mmorfikov@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2019-02-04netfilter: nft_tunnel: Add NFTA_TUNNEL_MODE optionswenxu
nft "tunnel" expr match both the tun_info of RX and TX. This patch provide the NFTA_TUNNEL_MODE to individually match the tun_info of RX or TX. Signed-off-by: wenxu <wenxu@ucloud.cn> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2019-02-04ASoC: topology: unload physical dai link in removeBard liao
soc_tplg_link_config() will find the physical dai link and call soc_tplg_dai_link_load() to load the BE dai link. Currently remove_link() is only used to remove the FE dai link which is created by the topology. The BE dai link cannot however be unloaded in snd_soc_tplg_component _remove(), which is problematic if anything needs to be released or reinitialized. This patch aligns the definitions of dynamic types with the existing UAPI and adds a new remove_backend_link() routine to unload the the BE dai link when snd_soc_tplg_component_remove() is invoked. Signed-off-by: Bard liao <yung-chuan.liao@linux.intel.com> Signed-off-by: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com> Signed-off-by: Mark Brown <broonie@kernel.org>
2019-02-04iwlwifi: mvm: add HE TB PPDU SIG-A BW to radiotapJohannes Berg
Expose the trigger-based PPDU SIG-A bandwidth to radiotap in the newly defined bits thereof. Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
2019-02-04drm: Trivial comment grammar cleanupsMatt Roper
Most of these are just cases where code comments used contractions (it's, who's) where they actually mean to use a possessive pronoun (its, whose) or vice-versa. Signed-off-by: Matt Roper <matthew.d.roper@intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch> Link: https://patchwork.freedesktop.org/patch/msgid/20190202012326.20096-1-matthew.d.roper@intel.com
2019-02-04sched/core: Use READ_ONCE()/WRITE_ONCE() in move_queued_task()/task_rq_lock()Andrea Parri
move_queued_task() synchronizes with task_rq_lock() as follows: move_queued_task() task_rq_lock() [S] ->on_rq = MIGRATING [L] rq = task_rq() WMB (__set_task_cpu()) ACQUIRE (rq->lock); [S] ->cpu = new_cpu [L] ->on_rq where "[L] rq = task_rq()" is ordered before "ACQUIRE (rq->lock)" by an address dependency and, in turn, "ACQUIRE (rq->lock)" is ordered before "[L] ->on_rq" by the ACQUIRE itself. Use READ_ONCE() to load ->cpu in task_rq() (c.f., task_cpu()) to honor this address dependency. Also, mark the accesses to ->cpu and ->on_rq with READ_ONCE()/WRITE_ONCE() to comply with the LKMM. Signed-off-by: Andrea Parri <andrea.parri@amarulasolutions.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Alan Stern <stern@rowland.harvard.edu> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Paul E. McKenney <paulmck@linux.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Will Deacon <will.deacon@arm.com> Link: https://lkml.kernel.org/r/20190121155240.27173-1-andrea.parri@amarulasolutions.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-02-04sched/fair: Update scale invariance of PELTVincent Guittot
The current implementation of load tracking invariance scales the contribution with current frequency and uarch performance (only for utilization) of the CPU. One main result of this formula is that the figures are capped by current capacity of CPU. Another one is that the load_avg is not invariant because not scaled with uarch. The util_avg of a periodic task that runs r time slots every p time slots varies in the range : U * (1-y^r)/(1-y^p) * y^i < Utilization < U * (1-y^r)/(1-y^p) with U is the max util_avg value = SCHED_CAPACITY_SCALE At a lower capacity, the range becomes: U * C * (1-y^r')/(1-y^p) * y^i' < Utilization < U * C * (1-y^r')/(1-y^p) with C reflecting the compute capacity ratio between current capacity and max capacity. so C tries to compensate changes in (1-y^r') but it can't be accurate. Instead of scaling the contribution value of PELT algo, we should scale the running time. The PELT signal aims to track the amount of computation of tasks and/or rq so it seems more correct to scale the running time to reflect the effective amount of computation done since the last update. In order to be fully invariant, we need to apply the same amount of running time and idle time whatever the current capacity. Because running at lower capacity implies that the task will run longer, we have to ensure that the same amount of idle time will be applied when system becomes idle and no idle time has been "stolen". But reaching the maximum utilization value (SCHED_CAPACITY_SCALE) means that the task is seen as an always-running task whatever the capacity of the CPU (even at max compute capacity). In this case, we can discard this "stolen" idle times which becomes meaningless. In order to achieve this time scaling, a new clock_pelt is created per rq. The increase of this clock scales with current capacity when something is running on rq and synchronizes with clock_task when rq is idle. With this mechanism, we ensure the same running and idle time whatever the current capacity. This also enables to simplify the pelt algorithm by removing all references of uarch and frequency and applying the same contribution to utilization and loads. Furthermore, the scaling is done only once per update of clock (update_rq_clock_task()) instead of during each update of sched_entities and cfs/rt/dl_rq of the rq like the current implementation. This is interesting when cgroup are involved as shown in the results below: On a hikey (octo Arm64 platform). Performance cpufreq governor and only shallowest c-state to remove variance generated by those power features so we only track the impact of pelt algo. each test runs 16 times: ./perf bench sched pipe (higher is better) kernel tip/sched/core + patch ops/seconds ops/seconds diff cgroup root 59652(+/- 0.18%) 59876(+/- 0.24%) +0.38% level1 55608(+/- 0.27%) 55923(+/- 0.24%) +0.57% level2 52115(+/- 0.29%) 52564(+/- 0.22%) +0.86% hackbench -l 1000 (lower is better) kernel tip/sched/core + patch duration(sec) duration(sec) diff cgroup root 4.453(+/- 2.37%) 4.383(+/- 2.88%) -1.57% level1 4.859(+/- 8.50%) 4.830(+/- 7.07%) -0.60% level2 5.063(+/- 9.83%) 4.928(+/- 9.66%) -2.66% Then, the responsiveness of PELT is improved when CPU is not running at max capacity with this new algorithm. I have put below some examples of duration to reach some typical load values according to the capacity of the CPU with current implementation and with this patch. These values has been computed based on the geometric series and the half period value: Util (%) max capacity half capacity(mainline) half capacity(w/ patch) 972 (95%) 138ms not reachable 276ms 486 (47.5%) 30ms 138ms 60ms 256 (25%) 13ms 32ms 26ms On my hikey (octo Arm64 platform) with schedutil governor, the time to reach max OPP when starting from a null utilization, decreases from 223ms with current scale invariance down to 121ms with the new algorithm. Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Morten.Rasmussen@arm.com Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: bsegall@google.com Cc: dietmar.eggemann@arm.com Cc: patrick.bellasi@arm.com Cc: pjt@google.com Cc: pkondeti@codeaurora.org Cc: quentin.perret@arm.com Cc: rjw@rjwysocki.net Cc: srinivas.pandruvada@linux.intel.com Cc: thara.gopinath@linaro.org Link: https://lkml.kernel.org/r/1548257214-13745-3-git-send-email-vincent.guittot@linaro.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-02-04sched/wake_q: Reduce reference counting for special usersDavidlohr Bueso
Some users, specifically futexes and rwsems, required fixes that allowed the callers to be safe when wakeups occur before they are expected by wake_up_q(). Such scenarios also play games and rely on reference counting, and until now were pivoting on wake_q doing it. With the wake_q_add() call being moved down, this can no longer be the case. As such we end up with a a double task refcounting overhead; and these callers care enough about this (being rather core-ish). This patch introduces a wake_q_add_safe() call that serves for callers that have already done refcounting and therefore the task is 'safe' from wake_q point of view (int that it requires reference throughout the entire queue/>wakeup cycle). In the one case it has internal reference counting, in the other case it consumes the reference counting. Signed-off-by: Davidlohr Bueso <dbueso@suse.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Waiman Long <longman@redhat.com> Cc: Will Deacon <will.deacon@arm.com> Cc: Xie Yongji <xieyongji@baidu.com> Cc: Yongji Xie <elohimes@gmail.com> Cc: andrea.parri@amarulasolutions.com Cc: lilin24@baidu.com Cc: liuqi16@baidu.com Cc: nixun@baidu.com Cc: yuanlinsi01@baidu.com Cc: zhangyu31@baidu.com Link: https://lkml.kernel.org/r/20181218195352.7orq3upiwfdbrdne@linux-r8p5 Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-02-04sched/core: Convert task_struct.stack_refcount to refcount_tElena Reshetova
atomic_t variables are currently used to implement reference counters with the following properties: - counter is initialized to 1 using atomic_set() - a resource is freed upon counter reaching zero - once counter reaches zero, its further increments aren't allowed - counter schema uses basic atomic operations (set, inc, inc_not_zero, dec_and_test, etc.) Such atomic variables should be converted to a newly provided refcount_t type and API that prevents accidental counter overflows and underflows. This is important since overflows and underflows can lead to use-after-free situation and be exploitable. The variable task_struct.stack_refcount is used as pure reference counter. Convert it to refcount_t and fix up the operations. ** Important note for maintainers: Some functions from refcount_t API defined in lib/refcount.c have different memory ordering guarantees than their atomic counterparts. The full comparison can be seen in https://lkml.org/lkml/2017/11/15/57 and it is hopefully soon in state to be merged to the documentation tree. Normally the differences should not matter since refcount_t provides enough guarantees to satisfy the refcounting use cases, but in some rare cases it might matter. Please double check that you don't have some undocumented memory guarantees for this variable usage. For the task_struct.stack_refcount it might make a difference in following places: - try_get_task_stack(): increment in refcount_inc_not_zero() only guarantees control dependency on success vs. fully ordered atomic counterpart - put_task_stack(): decrement in refcount_dec_and_test() only provides RELEASE ordering and control dependency on success vs. fully ordered atomic counterpart Suggested-by: Kees Cook <keescook@chromium.org> Signed-off-by: Elena Reshetova <elena.reshetova@intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: David Windsor <dwindsor@gmail.com> Reviewed-by: Hans Liljestrand <ishkamiel@gmail.com> Reviewed-by: Andrea Parri <andrea.parri@amarulasolutions.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: akpm@linux-foundation.org Cc: viro@zeniv.linux.org.uk Link: https://lkml.kernel.org/r/1547814450-18902-6-git-send-email-elena.reshetova@intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-02-04sched/core: Convert task_struct.usage to refcount_tElena Reshetova
atomic_t variables are currently used to implement reference counters with the following properties: - counter is initialized to 1 using atomic_set() - a resource is freed upon counter reaching zero - once counter reaches zero, its further increments aren't allowed - counter schema uses basic atomic operations (set, inc, inc_not_zero, dec_and_test, etc.) Such atomic variables should be converted to a newly provided refcount_t type and API that prevents accidental counter overflows and underflows. This is important since overflows and underflows can lead to use-after-free situation and be exploitable. The variable task_struct.usage is used as pure reference counter. Convert it to refcount_t and fix up the operations. ** Important note for maintainers: Some functions from refcount_t API defined in lib/refcount.c have different memory ordering guarantees than their atomic counterparts. The full comparison can be seen in https://lkml.org/lkml/2017/11/15/57 and it is hopefully soon in state to be merged to the documentation tree. Normally the differences should not matter since refcount_t provides enough guarantees to satisfy the refcounting use cases, but in some rare cases it might matter. Please double check that you don't have some undocumented memory guarantees for this variable usage. For the task_struct.usage it might make a difference in following places: - put_task_struct(): decrement in refcount_dec_and_test() only provides RELEASE ordering and control dependency on success vs. fully ordered atomic counterpart Suggested-by: Kees Cook <keescook@chromium.org> Signed-off-by: Elena Reshetova <elena.reshetova@intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: David Windsor <dwindsor@gmail.com> Reviewed-by: Hans Liljestrand <ishkamiel@gmail.com> Reviewed-by: Andrea Parri <andrea.parri@amarulasolutions.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: akpm@linux-foundation.org Cc: viro@zeniv.linux.org.uk Link: https://lkml.kernel.org/r/1547814450-18902-5-git-send-email-elena.reshetova@intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-02-04sched/core: Convert signal_struct.sigcnt to refcount_tElena Reshetova
atomic_t variables are currently used to implement reference counters with the following properties: - counter is initialized to 1 using atomic_set() - a resource is freed upon counter reaching zero - once counter reaches zero, its further increments aren't allowed - counter schema uses basic atomic operations (set, inc, inc_not_zero, dec_and_test, etc.) Such atomic variables should be converted to a newly provided refcount_t type and API that prevents accidental counter overflows and underflows. This is important since overflows and underflows can lead to use-after-free situation and be exploitable. The variable signal_struct.sigcnt is used as pure reference counter. Convert it to refcount_t and fix up the operations. ** Important note for maintainers: Some functions from refcount_t API defined in lib/refcount.c have different memory ordering guarantees than their atomic counterparts. The full comparison can be seen in https://lkml.org/lkml/2017/11/15/57 and it is hopefully soon in state to be merged to the documentation tree. Normally the differences should not matter since refcount_t provides enough guarantees to satisfy the refcounting use cases, but in some rare cases it might matter. Please double check that you don't have some undocumented memory guarantees for this variable usage. For the signal_struct.sigcnt it might make a difference in following places: - put_signal_struct(): decrement in refcount_dec_and_test() only provides RELEASE ordering and control dependency on success vs. fully ordered atomic counterpart Suggested-by: Kees Cook <keescook@chromium.org> Signed-off-by: Elena Reshetova <elena.reshetova@intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: David Windsor <dwindsor@gmail.com> Reviewed-by: Hans Liljestrand <ishkamiel@gmail.com> Reviewed-by: Andrea Parri <andrea.parri@amarulasolutions.com> Reviewed-by: Oleg Nesterov <oleg@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: akpm@linux-foundation.org Cc: viro@zeniv.linux.org.uk Link: https://lkml.kernel.org/r/1547814450-18902-3-git-send-email-elena.reshetova@intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-02-04sched/core: Convert sighand_struct.count to refcount_tElena Reshetova
atomic_t variables are currently used to implement reference counters with the following properties: - counter is initialized to 1 using atomic_set() - a resource is freed upon counter reaching zero - once counter reaches zero, its further increments aren't allowed - counter schema uses basic atomic operations (set, inc, inc_not_zero, dec_and_test, etc.) Such atomic variables should be converted to a newly provided refcount_t type and API that prevents accidental counter overflows and underflows. This is important since overflows and underflows can lead to use-after-free situation and be exploitable. The variable sighand_struct.count is used as pure reference counter. Convert it to refcount_t and fix up the operations. ** Important note for maintainers: Some functions from refcount_t API defined in lib/refcount.c have different memory ordering guarantees than their atomic counterparts. The full comparison can be seen in https://lkml.org/lkml/2017/11/15/57 and it is hopefully soon in state to be merged to the documentation tree. Normally the differences should not matter since refcount_t provides enough guarantees to satisfy the refcounting use cases, but in some rare cases it might matter. Please double check that you don't have some undocumented memory guarantees for this variable usage. For the sighand_struct.count it might make a difference in following places: - __cleanup_sighand: decrement in refcount_dec_and_test() only provides RELEASE ordering and control dependency on success vs. fully ordered atomic counterpart Suggested-by: Kees Cook <keescook@chromium.org> Signed-off-by: Elena Reshetova <elena.reshetova@intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: David Windsor <dwindsor@gmail.com> Reviewed-by: Hans Liljestrand <ishkamiel@gmail.com> Reviewed-by: Andrea Parri <andrea.parri@amarulasolutions.com> Reviewed-by: Oleg Nesterov <oleg@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: akpm@linux-foundation.org Cc: viro@zeniv.linux.org.uk Link: https://lkml.kernel.org/r/1547814450-18902-2-git-send-email-elena.reshetova@intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-02-04perf: Convert perf_event_context.refcount to refcount_tElena Reshetova
atomic_t variables are currently used to implement reference counters with the following properties: - counter is initialized to 1 using atomic_set() - a resource is freed upon counter reaching zero - once counter reaches zero, its further increments aren't allowed - counter schema uses basic atomic operations (set, inc, inc_not_zero, dec_and_test, etc.) Such atomic variables should be converted to a newly provided refcount_t type and API that prevents accidental counter overflows and underflows. This is important since overflows and underflows can lead to use-after-free situation and be exploitable. The variable perf_event_context.refcount is used as pure reference counter. Convert it to refcount_t and fix up the operations. ** Important note for maintainers: Some functions from refcount_t API defined in lib/refcount.c have different memory ordering guarantees than their atomic counterparts. Please check Documentation/core-api/refcount-vs-atomic.rst for more information. Normally the differences should not matter since refcount_t provides enough guarantees to satisfy the refcounting use cases, but in some rare cases it might matter. Please double check that you don't have some undocumented memory guarantees for this variable usage. For the perf_event_context.refcount it might make a difference in following places: - get_ctx(), perf_event_ctx_lock_nested(), perf_lock_task_context() and __perf_event_ctx_lock_double(): increment in refcount_inc_not_zero() only guarantees control dependency on success vs. fully ordered atomic counterpart - put_ctx(): decrement in refcount_dec_and_test() provides RELEASE ordering and ACQUIRE ordering + control dependency on success vs. fully ordered atomic counterpart Suggested-by: Kees Cook <keescook@chromium.org> Signed-off-by: Elena Reshetova <elena.reshetova@intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: David Windsor <dwindsor@gmail.com> Reviewed-by: Hans Liljestrand <ishkamiel@gmail.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: acme@kernel.org Cc: namhyung@kernel.org Link: https://lkml.kernel.org/r/1548678448-24458-2-git-send-email-elena.reshetova@intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-02-04Merge branch 'perf/urgent' into perf/core, to pick up fixesIngo Molnar
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-02-04efi: Use 32-bit alignment for efi_guid_tArd Biesheuvel
The UEFI spec and EDK2 reference implementation both define EFI_GUID as struct { u32 a; u16; b; u16 c; u8 d[8]; }; and so the implied alignment is 32 bits not 8 bits like our guid_t. In some cases (i.e., on 32-bit ARM), this means that firmware services invoked by the kernel may assume that efi_guid_t* arguments are 32-bit aligned, and use memory accessors that do not tolerate misalignment. So let's set the minimum alignment to 32 bits. Note that the UEFI spec as well as some comments in the EDK2 code base suggest that EFI_GUID should be 64-bit aligned, but this appears to be a mistake, given that no code seems to exist that actually enforces that or relies on it. Reported-by: Heinrich Schuchardt <xypron.glpk@gmx.de> Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Reviewed-by: Leif Lindholm <leif.lindholm@linaro.org> Cc: AKASHI Takahiro <takahiro.akashi@linaro.org> Cc: Alexander Graf <agraf@suse.de> Cc: Bjorn Andersson <bjorn.andersson@linaro.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Jeffrey Hugo <jhugo@codeaurora.org> Cc: Lee Jones <lee.jones@linaro.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Matt Fleming <matt@codeblueprint.co.uk> Cc: Peter Jones <pjones@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Sai Praneeth Prakhya <sai.praneeth.prakhya@intel.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-efi@vger.kernel.org Link: http://lkml.kernel.org/r/20190202094119.13230-5-ard.biesheuvel@linaro.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-02-04Merge tag 'iio-for-5.1a' of ↵Greg Kroah-Hartman
git://git.kernel.org/pub/scm/linux/kernel/git/jic23/iio into staging-next Jonathan writes: First set of new device support, features and cleanup for IIO in the 5.1 cycle A number of interesting new devices supported plus a good set of staging cleanup including one graduation and one drop. New device support * ad56886 - Add support for AD5674R/AD5679R with some minor driver changes to support more channels. * ad7768 - New driver and dt bindings for this 24 bit ADC. * max44009 - New driver and dt bindings for this ambient light sensor. * mpu6050 - Support the ICM 20602 IMU. Minor tweaks due to slightly different register map. * NPCM adc - New driver and dt bindings for this BMC ADC. * Sensiron SGP30 - Modifiers for ethanol and H2. - New driver and dt bindings. - Follow patch added self cleaning support. * Sensiron SPS30 - New channel type for mass concentration. - New driver and bindings. - Minor tidy up patch followed (drop fmt specifier as unused) * st_pressure - lps22hh support. ID plus information structures and dt bindings. * ti-ads124s08 - Add binding doc and driver. Staging graduations * ad7606 driver and bindings. Staging drops * ad7152 CDC driver dropped. This part is near EoL and no one is known to be using it. If anyone surfaces obviously we can bring the driver back. If not, good to drop it to avoid wasting anyone's time cleaning it up. New features * bme680 - DT support and bindings doc. * isl29018 - Add regulator for VCC. * mag3110 - Add regulators for supplies. * meson-saradc - Support the temperature sensors of more SoCs. * mma8452 - Add regulators for power suplies and binding docs to reflect them. * st-accel - Support the undocumented but it seems fairly common _ONT ACPI method to specify orientation of the sensor. Cleanup, minor fixes and fixes for staging driver that have been broken a long time * ad5933 - Drop platform data alternative to specifying the reference voltage using a regulator. - Use the clock framework to contorl the reference clock. - Add a DT binding doc to cover the defacto binding. * ad7280a - Split up some big functions to improve readability. * ad7606 - Allow for timeout if interrupt never occurs. - Use devm functions to simplify probe and remove. - Use the find_closest macro to avoid need for precise values from userspace. - Add missing vendor prefixes for various DT properties. Note the driver is in staging still and there are no known devicetrees. - Add explict OF device ID table. - Simplify the Kconfig choices - Change to a threaded IRQ. - SPDX and simple stype fixes. * ad7816 - Drop unnecessary variable init. * ad9523 - Check a return value that was ignored. * ad9833 - Drop platform data. It was just setting most values to the hardware defaults. - Use the clock framework to provide the input clock. * adt7316 (lots of staging cleanup) - Fix some wrong register / bit definitions - Invert the logic of the check for an ldac pin so it actually makes sense. - Read the right register to get internal vref settings - Allow adt751x chips to use the internal vref for all DAC channels rather than a subset. - Remove dac vref bypass control from parts that don't have one. - Make the store DAC update mode function consistent with the show one. - Fix some spellings and other minor tidy up. - Avoid passing irq numbers around by putting all the irq logic in one place. - Fix an issue with the resolution of DAC control. - Fix support of the high resolution DAC mode (for temp proportional output) where supported. - Fix DAC read and write calculations. * st_lsm6dsx - Drop an unused variable (set but not read) * xilinx-xadc - Check an unhandled return value. * tag 'iio-for-5.1a' of git://git.kernel.org/pub/scm/linux/kernel/git/jic23/iio: (67 commits) iio: chemical: sps30: remove printk format specifier staging: iio: frequency: ad9833: Load clock using clock framework staging: iio: frequency: ad9833: Get frequency value statically dt-bindings: iio: light: Add max44009 iio: light: add driver for MAX44009 dt-bindings: iio: adc: Add docs for AD7768-1 iio: adc: Add AD7768-1 ADC basic support staging: iio: cdc: ad7152: remove driver completely iio: imu: mpu6050: Add support for the ICM 20602 IMU dt-bindings: iio: imu: add icm20602 bindings to mpu6050 dt-bindings: iio: pressure: add LPS22HH bindings iio: st_accel: use ACPI orientation data iio: adc: add NPCM ADC driver dt-binding: iio: add NPCM ADC documentation iio: chemical: sps30: allow changing self cleaning period dt-bindings: iio: chemical: Add bindings for bme680 iio: chemical: bme680: Add device-tree support iio:st_pressure:initial lps22hh sensor support iio: accell: mma8452: add vdd/vddio regulator operation support dt-bindings: iio: accel: mma8452: add power supplies property ...
2019-02-04Merge tag 'drm-intel-next-2019-02-02' of ↵Dave Airlie
git://anongit.freedesktop.org/drm/drm-intel into drm-next - Make background color and LUT more robust (Matt) - Icelake display fixes (Ville, Imre) - Workarounds fixes and reorg (Tvrtko, Talha) - Enable fastboot by default on VLV and CHV (Hans) - Add another PCI ID for Coffee Lake (Rodrigo) Signed-off-by: Dave Airlie <airlied@redhat.com> From: Rodrigo Vivi <rodrigo.vivi@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190202082911.GA6615@intel.com
2019-02-04Merge branch 'drm-next-5.1' of git://people.freedesktop.org/~agd5f/linux ↵Dave Airlie
into drm-next ttm: - Replace ref/unref naming with get/put amdgpu: - Revert DC clang fix, causes a segfault with some compiler versions - SR-IOV fix - PCIE fix for vega20 - Misc DC fixes Signed-off-by: Dave Airlie <airlied@redhat.com> From: Alex Deucher <alexdeucher@gmail.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190201062345.7304-1-alexander.deucher@amd.com
2019-02-04Merge tag 'drm-misc-next-2019-02-01' of ↵Dave Airlie
git://anongit.freedesktop.org/drm/drm-misc into drm-next drm-misc-next for 5.1: UAPI Changes: Cross-subsystem Changes: Core Changes: - Split out some part of drm_crtc_helper.h into drm_probe_helper.h - DRIVER_* flags improvements - New tasks on the TODO-list - Improvements to the documentation Driver Changes: - Continual of drmP.h removal in multiple drivers - Removal of FBINFO_(FLAG_)DEFAULT in multiple drivers - sun4i: Addition of the A23 support, multiple fixes for the tiled formats - atmel-hlcdc: Fix of clipping and rotation properties - qxl: various BO-related improvements, prime and generic fbdev emulation support - dw-hdmi: Support for HDMI2.0 2160p modes and YUV420 output - New Sitronix ST7701 panel driver - New Kingdisplay KD097D04 panel driver - New LeMaker BL035-RGB-002 panel driver - New PDA 91-00156-A0 panel driver Signed-off-by: Dave Airlie <airlied@redhat.com> From: Maxime Ripard <maxime.ripard@bootlin.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190201144749.t3abxvguhstu6bcl@flea
2019-02-03netdevice.h: Add __cold to netdev_<level> logging functionsJoe Perches
Add __cold to the netdev_<level> logging functions similar to the use of __cold in the generic printk function. Using __cold moves all the netdev_<level> logging functions out-of-line possibly improving code locality and runtime performance. Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-03audit: remove audit_context when CONFIG_ AUDIT and not AUDITSYSCALLRichard Guy Briggs
Remove audit_context from struct task_struct and struct audit_buffer when CONFIG_AUDIT is enabled but CONFIG_AUDITSYSCALL is not. Also, audit_log_name() (and supporting inode and fcaps functions) should have been put back in auditsc.c when soft and hard link logging was normalized since it is only used by syscall auditing. See github issue https://github.com/linux-audit/audit-kernel/issues/105 Signed-off-by: Richard Guy Briggs <rgb@redhat.com> Signed-off-by: Paul Moore <paul@paul-moore.com>
2019-02-03net: Fix ip_mc_{dec,inc}_group allocation contextFlorian Fainelli
After 4effd28c1245 ("bridge: join all-snoopers multicast address"), I started seeing the following sleep in atomic warnings: [ 26.763893] BUG: sleeping function called from invalid context at mm/slab.h:421 [ 26.771425] in_atomic(): 1, irqs_disabled(): 0, pid: 1658, name: sh [ 26.777855] INFO: lockdep is turned off. [ 26.781916] CPU: 0 PID: 1658 Comm: sh Not tainted 5.0.0-rc4 #20 [ 26.787943] Hardware name: BCM97278SV (DT) [ 26.792118] Call trace: [ 26.794645] dump_backtrace+0x0/0x170 [ 26.798391] show_stack+0x24/0x30 [ 26.801787] dump_stack+0xa4/0xe4 [ 26.805182] ___might_sleep+0x208/0x218 [ 26.809102] __might_sleep+0x78/0x88 [ 26.812762] kmem_cache_alloc_trace+0x64/0x28c [ 26.817301] igmp_group_dropped+0x150/0x230 [ 26.821573] ip_mc_dec_group+0x1b0/0x1f8 [ 26.825585] br_ip4_multicast_leave_snoopers.isra.11+0x174/0x190 [ 26.831704] br_multicast_toggle+0x78/0xcc [ 26.835887] store_bridge_parm+0xc4/0xfc [ 26.839894] multicast_snooping_store+0x3c/0x4c [ 26.844517] dev_attr_store+0x44/0x5c [ 26.848262] sysfs_kf_write+0x50/0x68 [ 26.852006] kernfs_fop_write+0x14c/0x1b4 [ 26.856102] __vfs_write+0x60/0x190 [ 26.859668] vfs_write+0xc8/0x168 [ 26.863059] ksys_write+0x70/0xc8 [ 26.866449] __arm64_sys_write+0x24/0x30 [ 26.870458] el0_svc_common+0xa0/0x11c [ 26.874291] el0_svc_handler+0x38/0x70 [ 26.878120] el0_svc+0x8/0xc while toggling the bridge's multicast_snooping attribute dynamically. Pass a gfp_t down to igmpv3_add_delrec(), introduce __igmp_group_dropped() and introduce __ip_mc_dec_group() to take a gfp_t argument. Similarly introduce ____ip_mc_inc_group() and __ip_mc_inc_group() to allow caller to specify gfp_t. IPv6 part of the patch appears fine. Fixes: 4effd28c1245 ("bridge: join all-snoopers multicast address") Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-03net: devlink: report cell size of shared buffersJakub Kicinski
Shared buffer allocation is usually done in cell increments. Drivers will either round up the allocation or refuse the configuration if it's not an exact multiple of cell size. Drivers know exactly the cell size of shared buffer, so help out users by providing this information in dumps. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com> Reviewed-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-03sock: Add SO_RCVTIMEO_NEW and SO_SNDTIMEO_NEWDeepa Dinamani
Add new socket timeout options that are y2038 safe. Signed-off-by: Deepa Dinamani <deepa.kernel@gmail.com> Acked-by: Willem de Bruijn <willemb@google.com> Cc: ccaulfie@redhat.com Cc: davem@davemloft.net Cc: deller@gmx.de Cc: paulus@samba.org Cc: ralf@linux-mips.org Cc: rth@twiddle.net Cc: cluster-devel@redhat.com Cc: linuxppc-dev@lists.ozlabs.org Cc: linux-alpha@vger.kernel.org Cc: linux-arch@vger.kernel.org Cc: linux-mips@vger.kernel.org Cc: linux-parisc@vger.kernel.org Cc: sparclinux@vger.kernel.org Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-03socket: Rename SO_RCVTIMEO/ SO_SNDTIMEO with _OLD suffixesDeepa Dinamani
SO_RCVTIMEO and SO_SNDTIMEO socket options use struct timeval as the time format. struct timeval is not y2038 safe. The subsequent patches in the series add support for new socket timeout options with _NEW suffix that will use y2038 safe data structures. Although the existing struct timeval layout is sufficiently wide to represent timeouts, because of the way libc will interpret time_t based on user defined flag, these new flags provide a way of having a structure that is the same for all architectures consistently. Rename the existing options with _OLD suffix forms so that the right option is enabled for userspace applications according to the architecture and time_t definition of libc. Signed-off-by: Deepa Dinamani <deepa.kernel@gmail.com> Acked-by: Willem de Bruijn <willemb@google.com> Cc: ccaulfie@redhat.com Cc: deller@gmx.de Cc: paulus@samba.org Cc: ralf@linux-mips.org Cc: rth@twiddle.net Cc: cluster-devel@redhat.com Cc: linuxppc-dev@lists.ozlabs.org Cc: linux-alpha@vger.kernel.org Cc: linux-arch@vger.kernel.org Cc: linux-mips@vger.kernel.org Cc: linux-parisc@vger.kernel.org Cc: sparclinux@vger.kernel.org Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-03socket: Add SO_TIMESTAMPING_NEWDeepa Dinamani
Add SO_TIMESTAMPING_NEW variant of socket timestamp options. This is the y2038 safe versions of the SO_TIMESTAMPING_OLD for all architectures. Signed-off-by: Deepa Dinamani <deepa.kernel@gmail.com> Acked-by: Willem de Bruijn <willemb@google.com> Cc: chris@zankel.net Cc: fenghua.yu@intel.com Cc: rth@twiddle.net Cc: tglx@linutronix.de Cc: ubraun@linux.ibm.com Cc: linux-alpha@vger.kernel.org Cc: linux-arch@vger.kernel.org Cc: linux-ia64@vger.kernel.org Cc: linux-mips@linux-mips.org Cc: linux-s390@vger.kernel.org Cc: linux-xtensa@linux-xtensa.org Cc: sparclinux@vger.kernel.org Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-03socket: Add SO_TIMESTAMP[NS]_NEWDeepa Dinamani
Add SO_TIMESTAMP_NEW and SO_TIMESTAMPNS_NEW variants of socket timestamp options. These are the y2038 safe versions of the SO_TIMESTAMP_OLD and SO_TIMESTAMPNS_OLD for all architectures. Note that the format of scm_timestamping.ts[0] is not changed in this patch. Signed-off-by: Deepa Dinamani <deepa.kernel@gmail.com> Acked-by: Willem de Bruijn <willemb@google.com> Cc: jejb@parisc-linux.org Cc: ralf@linux-mips.org Cc: rth@twiddle.net Cc: linux-alpha@vger.kernel.org Cc: linux-mips@linux-mips.org Cc: linux-parisc@vger.kernel.org Cc: linux-rdma@vger.kernel.org Cc: netdev@vger.kernel.org Cc: sparclinux@vger.kernel.org Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-03socket: Add struct __kernel_sock_timevalDeepa Dinamani
The new type is meant to be used as a y2038 safe structure to be used as part of cmsg data. Presently the SO_TIMESTAMP socket option uses struct timeval for timestamps. This is not y2038 safe. Subsequent patches in the series add new y2038 safe socket option to be used in the place of SO_TIMESTAMP_OLD. struct __kernel_sock_timeval will be used as the timestamp format at that time. struct __kernel_sock_timeval also maintains the same layout across 32 bit and 64 bit ABIs. Signed-off-by: Deepa Dinamani <deepa.kernel@gmail.com> Acked-by: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-03socket: Use old_timeval types for socket timestampsDeepa Dinamani
As part of y2038 solution, all internal uses of struct timeval are replaced by struct __kernel_old_timeval and struct compat_timeval by struct old_timeval32. Make socket timestamps use these new types. This is mainly to be able to verify that the kernel build is y2038 safe when such non y2038 safe types are not supported anymore. Signed-off-by: Deepa Dinamani <deepa.kernel@gmail.com> Acked-by: Willem de Bruijn <willemb@google.com> Cc: isdn@linux-pingi.de Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-03arch: sparc: Override struct __kernel_old_timevalDeepa Dinamani
struct __kernel_old_timeval is supposed to have the same layout as struct timeval. But, it was inadvarently missed that __kernel_suseconds has a different definition for sparc64. Provide an asm-specific override that fixes it. Reported-by: Arnd Bergmann <arnd@arndb.de> Suggested-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Deepa Dinamani <deepa.kernel@gmail.com> Acked-by: Willem de Bruijn <willemb@google.com> Cc: sparclinux@vger.kernel.org Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-03sockopt: Rename SO_TIMESTAMP* to SO_TIMESTAMP*_OLDDeepa Dinamani
SO_TIMESTAMP, SO_TIMESTAMPNS and SO_TIMESTAMPING options, the way they are currently defined, are not y2038 safe. Subsequent patches in the series add new y2038 safe versions of these options which provide 64 bit timestamps on all architectures uniformly. Hence, rename existing options with OLD tag suffixes. Also note that kernel will not use the untagged SO_TIMESTAMP* and SCM_TIMESTAMP* options internally anymore. Signed-off-by: Deepa Dinamani <deepa.kernel@gmail.com> Acked-by: Willem de Bruijn <willemb@google.com> Cc: deller@gmx.de Cc: dhowells@redhat.com Cc: jejb@parisc-linux.org Cc: ralf@linux-mips.org Cc: rth@twiddle.net Cc: linux-afs@lists.infradead.org Cc: linux-alpha@vger.kernel.org Cc: linux-arch@vger.kernel.org Cc: linux-mips@linux-mips.org Cc: linux-parisc@vger.kernel.org Cc: linux-rdma@vger.kernel.org Cc: sparclinux@vger.kernel.org Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-03Merge branch 'x86-urgent-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fixes from Thomas Gleixner: "A few updates for x86: - Fix an unintended sign extension issue in the fault handling code - Rename the new resource control config switch so it's less confusing - Avoid setting up EFI info in kexec when the EFI runtime is disabled. - Fix the microcode version check in the AMD microcode loader so it only loads higher version numbers and never downgrades - Set EFER.LME in the 32bit trampoline before returning to long mode to handle older AMD/KVM behaviour properly. - Add Darren and Andy as x86/platform reviewers" * 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/resctrl: Avoid confusion over the new X86_RESCTRL config x86/kexec: Don't setup EFI info if EFI runtime is not enabled x86/microcode/amd: Don't falsely trick the late loading mechanism MAINTAINERS: Add Andy and Darren as arch/x86/platform/ reviewers x86/fault: Fix sign-extend unintended sign extension x86/boot/compressed/64: Set EFER.LME=1 in 32-bit trampoline before returning to long mode x86/cpu: Add Atom Tremont (Jacobsville)