Age | Commit message (Collapse) | Author |
|
testusb' application which uses 'usbtest' driver reports 'unknown speed'
from the function 'find_testdev'. The variable 'entry->speed' was not
updated from the application. The IOCTL mentioned in the FIXME comment can
only report whether the connection is low speed or not. Speed is read using
the IOCTL USBDEVFS_GET_SPEED which reports the proper speed grade. The
call is implemented in the function 'handle_testdev' where the file
descriptor was availble locally. Sample output is given below where 'high
speed' is printed as the connected speed.
sudo ./testusb -a
high speed /dev/bus/usb/001/011 0
/dev/bus/usb/001/011 test 0, 0.000015 secs
/dev/bus/usb/001/011 test 1, 0.194208 secs
/dev/bus/usb/001/011 test 2, 0.077289 secs
/dev/bus/usb/001/011 test 3, 0.170604 secs
/dev/bus/usb/001/011 test 4, 0.108335 secs
/dev/bus/usb/001/011 test 5, 2.788076 secs
/dev/bus/usb/001/011 test 6, 2.594610 secs
/dev/bus/usb/001/011 test 7, 2.905459 secs
/dev/bus/usb/001/011 test 8, 2.795193 secs
/dev/bus/usb/001/011 test 9, 8.372651 secs
/dev/bus/usb/001/011 test 10, 6.919731 secs
/dev/bus/usb/001/011 test 11, 16.372687 secs
/dev/bus/usb/001/011 test 12, 16.375233 secs
/dev/bus/usb/001/011 test 13, 2.977457 secs
/dev/bus/usb/001/011 test 14 --> 22 (Invalid argument)
/dev/bus/usb/001/011 test 17, 0.148826 secs
/dev/bus/usb/001/011 test 18, 0.068718 secs
/dev/bus/usb/001/011 test 19, 0.125992 secs
/dev/bus/usb/001/011 test 20, 0.127477 secs
/dev/bus/usb/001/011 test 21 --> 22 (Invalid argument)
/dev/bus/usb/001/011 test 24, 4.133763 secs
/dev/bus/usb/001/011 test 27, 2.140066 secs
/dev/bus/usb/001/011 test 28, 2.120713 secs
/dev/bus/usb/001/011 test 29, 0.507762 secs
Signed-off-by: Faizel K B <faizel.kb@dicortech.com>
Link: https://lore.kernel.org/r/20210902114444.15106-1-faizel.kb@dicortech.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
There are two cases for machine check recovery:
1) The machine check was triggered by ring3 (application) code.
This is the simpler case. The machine check handler simply queues
work to be executed on return to user. That code unmaps the page
from all users and arranges to send a SIGBUS to the task that
triggered the poison.
2) The machine check was triggered in kernel code that is covered by
an exception table entry. In this case the machine check handler
still queues a work entry to unmap the page, etc. but this will
not be called right away because the #MC handler returns to the
fix up code address in the exception table entry.
Problems occur if the kernel triggers another machine check before the
return to user processes the first queued work item.
Specifically, the work is queued using the ->mce_kill_me callback
structure in the task struct for the current thread. Attempting to queue
a second work item using this same callback results in a loop in the
linked list of work functions to call. So when the kernel does return to
user, it enters an infinite loop processing the same entry for ever.
There are some legitimate scenarios where the kernel may take a second
machine check before returning to the user.
1) Some code (e.g. futex) first tries a get_user() with page faults
disabled. If this fails, the code retries with page faults enabled
expecting that this will resolve the page fault.
2) Copy from user code retries a copy in byte-at-time mode to check
whether any additional bytes can be copied.
On the other side of the fence are some bad drivers that do not check
the return value from individual get_user() calls and may access
multiple user addresses without noticing that some/all calls have
failed.
Fix by adding a counter (current->mce_count) to keep track of repeated
machine checks before task_work() is called. First machine check saves
the address information and calls task_work_add(). Subsequent machine
checks before that task_work call back is executed check that the address
is in the same page as the first machine check (since the callback will
offline exactly one page).
Expected worst case is four machine checks before moving on (e.g. one
user access with page faults disabled, then a repeat to the same address
with page faults enabled ... repeat in copy tail bytes). Just in case
there is some code that loops forever enforce a limit of 10.
[ bp: Massage commit message, drop noinstr, fix typo, extend panic
messages. ]
Fixes: 5567d11c21a1 ("x86/mce: Send #MC singal from task work")
Signed-off-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: <stable@vger.kernel.org>
Link: https://lkml.kernel.org/r/YT/IJ9ziLqmtqEPu@agluck-desk2.amr.corp.intel.com
|
|
As previously noted in commit 66e4f4a9cc38 ("rtc: cmos: Use
spin_lock_irqsave() in cmos_interrupt()"):
<4>[ 254.192378] WARNING: inconsistent lock state
<4>[ 254.192384] 5.12.0-rc1-CI-CI_DRM_9834+ #1 Not tainted
<4>[ 254.192396] --------------------------------
<4>[ 254.192400] inconsistent {IN-HARDIRQ-W} -> {HARDIRQ-ON-W} usage.
<4>[ 254.192409] rtcwake/5309 [HC0[0]:SC0[0]:HE1:SE1] takes:
<4>[ 254.192429] ffffffff8263c5f8 (rtc_lock){?...}-{2:2}, at: cmos_interrupt+0x18/0x100
<4>[ 254.192481] {IN-HARDIRQ-W} state was registered at:
<4>[ 254.192488] lock_acquire+0xd1/0x3d0
<4>[ 254.192504] _raw_spin_lock+0x2a/0x40
<4>[ 254.192519] cmos_interrupt+0x18/0x100
<4>[ 254.192536] rtc_handler+0x1f/0xc0
<4>[ 254.192553] acpi_ev_fixed_event_detect+0x109/0x13c
<4>[ 254.192574] acpi_ev_sci_xrupt_handler+0xb/0x28
<4>[ 254.192596] acpi_irq+0x13/0x30
<4>[ 254.192620] __handle_irq_event_percpu+0x43/0x2c0
<4>[ 254.192641] handle_irq_event_percpu+0x2b/0x70
<4>[ 254.192661] handle_irq_event+0x2f/0x50
<4>[ 254.192680] handle_fasteoi_irq+0x9e/0x150
<4>[ 254.192693] __common_interrupt+0x76/0x140
<4>[ 254.192715] common_interrupt+0x96/0xc0
<4>[ 254.192732] asm_common_interrupt+0x1e/0x40
<4>[ 254.192750] _raw_spin_unlock_irqrestore+0x38/0x60
<4>[ 254.192767] resume_irqs+0xba/0xf0
<4>[ 254.192786] dpm_resume_noirq+0x245/0x3d0
<4>[ 254.192811] suspend_devices_and_enter+0x230/0xaa0
<4>[ 254.192835] pm_suspend.cold.8+0x301/0x34a
<4>[ 254.192859] state_store+0x7b/0xe0
<4>[ 254.192879] kernfs_fop_write_iter+0x11d/0x1c0
<4>[ 254.192899] new_sync_write+0x11d/0x1b0
<4>[ 254.192916] vfs_write+0x265/0x390
<4>[ 254.192933] ksys_write+0x5a/0xd0
<4>[ 254.192949] do_syscall_64+0x33/0x80
<4>[ 254.192965] entry_SYSCALL_64_after_hwframe+0x44/0xae
<4>[ 254.192986] irq event stamp: 43775
<4>[ 254.192994] hardirqs last enabled at (43775): [<ffffffff81c00c42>] asm_sysvec_apic_timer_interrupt+0x12/0x20
<4>[ 254.193023] hardirqs last disabled at (43774): [<ffffffff81aa691a>] sysvec_apic_timer_interrupt+0xa/0xb0
<4>[ 254.193049] softirqs last enabled at (42548): [<ffffffff81e00342>] __do_softirq+0x342/0x48e
<4>[ 254.193074] softirqs last disabled at (42543): [<ffffffff810b45fd>] irq_exit_rcu+0xad/0xd0
<4>[ 254.193101]
other info that might help us debug this:
<4>[ 254.193107] Possible unsafe locking scenario:
<4>[ 254.193112] CPU0
<4>[ 254.193117] ----
<4>[ 254.193121] lock(rtc_lock);
<4>[ 254.193137] <Interrupt>
<4>[ 254.193142] lock(rtc_lock);
<4>[ 254.193156]
*** DEADLOCK ***
<4>[ 254.193161] 6 locks held by rtcwake/5309:
<4>[ 254.193174] #0: ffff888104861430 (sb_writers#5){.+.+}-{0:0}, at: ksys_write+0x5a/0xd0
<4>[ 254.193232] #1: ffff88810f823288 (&of->mutex){+.+.}-{3:3}, at: kernfs_fop_write_iter+0xe7/0x1c0
<4>[ 254.193282] #2: ffff888100cef3c0 (kn->active#285
<7>[ 254.192706] i915 0000:00:02.0: [drm:intel_modeset_setup_hw_state [i915]] [CRTC:51:pipe A] hw state readout: disabled
<4>[ 254.193307] ){.+.+}-{0:0}, at: kernfs_fop_write_iter+0xf0/0x1c0
<4>[ 254.193333] #3: ffffffff82649fa8 (system_transition_mutex){+.+.}-{3:3}, at: pm_suspend.cold.8+0xce/0x34a
<4>[ 254.193387] #4: ffffffff827a2108 (acpi_scan_lock){+.+.}-{3:3}, at: acpi_suspend_begin+0x47/0x70
<4>[ 254.193433] #5: ffff8881019ea178 (&dev->mutex){....}-{3:3}, at: device_resume+0x68/0x1e0
<4>[ 254.193485]
stack backtrace:
<4>[ 254.193492] CPU: 1 PID: 5309 Comm: rtcwake Not tainted 5.12.0-rc1-CI-CI_DRM_9834+ #1
<4>[ 254.193514] Hardware name: Google Soraka/Soraka, BIOS MrChromebox-4.10 08/25/2019
<4>[ 254.193524] Call Trace:
<4>[ 254.193536] dump_stack+0x7f/0xad
<4>[ 254.193567] mark_lock.part.47+0x8ca/0xce0
<4>[ 254.193604] __lock_acquire+0x39b/0x2590
<4>[ 254.193626] ? asm_sysvec_apic_timer_interrupt+0x12/0x20
<4>[ 254.193660] lock_acquire+0xd1/0x3d0
<4>[ 254.193677] ? cmos_interrupt+0x18/0x100
<4>[ 254.193716] _raw_spin_lock+0x2a/0x40
<4>[ 254.193735] ? cmos_interrupt+0x18/0x100
<4>[ 254.193758] cmos_interrupt+0x18/0x100
<4>[ 254.193785] cmos_resume+0x2ac/0x2d0
<4>[ 254.193813] ? acpi_pm_set_device_wakeup+0x1f/0x110
<4>[ 254.193842] ? pnp_bus_suspend+0x10/0x10
<4>[ 254.193864] pnp_bus_resume+0x5e/0x90
<4>[ 254.193885] dpm_run_callback+0x5f/0x240
<4>[ 254.193914] device_resume+0xb2/0x1e0
<4>[ 254.193942] ? pm_dev_err+0x25/0x25
<4>[ 254.193974] dpm_resume+0xea/0x3f0
<4>[ 254.194005] dpm_resume_end+0x8/0x10
<4>[ 254.194030] suspend_devices_and_enter+0x29b/0xaa0
<4>[ 254.194066] pm_suspend.cold.8+0x301/0x34a
<4>[ 254.194094] state_store+0x7b/0xe0
<4>[ 254.194124] kernfs_fop_write_iter+0x11d/0x1c0
<4>[ 254.194151] new_sync_write+0x11d/0x1b0
<4>[ 254.194183] vfs_write+0x265/0x390
<4>[ 254.194207] ksys_write+0x5a/0xd0
<4>[ 254.194232] do_syscall_64+0x33/0x80
<4>[ 254.194251] entry_SYSCALL_64_after_hwframe+0x44/0xae
<4>[ 254.194274] RIP: 0033:0x7f07d79691e7
<4>[ 254.194293] Code: 64 89 02 48 c7 c0 ff ff ff ff eb bb 0f 1f 80 00 00 00 00 f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 51 c3 48 83 ec 28 48 89 54 24 18 48 89 74 24
<4>[ 254.194312] RSP: 002b:00007ffd9cc2c768 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
<4>[ 254.194337] RAX: ffffffffffffffda RBX: 0000000000000004 RCX: 00007f07d79691e7
<4>[ 254.194352] RDX: 0000000000000004 RSI: 0000556ebfc63590 RDI: 000000000000000b
<4>[ 254.194366] RBP: 0000556ebfc63590 R08: 0000000000000000 R09: 0000000000000004
<4>[ 254.194379] R10: 0000556ebf0ec2a6 R11: 0000000000000246 R12: 0000000000000004
which breaks S3-resume on fi-kbl-soraka presumably as that's slow enough
to trigger the alarm during the suspend.
Fixes: 6950d046eb6e ("rtc: cmos: Replace spin_lock_irqsave with spin_lock in hard IRQ")
References: 66e4f4a9cc38 ("rtc: cmos: Use spin_lock_irqsave() in cmos_interrupt()"):
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Xiaofei Tan <tanxiaofei@huawei.com>
Cc: Alexandre Belloni <alexandre.belloni@bootlin.com>
Cc: Alessandro Zummo <a.zummo@towertech.it>
Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Signed-off-by: Alexandre Belloni <alexandre.belloni@bootlin.com>
Link: https://lore.kernel.org/r/20210305122140.28774-1-chris@chris-wilson.co.uk
|
|
Driver's tx_empty callback should signal when the transmit shift register
is empty. So when the last character has been sent.
STAT_TX_FIFO_EMP bit signals only that HW transmit FIFO is empty, which
happens when the last byte is loaded into transmit shift register.
STAT_TX_EMP bit signals when the both HW transmit FIFO and transmit shift
register are empty.
So replace STAT_TX_FIFO_EMP check by STAT_TX_EMP in mvebu_uart_tx_empty()
callback function.
Fixes: 30530791a7a0 ("serial: mvebu-uart: initial support for Armada-3700 serial port")
Cc: stable <stable@vger.kernel.org>
Signed-off-by: Pali Rohár <pali@kernel.org>
Link: https://lore.kernel.org/r/20210911132017.25505-1-pali@kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
Commit b67e830d38fa ("serial: 8250: 8250_omap: Fix possible interrupt
storm on K3 SoCs") introduced fixup including a register read to
RX_LVL, however, we should be using word offset than byte offset
since our registers are on 4 byte boundary (port.regshift = 2) for
8250_omap.
Fixes: b67e830d38fa ("serial: 8250: 8250_omap: Fix possible interrupt storm on K3 SoCs")
Cc: stable <stable@vger.kernel.org>
Cc: Jan Kiszka <jan.kiszka@siemens.com>
Cc: Vignesh Raghavendra <vigneshr@ti.com>
Signed-off-by: Nishanth Menon <nm@ti.com>
Link: https://lore.kernel.org/r/20210903050550.29050-1-nm@ti.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
Seeing these errors when GT is likely in suspend state-
"RPM wakelock ref not held during HW access"
Ensure GT is awake before trying to access HW registers. Avoid
reading the register if that is not the case.
Signed-off-by: Vinay Belgaumkar <vinay.belgaumkar@intel.com>
Fixes: 41e5c17ebfc2 ("drm/i915/guc/slpc: Sysfs hooks for SLPC")
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20210907232704.12982-1-vinay.belgaumkar@intel.com
(cherry picked from commit f25e3908b9cd4a3fe819e9bdcdde58f20bacb34c)
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
|
|
gem context refcounting is another exercise in least locking design it
seems, where most things get destroyed upon context closure (which can
race with anything really). Only the actual memory allocation and the
locks survive while holding a reference.
This tripped up Jason when reimplementing the single timeline feature
in
commit 00dae4d3d35d4f526929633b76e00b0ab4d3970d
Author: Jason Ekstrand <jason@jlekstrand.net>
Date: Thu Jul 8 10:48:12 2021 -0500
drm/i915: Implement SINGLE_TIMELINE with a syncobj (v4)
We could fix the bug by holding ctx->mutex in execbuf and clear the
pointer (again while holding the mutex) context_close, but it's
cleaner to just make the context object actually invariant over its
_entire_ lifetime. This way any other ioctl that's potentially racing,
but holding a full reference, can still rely on ctx->syncobj being
an immutable pointer. Which without this change, is not the case.
Reviewed-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
Fixes: 00dae4d3d35d ("drm/i915: Implement SINGLE_TIMELINE with a syncobj (v4)")
Cc: Jason Ekstrand <jason@jlekstrand.net>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Matthew Brost <matthew.brost@intel.com>
Cc: Matthew Auld <matthew.auld@intel.com>
Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Cc: "Thomas Hellström" <thomas.hellstrom@intel.com>
Cc: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Cc: Dave Airlie <airlied@redhat.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20210902142057.929669-2-daniel.vetter@ffwll.ch
(cherry picked from commit c238980efd3b35af70fc926066cf7440f50a97a9)
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
|
|
Using the I915_MMAP_TYPE_FIXED mmap type requires the TTM backend, so
for that mmap type, use __i915_gem_object_create_user() instead of
i915_gem_object_create_internal(), as we really want to tests objects
mmap-able by user-space.
This also means that the out-of-space error happens at object creation
and returns -ENXIO rather than -ENOSPC, so fix the code up to expect
that on out-of-offset-space errors.
Finally only use I915_MMAP_TYPE_FIXED for LMEM and SMEM for now if
testing on LMEM-capable devices. For stolen LMEM, we still take the
same path as for integrated, as that haven't been moved over to TTM yet,
and user-space should not be able to create out of stolen LMEM anyway.
v2:
- Check the presence of the obj->ops->mmap_offset callback rather than
hardcoding the supported mmap regions in can_mmap() (Maarten Lankhorst)
Fixes: 7961c5b60f23 ("drm/i915: Add TTM offset argument to mmap.")
Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Reviewed-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20210831122931.157536-1-thomas.hellstrom@linux.intel.com
(cherry picked from commit 450cede7f3804ca7f8b3da210ebefa61c0958f22)
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
|
|
The function is only used from within GEM_BUG_ON(), which is causing
warnings with Wunneeded-internal-declaration in some builds. Since the
function is a simple wrapper around a CT function, we can just call the
CT function directly instead.
Fixes: 1fb12c587152 ("drm/i915/guc: skip disabling CTBs before sanitizing the GuC")
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Cc: Matthew Brost <matthew.brost@intel.com>
Cc: John Harrison <John.C.Harrison@Intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20210823163137.19770-1-daniele.ceraolospurio@intel.com
(cherry picked from commit 5db1856781e45c9610f7652a19cc656b984235e7)
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
|
|
Users reported that after commit 2bbd6dba84d4 ("drm/i915: Try to use
fast+narrow link on eDP again and fall back to the old max strategy on
failure"), the screen starts to have wobbly effect.
Commit a5c936add6a2 ("drm/i915/dp: Use slow and wide link training for
everything") doesn't help either, that means the affected eDP 1.2 panels
only work with max params.
So use max params for panels < eDP 1.4 as Windows does to solve the
issue.
v3:
- Do the eDP rev check in intel_edp_init_dpcd()
v2:
- Check eDP 1.4 instead of DPCD 1.1 to apply max params
Cc: stable@vger.kernel.org
Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/3714
Fixes: 2bbd6dba84d4 ("drm/i915: Try to use fast+narrow link on eDP again and fall back to the old max strategy on failure")
Fixes: a5c936add6a2 ("drm/i915/dp: Use slow and wide link training for everything")
Suggested-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Signed-off-by: Kai-Heng Feng <kai.heng.feng@canonical.com>
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20210820075301.693099-1-kai.heng.feng@canonical.com
(cherry picked from commit d7f213c131adf0bec8b731553eb82990cdac265d)
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
|
|
After DPRX link training, intel_dp_link_train_phy() did not
return the training result properly. If link training failed,
i915 driver would not run into link train fallback function.
And no hotplug uevent would be received by user space application.
Fixes: b30edfd8d0b4 ("drm/i915: Switch to LTTPR non-transparent mode link training")
Cc: Ville Syrjala <ville.syrjala@linux.intel.com>
Cc: Imre Deak <imre.deak@intel.com>
Cc: Jani Nikula <jani.nikula@linux.intel.com>
Cc: Cooper Chiou <cooper.chiou@intel.com>
Cc: William Tseng <william.tseng@intel.com>
Signed-off-by: Lee Shawn C <shawn.c.lee@intel.com>
Reviewed-by: Imre Deak <imre.deak@intel.com>
Signed-off-by: Imre Deak <imre.deak@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20210706152541.25021-1-shawn.c.lee@intel.com
(cherry picked from commit dab1b47e57e053b2a02c22ead8e7449f79961335)
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
|
|
If there is no legacy RTC device, don't try to use it for storing trace
data across suspend/resume.
Cc: <stable@vger.kernel.org>
Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Rafael J. Wysocki <rafael@kernel.org>
Link: https://lore.kernel.org/r/20210903084937.19392-2-jgross@suse.com
Signed-off-by: Juergen Gross <jgross@suse.com>
|
|
Today the Xen ballooning is done via delayed work in a workqueue. This
might result in workqueue hangups being reported in case of large
amounts of memory are being ballooned in one go (here 16GB):
BUG: workqueue lockup - pool cpus=6 node=0 flags=0x0 nice=0 stuck for 64s!
Showing busy workqueues and worker pools:
workqueue events: flags=0x0
pwq 12: cpus=6 node=0 flags=0x0 nice=0 active=2/256 refcnt=3
in-flight: 229:balloon_process
pending: cache_reap
workqueue events_freezable_power_: flags=0x84
pwq 12: cpus=6 node=0 flags=0x0 nice=0 active=1/256 refcnt=2
pending: disk_events_workfn
workqueue mm_percpu_wq: flags=0x8
pwq 12: cpus=6 node=0 flags=0x0 nice=0 active=1/256 refcnt=2
pending: vmstat_update
pool 12: cpus=6 node=0 flags=0x0 nice=0 hung=64s workers=3 idle: 2222 43
This can easily be avoided by using a dedicated kernel thread for doing
the ballooning work.
Reported-by: Jan Beulich <jbeulich@suse.com>
Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Link: https://lore.kernel.org/r/20210827123206.15429-1-jgross@suse.com
Signed-off-by: Juergen Gross <jgross@suse.com>
|
|
User space can hold a tty open indefinitely and tty drivers must not
release the underlying structures until the last user is gone.
Switch to using the tty-port reference counter to manage the life time
of the greybus tty state to avoid use after free after a disconnect.
Fixes: a18e15175708 ("greybus: more uart work")
Cc: stable@vger.kernel.org # 4.9
Reviewed-by: Alex Elder <elder@linaro.org>
Signed-off-by: Johan Hovold <johan@kernel.org>
Link: https://lore.kernel.org/r/20210906124538.22358-1-johan@kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
This fixes warnings with -Wimplicit-function-declaration, e.g.
drivers/hwtracing/coresight/coresight-syscfg.c:455:15: error:
implicit declaration of function 'kzalloc' [-Werror,
-Wimplicit-function-declaration]
csdev_item = kzalloc(sizeof(struct cscfg_registered_csdev),
GFP_KERNEL);
Link: https://lore.kernel.org/r/20210830172820.2840433-1-jiancai@google.com
Fixes: 85e2414c518a ("coresight: syscfg: Initial coresight system configuration")
Reviewed-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Jian Cai <jiancai@google.com>
Signed-off-by: Mathieu Poirier <mathieu.poirier@linaro.org>
Link: https://lore.kernel.org/r/20210913164613.1675791-2-mathieu.poirier@linaro.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
When I added nvmem_cell_read_variable_le_u32() and
nvmem_cell_read_variable_le_u64() I forgot to add the "static inline"
stub functions for when CONFIG_NVMEM wasn't defined. Add them
now. This was causing problems with randconfig builds that compiled
`drivers/soc/qcom/cpr.c`.
Fixes: 6feba6a62c57 ("PM: AVS: qcom-cpr: Use nvmem_cell_read_variable_le_u32()")
Fixes: a28e824fb827 ("nvmem: core: Add functions to make number reading easy")
Reported-by: kernel test robot <lkp@intel.com>
Reviewed-by: Bjorn Andersson <bjorn.andersson@linaro.org>
Signed-off-by: Douglas Anderson <dianders@chromium.org>
Signed-off-by: Srinivas Kandagatla <srinivas.kandagatla@linaro.org>
Link: https://lore.kernel.org/r/20210913160551.12907-1-srinivas.kandagatla@linaro.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
During BC_FREE_BUFFER processing, the BINDER_TYPE_FDA object
cleanup may close 1 or more fds. The close operations are
completed using the task work mechanism -- which means the thread
needs to return to userspace or the file object may never be
dereferenced -- which can lead to hung processes.
Force the binder thread back to userspace if an fd is closed during
BC_FREE_BUFFER handling.
Fixes: 80cd795630d6 ("binder: fix use-after-free due to ksys_close() during fdget()")
Cc: stable <stable@vger.kernel.org>
Reviewed-by: Martijn Coenen <maco@android.com>
Acked-by: Christian Brauner <christian.brauner@ubuntu.com>
Signed-off-by: Todd Kjos <tkjos@google.com>
Link: https://lore.kernel.org/r/20210830195146.587206-1-tkjos@google.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
Currently cgroup freezer is used to freeze the application threads, and
BINDER_FREEZE is used to freeze the corresponding binder interface.
There's already a mechanism in ioctl(BINDER_FREEZE) to wait for any
existing transactions to drain out before actually freezing the binder
interface.
But freezing an app requires 2 steps, freezing the binder interface with
ioctl(BINDER_FREEZE) and then freezing the application main threads with
cgroupfs. This is not an atomic operation. The following race issue
might happen.
1) Binder interface is frozen by ioctl(BINDER_FREEZE);
2) Main thread A initiates a new sync binder transaction to process B;
3) Main thread A is frozen by "echo 1 > cgroup.freeze";
4) The response from process B reaches the frozen thread, which will
unexpectedly fail.
This patch provides a mechanism to check if there's any new pending
transaction happening between ioctl(BINDER_FREEZE) and freezing the
main thread. If there's any, the main thread freezing operation can
be rolled back to finish the pending transaction.
Furthermore, the response might reach the binder driver before the
rollback actually happens. That will still cause failed transaction.
As the other process doesn't wait for another response of the response,
the response transaction failure can be fixed by treating the response
transaction like an oneway/async one, allowing it to reach the frozen
thread. And it will be consumed when the thread gets unfrozen later.
NOTE: This patch reuses the existing definition of struct
binder_frozen_status_info but expands the bit assignments of __u32
member sync_recv.
To ensure backward compatibility, bit 0 of sync_recv still indicates
there's an outstanding sync binder transaction. This patch adds new
information to bit 1 of sync_recv, indicating the binder transaction
happens exactly when there's a race.
If an existing userspace app runs on a new kernel, a sync binder call
will set bit 0 of sync_recv so ioctl(BINDER_GET_FROZEN_INFO) still
return the expected value (true). The app just doesn't check bit 1
intentionally so it doesn't have the ability to tell if there's a race.
This behavior is aligned with what happens on an old kernel which
doesn't set bit 1 at all.
A new userspace app can 1) check bit 0 to know if there's a sync binder
transaction happened when being frozen - same as before; and 2) check
bit 1 to know if that sync binder transaction happened exactly when
there's a race - a new information for rollback decision.
the same time, confirmed the pending transactions succeeded.
Fixes: 432ff1e91694 ("binder: BINDER_FREEZE ioctl")
Acked-by: Todd Kjos <tkjos@google.com>
Cc: stable <stable@vger.kernel.org>
Signed-off-by: Li Li <dualli@google.com>
Test: stress test with apps being frozen and initiating binder calls at
Link: https://lore.kernel.org/r/20210910164210.2282716-2-dualli@chromium.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
device_initialize() is used to take a refcount on the device. However,
put_device() is not called during device teardown. This leads to a
leak of private data of the driver core, dev_name(), etc. This is
reported by kmemleak at boot time if we compile kernel with
DEBUG_TEST_DRIVER_REMOVE.
Fix memory leaks during unregistration and implement a release
function.
Link: https://lore.kernel.org/r/20210911105306.1511-1-yuzenghui@huawei.com
Fixes: ead09dd3aed5 ("scsi: bsg: Simplify device registration")
Reviewed-by: Johan Hovold <johan@kernel.org>
Signed-off-by: Zenghui Yu <yuzenghui@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
sd_spinup_disk() is a little bit noisy after commit 848ade90ba9c ("scsi:
sd: Do not exit sd_spinup_disk() quietly"):
scsi 0:0:0:0: Direct-Access Multiple Card Reader 1.00 PQ: 0 ANSI: 0
sd 0:0:0:0: Attached scsi generic sg0 type 0
sd 0:0:0:0: [sda] Media removed, stopped polling
sd 0:0:0:0: [sda] Media removed, stopped polling
sd 0:0:0:0: [sda] Attached SCSI removable disk
sd 0:0:0:0: [sda] Media removed, stopped polling
There's not really a benefit in printing the same message multiple
times. Therefore print it only if media_present was previously set.
Link: https://lore.kernel.org/r/a2d0a249-6035-9697-626a-e14ec50ef6ee@gmail.com
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
Intel LKF can experience link errors. Make fixes to increase link
stability, especially when switching to high speed modes.
Link: https://lore.kernel.org/r/20210831145317.26306-1-adrian.hunter@intel.com
Fixes: b2c57925df1f ("scsi: ufs: ufs-pci: Add support for Intel LKF")
Cc: stable@vger.kernel.org
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
There are a couple of statements where the indentation is not correct,
clean these up. Remove a redundant break statement.
Link: https://lore.kernel.org/r/20210902224215.57286-1-colin.king@canonical.com
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
There are a few statements where the indentation is not correct, clean
these up.
Link: https://lore.kernel.org/r/20210902223643.56979-1-colin.king@canonical.com
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
There is a spelling mistake in a literal string. Fix it.
Link: https://lore.kernel.org/r/20210826115714.11844-1-colin.king@canonical.com
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
There's little point in keeping this one separately maintained these days,
so just remove the entry and it'll fall under the SCSI subsystem where it
belongs.
Link: https://lore.kernel.org/r/c5e12bd1-10de-634c-d6b3-dac79ed01af5@kernel.dk
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
WARNING !A || A && B is equivalent to !A || B
This issue was detected with the help of Coccinelle.
Link: https://lore.kernel.org/r/20210820030805.12383-1-jing.yangyang@zte.com.cn
Reported-by: Zeal Robot <zealci@zte.com.cn>
Acked-by: Sumit Saxena <sumit.saxena@broadcom.com>
Signed-off-by: jing yangyang <jing.yangyang@zte.com.cn>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
Drop retrieve_from_waiting_list() to avoid this warning:
drivers/scsi/ncr53c8xx.c:8000:26: warning: ‘retrieve_from_waiting_list’
defined but not used [-Wunused-function]
Link: https://lore.kernel.org/r/YTfS/LH5vCN6afDW@ls3530
Fixes: 1c22e327545c ("scsi: ncr53c8xx: Remove unused code")
Reviewed-by: Guenter Roeck <linux@roeck-us.net>
Tested-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Helge Deller <deller@gmx.de>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
Smatch checker reported the following error:
drivers/base/power/sysfs.c:833 dpm_sysfs_remove()
warn: sleeping in atomic context
With a calling sequence of:
efct_lio_npiv_drop_nport() <- disables preempt
-> fc_vport_terminate()
-> device_del()
-> dpm_sysfs_remove()
Issue is efct_lio_npiv_drop_nport() is making the fc_vport_terminate() call
while holding a lock w/ ipl raised.
It is unnecessary to hold the lock over this call, shift where the lock is
taken.
Link: https://lore.kernel.org/r/20210907165225.10821-1-jsmart2021@gmail.com
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Co-developed-by: Ram Vegesna <ram.vegesna@broadcom.com>
Signed-off-by: Ram Vegesna <ram.vegesna@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
Commit 356ba2a8bc8d ("scsi: target: tcmu: Make pgr_support and alua_support
attributes writable") introduced support for changeable alua_support and
pgr_support target attributes. These can only be changed if the backstore
is user-backed, otherwise the kernel returns -EINVAL.
This triggers a warning in the targetcli/rtslib code when performing a
target restore that includes non-userbacked backstores:
# targetctl restore
Storage Object block/storage1: Cannot set attribute alua_support:
[Errno 22] Invalid argument, skipped
Storage Object block/storage1: Cannot set attribute pgr_support:
[Errno 22] Invalid argument, skipped
Fix this warning by returning an error code only if we are really going to
flip the PGR/ALUA bit in the transport_flags field, otherwise we will do
nothing and return success.
Return ENOSYS instead of EINVAL if the pgr/alua attributes can not be
changed, this way it will be possible for userspace to understand if the
operation failed because an invalid value has been passed to strtobool() or
because the attributes are fixed.
Fixes: 356ba2a8bc8d ("scsi: target: tcmu: Make pgr_support and alua_support attributes writable")
Link: https://lore.kernel.org/r/20210906151809.52811-1-mlombard@redhat.com
Reviewed-by: Bodo Stroesser <bostroesser@gmail.com>
Signed-off-by: Maurizio Lombardi <mlombard@redhat.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
Reporting zones on a SCSI device sometimes fail with the following error:
[76248.516390] ata16.00: invalid transfer count 131328
[76248.523618] sd 15:0:0:0: [sda] REPORT ZONES start lba 536870912 failed
The error (from drivers/ata/libata-scsi.c:ata_scsi_zbc_in_xlat()) indicates
that buffer size is not aligned to SECTOR_SIZE.
This happens when the __vmalloc() failed. Consider we are reporting 4096
zones, then we will have "bufsize = roundup((4096 + 1) * 64,
SECTOR_SIZE)" = (513 * 512) = 262656. Then, __vmalloc() failure halves
the bufsize to 131328, which is no longer aligned to SECTOR_SIZE.
Use rounddown() to ensure the size is always aligned to SECTOR_SIZE and fix
the comment as well.
Link: https://lore.kernel.org/r/20210906140642.2267569-1-naohiro.aota@wdc.com
Fixes: 23a50861adda ("scsi: sd_zbc: Cleanup sd_zbc_alloc_report_buffer()")
Cc: stable@vger.kernel.org # 5.5+
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Reviewed-by: Damien Le Moal <damien.lemoal@wdc.com>
Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com>
Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
After a device is initialized via device_initialize() it should be freed
via put_device(). sd_probe() currently gets this wrong, fix it up.
Link: https://lore.kernel.org/r/20210906090112.531442-1-ming.lei@redhat.com
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
Call cpu_relax() while waiting for the current blk-mq polling instance to
complete.
Link: https://lore.kernel.org/r/20210901152542.27866-1-sreekanth.reddy@broadcom.com
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Sreekanth Reddy <sreekanth.reddy@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
ISCSI_NET_PARAM_IFACE_ENABLE belongs to enum iscsi_net_param instead of
iscsi_iface_param so move it to ISCSI_NET_PARAM. Otherwise, when we call
into the driver, we might not match and return that we don't want attr
visible in sysfs. Found in code review.
Link: https://lore.kernel.org/r/20210901085336.2264295-1-libaokun1@huawei.com
Fixes: e746f3451ec7 ("scsi: iscsi: Fix iface sysfs attr detection")
Reviewed-by: Lee Duncan <lduncan@suse.com>
Signed-off-by: Baokun Li <libaokun1@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
The following parameters are not used in the function. Remove them.
*func(): ufshpb_set_hpb_read_to_upiu
-> struct ufshpb_lu *hpb
-> u32 lpn
Link: https://lore.kernel.org/r/20210901025617.31174-1-cw9316.lee@samsung.com
Reviewed-by: Daejun Park <daejun7.park@samsung.com>
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: ChanWoo Lee <cw9316.lee@samsung.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
Fix the following coccicheck REVIEW:
./drivers/scsi/lpfc/lpfc_scsi.c:1498:9-12 REVIEW Unneeded variable
Link: https://lore.kernel.org/r/20210831114058.17817-1-lv.ruyi@zte.com.cn
Reported-by: Zeal Robot <zealci@zte.com.cm>
Reviewed-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Chi Minghao <chi.minghao@zte.com.cn>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
The Kernel test robot flagged the following warning:
".../lpfc_init.c:7788:35: error: 'struct lpfc_sli4_hba' has no member
named 'c_stat'"
Reviewing this issue highlighted that one of the recent patches caused the
driver to no longer compile cleanly if CONFIG_DEBUG_FS is not set.
Correct the different areas that are failing to compile.
Link: https://lore.kernel.org/r/20210908050927.37275-1-jsmart2021@gmail.com
Fixes: 02243836ad6f ("scsi: lpfc: Add support for the CM framework")
Reviewed-by: Nathan Chancellor <nathan@kernel.org>
Build-tested-by: Nathan Chancellor <nathan@kernel.org>
Co-developed-by: Justin Tee <justin.tee@broadcom.com>
Signed-off-by: Justin Tee <justin.tee@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
The kernel test robot reported the following sparse warning:
".../lpfc_els.c:3984:25: sparse: sparse: cast from restricted __be16"
For the error being flagged, using be32_to_cpu() on a be16 data type, it
was simple enough. But a review of other elements and warnings were also
evaluated.
This patch corrected several items in the original patch:
- Using be32_to_cpu() on a be16 data type
- cpu_to_le32() used on a std uint32_t (CPU) data type.
Note: This is a byte array, but stored in LE layout by hardware at
32-bit boundaries. So it possibly needed conversion.
- Using cpu_to_le32() on a std uint16_t and assigned to a char typeA
- Using le32_to_cpu() on a le16 type
- Missing cpu_to_le16() on an assignment
Link: https://lore.kernel.org/r/20210830231243.6227-1-jsmart2021@gmail.com
Fixes: 9064aeb2df8e ("scsi: lpfc: Add EDC ELS support")
Reported-by: kernel test robot <lkp@intel.com>
Co-developed-by: Justin Tee <justin.tee@broadcom.com>
Signed-off-by: Justin Tee <justin.tee@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
The kernel test robot flagged an warning for ".../efc_device.c:932:6:
warning: cast to smaller integer type 'enum efc_nport_topology' from 'void
*'"
For the topology events, the "arg" field is generically defined as a void *
and is used to pass different arguments. Most of the arguments are pointers
to data structures. But for the EFC_EVT_NPORT_TOPOLOGY_NOTIFY event, the
argument is an enum value, and the code is typecasting the void * to an
enum generating the warning.
Fix by converting the EFC_EVT_NPORT_TOPOLOGY_NOTIFY event to pass a pointer
to the enum, thus it's a straight-forward pointer dereference in the event
handler.
Link: https://lore.kernel.org/r/20210830231050.5951-1-jsmart2021@gmail.com
Fixes: 202bfdffae27 ("scsi: elx: libefc: FC node ELS and state handling")
Reported-by: kernel test robot <lkp@intel.com>
Co-developed-by: Ram Vegesna <ram.vegesna@broadcom.com>
Signed-off-by: Ram Vegesna <ram.vegesna@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
Clang + -Wimplicit-fallthrough warns:
drivers/scsi/st.c:3831:2: warning: unannotated fall-through between
switch labels [-Wimplicit-fallthrough]
default:
^
drivers/scsi/st.c:3831:2: note: insert 'break;' to avoid fall-through
default:
^
break;
1 warning generated.
Clang's -Wimplicit-fallthrough is a little bit more pedantic than GCC's,
requiring every case block to end in break, return, or fallthrough, rather
than allowing implicit fallthroughs to cases that just contain break or
return. Add a break so that there is no more warning, as has been done all
over the tree already.
Link: https://lore.kernel.org/r/20210817235531.172995-1-nathan@kernel.org
Fixes: 2e27f576abc6 ("scsi: scsi_ioctl: Call scsi_cmd_ioctl() from scsi_ioctl()")
Reviewed-by: Gustavo A. R. Silva <gustavoars@kernel.org>
Signed-off-by: Nathan Chancellor <nathan@kernel.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
We need to re-check sqd->thread after we've dropped the lock. Pin
the sqd before doing the lockdep lock dance, and check if the thread
is alive after that. It's either NULL or alive, as the SQPOLL thread
cannot exit without holding the same sqd->lock.
Reported-and-tested-by: syzbot+337de45f13a4fd54d708@syzkaller.appspotmail.com
Fixes: fa84693b3c89 ("io_uring: ensure IORING_REGISTER_IOWQ_MAX_WORKERS works with SQPOLL")
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Minimal selftest which implements a small BPF policy program to the
connect(2) hook which rejects TCP connection requests to port 60123
with EPERM. This is being attached to a non-root cgroup v2 path. The
test asserts that this works under cgroup v2-only and under a mixed
cgroup v1/v2 environment where net_classid is set in the former case.
Before fix:
# ./test_progs -t cgroup_v1v2
test_cgroup_v1v2:PASS:server_fd 0 nsec
test_cgroup_v1v2:PASS:client_fd 0 nsec
test_cgroup_v1v2:PASS:cgroup_fd 0 nsec
test_cgroup_v1v2:PASS:server_fd 0 nsec
run_test:PASS:skel_open 0 nsec
run_test:PASS:prog_attach 0 nsec
test_cgroup_v1v2:PASS:cgroup-v2-only 0 nsec
run_test:PASS:skel_open 0 nsec
run_test:PASS:prog_attach 0 nsec
run_test:PASS:join_classid 0 nsec
(network_helpers.c:219: errno: None) Unexpected success to connect to server
test_cgroup_v1v2:FAIL:cgroup-v1v2 unexpected error: -1 (errno 0)
#27 cgroup_v1v2:FAIL
Summary: 0/0 PASSED, 0 SKIPPED, 1 FAILED
After fix:
# ./test_progs -t cgroup_v1v2
#27 cgroup_v1v2:OK
Summary: 1/0 PASSED, 0 SKIPPED, 0 FAILED
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20210913230759.2313-3-daniel@iogearbox.net
|
|
Minimal set of helpers for net_cls classid cgroupv1 management in order
to set an id, join from a process, initiate setup and teardown. cgroupv2
helpers are left as-is, but reused where possible.
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20210913230759.2313-2-daniel@iogearbox.net
|
|
Fix cgroup v1 interference when non-root cgroup v2 BPF programs are used.
Back in the days, commit bd1060a1d671 ("sock, cgroup: add sock->sk_cgroup")
embedded per-socket cgroup information into sock->sk_cgrp_data and in order
to save 8 bytes in struct sock made both mutually exclusive, that is, when
cgroup v1 socket tagging (e.g. net_cls/net_prio) is used, then cgroup v2
falls back to the root cgroup in sock_cgroup_ptr() (&cgrp_dfl_root.cgrp).
The assumption made was "there is no reason to mix the two and this is in line
with how legacy and v2 compatibility is handled" as stated in bd1060a1d671.
However, with Kubernetes more widely supporting cgroups v2 as well nowadays,
this assumption no longer holds, and the possibility of the v1/v2 mixed mode
with the v2 root fallback being hit becomes a real security issue.
Many of the cgroup v2 BPF programs are also used for policy enforcement, just
to pick _one_ example, that is, to programmatically deny socket related system
calls like connect(2) or bind(2). A v2 root fallback would implicitly cause
a policy bypass for the affected Pods.
In production environments, we have recently seen this case due to various
circumstances: i) a different 3rd party agent and/or ii) a container runtime
such as [0] in the user's environment configuring legacy cgroup v1 net_cls
tags, which triggered implicitly mentioned root fallback. Another case is
Kubernetes projects like kind [1] which create Kubernetes nodes in a container
and also add cgroup namespaces to the mix, meaning programs which are attached
to the cgroup v2 root of the cgroup namespace get attached to a non-root
cgroup v2 path from init namespace point of view. And the latter's root is
out of reach for agents on a kind Kubernetes node to configure. Meaning, any
entity on the node setting cgroup v1 net_cls tag will trigger the bypass
despite cgroup v2 BPF programs attached to the namespace root.
Generally, this mutual exclusiveness does not hold anymore in today's user
environments and makes cgroup v2 usage from BPF side fragile and unreliable.
This fix adds proper struct cgroup pointer for the cgroup v2 case to struct
sock_cgroup_data in order to address these issues; this implicitly also fixes
the tradeoffs being made back then with regards to races and refcount leaks
as stated in bd1060a1d671, and removes the fallback, so that cgroup v2 BPF
programs always operate as expected.
[0] https://github.com/nestybox/sysbox/
[1] https://kind.sigs.k8s.io/
Fixes: bd1060a1d671 ("sock, cgroup: add sock->sk_cgroup")
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Stanislav Fomichev <sdf@google.com>
Acked-by: Tejun Heo <tj@kernel.org>
Link: https://lore.kernel.org/bpf/20210913230759.2313-1-daniel@iogearbox.net
|
|
Correct kernel-doc comments pointed out by the
automated kernel test robot.
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
|
|
Commit 7661809d493b ("mm: don't allow oversized kvmalloc() calls") add the
oversize check. When the allocation is larger than what kmalloc() supports,
the following warning triggered:
WARNING: CPU: 0 PID: 8408 at mm/util.c:597 kvmalloc_node+0x108/0x110 mm/util.c:597
Modules linked in:
CPU: 0 PID: 8408 Comm: syz-executor221 Not tainted 5.14.0-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
RIP: 0010:kvmalloc_node+0x108/0x110 mm/util.c:597
Call Trace:
kvmalloc include/linux/mm.h:806 [inline]
kvmalloc_array include/linux/mm.h:824 [inline]
kvcalloc include/linux/mm.h:829 [inline]
check_btf_line kernel/bpf/verifier.c:9925 [inline]
check_btf_info kernel/bpf/verifier.c:10049 [inline]
bpf_check+0xd634/0x150d0 kernel/bpf/verifier.c:13759
bpf_prog_load kernel/bpf/syscall.c:2301 [inline]
__sys_bpf+0x11181/0x126e0 kernel/bpf/syscall.c:4587
__do_sys_bpf kernel/bpf/syscall.c:4691 [inline]
__se_sys_bpf kernel/bpf/syscall.c:4689 [inline]
__x64_sys_bpf+0x78/0x90 kernel/bpf/syscall.c:4689
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x3d/0xb0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x44/0xae
Reported-by: syzbot+f3e749d4c662818ae439@syzkaller.appspotmail.com
Signed-off-by: Bixuan Cui <cuibixuan@huawei.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/bpf/20210911005557.45518-1-cuibixuan@huawei.com
|
|
When building objtool with HOSTCC=clang, there are several errors along
the lines of
orc_dump.c:201:28: error: unknown attribute 'error' ignored [-Werror,-Wunknown-attributes]
This occurs after commit 4e59869aa655 ("compiler-gcc.h: drop checks for
older GCC versions"), which removed the GCC_VERSION gating. The removed
version check just so happened to prevent __compiletime_error() from
being defined with clang because it pretends to be GCC 4.2.1 for
compatibility but the error attribute was not added to clang until
14.0.0.
Commit 815f0ddb346c ("include/linux/compiler*.h: make compiler-*.h
mutually exclusive") and commit a3f8a30f3f00 ("Compiler Attributes: use
feature checks instead of version checks") refactored the handling of
attributes in the main kernel to avoid situations like this but that
refactoring has never been done for the tools directory.
Refactoring is a rather large undertaking and this has never been an
issue before so instead, just guard the definition of
__compiletime_error() with __has_attribute() so that there are no more
errors.
Fixes: 4e59869aa655 ("compiler-gcc.h: drop checks for older GCC versions")
Signed-off-by: Nathan Chancellor <nathan@kernel.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
checkpatch complains about source files with filenames (e.g. in
these cases just below the SPDX header in comments at the top of
various files in fs/cifs). It also is helpful to change this now
so will be less confusing when the parent directory is renamed
e.g. from fs/cifs to fs/smb_client (or fs/smbfs)
Reviewed-by: Ronnie Sahlberg <lsahlber@redhat.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
|
|
Merge patch series from Nick Desaulniers to update the minimum gcc
version to 5.1.
This is some of the left-overs from the merge window that I didn't want
to deal with yesterday, so it comes in after -rc1 but was sent before.
Gcc-4.9 support has been an annoyance for some time, and with -Werror I
had the choice of applying a fairly big patch from Kees Cook to remove a
fair number of initializer warnings (still leaving some), or this patch
series from Nick that just removes the source of the problem.
The initializer cleanups might still be worth it regardless, but
honestly, I preferred just tackling the problem with gcc-4.9 head-on.
We've been more aggressiuve about no longer having to care about
compilers that were released a long time ago, and I think it's been a
good thing.
I added a couple of patches on top to sort out a few left-overs now that
we no longer support gcc-4.x.
As noted by Arnd, as a result of this minimum compiler version upgrade
we can probably change our use of '--std=gnu89' to '--std=gnu11', and
finally start using local loop declarations etc. But this series does
_not_ yet do that.
Link: https://lore.kernel.org/all/20210909182525.372ee687@canb.auug.org.au/
Link: https://lore.kernel.org/lkml/CAK7LNASs6dvU6D3jL2GG3jW58fXfaj6VNOe55NJnTB8UPuk2pA@mail.gmail.com/
Link: https://github.com/ClangBuiltLinux/linux/issues/1438
* emailed patches from Nick Desaulniers <ndesaulniers@google.com>:
Drop some straggling mentions of gcc-4.9 as being stale
compiler_attributes.h: drop __has_attribute() support for gcc4
vmlinux.lds.h: remove old check for GCC 4.9
compiler-gcc.h: drop checks for older GCC versions
Makefile: drop GCC < 5 -fno-var-tracking-assignments workaround
arm64: remove GCC version check for ARCH_SUPPORTS_INT128
powerpc: remove GCC version check for UPD_CONSTR
riscv: remove Kconfig check for GCC version for ARCH_RV64I
Kconfig.debug: drop GCC 5+ version check for DWARF5
mm/ksm: remove old GCC 4.9+ check
compiler.h: drop fallback overflow checkers
Documentation: raise minimum supported version of GCC to 5.1
|
|
Fix up the admin-guide README file to the new gcc-5.1 requirement, and
remove a stale comment about gcc support for the __assume_aligned__
attribute.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
If HWP has been already been enabled by BIOS, it may be
necessary to override some kernel command line parameters.
Once it has been enabled it requires a reset to be disabled.
Suggested-by: Rafael J. Wysocki <rafael@kernel.org>
Signed-off-by: Doug Smythies <dsmythies@telus.net>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
|