Age | Commit message (Collapse) | Author |
|
The SCU message's data field used for receiving response data from SCU
should be 32-bit width, as SCU will send back 32-bit width data.
This solves kernel panic when CONFIG_CC_HAVE_STACKPROTECTOR_SYSREG is
enabled.
[ 1.950768] Kernel panic - not syncing: stack-protector: Kernel stack is corrupted
[ 1.980607] Workqueue: events imx_sc_check_for_events
[ 1.985657] Call trace:
[ 1.988104] dump_backtrace+0x0/0x140
[ 1.991768] show_stack+0x14/0x20
[ 1.995090] dump_stack+0xb4/0xf8
[ 1.998407] panic+0x158/0x324
[ 2.001463] print_tainted+0x0/0xa8
[ 2.004950] imx_sc_check_for_events+0x18c/0x190
[ 2.009569] process_one_work+0x198/0x320
[ 2.013579] worker_thread+0x48/0x420
[ 2.017252] kthread+0xf0/0x120
[ 2.020394] ret_from_fork+0x10/0x18
[ 2.023977] SMP: stopping secondary CPUs
[ 2.027901] Kernel Offset: disabled
[ 2.031391] CPU features: 0x0002,2100600c
[ 2.035401] Memory Limit: none
Signed-off-by: Anson Huang <Anson.Huang@nxp.com>
Fixes: 688f1dfb69b4 ("Input: keyboard - imx_sc: Add i.MX system controller key support")
Link: https://lore.kernel.org/r/1573730499-2224-1-git-send-email-Anson.Huang@nxp.com
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
|
|
Metadata runs are supposed to be aligned on 4k boundary (so that they work
efficiently with disks with 4k sectors). However, there was a programming
bug that makes them aligned on 128k boundary instead. The unused space is
wasted.
Fix this bug by providing a proper 4k alignment. In order to keep
existing volumes working, we introduce a new flag SB_FLAG_FIXED_PADDING
- when the flag is clear, we calculate the padding the old way. In order
to make sure that the old version cannot mount the volume created by the
new version, we increase superblock version to 4.
Also in order to not break with old integritysetup, we fix alignment
only if the parameter "fix_padding" is present when formatting the
device.
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
|
|
The driver forgets to destroy workqueue in remove() similarly to what is
done when probe() fails. Add a call to destroy_workqueue() to fix it.
Since unregistration will wait for the work to finish, we do not need to
cancel/flush the work instance in remove().
Signed-off-by: Chuhong Yuan <hslester96@gmail.com>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20191114023405.31477-1-hslester96@gmail.com
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
|
|
Add PDMA support to (arch/riscv/boot/dts/sifive/fu540-c000.dtsi)
Signed-off-by: Green Wan <green.wan@sifive.com>
Signed-off-by: Paul Walmsley <paul.walmsley@sifive.com>
|
|
No timer must be left running when the device goes away.
Signed-off-by: Oliver Neukum <oneukum@suse.com>
Reported-and-tested-by: syzbot+b6c55daa701fc389e286@syzkaller.appspotmail.com
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/1573726121.17351.3.camel@suse.com
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
|
|
Both unregister_ftrace_direct() and modify_ftrace_direct() needs to
normalize the ip passed in to match the rec->ip, as it is acceptable to have
the ip on the ftrace call site but not the start. There are also common
validity checks with the record found by the ip, these should be done for
both unregister_ftrace_direct() and modify_ftrace_direct().
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
|
|
As an instruction pointer passed into register_ftrace_direct() may just
exist on the ftrace call site, but may not be the start of the call site
itself, register_ftrace_direct() still needs to update test if a direct call
exists on the normalized site, as only one direct call is allowed at any one
time.
Fixes: 763e34e74bb7d ("ftrace: Add register_ftrace_direct()")
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
|
|
The direct->count wasn't being updated properly, where it only was updated
when the first entry was added, but should be updated every time.
Fixes: 013bf0da04748 ("ftrace: Add ftrace_find_direct_func()")
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
|
|
The x86_capability array in cpuinfo_x86 is of type u32 and thus is
naturally aligned to 4 bytes. But, set_bit() and clear_bit() require the
array to be aligned to size of unsigned long (i.e. 8 bytes on 64-bit
systems).
The array pointer is handed into atomic bit operations. If the access is
not aligned to unsigned long then the atomic bit operations can end up
crossing a cache line boundary, which causes the CPU to do a full bus lock
as it can't lock both cache lines at once. The bus lock operation is heavy
weight and can cause severe performance degradation.
The upcoming #AC split lock detection mechanism will issue warnings for
this kind of access.
Force the alignment of the array to unsigned long. This avoids the massive
code changes which would be required when converting the array data type to
unsigned long.
[ tglx: Rewrote changelog so it contains information WHY this is required ]
Suggested-by: David Laight <David.Laight@aculab.com>
Suggested-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lkml.kernel.org/r/20190916223958.27048-4-tony.luck@intel.com
|
|
cpu_caps_cleared[] and cpu_caps_set[] are arrays of type u32 and therefore
naturally aligned to 4 bytes, which is also unsigned long aligned on
32-bit, but not on 64-bit.
The array pointer is handed into atomic bit operations. If the access not
aligned to unsigned long then the atomic bit operations can end up crossing
a cache line boundary, which causes the CPU to do a full bus lock as it
can't lock both cache lines at once. The bus lock operation is heavy weight
and can cause severe performance degradation.
The upcoming #AC split lock detection mechanism will issue warnings for
this kind of access.
Force the alignment of these arrays to unsigned long. This avoids the
massive code changes which would be required when converting the array data
type to unsigned long.
[ tglx: Rewrote changelog ]
Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Borislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/20190916223958.27048-2-tony.luck@intel.com
|
|
Pinned negative dentries can, generally, be made positive
by another thread. Conditions that prevent that are
* ->d_lock on dentry in question
* parent directory held at least shared
* nobody else could have observed the address of dentry
Most of the places working with those fall into one of those
categories; however, d_lookup() and friends need to be used
with some care. Fortunately, there's not a lot of call sites,
and with few exceptions all of those fall under one of the
cases above.
Exceptions are all in fs/namei.c - in lookup_fast(), lookup_dcache()
and mountpoint_last(). Another one is lookup_slow() - there
dcache lookup is done with parent held shared, but the result
is used after we'd drop the lock. The same happens in do_last() -
the lookup (in lookup_one()) is done with parent locked, but
result is used after unlocking.
lookup_fast(), do_last() and mountpoint_last() flat-out reject
negatives.
Most of lookup_dcache() calls are made with parent locked at least
shared; the only exception is lookup_one_len_unlocked(). It might
return pinned negative, needs serious care from callers. Fortunately,
almost nobody calls it directly anymore; all but two callers have
converted to lookup_positive_unlocked(), which rejects negatives.
lookup_slow() is called by the same lookup_one_len_unlocked() (see
above), mountpoint_last() and walk_component(). In those two negatives
are rejected.
In other words, there is a small set of places where we need to
check carefully if a pinned potentially negative dentry is, in
fact, positive. After that check we want to be sure that both
->d_inode and type bits in ->d_flags are stable and observed.
The set consists of follow_managed() (where the rejection happens
for lookup_fast(), walk_component() and do_last()), last_mountpoint()
and lookup_positive_unlocked().
Solution:
1) transition from negative to positive (in __d_set_inode_and_type())
stores ->d_inode, then uses smp_store_release() to set ->d_flags type bits.
2) aforementioned 3 places in fs/namei.c fetch ->d_flags with
smp_load_acquire() and bugger off if it type bits say "negative".
That way anyone downstream of those checks has dentry know positive pinned,
with ->d_inode and type bits of ->d_flags stable and observed.
I considered splitting off d_lookup_positive(), so that the checks could
be done right there, under ->d_lock. However, that leads to massive
duplication of rather subtle code in fs/namei.c and fs/dcache.c. It's
worse than it might seem, thanks to autofs ->d_manage() getting involved ;-/
No matter what, autofs_d_manage()/autofs_d_automount() must live with
the possibility of pinned negative dentry passed their way, becoming
positive under them - that's the intended behaviour when lookup comes
in the middle of automount in progress, so we can't keep them out of
the area that has to deal with those, more's the pity...
Reported-by: Ritesh Harjani <riteshh@linux.ibm.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
|
We are overoptimistic about taking the fast path there; seeing
the same value in ->d_parent after having grabbed a reference
to that parent does *not* mean that it has remained our parent
all along.
That wouldn't be a big deal (in the end it is our parent and
we have grabbed the reference we are about to return), but...
the situation with barriers is messed up.
We might have hit the following sequence:
d is a dentry of /tmp/a/b
CPU1: CPU2:
parent = d->d_parent (i.e. dentry of /tmp/a)
rename /tmp/a/b to /tmp/b
rmdir /tmp/a, making its dentry negative
grab reference to parent,
end up with cached parent->d_inode (NULL)
mkdir /tmp/a, rename /tmp/b to /tmp/a/b
recheck d->d_parent, which is back to original
decide that everything's fine and return the reference we'd got.
The trouble is, caller (on CPU1) will observe dget_parent()
returning an apparently negative dentry. It actually is positive,
but CPU1 has stale ->d_inode cached.
Use d->d_seq to see if it has been moved instead of rechecking ->d_parent.
NOTE: we are *NOT* going to retry on any kind of ->d_seq mismatch;
we just go into the slow path in such case. We don't wait for ->d_seq
to become even either - again, if we are racing with renames, we
can bloody well go to slow path anyway.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
|
Most of the callers of lookup_one_len_unlocked() treat negatives are
ERR_PTR(-ENOENT). Provide a helper that would do just that. Note
that a pinned positive dentry remains positive - it's ->d_inode is
stable, etc.; a pinned _negative_ dentry can become positive at any
point as long as you are not holding its parent at least shared.
So using lookup_one_len_unlocked() needs to be careful;
lookup_positive_unlocked() is safer and that's what the callers
end up open-coding anyway.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
|
There are 4 callers; two proceed to check if result is positive and
fail with ENOENT if it isn't; one (in handle_lookup_down()) is
guaranteed to yield positive and one (in lookup_fast()) is _preceded_
by positivity check.
However, follow_managed() on a negative dentry is a (fairly cheap)
no-op on anything other than autofs. And negative autofs dentries
are never hashed, so lookup_fast() is not going to run into one
of those. Moreover, successful follow_managed() on a _positive_
dentry never yields a negative one (and we significantly rely upon
that in callers of lookup_fast()).
In other words, we can easily transpose the positivity check and
the call of follow_managed() in lookup_fast(). And that allows
to fold the positivity check *into* follow_managed(), simplifying
life for the code downstream of its calls.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
|
Pull ceph fixes from Ilya Dryomov:
"Two fixes for the buffered reads and O_DIRECT writes serialization
patch that went into -rc1 and a fixup for a bogus warning on older gcc
versions"
* tag 'ceph-for-5.4-rc8' of git://github.com/ceph/ceph-client:
rbd: silence bogus uninitialized warning in rbd_object_map_update_finish()
ceph: increment/decrement dio counter on async requests
ceph: take the inode lock before acquiring cap refs
|
|
When a lookup is done, the afs filesystem will perform a bulk status-fetch
operation on the requested vnode (file) plus the next 49 other vnodes from
the directory list (in AFS, directory contents are downloaded as blobs and
parsed locally). When the results are received, it will speculatively
populate the inode cache from the extra data.
However, if the lookup races with another lookup on the same directory, but
for a different file - one that's in the 49 extra fetches, then if the bulk
status-fetch operation finishes first, it will try and update the inode
from the other lookup.
If this other inode is still in the throes of being created, however, this
will cause an assertion failure in afs_apply_status():
BUG_ON(test_bit(AFS_VNODE_UNSET, &vnode->flags));
on or about fs/afs/inode.c:175 because it expects data to be there already
that it can compare to.
Fix this by skipping the update if the inode is being created as the
creator will presumably set up the inode with the same information.
Fixes: 39db9815da48 ("afs: Fix application of the results of a inline bulk status fetch")
Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: Marc Dionne <marc.dionne@auristor.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Robust futexes utilize the robust_list mechanism to allow the kernel to
release futexes which are held when a task exits. The exit can be voluntary
or caused by a signal or fault. This prevents that waiters block forever.
The futex operations in user space store a pointer to the futex they are
either locking or unlocking in the op_pending member of the per task robust
list.
After a lock operation has succeeded the futex is queued in the robust list
linked list and the op_pending pointer is cleared.
After an unlock operation has succeeded the futex is removed from the
robust list linked list and the op_pending pointer is cleared.
The robust list exit code checks for the pending operation and any futex
which is queued in the linked list. It carefully checks whether the futex
value is the TID of the exiting task. If so, it sets the OWNER_DIED bit and
tries to wake up a potential waiter.
This is race free for the lock operation but unlock has two race scenarios
where waiters might not be woken up. These issues can be observed with
regular robust pthread mutexes. PI aware pthread mutexes are not affected.
(1) Unlocking task is killed after unlocking the futex value in user space
before being able to wake a waiter.
pthread_mutex_unlock()
|
V
atomic_exchange_rel (&mutex->__data.__lock, 0)
<------------------------killed
lll_futex_wake () |
|
|(__lock = 0)
|(enter kernel)
|
V
do_exit()
exit_mm()
mm_release()
exit_robust_list()
handle_futex_death()
|
|(__lock = 0)
|(uval = 0)
|
V
if ((uval & FUTEX_TID_MASK) != task_pid_vnr(curr))
return 0;
The sanity check which ensures that the user space futex is owned by
the exiting task prevents the wakeup of waiters which in consequence
block infinitely.
(2) Waiting task is killed after a wakeup and before it can acquire the
futex in user space.
OWNER WAITER
futex_wait()
pthread_mutex_unlock() |
| |
|(__lock = 0) |
| |
V |
futex_wake() ------------> wakeup()
|
|(return to userspace)
|(__lock = 0)
|
V
oldval = mutex->__data.__lock
<-----------------killed
atomic_compare_and_exchange_val_acq (&mutex->__data.__lock, |
id | assume_other_futex_waiters, 0) |
|
|
(enter kernel)|
|
V
do_exit()
|
|
V
handle_futex_death()
|
|(__lock = 0)
|(uval = 0)
|
V
if ((uval & FUTEX_TID_MASK) != task_pid_vnr(curr))
return 0;
The sanity check which ensures that the user space futex is owned
by the exiting task prevents the wakeup of waiters, which seems to
be correct as the exiting task does not own the futex value, but
the consequence is that other waiters wont be woken up and block
infinitely.
In both scenarios the following conditions are true:
- task->robust_list->list_op_pending != NULL
- user space futex value == 0
- Regular futex (not PI)
If these conditions are met then it is reasonably safe to wake up a
potential waiter in order to prevent the above problems.
As this might be a false positive it can cause spurious wakeups, but the
waiter side has to handle other types of unrelated wakeups, e.g. signals
gracefully anyway. So such a spurious wakeup will not affect the
correctness of these operations.
This workaround must not touch the user space futex value and cannot set
the OWNER_DIED bit because the lock value is 0, i.e. uncontended. Setting
OWNER_DIED in this case would result in inconsistent state and subsequently
in malfunction of the owner died handling in user space.
The rest of the user space state is still consistent as no other task can
observe the list_op_pending entry in the exiting tasks robust list.
The eventually woken up waiter will observe the uncontended lock value and
take it over.
[ tglx: Massaged changelog and comment. Made the return explicit and not
depend on the subsequent check and added constants to hand into
handle_futex_death() instead of plain numbers. Fixed a few coding
style issues. ]
Fixes: 0771dfefc9e5 ("[PATCH] lightweight robust futexes: core")
Signed-off-by: Yang Tao <yang.tao172@zte.com.cn>
Signed-off-by: Yi Wang <wang.yi59@zte.com.cn>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/1573010582-35297-1-git-send-email-wang.yi59@zte.com.cn
Link: https://lkml.kernel.org/r/20191106224555.943191378@linutronix.de
|
|
This patch closes a timing window in which two processes compete
and overlap in the execution of do_xmote for the same glock:
Process A Process B
------------------------------------ -----------------------------
1. Grabs gl_lockref and calls do_xmote
2. Grabs gl_lockref but is blocked
3. Sets GLF_INVALIDATE_IN_PROGRESS
4. Unlocks gl_lockref
5. Calls do_xmote
6. Call glops->go_sync
7. test_and_clear_bit GLF_DIRTY
8. Call gfs2_log_flush Call glops->go_sync
9. (slow IO, so it blocks a long time) test_and_clear_bit GLF_DIRTY
It's not dirty (step 7) returns
10. Tests GLF_INVALIDATE_IN_PROGRESS
11. Calls go_inval (rgrp_go_inval)
12. gfs2_rgrp_relse does brelse
13. truncate_inode_pages_range
14. Calls lm_lock UN
In step 14 we've just told dlm to give the glock to another node
when, in fact, process A has not finished the IO and synced all
buffer_heads to disk and make sure their revokes are done.
This patch fixes the problem by changing the GLF_INVALIDATE_IN_PROGRESS
to use test_and_set_bit, and if the bit is already set, process B just
ignores it and trusts that process A will do the do_xmote in the proper
order.
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux
Pull arm64 fix from Will Deacon:
"One trivial fix for -rc8/final that ensures that the script used to
detect RELR relocation support in the toolchain works correctly when
$CC contains quotes. Although it fails safely (by failing to detect
the support when it exists), it would be nice to have this fixed in
5.4 given that it was only introduced in the last merge window.
Summary:
- Handle CC variables containing quotes in tools-support-relr.sh
script"
* tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
scripts/tools-support-relr.sh: un-quote variables
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/mips/linux
Pull MIPS fixes from Paul Burton:
"A fix and simplification for SGI IP27 exception handlers, and a small
MAINTAINERS update for Broadcom MIPS systems"
* tag 'mips_fixes_5.4_4' of git://git.kernel.org/pub/scm/linux/kernel/git/mips/linux:
MAINTAINERS: Remove Kevin as maintainer of BMIPS generic platforms
MIPS: SGI-IP27: fix exception handler replication
|
|
Pull more KVM fixes from Paolo Bonzini:
- fixes for CONFIG_KVM_COMPAT=n
- two updates to the IFU erratum
- selftests build fix
- brown paper bag fix
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
KVM: Add a comment describing the /dev/kvm no_compat handling
KVM: x86/mmu: Take slots_lock when using kvm_mmu_zap_all_fast()
KVM: Forbid /dev/kvm being opened by a compat task when CONFIG_KVM_COMPAT=n
KVM: X86: Reset the three MSR list number variables to 0 in kvm_init_msr_list()
selftests: kvm: fix build with glibc >= 2.30
kvm: x86: disable shattered huge page recovery for PREEMPT_RT.
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc
Pull MMC fix from Ulf Hansson:
"Don't overwrite quirk flags in sdhci-of-at91 host driver"
* tag 'mmc-v5.4-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc:
mmc: sdhci-of-at91: fix quirk2 overwrite
|
|
Before this patch, an io error, such as -EIO writing to the journal
would cause function gfs2_freeze to go into an infinite loop,
continuously retrying the freeze operation. But nothing ever clears
the -EIO except unmount after withdraw, which is impossible if the
freeze operation never ends (fails). Instead you get:
[ 6499.767994] gfs2: fsid=dm-32.0: error freezing FS: -5
[ 6499.773058] gfs2: fsid=dm-32.0: retrying...
[ 6500.791957] gfs2: fsid=dm-32.0: error freezing FS: -5
[ 6500.797015] gfs2: fsid=dm-32.0: retrying...
This patch adds a check for -EIO in gfs2_freeze, and if seen, it
dequeues the freeze glock, aborts the loop and returns the error.
Also, there's no need to pass the freeze holder to function
gfs2_lock_fs_check_clean since it's only called in one place and
it's a well-known superblock pointer, so this simplifies that.
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound
Pull sound fixes from Takashi Iwai:
"A few small last-minute fixes for USB-audio and HD-audio as well as
for PCM core:
- A race fix for PCM core between stopping and closing a stream
- USB-audio regressions in the recent descriptor validation code and
relevant changes
- A read of uninitialized value in USB-audio spotted by fuzzer
- A fix for USB-audio race at stopping a stream
- Intel HD-audio platform fixes"
* tag 'sound-5.4-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound:
ALSA: usb-audio: Fix incorrect size check for processing/extension units
ALSA: usb-audio: Fix incorrect NULL check in create_yamaha_midi_quirk()
ALSA: pcm: Fix stream lock usage in snd_pcm_period_elapsed()
ALSA: usb-audio: not submit urb for stopped endpoint
ALSA: hda: hdmi - fix pin setup on Tigerlake
ALSA: hda: Add Cometlake-S PCI ID
ALSA: usb-audio: Fix missing error check at mixer resolution test
|
|
Pull drm fixes from Dave Airlie:
"Here is this weeks non-intel hw vuln fixes pull. Three drivers, all
small fixes.
i915:
- MOCS table fixes for EHL and TGL
- Update Display's rawclock on resume
- GVT's dmabuf reference drop fix
amdgpu:
- Fix a potential crash in firmware parsing
sun4i:
- One fix to the dotclock dividers range for sun4i"
* tag 'drm-fixes-2019-11-15' of git://anongit.freedesktop.org/drm/drm:
drm/amdgpu: fix null pointer deref in firmware header printing
drm/i915/tgl: MOCS table update
Revert "drm/i915/ehl: Update MOCS table for EHL"
drm/sun4i: tcon: Set min division of TCON0_DCLK to 1.
drm/i915: update rawclk also on resume
drm/i915/gvt: fix dropping obj reference twice
|
|
Pull misc vfs fixes from Al Viro:
"Assorted fixes all over the place; some of that is -stable fodder,
some regressions from the last window"
* 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
ecryptfs_lookup_interpose(): lower_dentry->d_parent is not stable either
ecryptfs_lookup_interpose(): lower_dentry->d_inode is not stable
ecryptfs: fix unlink and rmdir in face of underlying fs modifications
audit_get_nd(): don't unlock parent too early
exportfs_decode_fh(): negative pinned may become positive without the parent locked
cgroup: don't put ERR_PTR() into fc->root
autofs: fix a leak in autofs_expire_indirect()
aio: Fix io_pgetevents() struct __compat_aio_sigset layout
fs/namespace.c: fix use-after-free of mount in mnt_warn_timestamp_expiry()
|
|
There is a spelling mistake in a trace_printk message. As well as in
the selftests that search for this string.
Link: http://lkml.kernel.org/r/20191115085938.38947-1-colin.king@canonical.com
Link: http://lkml.kernel.org/r/20191115090356.39572-1-colin.king@canonical.com
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
|
|
Increase the maximum allowed count of synthetic event fields from 16 to 32
in order to allow for larger-than-usual events.
Link: http://lkml.kernel.org/r/20191115091730.9192-1-dedekind1@gmail.com
Reviewed-by: Tom Zanussi <tom.zanussi@linux.intel.com>
Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
|
|
|
|
Increase the threshold at which the reader sends a wake event to the
writers in the queue such that the queue must be half empty before the wake
is issued rather than the wake being issued when just a single slot
available.
This reduces the number of context switches in the tests significantly,
without altering the amount of work achieved. With my pipe-bench program,
there's a 20% reduction versus an unpatched kernel.
Suggested-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
Signed-off-by: David Howells <dhowells@redhat.com>
|
|
Make pipe_write() check to see if the ring has become full between it
taking the pipe mutex, checking the ring status and then taking the
spinlock.
This can happen if a notification is written into the pipe as that happens
without the pipe mutex.
Signed-off-by: David Howells <dhowells@redhat.com>
|
|
Remove a redundant wakeup from pipe_write().
Signed-off-by: David Howells <dhowells@redhat.com>
|
|
Rearrange the sequence in pipe_write() so that the allocation of the new
buffer, the allocation of a ring slot and the attachment to the ring is
done under the pipe wait spinlock and then the lock is dropped and the
buffer can be filled.
The data copy needs to be done with the spinlock unheld and irqs enabled,
so the lock needs to be dropped first. However, the reader can't progress
as we're holding pipe->mutex.
We also need to drop the lock as that would impact others looking at the
pipe waitqueue, such as poll(), the consumer and a future kernel message
writer.
We just abandon the preallocated slot if we get a copy error. Future
writes may continue it and a future read will eventually recycle it.
Signed-off-by: David Howells <dhowells@redhat.com>
|
|
Only do a wakeup in pipe_read() if we made space in a completely full
buffer. The producer shouldn't be waiting on pipe->wait otherwise.
Signed-off-by: David Howells <dhowells@redhat.com>
|
|
Advance the pipe ring tail pointer inside of wait spinlock in pipe_read()
so that the pipe can be written into with kernel notifications from
contexts where pipe->mutex cannot be taken.
Signed-off-by: David Howells <dhowells@redhat.com>
|
|
Split pipe->ring_size into two numbers:
(1) pipe->ring_size - indicates the hard size of the pipe ring.
(2) pipe->max_usage - indicates the maximum number of pipe ring slots that
userspace orchestrated events can fill.
This allows for a pipe that is both writable by the general kernel
notification facility and by userspace, allowing plenty of ring space for
notifications to be added whilst preventing userspace from being able to
pin too much unswappable kernel space.
Signed-off-by: David Howells <dhowells@redhat.com>
|
|
Commit 52cf93e63ee6 ("HID: i2c-hid: Don't reset device upon system
resume") fixes many touchpads and touchscreens, however ALPS touchpads
start to trigger IRQ storm after system resume.
Since it's total silence from ALPS, let's bring the old behavior back
to ALPS touchpads.
Fixes: 52cf93e63ee6 ("HID: i2c-hid: Don't reset device upon system resume")
Signed-off-by: Kai-Heng Feng <kai.heng.feng@canonical.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
|
|
On some ThinkPad L390 some raydium 3118 touchscreen devices
doesn't response any data after reset, but some does.
Add this ID to no irq quirk,
then don't wait for any response alike on these touchscreens.
All kinds of raydium 3118 devices work fine.
BugLink: https://bugs.launchpad.net/bugs/1849721
Signed-off-by: Aaron Ma <aaron.ma@canonical.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
|
|
The helper jbd2_handle_buffer_credits() doesn't correctly handle reserved
handles which can lead to crashes. Fix it getting of journal pointer to
work for reserved handles as well.
Fixes: a9a8344ee171 ("ext4, jbd2: Provide accessor function for handle credits")
Reported-by: Eric Biggers <ebiggers@kernel.org>
Signed-off-by: Jan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20191115102210.29445-1-jack@suse.cz
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
|
At the moment, the compilation of the old time32 system calls depends
purely on the architecture. As systems with new libc based on 64-bit
time_t are getting deployed, even architectures that previously supported
these (notably x86-32 and arm32 but also many others) no longer depend on
them, and removing them from a kernel image results in a smaller kernel
binary, the same way we can leave out many other optional system calls.
More importantly, on an embedded system that needs to keep working
beyond year 2038, any user space program calling these system calls
is likely a bug, so removing them from the kernel image does provide
an extra debugging help for finding broken applications.
I've gone back and forth on hiding this option unless CONFIG_EXPERT
is set. This version leaves it visible based on the logic that
eventually it will be turned off indefinitely.
Acked-by: Christian Brauner <christian.brauner@ubuntu.com>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
|
|
There is no 64-bit version of getitimer/setitimer since that is not
actually needed. However, the implementation is built around the
deprecated 'struct timeval' type.
Change the code to use timespec64 internally to reduce the dependencies
on timeval and associated helper functions.
Minor adjustments in the code are needed to make the native and compat
version work the same way, and to keep the range check working after
the conversion.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
|
|
Preparing for a change to the itimer internals, stop using the
do_setitimer() symbol and instead use a new higher-level interface.
The do_getitimer()/do_setitimer functions can now be made static,
allowing the compiler to potentially produce better object code.
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
|
|
The itimer handling for the old alpha osf_setitimer/osf_getitimer
system calls is identical to the compat version of getitimer/setitimer,
so just use those directly.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
|
|
The structure is only used in one place, moving it there simplifies the
interface and helps with later changes to this code.
Rename it to match the other time32 structures in the process.
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
|
|
The compat_get_timeval() and timeval_valid() interfaces are deprecated
and getting removed along with the definition of struct timeval itself.
Change the two implementations of the settimeofday() system call to
open-code these helpers and completely avoid references to timeval.
The timeval_valid() call is not needed any more here, only a check to
avoid overflowing tv_nsec during the multiplication, as there is another
range check in do_sys_settimeofday64().
Tested-by: syzbot+dccce9b26ba09ca49966@syzkaller.appspotmail.com
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
|
|
timerfd_show() uses a 'struct itimerspec' internally, but that is
deprecated because of the time_t overflow and a conflict with the glibc
type of the same name that is now incompatible in user space.
Use a pair of timespec64 variables instead as a simple replacement.
As this removes the last use of itimerspec from the kernel, allowing the
removal of the definition from the uapi headers along with timespec and
timeval later.
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
|
|
We store elapsed time for a crashed process in struct elf_prstatus using
'timeval' structures. Once glibc starts using 64-bit time_t, this becomes
incompatible with the kernel's idea of timeval since the structure layout
no longer matches on 32-bit architectures.
This changes the definition of the elf_prstatus structure to use
__kernel_old_timeval instead, which is hardcoded to the currently used
binary layout. There is no risk of overflow in y2038 though, because
the time values are all relative times, and can store up to 68 years
of process elapsed time.
There is a risk of applications breaking at build time when they
use the new kernel headers and expect the type to be exactly 'timeval'
rather than a structure that has the same fields as before. Those
applications have to be modified to deal with 64-bit time_t anyway.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
|
|
This gets us one step closer to removing 'struct timeval' from the
kernel. We still keep __kernel_old_timeval for interfaces that we cannot
fix otherwise, and ns_to_compat_timeval() is provably safe for interfaces
that are legitimate users of __kernel_old_timeval on native kernels,
so this is an obvious change.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
|
|
The 'timespec' type definition and helpers like ktime_to_timespec()
or timespec64_to_timespec() should no longer be used in the kernel so
we can remove them and avoid introducing y2038 issues in new code.
Change the socket code that needs to pass a timespec to user space for
backward compatibility to use __kernel_old_timespec instead. This type
has the same layout but with a clearer defined name.
Slightly reformat tcp_recv_timestamp() for consistency after the removal
of timespec64_to_timespec().
Acked-by: Deepa Dinamani <deepa.kernel@gmail.com>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
|
|
In order to remove the 'struct timespec' definition and the
timespec64_to_timespec() helper function, change over the in-kernel
definition of 'struct scm_timestamping' to use the __kernel_old_timespec
replacement and open-code the assignment.
Acked-by: Deepa Dinamani <deepa.kernel@gmail.com>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
|