Age | Commit message (Collapse) | Author |
|
DM writecache does not handle asynchronous pmem. Reject it when
supplied as cache.
Link: https://lore.kernel.org/linux-nvdimm/87lfk5hahc.fsf@linux.ibm.com/
Fixes: 6e84200c0a29 ("virtio-pmem: Add virtio pmem driver")
Signed-off-by: Michal Suchanek <msuchanek@suse.de>
Acked-by: Mikulas Patocka <mpatocka@redhat.com>
Cc: stable@vger.kernel.org # 5.3+
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
|
|
bio_uninit is the proper API to clean up a BIO that has been allocated
on stack or inside a structure that doesn't come from the BIO allocator.
Switch dm to use that instead of bio_disassociate_blkg, which really is
an implementation detail. Note that the bio_uninit calls are also moved
to the two callers of __send_empty_flush, so that they better pair with
the bio_init calls used to initialize them.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
|
|
CONFIG_REGMAP is not selected when no other serial bus is supported.
It's largely academic since CONFIG_I2C is usually selected e.g. by
DRM, but still this can break randconfig so let's be explicit.
Signed-off-by: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com>
Link: https://lore.kernel.org/r/20200707202628.113142-1-pierre-louis.bossart@linux.intel.com
Signed-off-by: Mark Brown <broonie@kernel.org>
|
|
During DLL initialization, the DLL_CONFIG register value would be
updated with the value supplied from the device-tree.
Override this register only if a valid value is supplied.
Fixes: 03591160ca19 ("mmc: sdhci-msm: Read and use DLL Config property from device tree file")
Signed-off-by: Veerabhadrarao Badiganti <vbadigan@codeaurora.org>
Link: https://lore.kernel.org/r/1594213888-2780-1-git-send-email-vbadigan@codeaurora.org
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
|
|
Move the initialization of the vendor_part_id to be before calling
ib_register_device(), this is needed because the query_device() callback
is called from the context of ib_register_device() before initializing the
vendor_part_id, so the reported value is wrong.
Fixes: bdcf26bf9b3a ("rdma/siw: network and RDMA core interface")
Link: https://lore.kernel.org/r/20200707130931.444724-1-kamalheib1@gmail.com
Signed-off-by: Kamal Heib <kamalheib1@gmail.com>
Reviewed-by: Bernard Metzler <bmt@zurich.ibm.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
|
|
A typo caused the interrupt handler to branch immediately to the
common "unknown interrupt" handler and skip the special case test for
denormal cause.
This does not affect KVM softpatch handling (e.g., for POWER9 TM
assist) because the KVM test was moved to common code by commit
9600f261acaa ("powerpc/64s/exception: Move KVM test to common code")
just before this bug was introduced.
Fixes: 3f7fbd97d07d ("powerpc/64s/exception: Clean up SRR specifiers")
Reported-by: Paul Menzel <pmenzel@molgen.mpg.de>
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Tested-by: Paul Menzel <pmenzel@molgen.mpg.de>
[mpe: Split selftest into a separate patch]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200708074942.1713396-1-npiggin@gmail.com
|
|
While integrating rseq into glibc and replacing glibc's sched_getcpu
implementation with rseq, glibc's tests discovered an issue with
incorrect __rseq_abi.cpu_id field value right after the first time
a newly created process issues sched_setaffinity.
For the records, it triggers after building glibc and running tests, and
then issuing:
for x in {1..2000} ; do posix/tst-affinity-static & done
and shows up as:
error: Unexpected CPU 2, expected 0
error: Unexpected CPU 2, expected 0
error: Unexpected CPU 2, expected 0
error: Unexpected CPU 2, expected 0
error: Unexpected CPU 138, expected 0
error: Unexpected CPU 138, expected 0
error: Unexpected CPU 138, expected 0
error: Unexpected CPU 138, expected 0
This is caused by the scheduler invoking __set_task_cpu() directly from
sched_fork() and wake_up_new_task(), thus bypassing rseq_migrate() which
is done by set_task_cpu().
Add the missing rseq_migrate() to both functions. The only other direct
use of __set_task_cpu() is done by init_idle(), which does not involve a
user-space task.
Based on my testing with the glibc test-case, just adding rseq_migrate()
to wake_up_new_task() is sufficient to fix the observed issue. Also add
it to sched_fork() to keep things consistent.
The reason why this never triggered so far with the rseq/basic_test
selftest is unclear.
The current use of sched_getcpu(3) does not typically require it to be
always accurate. However, use of the __rseq_abi.cpu_id field within rseq
critical sections requires it to be accurate. If it is not accurate, it
can cause corruption in the per-cpu data targeted by rseq critical
sections in user-space.
Reported-By: Florian Weimer <fweimer@redhat.com>
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Tested-By: Florian Weimer <fweimer@redhat.com>
Cc: stable@vger.kernel.org # v4.18+
Link: https://lkml.kernel.org/r/20200707201505.2632-1-mathieu.desnoyers@efficios.com
|
|
The recent commit:
c6e7bd7afaeb ("sched/core: Optimize ttwu() spinning on p->on_cpu")
moved these lines in ttwu():
p->sched_contributes_to_load = !!task_contributes_to_load(p);
p->state = TASK_WAKING;
up before:
smp_cond_load_acquire(&p->on_cpu, !VAL);
into the 'p->on_rq == 0' block, with the thinking that once we hit
schedule() the current task cannot change it's ->state anymore. And
while this is true, it is both incorrect and flawed.
It is incorrect in that we need at least an ACQUIRE on 'p->on_rq == 0'
to avoid weak hardware from re-ordering things for us. This can fairly
easily be achieved by relying on the control-dependency already in
place.
The second problem, which makes the flaw in the original argument, is
that while schedule() will not change prev->state, it will read it a
number of times (arguably too many times since it's marked volatile).
The previous condition 'p->on_cpu == 0' was sufficient because that
indicates schedule() has completed, and will no longer read
prev->state. So now the trick is to make this same true for the (much)
earlier 'prev->on_rq == 0' case.
Furthermore, in order to make the ordering stick, the 'prev->on_rq = 0'
assignment needs to he a RELEASE, but adding additional ordering to
schedule() is an unwelcome proposition at the best of times, doubly so
for mere accounting.
Luckily we can push the prev->state load up before rq->lock, with the
only caveat that we then have to re-read the state after. However, we
know that if it changed, we no longer have to worry about the blocking
path. This gives us the required ordering, if we block, we did the
prev->state load before an (effective) smp_mb() and the p->on_rq store
needs not change.
With this we end up with the effective ordering:
LOAD p->state LOAD-ACQUIRE p->on_rq == 0
MB
STORE p->on_rq, 0 STORE p->state, TASK_WAKING
which ensures the TASK_WAKING store happens after the prev->state
load, and all is well again.
Fixes: c6e7bd7afaeb ("sched/core: Optimize ttwu() spinning on p->on_cpu")
Reported-by: Dave Jones <davej@codemonkey.org.uk>
Reported-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Tested-by: Dave Jones <davej@codemonkey.org.uk>
Tested-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Link: https://lkml.kernel.org/r/20200707102957.GN117543@hirez.programming.kicks-ass.net
|
|
The HiSilicon hibmc driver triggers a splat at boot time as below
[ 14.137806] ------------[ cut here ]------------
[ 14.142405] hibmc-drm 0000:0a:00.0: Device has not been registered.
[ 14.148661] WARNING: CPU: 0 PID: 496 at drivers/gpu/drm/drm_fb_helper.c:2233 drm_fbdev_generic_setup+0x15c/0x1b8
[ 14.158787] [...]
[ 14.278307] Call trace:
[ 14.280742] drm_fbdev_generic_setup+0x15c/0x1b8
[ 14.285337] hibmc_pci_probe+0x354/0x418
[ 14.289242] local_pci_probe+0x44/0x98
[ 14.292974] work_for_cpu_fn+0x20/0x30
[ 14.296708] process_one_work+0x1c4/0x4e0
[ 14.300698] worker_thread+0x2c8/0x528
[ 14.304431] kthread+0x138/0x140
[ 14.307646] ret_from_fork+0x10/0x18
[ 14.311205] ---[ end trace a2000ec2d838af4d ]---
This turned out to be due to the fbdev device hasn't been registered when
drm_fbdev_generic_setup() is invoked. Let's fix the splat by moving it down
after drm_dev_register() which will follow the "Display driver example"
documented by commit de99f0600a79 ("drm/drv: DOC: Add driver example
code").
Signed-off-by: Zenghui Yu <yuzenghui@huawei.com>
Reviewed-by: Thomas Zimmermann <tzimmermann@suse.de>
Signed-off-by: Xinliang Liu <xinliang.liu@linaro.org>
Link: https://patchwork.freedesktop.org/patch/msgid/20200706144713.1123-1-yuzenghui@huawei.com
|
|
We should not be logging a warning repeatedly on change notify.
CC: Stable <stable@vger.kernel.org> # v5.6+
Signed-off-by: Steve French <stfrench@microsoft.com>
Reviewed-by: Ronnie Sahlberg <lsahlber@redhat.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/jic23/iio into staging-linus
Jonathan writes:
First set of IIO and counter fixes in the 5.8 cycle.
The buffer alignment fixes continue to trickle through as we get
reviews in. The rest are the standard mixed bag of long term issues
just discovered an things we missed in this cycle.
IIO fixes
* core
- Add missing IIO_MOD_H2 and ETHANOL strings. Somehow got missed
when drivers were added using these in attribute names.
* afe4403, afe4404, ak8974, hdc100x, hts221, ms5611
- Fix a recently identified issue with alignment when using
iio_push_to_buffers_with_timestamp which assumes the timestamp
is 8 byte aligned.
* ad7780
- Fix a some premature / excess cleanup in an error path.
* adi-axi-adc
- Fix reference counting on the wrong object.
* ak8974
- Fix unbalance runtime pm.
* mma8452
- Fix missing iio_device_unregister in error path.
* zp2326
- Error handling for pm_runtime_get_sync failing.
counter fixes
* Add lock guards in 104-quad-8 to protect against races - done
in 2 patches to allow easy back porting.
* tag 'iio-fixes-for-5.8a' of git://git.kernel.org/pub/scm/linux/kernel/git/jic23/iio:
iio: adc: ad7780: Fix a resource handling path in 'ad7780_probe()'
iio:pressure:ms5611 Fix buffer element alignment
iio:humidity:hts221 Fix alignment and data leak issues
iio:humidity:hdc100x Fix alignment and data leak issues
iio:magnetometer:ak8974: Fix alignment and data leak issues
iio: adc: adi-axi-adc: Fix object reference counting
iio: pressure: zpa2326: handle pm_runtime_get_sync failure
counter: 104-quad-8: Add lock guards - filter clock prescaler
counter: 104-quad-8: Add lock guards - differential encoder
iio: core: add missing IIO_MOD_H2/ETHANOL string identifiers
iio: magnetometer: ak8974: Fix runtime PM imbalance on error
iio: mma8452: Add missed iio_device_unregister() call in mma8452_probe()
iio:health:afe4404 Fix timestamp alignment and prevent data leak.
iio:health:afe4403 Fix timestamp alignment and prevent data leak.
|
|
Use for_each_set_bit() instead of open-coding it to simplify the code.
Signed-off-by: Xu Wang <vulab@iscas.ac.cn>
Message-Id: <20200708062023.7986-1-vulab@iscas.ac.cn>
Signed-off-by: Max Filippov <jcmvbkbc@gmail.com>
|
|
Fold it into the two callers.
Signed-off-by: Christoph Hellwig <hch@lst.de>
|
|
Consolidate the two in-kernel read helpers to make upcoming changes
easier. The only difference are the missing call to rw_verify_area
in kernel_read, and an access_ok check that doesn't make sense for
kernel buffers to start with.
Signed-off-by: Christoph Hellwig <hch@lst.de>
|
|
__kernel_read has a bunch of additional sanity checks, and this moves
the set_fs out of non-core code.
Signed-off-by: Christoph Hellwig <hch@lst.de>
|
|
This is the counterpart to __kernel_write, and skip the rw_verify_area
call compared to kernel_read.
Signed-off-by: Christoph Hellwig <hch@lst.de>
|
|
Fold it into the two callers.
Signed-off-by: Christoph Hellwig <hch@lst.de>
|
|
Consolidate the two in-kernel write helpers to make upcoming changes
easier. The only difference are the missing call to rw_verify_area
in kernel_write, and an access_ok check that doesn't make sense for
kernel buffers to start with.
Signed-off-by: Christoph Hellwig <hch@lst.de>
|
|
Add a WARN_ON_ONCE if the file isn't actually open for write. This
matches the check done in vfs_write, but actually warn warns as a
kernel user calling write on a file not opened for writing is a pretty
obvious programming error.
Signed-off-by: Christoph Hellwig <hch@lst.de>
|
|
This is a very special interface that skips sb_writes protection, and not
used by modules anymore.
Signed-off-by: Christoph Hellwig <hch@lst.de>
|
|
While pipes don't really need sb_writers projection, __kernel_write is an
interface better kept private, and the additional rw_verify_area does not
hurt here.
Signed-off-by: Christoph Hellwig <hch@lst.de>
|
|
While pipes don't really need sb_writers projection, __kernel_write is an
interface better kept private, and the additional rw_verify_area does not
hurt here.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Ian Kent <raven@themaw.net>
|
|
__kernel_write doesn't take a sb_writers references, which we need here.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: David Howells <dhowells@redhat.com>
|
|
Add FUJITSU ETERNUS_AHB
Link: https://lore.kernel.org/r/DM6PR06MB5276CCA765336BD312C4282E8C660@DM6PR06MB5276.namprd06.prod.outlook.com
Signed-off-by: Steve Schremmer <steve.schremmer@netapp.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
The caller of cifs_posix_lock_set will do retry(like
fcntl_setlk64->do_lock_file_wait) if we will wait for any file_lock.
So the retry in cifs_poxis_lock_set seems duplicated, remove it to
make a cleanup.
Signed-off-by: yangerkun <yangerkun@huawei.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
Reviewed-by: NeilBrown <neilb@suse.de>
|
|
BRM_status_show() has several error branches, but none of them record the
error in the error return.
Also while at it remove the manual mutex_unlock() of the pci_access_mutex
in case of an ongoing pci error recovery or host removal and jump to the
cleanup label instead.
Note: We can safely jump to out from here as io_unit_pg3 is initialized to
NULL and if it hasn't been allocated, kfree() skips the NULL pointer.
[mkp: compilation warning]
Link: https://lore.kernel.org/r/20200701131454.5255-1-johannes.thumshirn@wdc.com
Reviewed-by: Damien Le Moal <damien.lemoal@wdc.com>
Acked-by: Sreekanth Reddy <sreekanth.reddy@broadcom.com>
Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
If system memory is migrated to device private memory and no GPU MMU
page table entry exists, the GPU will fault and call hmm_range_fault()
to get the PFN for the page. Since the .dev_private_owner pointer in
struct hmm_range is not set, hmm_range_fault returns an error which
results in the GPU program stopping with a fatal fault.
Fix this by setting .dev_private_owner appropriately.
Fixes: 08ddddda667b ("mm/hmm: check the device private page owner in hmm_range_fault()")
Cc: stable@vger.kernel.org
Signed-off-by: Ralph Campbell <rcampbell@nvidia.com>
Reviewed-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
|
|
The patch to add zero page migration to GPU memory inadvertently included
part of a future change which broke normal page migration to GPU memory
by copying too much data and corrupting GPU memory.
Fix this by only copying one page instead of a byte count.
Fixes: 9d4296a7d4b3 ("drm/nouveau/nouveau/hmm: fix migrate zero page to GPU")
Signed-off-by: Ralph Campbell <rcampbell@nvidia.com>
Reviewed-by: John Hubbard <jhubbard@nvidia.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
|
|
Tegra TRM says worst-case reply time is 1216us, and this should fix some
spurious timeouts that have been popping up.
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
|
|
Prevents "snd_hda_codec_hdmi hdaudioC1D0: HDMI: pin nid 5 not registered"
that occur on some configurations.
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
|
|
On eviction, we acquire the vm->mutex and then wait on the vma->active.
Therefore when binding and pinning the vma, we must follow the same
sequence, lock/pin the vma then mark it active. Otherwise, we mark the
vma as active, then wait for the vm->mutex, and meanwhile the evictor
holding the mutex waits upon us to complete our activity.
Fixes: 8ccfc20a7d56 ("drm/i915/gt: Mark ring->vma as active while pinned")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: <stable@vger.kernel.org> # v5.6+
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200706170138.8993-1-chris@chris-wilson.co.uk
(cherry picked from commit 8567774e87e23a57155e5102f81208729b992ae6)
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
|
|
read permission, not just read attributes permission, is required
on the directory.
See MS-SMB2 (protocol specification) section 3.3.5.19.
Signed-off-by: Steve French <stfrench@microsoft.com>
CC: Stable <stable@vger.kernel.org> # v5.6+
Reviewed-by: Pavel Shilovsky <pshilov@microsoft.com>
|
|
The queue reset pattern is used in a couple different places,
only slightly different from each other, and could cause
issues if one gets changed and the other didn't. This puts
them together so that only one version is needed, yet each
can have slighty different effects by passing in a pointer
to a work function to do whatever configuration twiddling is
needed in the middle of the reset.
This specifically addresses issues seen where under loops
of changing ring size or queue count parameters we could
occasionally bump into the netdev watchdog.
v2: added more commit message commentary
Fixes: 4d03e00a2140 ("ionic: Add initial ethtool support")
Signed-off-by: Shannon Nelson <snelson@pensando.io>
Acked-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Toshiaki pointed out that we now have two very similar functions to extract
the L3 protocol number in the presence of VLAN tags. And Daniel pointed out
that the unbounded parsing loop makes it possible for maliciously crafted
packets to loop through potentially hundreds of tags.
Fix both of these issues by consolidating the two parsing functions and
limiting the VLAN tag parsing to a max depth of 8 tags. As part of this,
switch over __vlan_get_protocol() to use skb_header_pointer() instead of
pskb_may_pull(), to avoid the possible side effects of the latter and keep
the skb pointer 'const' through all the parsing functions.
v2:
- Use limit of 8 tags instead of 32 (matching XMIT_RECURSION_LIMIT)
Reported-by: Toshiaki Makita <toshiaki.makita1@gmail.com>
Reported-by: Daniel Borkmann <daniel@iogearbox.net>
Fixes: d7bf2ebebc2b ("sched: consistently handle layer3 header accesses in the presence of VLANs")
Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
When generating debug dump, driver firstly collects all data in binary
form, and then performs per-feature formatting to human-readable if it
is supported.
For ethtool -d, this is roughly incorrect for two reasons. First of all,
drivers should always provide only original raw dumps to Ethtool without
any changes.
The second, and more critical, is that Ethtool's output buffer size is
strictly determined by ethtool_ops::get_regs_len(), and all data *must*
fit in it. The current version of driver always returns the size of raw
data, but the size of the formatted buffer exceeds it in most cases.
This leads to out-of-bound writes and memory corruption.
Address both issues by adding an option to return original, non-formatted
debug data, and using it for Ethtool case.
v2:
- Expand commit message to make it more clear;
- No functional changes.
Fixes: c965db444629 ("qed: Add support for debug data collection")
Signed-off-by: Alexander Lobakin <alobakin@marvell.com>
Signed-off-by: Igor Russkikh <irusskikh@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux
Pull perf tooling fixes from Arnaldo Carvalho de Melo:
- Intel PT fixes for PEBS-via-PT with registers
- Fixes for Intel PT python based GUI
- Avoid duplicated sideband events with Intel PT in system wide tracing
- Remove needless 'dummy' event from TUI menu, used when synthesizing
meta data events for pre-existing processes
- Fix corner case segfault when pressing enter in a screen without
entries in the TUI for report/top
- Fixes for time stamp handling in libtraceevent
- Explicitly set utf-8 encoding in perf flamegraph
- Update arch/x86/lib/memcpy_64.S copy used in 'perf bench mem memcpy',
silencing perf build warning
* tag 'perf-tools-fixes-2020-07-07' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux:
perf report TUI: Remove needless 'dummy' event from menu
perf intel-pt: Fix PEBS sample for XMM registers
perf intel-pt: Fix displaying PEBS-via-PT with registers
perf intel-pt: Fix recording PEBS-via-PT with registers
perf report TUI: Fix segmentation fault in perf_evsel__hists_browse()
tools lib traceevent: Add proper KBUFFER_TYPE_TIME_STAMP handling
tools lib traceevent: Add API to read time information from kbuffer
perf scripts python: exported-sql-viewer.py: Fix time chart call tree
perf scripts python: exported-sql-viewer.py: Fix zero id in call tree 'Find' result
perf scripts python: exported-sql-viewer.py: Fix zero id in call graph 'Find' result
perf scripts python: exported-sql-viewer.py: Fix unexpanded 'Find' result
perf record: Fix duplicated sideband events with Intel PT system wide tracing
perf scripts python: export-to-postgresql.py: Fix struct.pack() int argument
tools arch: Update arch/x86/lib/memcpy_64.S copy used in 'perf bench mem memcpy'
perf flamegraph: Explicitly set utf-8 encoding
|
|
Commit e57f61858b7c ("net: bridge: mcast: fix stale nsrcs pointer in
igmp3/mld2 report handling") introduced a bug in the IPv6 header payload
length check which would potentially lead to rejecting a valid MLD2 Report:
The check needs to take into account the 2 bytes for the "Number of
Sources" field in the "Multicast Address Record" before reading it.
And not the size of a pointer to this field.
Fixes: e57f61858b7c ("net: bridge: mcast: fix stale nsrcs pointer in igmp3/mld2 report handling")
Acked-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: Linus Lüssing <linus.luessing@c0d3.blue>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The packets from tunnel devices (eg bareudp) may have only
metadata in the dst pointer of skb. Hence a pointer check of
neigh_lookup is needed in dst_neigh_lookup_skb
Kernel crashes when packets from bareudp device is processed in
the kernel neighbour subsytem.
[ 133.384484] BUG: kernel NULL pointer dereference, address: 0000000000000000
[ 133.385240] #PF: supervisor instruction fetch in kernel mode
[ 133.385828] #PF: error_code(0x0010) - not-present page
[ 133.386603] PGD 0 P4D 0
[ 133.386875] Oops: 0010 [#1] SMP PTI
[ 133.387275] CPU: 0 PID: 5045 Comm: ping Tainted: G W 5.8.0-rc2+ #15
[ 133.388052] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
[ 133.391076] RIP: 0010:0x0
[ 133.392401] Code: Bad RIP value.
[ 133.394029] RSP: 0018:ffffb79980003d50 EFLAGS: 00010246
[ 133.396656] RAX: 0000000080000102 RBX: ffff9de2fe0d6600 RCX: ffff9de2fe5e9d00
[ 133.399018] RDX: 0000000000000000 RSI: ffff9de2fe5e9d00 RDI: ffff9de2fc21b400
[ 133.399685] RBP: ffff9de2fe5e9d00 R08: 0000000000000000 R09: 0000000000000000
[ 133.400350] R10: ffff9de2fbc6be22 R11: ffff9de2fe0d6600 R12: ffff9de2fc21b400
[ 133.401010] R13: ffff9de2fe0d6628 R14: 0000000000000001 R15: 0000000000000003
[ 133.401667] FS: 00007fe014918740(0000) GS:ffff9de2fec00000(0000) knlGS:0000000000000000
[ 133.402412] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 133.402948] CR2: ffffffffffffffd6 CR3: 000000003bb72000 CR4: 00000000000006f0
[ 133.403611] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 133.404270] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 133.404933] Call Trace:
[ 133.405169] <IRQ>
[ 133.405367] __neigh_update+0x5a4/0x8f0
[ 133.405734] arp_process+0x294/0x820
[ 133.406076] ? __netif_receive_skb_core+0x866/0xe70
[ 133.406557] arp_rcv+0x129/0x1c0
[ 133.406882] __netif_receive_skb_one_core+0x95/0xb0
[ 133.407340] process_backlog+0xa7/0x150
[ 133.407705] net_rx_action+0x2af/0x420
[ 133.408457] __do_softirq+0xda/0x2a8
[ 133.408813] asm_call_on_stack+0x12/0x20
[ 133.409290] </IRQ>
[ 133.409519] do_softirq_own_stack+0x39/0x50
[ 133.410036] do_softirq+0x50/0x60
[ 133.410401] __local_bh_enable_ip+0x50/0x60
[ 133.410871] ip_finish_output2+0x195/0x530
[ 133.411288] ip_output+0x72/0xf0
[ 133.411673] ? __ip_finish_output+0x1f0/0x1f0
[ 133.412122] ip_send_skb+0x15/0x40
[ 133.412471] raw_sendmsg+0x853/0xab0
[ 133.412855] ? insert_pfn+0xfe/0x270
[ 133.413827] ? vvar_fault+0xec/0x190
[ 133.414772] sock_sendmsg+0x57/0x80
[ 133.415685] __sys_sendto+0xdc/0x160
[ 133.416605] ? syscall_trace_enter+0x1d4/0x2b0
[ 133.417679] ? __audit_syscall_exit+0x1d9/0x280
[ 133.418753] ? __prepare_exit_to_usermode+0x5d/0x1a0
[ 133.419819] __x64_sys_sendto+0x24/0x30
[ 133.420848] do_syscall_64+0x4d/0x90
[ 133.421768] entry_SYSCALL_64_after_hwframe+0x44/0xa9
[ 133.422833] RIP: 0033:0x7fe013689c03
[ 133.423749] Code: Bad RIP value.
[ 133.424624] RSP: 002b:00007ffc7288f418 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
[ 133.425940] RAX: ffffffffffffffda RBX: 000056151fc63720 RCX: 00007fe013689c03
[ 133.427225] RDX: 0000000000000040 RSI: 000056151fc63720 RDI: 0000000000000003
[ 133.428481] RBP: 00007ffc72890b30 R08: 000056151fc60500 R09: 0000000000000010
[ 133.429757] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000040
[ 133.431041] R13: 000056151fc636e0 R14: 000056151fc616bc R15: 0000000000000080
[ 133.432481] Modules linked in: mpls_iptunnel act_mirred act_tunnel_key cls_flower sch_ingress veth mpls_router ip_tunnel bareudp ip6_udp_tunnel udp_tunnel macsec udp_diag inet_diag unix_diag af_packet_diag netlink_diag binfmt_misc xt_MASQUERADE iptable_nat xt_addrtype xt_conntrack nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 br_netfilter bridge stp llc ebtable_filter ebtables overlay ip6table_filter ip6_tables iptable_filter sunrpc ext4 mbcache jbd2 pcspkr i2c_piix4 virtio_balloon joydev ip_tables xfs libcrc32c ata_generic qxl pata_acpi drm_ttm_helper ttm drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops drm ata_piix libata virtio_net net_failover virtio_console failover virtio_blk i2c_core virtio_pci virtio_ring serio_raw floppy virtio dm_mirror dm_region_hash dm_log dm_mod
[ 133.444045] CR2: 0000000000000000
[ 133.445082] ---[ end trace f4aeee1958fd1638 ]---
[ 133.446236] RIP: 0010:0x0
[ 133.447180] Code: Bad RIP value.
[ 133.448152] RSP: 0018:ffffb79980003d50 EFLAGS: 00010246
[ 133.449363] RAX: 0000000080000102 RBX: ffff9de2fe0d6600 RCX: ffff9de2fe5e9d00
[ 133.450835] RDX: 0000000000000000 RSI: ffff9de2fe5e9d00 RDI: ffff9de2fc21b400
[ 133.452237] RBP: ffff9de2fe5e9d00 R08: 0000000000000000 R09: 0000000000000000
[ 133.453722] R10: ffff9de2fbc6be22 R11: ffff9de2fe0d6600 R12: ffff9de2fc21b400
[ 133.455149] R13: ffff9de2fe0d6628 R14: 0000000000000001 R15: 0000000000000003
[ 133.456520] FS: 00007fe014918740(0000) GS:ffff9de2fec00000(0000) knlGS:0000000000000000
[ 133.458046] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 133.459342] CR2: ffffffffffffffd6 CR3: 000000003bb72000 CR4: 00000000000006f0
[ 133.460782] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 133.462240] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 133.463697] Kernel panic - not syncing: Fatal exception in interrupt
[ 133.465226] Kernel Offset: 0xfa00000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
[ 133.467025] ---[ end Kernel panic - not syncing: Fatal exception in interrupt ]---
Fixes: aaa0c23cb901 ("Fix dst_neigh_lookup/dst_neigh_lookup_skb return value handling bug")
Signed-off-by: Martin Varghese <martin.varghese@nokia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
When tcf_ct_act execute the tcf_lastuse_update should
be update or the used stats never update
filter protocol ip pref 3 flower chain 0
filter protocol ip pref 3 flower chain 0 handle 0x1
eth_type ipv4
dst_ip 1.1.1.1
ip_flags frag/firstfrag
skip_hw
not_in_hw
action order 1: ct zone 1 nat pipe
index 1 ref 1 bind 1 installed 103 sec used 103 sec
Action statistics:
Sent 151500 bytes 101 pkt (dropped 0, overlimits 0 requeues 0)
backlog 0b 0p requeues 0
cookie 4519c04dc64a1a295787aab13b6a50fb
Signed-off-by: wenxu <wenxu@ucloud.cn>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
rwlock.h should not be included directly. Instead linux/splinlock.h
should be included. Including it directly will break the RT build.
Fixes: 549c243e4e010 ("net/mlx5e: Extract neigh-specific code from en_rep.c to rep/neigh.c")
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Acked-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The RFC 8684 mandates that no-data DATA FIN packets should carry
a DSS with 0 sequence number and data len equal to 1. Currently,
on FIN retransmission we re-use the existing mapping; if the previous
fin transmission was part of a partially acked data packet, we could
end-up writing in the egress packet a non-compliant DSS.
The above will be detected by a "Bad mapping" warning on the receiver
side.
This change addresses the issue explicitly checking for 0 len packet
when adding the DATA_FIN option.
Fixes: 6d0060f600ad ("mptcp: Write MPTCP DSS headers to outgoing data packets")
Reported-by: syzbot+42a07faa5923cfaeb9c9@syzkaller.appspotmail.com
Tested-by: Christoph Paasch <cpaasch@apple.com>
Reviewed-by: Christoph Paasch <cpaasch@apple.com>
Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
IPv4 ping sockets don't set fl4.fl4_icmp_{type,code}, which leads to
incomplete IPsec ACQUIRE messages being sent to userspace. Currently,
both raw sockets and IPv6 ping sockets set those fields.
Expected output of "ip xfrm monitor":
acquire proto esp
sel src 10.0.2.15/32 dst 8.8.8.8/32 proto icmp type 8 code 0 dev ens4
policy src 10.0.2.15/32 dst 8.8.8.8/32
<snip>
Currently with ping sockets:
acquire proto esp
sel src 10.0.2.15/32 dst 8.8.8.8/32 proto icmp type 0 code 0 dev ens4
policy src 10.0.2.15/32 dst 8.8.8.8/32
<snip>
The Libreswan test suite found this problem after Fedora changed the
value for the sysctl net.ipv4.ping_group_range.
Fixes: c319b4d76b9e ("net: ipv4: add IPPROTO_ICMP socket kind")
Reported-by: Paul Wouters <pwouters@redhat.com>
Tested-by: Paul Wouters <pwouters@redhat.com>
Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
In the ISR, we poll the event register for the queues in need of
service and then enter polled mode. After this point, the event
register will never be read again until we exit polled mode.
In a scenario where a UDP flow is routed back out through the same
interface, i.e. "router-on-a-stick" we'll typically only see an rx
queue event initially. Once we start to process the incoming flow
we'll be locked polled mode, but we'll never clean the tx rings since
that event is never caught.
Eventually the netdev watchdog will trip, causing all buffers to be
dropped and then the process starts over again.
Rework the NAPI poll to keep trying to consome the entire budget as
long as new events are coming in, making sure to service all rx/tx
queues, in priority order, on each pass.
Fixes: 4d494cdc92b3 ("net: fec: change data structure to support multiqueue")
Signed-off-by: Tobias Waldekranz <tobias@waldekranz.com>
Tested-by: Fugang Duan <fugang.duan@nxp.com>
Reviewed-by: Fugang Duan <fugang.duan@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
clang static analysis flags this garbage return
drivers/net/ethernet/marvell/sky2.c:208:2: warning: Undefined or garbage value returned to caller [core.uninitialized.UndefReturn]
return v;
^~~~~~~~
static inline u16 gm_phy_read( ...
{
u16 v;
__gm_phy_read(hw, port, reg, &v);
return v;
}
__gm_phy_read can return without setting v.
So handle similar to skge.c's gm_phy_read, initialize v.
Signed-off-by: Tom Rix <trix@redhat.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/mtd/linux
Pull MTD fixes from Miquel Raynal:
"MTD:
- Set a missing master partition panic write flag
Raw NAND:
- Fix build issue in the xway driver
- Fix a wrong return code"
* tag 'mtd/fixes-for-5.8-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/mtd/linux:
mtd: rawnand: xway: Fix build issue
mtd: set master partition panic write flag
nandsim: Fix return code testing of ns_find_operation()
|
|
So far, gfs2 has taken the inode glocks inside the ->readpage and
->readahead address space operations. Since commit d4388340ae0b ("fs:
convert mpage_readpages to mpage_readahead"), gfs2_readahead is passed
the pages to read ahead locked. With that, the current holder of the
inode glock may be trying to lock one of those pages while
gfs2_readahead is trying to take the inode glock, resulting in a
deadlock.
Fix that by moving the lock taking to the higher-level ->read_iter file
and ->fault vm operations. This also gets rid of an ugly lock inversion
workaround in gfs2_readpage.
The cache consistency model of filesystems like gfs2 is such that if
data is found in the page cache, the data is up to date and can be used
without taking any filesystem locks. If a page is not cached,
filesystem locks must be taken before populating the page cache.
To avoid taking the inode glock when the data is already cached,
gfs2_file_read_iter first tries to read the data with the IOCB_NOIO flag
set. If that fails, the inode glock is taken and the operation is
retried with the IOCB_NOIO flag cleared.
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
|
|
Add an IOCB_NOIO flag that indicates to generic_file_read_iter that it
shouldn't trigger any filesystem I/O for the actual request or for
readahead. This allows to do tentative reads out of the page cache as
some filesystems allow, and to take the appropriate locks and retry the
reads only if the requested pages are not cached.
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux
Pull btrfs fixes from David Sterba:
- regression fix of a leak in global block reserve accounting
- fix a (hard to hit) race of readahead vs releasepage that could lead
to crash
- convert all remaining uses of comment fall through annotations to the
pseudo keyword
- fix crash when mounting a fuzzed image with -o recovery
* tag 'for-5.8-rc4-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
btrfs: reset tree root pointer after error in init_tree_roots
btrfs: fix reclaim_size counter leak after stealing from global reserve
btrfs: fix fatal extent_buffer readahead vs releasepage race
btrfs: convert comments to fallthrough annotations
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/vgupta/arc
Pull ARC fixes from Vineet Gupta:
- User build systems to pass -mcpu
- Fix potential EFA clobber in syscall handler
- Fix ARCompact 2 levels of interrupts build
- Detect newer HS CPU releases
- misc other fixes
* tag 'arc-5.8-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/vgupta/arc:
ARCv2: support loop buffer (LPB) disabling
ARC: build: remove deprecated toggle for arc700 builds
ARC: build: allow users to specify -mcpu
ARCv2: boot log: detect newer/upconing HS3x/HS4x releases
ARC: elf: use right ELF_ARCH
ARC: [arcompact] fix bitrot with 2 levels of interrupt
ARC: entry: fix potential EFA clobber when TIF_SYSCALL_TRACE
|
|
When we clone a socket in sk_clone_lock(), its sk_cgrp_data is
copied, so the cgroup refcnt must be taken too. And, unlike the
sk_alloc() path, sock_update_netprioidx() is not called here.
Therefore, it is safe and necessary to grab the cgroup refcnt
even when cgroup_sk_alloc is disabled.
sk_clone_lock() is in BH context anyway, the in_interrupt()
would terminate this function if called there. And for sk_alloc()
skcd->val is always zero. So it's safe to factor out the code
to make it more readable.
The global variable 'cgroup_sk_alloc_disabled' is used to determine
whether to take these reference counts. It is impossible to make
the reference counting correct unless we save this bit of information
in skcd->val. So, add a new bit there to record whether the socket
has already taken the reference counts. This obviously relies on
kmalloc() to align cgroup pointers to at least 4 bytes,
ARCH_KMALLOC_MINALIGN is certainly larger than that.
This bug seems to be introduced since the beginning, commit
d979a39d7242 ("cgroup: duplicate cgroup reference when cloning sockets")
tried to fix it but not compeletely. It seems not easy to trigger until
the recent commit 090e28b229af
("netprio_cgroup: Fix unlimited memory leak of v2 cgroups") was merged.
Fixes: bd1060a1d671 ("sock, cgroup: add sock->sk_cgroup")
Reported-by: Cameron Berkenpas <cam@neo-zeon.de>
Reported-by: Peter Geis <pgwipeout@gmail.com>
Reported-by: Lu Fengqi <lufq.fnst@cn.fujitsu.com>
Reported-by: Daniël Sonck <dsonck92@gmail.com>
Reported-by: Zhang Qiang <qiang.zhang@windriver.com>
Tested-by: Cameron Berkenpas <cam@neo-zeon.de>
Tested-by: Peter Geis <pgwipeout@gmail.com>
Tested-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Zefan Li <lizefan@huawei.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Roman Gushchin <guro@fb.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|