Age | Commit message (Collapse) | Author |
|
Verify the inode di_forkoff, lifted from xfs_repair's
process_check_inode_forkoff().
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
|
|
The iomap direct I/O code issues a single ->end_io call for the whole
I/O request, and if some of the extents cowered needed a COW operation
it will call xfs_reflink_end_cow over the whole range.
When we do AIO writes we drop the iolock after doing the initial setup,
but before the I/O completion. Between dropping the lock and completing
the I/O we can have a racing buffered write create new delalloc COW fork
extents in the region covered by the outstanding direct I/O write, and
thus see delalloc COW fork extents in xfs_reflink_end_cow. As
concurrent writes are fundamentally racy and no guarantees are given we
can simply skip those.
This can be easily reproduced with xfstests generic/208 in always_cow
mode.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
|
|
xchk_inode_flags2() currently treats any di_flags2 values that the
running kernel doesn't recognize as corruption, and calls
xchk_ino_set_corrupt() if they are set. However, it's entirely possible
that these flags were set in some newer kernel and are quite valid,
but ignored in this kernel.
(Validators don't care one bit about unknown di_flags2.)
Call xchk_ino_set_warning instead, because this may or may not actually
indicate a problem.
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
|
|
Remove duplicated include xfs_alloc.h
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
|
|
This function is only used to punch out delayed allocations on I/O
failure, which means we need to have read the extents earlier.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
|
|
When xfs_reflink_allocate_cow() allocates a transaction, it drops
the ILOCK to perform the operation. This Introduces a race condition
where another thread modifying the file can perform the COW
allocation operation underneath us. This result in the retry loop
finding an allocated block and jumping straight to the conversion
code. It does not, however, cancel the transaction it holds and so
this gets leaked. This results in a lockdep warning:
================================================
WARNING: lock held when returning to user space!
4.18.5 #1 Not tainted
------------------------------------------------
worker/6123 is leaving the kernel with locks still held!
1 lock held by worker/6123:
#0: 000000009eab4f1b (sb_internal#2){.+.+}, at: xfs_trans_alloc+0x17c/0x220
And eventually the filesystem deadlocks because it runs out of log
space that is reserved by the leaked transaction and never gets
released.
The logic flow in xfs_reflink_allocate_cow() is a convoluted mess of
gotos - it's no surprise that it has bug where the flow through
several goto jumps then fails to clean up context from a non-obvious
logic path. CLean up the logic flow and make sure every path does
the right thing.
Reported-by: Alexander Y. Fomichev <git.user@gmail.com>
Tested-by: Alexander Y. Fomichev <git.user@gmail.com>
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=200981
Signed-off-by: Dave Chinner <dchinner@redhat.com>
[hch: slight refactor]
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
|
|
We've had a few reports of lockdep tripping over memory reclaim
context vs filesystem freeze "deadlocks". They all have looked
to be false positives on analysis, but it seems that they are
being tripped because we take freeze references before we run
a GFP_KERNEL allocation for the struct xfs_trans.
We can avoid this false positive vector just by re-ordering the
operations in xfs_trans_alloc(). That is. we need allocate the
structure before we take the freeze reference and enter the GFP_NOFS
allocation context that follows the xfs_trans around. This prevents
lockdep from seeing the GFP_KERNEL allocation inside the transaction
context, and that prevents it from triggering the freeze level vs
alloc context vs reclaim warnings.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
|
|
The xfs_buf_log_item structure has a reference counter with slightly
tricky semantics. In the common case, a buffer is logged and
committed in a transaction, committed to the on-disk log (added to
the AIL) and then finally written back and removed from the AIL. The
bli refcount covers two potentially overlapping timeframes:
1. the bli is held in an active transaction
2. the bli is pinned by the log
The caveat to this approach is that the reference counter does not
purely dictate the lifetime of the bli. IOW, when a dirty buffer is
physically logged and unpinned, the bli refcount may go to zero as
the log item is inserted into the AIL. Only once the buffer is
written back can the bli finally be freed.
The above semantics means that it is not enough for the various
refcount decrementing contexts to release the bli on decrement to
zero. xfs_trans_brelse(), transaction commit (->iop_unlock()) and
unpin (->iop_unpin()) must all drop the associated reference and
make additional checks to determine if the current context is
responsible for freeing the item.
For example, if a transaction holds but does not dirty a particular
bli, the commit may drop the refcount to zero. If the bli itself is
clean, it is also not AIL resident and must be freed at this time.
The same is true for xfs_trans_brelse(). If the transaction dirties
a bli and then aborts or an unpin results in an abort due to a log
I/O error, the last reference count holder is expected to explicitly
remove the item from the AIL and release it (since an abort means
filesystem shutdown and metadata writeback will never occur).
This leads to fairly complex checks being replicated in a few
different places. Since ->iop_unlock() and xfs_trans_brelse() are
nearly identical, refactor the logic into a common helper that
implements and documents the semantics in one place. This patch does
not change behavior.
Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
|
|
xfs_trans_brelse() is a bit of a historical mess, similar to
xfs_buf_item_unlock(). It is unnecessarily verbose, has snippets of
commented out code, inconsistency with regard to stale items, etc.
Clean up xfs_trans_brelse() to use similar logic and flow as
xfs_buf_item_unlock() with regard to bli reference count handling.
This patch makes no functional changes, but facilitates further
refactoring of the common bli reference count handling code.
Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
|
|
xfstests generic/388,475 occasionally reproduce assertion failures
in xfs_buf_item_unpin() when the final bli reference is dropped on
an invalidated buffer and the buffer is not locked as it is expected
to be. Invalidated buffers should remain locked on transaction
commit until the final unpin, at which point the buffer is removed
from the AIL and the bli is freed since stale buffers are not
written back.
The assert failures are associated with filesystem shutdown,
typically due to log I/O errors injected by the test. The
problematic situation can occur if the shutdown happens to cause a
race between an active transaction that has invalidated a particular
buffer and an I/O error on a log buffer that contains the bli
associated with the same (now stale) buffer.
Both transaction and log contexts acquire a bli reference. If the
transaction has already invalidated the buffer by the time the I/O
error occurs and ends up aborting due to shutdown, the transaction
and log hold the last two references to a stale bli. If the
transaction cancel occurs first, it treats the buffer as non-stale
due to the aborted state: the bli reference is dropped and the
buffer is released/unlocked. The log buffer I/O error handling
eventually calls into xfs_buf_item_unpin(), drops the final
reference to the bli and treats it as stale. The buffer wasn't left
locked by xfs_buf_item_unlock(), however, so the assert fails and
the buffer is double unlocked. The latter problem is mitigated by
the fact that the fs is shutdown and no further damage is possible.
->iop_unlock() of an invalidated buffer should behave consistently
with respect to the bli refcount, regardless of aborted state. If
the refcount remains elevated on commit, we know the bli is awaiting
an unpin (since it can't be in another transaction) and will be
handled appropriately on log buffer completion. If the final bli
reference of an invalidated buffer is dropped in ->iop_unlock(), we
can assume the transaction has aborted because invalidation implies
a dirty transaction. In the non-abort case, the log would have
acquired a bli reference in ->iop_pin() and prevented bli release at
->iop_unlock() time. In the abort case the item must be freed and
buffer unlocked because it wasn't pinned by the log.
Rework xfs_buf_item_unlock() to simplify the currently circuitous
and duplicate logic and leave invalidated buffers locked based on
bli refcount, regardless of aborted state. This ensures that a
pinned, stale buffer is always found locked when eventually
unpinned.
Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
|
|
Now that deferred operations are completely managed via
transactions, it's no longer necessary to cancel the dfops in error
paths that already cancel the associated transaction. There are a
few such calls lingering throughout the codebase.
Remove all remaining unnecessary calls to xfs_defer_cancel(). This
leaves xfs_defer_cancel() calls in two places. The first is the call
in the transaction cancel path itself, which facilitates this patch.
The second is made via the xfs_defer_finish() error path to provide
consistent error semantics with transaction commit. For example,
xfs_trans_commit() expects an xfs_defer_finish() failure to clean up
the dfops structure before it returns.
Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
|
|
The VFS routine that calls ->get_link blindly copies whatever's returned
into the user's buffer. If we return a NULL pointer, the vfs will
crash on the null pointer. Therefore, return -EFSCORRUPTED instead of
blowing up the kernel.
[dgc: clean up with hch's suggestions]
Reported-by: wen.xu@gatech.edu
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Allison Henderson <allison.henderson@oracle.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input
Dmitry writes:
"Input updates for v4.19-rc5
Just a few driver fixes"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input:
Input: uinput - allow for max == min during input_absinfo validation
Input: elantech - enable middle button of touchpad on ThinkPad P72
Input: atakbd - fix Atari CapsLock behaviour
Input: atakbd - fix Atari keymap
Input: egalax_ts - add system wakeup support
Input: gpio-keys - fix a documentation index issue
|
|
https://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi
Mark writes:
"spi: Fixes for v4.19
Quite a few fixes for the Renesas drivers in here, plus a fix for the
Tegra driver and some documentation fixes for the recently added
spi-mem code. The Tegra fix is relatively large but fairly
straightforward and mechanical, it runs on probe so it's been
reasonably well covered in -next testing."
* tag 'spi-fix-v4.19-rc5' of https://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi:
spi: spi-mem: Move the DMA-able constraint doc to the kerneldoc header
spi: spi-mem: Add missing description for data.nbytes field
spi: rspi: Fix interrupted DMA transfers
spi: rspi: Fix invalid SPI use during system suspend
spi: sh-msiof: Fix handling of write value for SISTR register
spi: sh-msiof: Fix invalid SPI use during system suspend
spi: gpio: Fix copy-and-paste error
spi: tegra20-slink: explicitly enable/disable clock
|
|
https://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator
Mark writes:
"regulator: Fixes for 4.19
A collection of fairly minor bug fixes here, a couple of driver
specific ones plus two core fixes. There's one fix for the new
suspend state code which fixes some confusion with constant values
that are supposed to indicate noop operation and another fixing a
race condition with the creation of sysfs files on new regulators."
* tag 'regulator-v4.19-rc5' of https://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator:
regulator: fix crash caused by null driver data
regulator: Fix 'do-nothing' value for regulators without suspend state
regulator: da9063: fix DT probing with constraints
regulator: bd71837: Disable voltage monitoring for LDO3/4
|
|
https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux
Michael writes:
"powerpc fixes for 4.19 #3
A reasonably big batch of fixes due to me being away for a few weeks.
A fix for the TM emulation support on Power9, which could result in
corrupting the guest r11 when running under KVM.
Two fixes to the TM code which could lead to userspace GPR corruption
if we take an SLB miss at exactly the wrong time.
Our dynamic patching code had a bug that meant we could patch freed
__init text, which could lead to corrupting userspace memory.
csum_ipv6_magic() didn't work on little endian platforms since we
optimised it recently.
A fix for an endian bug when reading a device tree property telling
us how many storage keys the machine has available.
Fix a crash seen on some configurations of PowerVM when migrating the
partition from one machine to another.
A fix for a regression in the setup of our CPU to NUMA node mapping
in KVM guests.
A fix to our selftest Makefiles to make them work since a recent
change to the shared Makefile logic."
* tag 'powerpc-4.19-3' of https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
selftests/powerpc: Fix Makefiles for headers_install change
powerpc/numa: Use associativity if VPHN hcall is successful
powerpc/tm: Avoid possible userspace r1 corruption on reclaim
powerpc/tm: Fix userspace r13 corruption
powerpc/pseries: Fix unitialized timer reset on migration
powerpc/pkeys: Fix reading of ibm, processor-storage-keys property
powerpc: fix csum_ipv6_magic() on little endian platforms
powerpc/powernv/ioda2: Reduce upper limit for DMA window size (again)
powerpc: Avoid code patching freed init sections
KVM: PPC: Book3S HV: Fix guest r11 corruption with POWER9 TM workarounds
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl
Linus writes:
"Pin control fixes for v4.19:
- Fixes to x86 hardware:
- AMD interrupt debounce issues
- Faulty Intel cannonlake register offset
- Revert pin translation IRQ locking"
* tag 'pinctrl-v4.19-4' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl:
Revert "pinctrl: intel: Do pin translation when lock IRQ"
pinctrl: cannonlake: Fix HOSTSW_OWN register offset of H variant
pinctrl/amd: poll InterruptEnable bits in amd_gpio_irq_set_type
|
|
Prior to 256a45937093 ("PCI/AER: Squash aerdrv_acpi.c into aerdrv.c"),
drivers/pci/pcie/aer/aerdrv_acpi.c contained code to parse the ACPI HEST
table. That code now lives in drivers/pci/pcie/aer.c.
Remove the "F: drivers/pci/*/*/*acpi*" pattern because it matches nothing.
We could add a "F: drivers/pci/pcie/aer.c" pattern to the ACPI APEI
section, but that file sees a lot of changes, almost none of which are of
interest to the ACPI folks.
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
|
|
It is possible that a failure can occur during the scheduling of a
pinned event. The initial portion of perf_event_read_local() contains
the various error checks an event should pass before it can be
considered valid. Ensure that the potential scheduling failure
of a pinned event is checked for and have a credible error.
Suggested-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: fenghua.yu@intel.com
Cc: tony.luck@intel.com
Cc: acme@kernel.org
Cc: gavin.hindman@intel.com
Cc: jithu.joseph@intel.com
Cc: dave.hansen@intel.com
Cc: hpa@zytor.com
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/6486385d1f30336e9973b24c8c65f5079543d3d3.1537377064.git.reinette.chatre@intel.com
|
|
Eric Dumazet says:
====================
netpoll: second round of fixes.
As diagnosed by Song Liu, ndo_poll_controller() can
be very dangerous on loaded hosts, since the cpu
calling ndo_poll_controller() might steal all NAPI
contexts (for all RX/TX queues of the NIC).
This capture, showing one ksoftirqd eating all cycles
can last for unlimited amount of time, since one
cpu is generally not able to drain all the queues under load.
It seems that all networking drivers that do use NAPI
for their TX completions, should not provide a ndo_poll_controller() :
Most NAPI drivers have netpoll support already handled
in core networking stack, since netpoll_poll_dev()
uses poll_napi(dev) to iterate through registered
NAPI contexts for a device.
First patch is a fix in poll_one_napi().
Then following patches take care of ten drivers.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
As diagnosed by Song Liu, ndo_poll_controller() can
be very dangerous on loaded hosts, since the cpu
calling ndo_poll_controller() might steal all NAPI
contexts (for all RX/TX queues of the NIC). This capture
can last for unlimited amount of time, since one
cpu is generally not able to drain all the queues under load.
ibmvnic uses NAPI for TX completions, so we better let core
networking stack call the napi->poll() to avoid the capture.
ibmvnic_netpoll_controller() was completely wrong anyway,
as it was scheduling NAPI to service RX queues (instead of TX),
so I doubt netpoll ever worked on this driver.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Thomas Falcon <tlfalcon@linux.vnet.ibm.com>
Cc: John Allen <jallen@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
As diagnosed by Song Liu, ndo_poll_controller() can
be very dangerous on loaded hosts, since the cpu
calling ndo_poll_controller() might steal all NAPI
contexts (for all RX/TX queues of the NIC). This capture
can last for unlimited amount of time, since one
cpu is generally not able to drain all the queues under load.
sfc-falcon uses NAPI for TX completions, so we better let core
networking stack call the napi->poll() to avoid the capture.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Solarflare linux maintainers <linux-net-drivers@solarflare.com>
Cc: Edward Cree <ecree@solarflare.com>
Cc: Bert Kenward <bkenward@solarflare.com>
Acked-By: Bert Kenward <bkenward@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
As diagnosed by Song Liu, ndo_poll_controller() can
be very dangerous on loaded hosts, since the cpu
calling ndo_poll_controller() might steal all NAPI
contexts (for all RX/TX queues of the NIC). This capture
can last for unlimited amount of time, since one
cpu is generally not able to drain all the queues under load.
sfc uses NAPI for TX completions, so we better let core
networking stack call the napi->poll() to avoid the capture.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Edward Cree <ecree@solarflare.com>
Cc: Bert Kenward <bkenward@solarflare.com>
Cc: Solarflare linux maintainers <linux-net-drivers@solarflare.com>
Acked-By: Bert Kenward <bkenward@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
As diagnosed by Song Liu, ndo_poll_controller() can
be very dangerous on loaded hosts, since the cpu
calling ndo_poll_controller() might steal all NAPI
contexts (for all RX/TX queues of the NIC). This capture
can last for unlimited amount of time, since one
cpu is generally not able to drain all the queues under load.
ena uses NAPI for TX completions, so we better let core
networking stack call the napi->poll() to avoid the capture.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Netanel Belgazal <netanel@amazon.com>
Cc: Saeed Bishara <saeedb@amazon.com>
Cc: Zorik Machulsky <zorik@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
As diagnosed by Song Liu, ndo_poll_controller() can
be very dangerous on loaded hosts, since the cpu
calling ndo_poll_controller() might steal all NAPI
contexts (for all RX/TX queues of the NIC). This capture
can last for unlimited amount of time, since one
cpu is generally not able to drain all the queues under load.
netxen uses NAPI for TX completions, so we better let core
networking stack call the napi->poll() to avoid the capture.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Manish Chopra <manish.chopra@cavium.com>
Cc: Rahul Verma <rahul.verma@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
As diagnosed by Song Liu, ndo_poll_controller() can
be very dangerous on loaded hosts, since the cpu
calling ndo_poll_controller() might steal all NAPI
contexts (for all RX/TX queues of the NIC). This capture
can last for unlimited amount of time, since one
cpu is generally not able to drain all the queues under load.
qlcnic uses NAPI for TX completions, so we better let core
networking stack call the napi->poll() to avoid the capture.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Harish Patil <harish.patil@cavium.com>
Cc: Manish Chopra <manish.chopra@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
As diagnosed by Song Liu, ndo_poll_controller() can
be very dangerous on loaded hosts, since the cpu
calling ndo_poll_controller() might steal all NAPI
contexts (for all RX/TX queues of the NIC). This capture
can last for unlimited amount of time, since one
cpu is generally not able to drain all the queues under load.
virto_net uses NAPI for TX completions, so we better let core
networking stack call the napi->poll() to avoid the capture.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Jason Wang <jasowang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
As diagnosed by Song Liu, ndo_poll_controller() can
be very dangerous on loaded hosts, since the cpu
calling ndo_poll_controller() might steal all NAPI
contexts (for all RX/TX queues of the NIC). This capture
can last for unlimited amount of time, since one
cpu is generally not able to drain all the queues under load.
hns uses NAPI for TX completions, so we better let core
networking stack call the napi->poll() to avoid the capture.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Yisen Zhuang <yisen.zhuang@huawei.com>
Cc: Salil Mehta <salil.mehta@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
As diagnosed by Song Liu, ndo_poll_controller() can
be very dangerous on loaded hosts, since the cpu
calling ndo_poll_controller() might steal all NAPI
contexts (for all RX/TX queues of the NIC). This capture
can last for unlimited amount of time, since one
cpu is generally not able to drain all the queues under load.
ehea uses NAPI for TX completions, so we better let core
networking stack call the napi->poll() to avoid the capture.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Douglas Miller <dougmill@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
As diagnosed by Song Liu, ndo_poll_controller() can
be very dangerous on loaded hosts, since the cpu
calling ndo_poll_controller() might steal all NAPI
contexts (for all RX/TX queues of the NIC). This capture
can last for unlimited amount of time, since one
cpu is generally not able to drain all the queues under load.
hinic uses NAPI for TX completions, so we better let core
networking stack call the napi->poll() to avoid the capture.
Note that hinic_netpoll() was incorrectly scheduling NAPI
on both RX and TX queues.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Aviad Krawczyk <aviad.krawczyk@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Since we do no longer require NAPI drivers to provide
an ndo_poll_controller(), napi_schedule() has not been done
before poll_one_napi() invocation.
So testing NAPI_STATE_SCHED is likely to cause early returns.
While we are at it, remove outdated comment.
Note to future bisections : This change might surface prior
bugs in drivers. See commit 73f21c653f93 ("bnxt_en: Fix TX
timeout during netpoll.") for one occurrence.
Fixes: ac3d9dd034e5 ("netpoll: make ndo_poll_controller() optional")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Tested-by: Song Liu <songliubraving@fb.com>
Cc: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211
Johannes Berg says:
====================
More patches than I'd like perhaps, but each seems reasonable:
* two new spectre-v1 mitigations in nl80211
* TX status fix in general, and mesh in particular
* powersave vs. offchannel fix
* regulatory initialization fix
* fix for a queue hang due to a bad return value
* allocate TXQs for active monitor interfaces, fixing my
earlier patch to avoid unnecessary allocations where I
missed this case needed them
* fix TDLS data frames priority assignment
* fix scan results processing to take into account duplicate
channel numbers (over different operating classes, but we
don't necessarily know the operating class)
* various hwsim fixes for radio destruction and new radio
announcement messages
* remove an extraneous kernel-doc line
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The structure shared between driver and the management FW (mfw) differ in
sizes. This would lead to issues when driver try to access the structure
members which are not-aligned with the mfw copy e.g., data_ptr usage in the
case of mfw_tlv request.
Align the driver structure with mfw copy, add reserved field(s) to driver
structure for the members not used by the driver.
Fixes: dd006921d67f ("qed: Add MFW interfaces for TLV request support.)
Signed-off-by: Sudarsana Reddy Kalluru <Sudarsana.Kalluru@cavium.com>
Signed-off-by: Michal Kalderon <Michal.Kalderon@cavium.com>
|
|
Signed-off-by: Sudarsana Reddy Kalluru <Sudarsana.Kalluru@cavium.com>
Signed-off-by: Ameen Rahman <Ameen.Rahman@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
I haven't been doing reviews only but not active development on bridge
code for several years. Roopa and Nikolay have been doing most of
the new features and have agreed to take over as new co-maintainers.
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Acked-by: Roopa Prabhu <roopa@cumulusnetworks.com>
Acked-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
|
|
Julian Wiedmann says:
====================
s390/qeth: fixes 2019-09-26
please apply two qeth patches for -net. The first is a trivial cleanup
required for patch #2 by Jean, which fixes a potential endless loop.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Functions qeth_get_ipa_msg and qeth_get_ipa_cmd_name are modifying
the last member of global arrays without any locking that I can see.
If two instances of either function are running at the same time,
it could cause a race ultimately leading to an array overrun (the
contents of the last entry of the array is the only guarantee that
the loop will ever stop).
Performing the lookups without modifying the arrays is admittedly
slower (two comparisons per iteration instead of one) but these
are operations which are rare (should only be needed in error
cases or when debugging, not during successful operation) and it
seems still less costly than introducing a mutex to protect the
arrays in question.
As a side bonus, it allows us to declare both arrays as const data.
Signed-off-by: Jean Delvare <jdelvare@suse.de>
Cc: Julian Wiedmann <jwi@linux.ibm.com>
Cc: Ursula Braun <ubraun@linux.ibm.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Use the common code ARRAY_SIZE macro instead of a private implementation.
Reviewed-by: Jean Delvare <jdelvare@suse.de>
Signed-off-by: zhong jiang <zhongjiang@huawei.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Dave writes:
"drm fixes for 4.19-rc6
Looks like a pretty normal week for graphics,
core: syncobj fix, panel link regression revert
amd: suspend/resume fixes, EDID emulation fix
mali-dp: NV12 writeback and vblank reset fixes
etnaviv: DMA setup fix"
* tag 'drm-fixes-2018-09-28' of git://anongit.freedesktop.org/drm/drm:
drm/amd/display: Fix Edid emulation for linux
drm/amd/display: Fix Vega10 lightup on S3 resume
drm/amdgpu: Fix vce work queue was not cancelled when suspend
Revert "drm/panel: Add device_link from panel device to DRM device"
drm/syncobj: Don't leak fences when WAIT_FOR_SUBMIT is set
drm/malidp: Fix writeback in NV12
drm: mali-dp: Call drm_crtc_vblank_reset on device init
drm/etnaviv: add DMA configuration for etnaviv platform device
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/palmer/riscv-linux
Palmer writes:
"A Single RISC-V Update for 4.19-rc6
The Debian guys have been pushing on our port and found some
unversioned symbols leaking into modules. This PR contains a single
fix for that issue."
* tag 'riscv-for-linus-4.19-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/palmer/riscv-linux:
RISC-V: include linux/ftrace.h in asm-prototypes.h
|
|
ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/helgaas/pci
Bjorn writes:
"PCI fixes:
- Fix ACPI hotplug issue that causes black screen crash at boot (Mika
Westerberg)
- Fix DesignWare "scheduling while atomic" issues (Jisheng Zhang)
- Add PPC contacts to MAINTAINERS for PCI core error handling (Bjorn
Helgaas)
- Sort Mobiveil MAINTAINERS entry (Lorenzo Pieralisi)"
* tag 'pci-v4.19-fixes-2' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/helgaas/pci:
ACPI / hotplug / PCI: Don't scan for non-hotplug bridges if slot is not bridge
PCI: dwc: Fix scheduling while atomic issues
MAINTAINERS: Move mobiveil PCI driver entry where it belongs
MAINTAINERS: Update PPC contacts for PCI core error handling
|
|
The debounce value passed to mmc_gpiod_request_cd() function is in
microseconds, but msecs_to_jiffies() requires the value to be in
miliseconds to properly calculate the delay, so adjust the value stored
in cd_debounce_delay_ms context entry.
Fixes: 1d71926bbd59 ("mmc: core: Fix debounce time to use microseconds")
Fixes: bfd694d5e21c ("mmc: core: Add tunable delay before detecting card
after card is inserted")
Cc: stable@vger.kernel.org # v4.18+
Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
Reviewed-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
|
|
Pull NVMe fix from Christoph.
* 'nvme-4.19' of git://git.infradead.org/nvme:
nvme: properly propagate errors in nvme_mpath_init
|
|
Commit a46b53672b2c2e3770b38a4abf90d16364d2584b ("xen/blkfront: cleanup
stale persistent grants") introduced a regression as purged persistent
grants were not pu into the list of free grants again. Correct that.
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Signed-off-by: Juergen Gross <jgross@suse.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Fix didn't work for all cases, reverting to add a (hopefully)
better fix.
This reverts commit f151ba989d149bbdfc90e5405724bbea094f9b17.
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
cgroup_storage_update_elem() shouldn't accept any flags
argument values except BPF_ANY and BPF_EXIST to guarantee
the backward compatibility, had a new flag value been added.
Fixes: de9cbbaadba5 ("bpf: introduce cgroup storage maps")
Signed-off-by: Roman Gushchin <guro@fb.com>
Reported-by: Daniel Borkmann <daniel@iogearbox.net>
Cc: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
|
|
Only check for the network namespace if the socket is available.
Fixes: f564650106a6 ("netfilter: check if the socket netns is correct.")
Reported-by: Guenter Roeck <linux@roeck-us.net>
Tested-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Flavio Leitner <fbl@redhat.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
Unfortunately some versions of gcc emit following warning:
$ make net/xfrm/xfrm_output.o
linux/compiler.h:252:20: warning: array subscript is above array bounds [-Warray-bounds]
hook_head = rcu_dereference(net->nf.hooks_arp[hook]);
^~~~~~~~~~~~~~~~~~~~~
xfrm_output_resume passes skb_dst(skb)->ops->family as its 'pf' arg so compiler
can't know that we'll never access hooks_arp[].
(NFPROTO_IPV4 or NFPROTO_IPV6 are only possible cases).
Avoid this by adding an explicit WARN_ON_ONCE() check.
This patch has no effect if the family is a compile-time constant as gcc
will remove the switch() construct entirely.
Reported-by: David Ahern <dsahern@gmail.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
Reviewed-by: David Ahern <dsahern@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
The nft_set_gc_batch_check() checks whether gc buffer is full.
If gc buffer is full, gc buffer is released by
the nft_set_gc_batch_complete() internally.
In case of rbtree, the rb_erase() should be called before calling the
nft_set_gc_batch_complete(). therefore the rb_erase() should
be called before calling the nft_set_gc_batch_check() too.
test commands:
table ip filter {
set set1 {
type ipv4_addr; flags interval, timeout;
gc-interval 10s;
timeout 1s;
elements = {
1-2,
3-4,
5-6,
...
10000-10001,
}
}
}
%nft -f test.nft
splat looks like:
[ 430.273885] kasan: GPF could be caused by NULL-ptr deref or user memory access
[ 430.282158] general protection fault: 0000 [#1] SMP DEBUG_PAGEALLOC KASAN PTI
[ 430.283116] CPU: 1 PID: 190 Comm: kworker/1:2 Tainted: G B 4.18.0+ #7
[ 430.283116] Workqueue: events_power_efficient nft_rbtree_gc [nf_tables_set]
[ 430.313559] RIP: 0010:rb_next+0x81/0x130
[ 430.313559] Code: 08 49 bd 00 00 00 00 00 fc ff df 48 bb 00 00 00 00 00 fc ff df 48 85 c0 75 05 eb 58 48 89 d4
[ 430.313559] RSP: 0018:ffff88010cdb7680 EFLAGS: 00010207
[ 430.313559] RAX: 0000000000b84854 RBX: dffffc0000000000 RCX: ffffffff83f01973
[ 430.313559] RDX: 000000000017090c RSI: 0000000000000008 RDI: 0000000000b84864
[ 430.313559] RBP: ffff8801060d4588 R08: fffffbfff09bc349 R09: fffffbfff09bc349
[ 430.313559] R10: 0000000000000001 R11: fffffbfff09bc348 R12: ffff880100f081a8
[ 430.313559] R13: dffffc0000000000 R14: ffff880100ff8688 R15: dffffc0000000000
[ 430.313559] FS: 0000000000000000(0000) GS:ffff88011b400000(0000) knlGS:0000000000000000
[ 430.313559] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 430.313559] CR2: 0000000001551008 CR3: 000000005dc16000 CR4: 00000000001006e0
[ 430.313559] Call Trace:
[ 430.313559] nft_rbtree_gc+0x112/0x5c0 [nf_tables_set]
[ 430.313559] process_one_work+0xc13/0x1ec0
[ 430.313559] ? _raw_spin_unlock_irq+0x29/0x40
[ 430.313559] ? pwq_dec_nr_in_flight+0x3c0/0x3c0
[ 430.313559] ? set_load_weight+0x270/0x270
[ 430.313559] ? __switch_to_asm+0x34/0x70
[ 430.313559] ? __switch_to_asm+0x40/0x70
[ 430.313559] ? __switch_to_asm+0x34/0x70
[ 430.313559] ? __switch_to_asm+0x34/0x70
[ 430.313559] ? __switch_to_asm+0x40/0x70
[ 430.313559] ? __switch_to_asm+0x34/0x70
[ 430.313559] ? __switch_to_asm+0x40/0x70
[ 430.313559] ? __switch_to_asm+0x34/0x70
[ 430.313559] ? __switch_to_asm+0x34/0x70
[ 430.313559] ? __switch_to_asm+0x40/0x70
[ 430.313559] ? __switch_to_asm+0x34/0x70
[ 430.313559] ? __schedule+0x6d3/0x1f50
[ 430.313559] ? find_held_lock+0x39/0x1c0
[ 430.313559] ? __sched_text_start+0x8/0x8
[ 430.313559] ? cyc2ns_read_end+0x10/0x10
[ 430.313559] ? save_trace+0x300/0x300
[ 430.313559] ? sched_clock_local+0xd4/0x140
[ 430.313559] ? find_held_lock+0x39/0x1c0
[ 430.313559] ? worker_thread+0x353/0x1120
[ 430.313559] ? worker_thread+0x353/0x1120
[ 430.313559] ? lock_contended+0xe70/0xe70
[ 430.313559] ? __lock_acquire+0x4500/0x4500
[ 430.535635] ? do_raw_spin_unlock+0xa5/0x330
[ 430.535635] ? do_raw_spin_trylock+0x101/0x1a0
[ 430.535635] ? do_raw_spin_lock+0x1f0/0x1f0
[ 430.535635] ? _raw_spin_lock_irq+0x10/0x70
[ 430.535635] worker_thread+0x15d/0x1120
[ ... ]
Fixes: 8d8540c4f5e0 ("netfilter: nft_set_rbtree: add timeout support")
Signed-off-by: Taehee Yoo <ap420073@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
Fix error distribution by immediately delivering the errors to all the
affected calls rather than deferring them to a worker thread. The problem
with the latter is that retries and things can happen in the meantime when we
want to stop that sooner.
To this end:
(1) Stop the error distributor from removing calls from the error_targets
list so that peer->lock isn't needed to synchronise against other adds
and removals.
(2) Require the peer's error_targets list to be accessed with RCU, thereby
avoiding the need to take peer->lock over distribution.
(3) Don't attempt to affect a call's state if it is already marked complete.
Signed-off-by: David Howells <dhowells@redhat.com>
|