Age | Commit message (Collapse) | Author |
|
We add the sysfs interface the read back the backplane
status of the interface.
Signed-off-by: Michael Grzeschik <m.grzeschik@pengutronix.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
We need to track the status of our queued packages. This way the driving
process knows if failed packages need to be retransmitted. For this
purpose we queue the transferred/failed packages back into the err_skb
message queue added with some status information.
Signed-off-by: Michael Grzeschik <m.grzeschik@pengutronix.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Michael Grzeschik says:
====================
arcnet: Collection of latest fixes
Here we sum up the recent fixes I collected on the way to use and
stabilise the framework. Part of it is an possible deadlock that we
prevent as well to fix the calculation of the dev_id that can be setup
by an rotary encoder. Beside that we added an trivial spelling patch and
fix some wrong and missing assignments that improves the code footprint.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
We add the pdev data to the pci devices netdev structure. This way
the interface get consistent device names in the userspace (udev).
Signed-off-by: Michael Grzeschik <m.grzeschik@pengutronix.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The dev_id was miscalculated. Only the two bits 4-5 are relevant for the
MA1 card. PCIARC1 and PCIFB2 use the four bits 4-7 for id selection.
Signed-off-by: Michael Grzeschik <m.grzeschik@pengutronix.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The assignment is superfluous.
Signed-off-by: Michael Grzeschik <m.grzeschik@pengutronix.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Michael Grzeschik <m.grzeschik@pengutronix.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
This patch prevents the arcnet driver from the following deadlock.
[ 41.273910] ======================================================
[ 41.280397] [ INFO: SOFTIRQ-safe -> SOFTIRQ-unsafe lock order detected ]
[ 41.287433] 4.4.0-00034-gc0ae784 #536 Not tainted
[ 41.292366] ------------------------------------------------------
[ 41.298863] arcecho/233 [HC0[0]:SC0[2]:HE0:SE0] is trying to acquire:
[ 41.305628] (&(&lp->lock)->rlock){+.+...}, at: [<bf083bc8>] arcnet_send_packet+0x60/0x1c0 [arcnet]
[ 41.315199]
[ 41.315199] and this task is already holding:
[ 41.321324] (_xmit_ARCNET#2){+.-...}, at: [<c06b934c>] packet_direct_xmit+0xfc/0x1c8
[ 41.329593] which would create a new lock dependency:
[ 41.334893] (_xmit_ARCNET#2){+.-...} -> (&(&lp->lock)->rlock){+.+...}
[ 41.341801]
[ 41.341801] but this new dependency connects a SOFTIRQ-irq-safe lock:
[ 41.350108] (_xmit_ARCNET#2){+.-...}
... which became SOFTIRQ-irq-safe at:
[ 41.357539] [<c06f8fc8>] _raw_spin_lock+0x30/0x40
[ 41.362677] [<c063ab8c>] dev_watchdog+0x5c/0x264
[ 41.367723] [<c0094edc>] call_timer_fn+0x6c/0xf4
[ 41.372759] [<c00950b8>] run_timer_softirq+0x154/0x210
[ 41.378340] [<c0036b30>] __do_softirq+0x144/0x298
[ 41.383469] [<c0036fb4>] irq_exit+0xcc/0x130
[ 41.388138] [<c0085c50>] __handle_domain_irq+0x60/0xb4
[ 41.393728] [<c0014578>] __irq_svc+0x58/0x78
[ 41.398402] [<c0010274>] arch_cpu_idle+0x24/0x3c
[ 41.403443] [<c007127c>] cpu_startup_entry+0x1f8/0x25c
[ 41.409029] [<c09adc90>] start_kernel+0x3c0/0x3cc
[ 41.414170]
[ 41.414170] to a SOFTIRQ-irq-unsafe lock:
[ 41.419931] (&(&lp->lock)->rlock){+.+...}
... which became SOFTIRQ-irq-unsafe at:
[ 41.427996] ... [<c06f8fc8>] _raw_spin_lock+0x30/0x40
[ 41.433409] [<bf083d54>] arcnet_interrupt+0x2c/0x800 [arcnet]
[ 41.439646] [<c0089120>] handle_nested_irq+0x8c/0xec
[ 41.445063] [<c03c1170>] regmap_irq_thread+0x190/0x314
[ 41.450661] [<c0087244>] irq_thread_fn+0x1c/0x34
[ 41.455700] [<c0087548>] irq_thread+0x13c/0x1dc
[ 41.460649] [<c0050f10>] kthread+0xe4/0xf8
[ 41.465158] [<c000f810>] ret_from_fork+0x14/0x24
[ 41.470207]
[ 41.470207] other info that might help us debug this:
[ 41.470207]
[ 41.478627] Possible interrupt unsafe locking scenario:
[ 41.478627]
[ 41.485763] CPU0 CPU1
[ 41.490521] ---- ----
[ 41.495279] lock(&(&lp->lock)->rlock);
[ 41.499414] local_irq_disable();
[ 41.505636] lock(_xmit_ARCNET#2);
[ 41.511967] lock(&(&lp->lock)->rlock);
[ 41.518741] <Interrupt>
[ 41.521490] lock(_xmit_ARCNET#2);
[ 41.525356]
[ 41.525356] *** DEADLOCK ***
[ 41.525356]
[ 41.531587] 1 lock held by arcecho/233:
[ 41.535617] #0: (_xmit_ARCNET#2){+.-...}, at: [<c06b934c>] packet_direct_xmit+0xfc/0x1c8
[ 41.544355]
the dependencies between SOFTIRQ-irq-safe lock and the holding lock:
[ 41.552362] -> (_xmit_ARCNET#2){+.-...} ops: 27 {
[ 41.557357] HARDIRQ-ON-W at:
[ 41.560664] [<c06f8fc8>] _raw_spin_lock+0x30/0x40
[ 41.567445] [<c063ba28>] dev_deactivate_many+0x114/0x304
[ 41.574866] [<c063bc3c>] dev_deactivate+0x24/0x38
[ 41.581646] [<c0630374>] linkwatch_do_dev+0x40/0x74
[ 41.588613] [<c06305d8>] __linkwatch_run_queue+0xec/0x140
[ 41.596120] [<c0630658>] linkwatch_event+0x2c/0x34
[ 41.602991] [<c004af30>] process_one_work+0x188/0x40c
[ 41.610131] [<c004b200>] worker_thread+0x4c/0x480
[ 41.616912] [<c0050f10>] kthread+0xe4/0xf8
[ 41.623048] [<c000f810>] ret_from_fork+0x14/0x24
[ 41.629735] IN-SOFTIRQ-W at:
[ 41.633039] [<c06f8fc8>] _raw_spin_lock+0x30/0x40
[ 41.639820] [<c063ab8c>] dev_watchdog+0x5c/0x264
[ 41.646508] [<c0094edc>] call_timer_fn+0x6c/0xf4
[ 41.653190] [<c00950b8>] run_timer_softirq+0x154/0x210
[ 41.660425] [<c0036b30>] __do_softirq+0x144/0x298
[ 41.667201] [<c0036fb4>] irq_exit+0xcc/0x130
[ 41.673518] [<c0085c50>] __handle_domain_irq+0x60/0xb4
[ 41.680754] [<c0014578>] __irq_svc+0x58/0x78
[ 41.687077] [<c0010274>] arch_cpu_idle+0x24/0x3c
[ 41.693769] [<c007127c>] cpu_startup_entry+0x1f8/0x25c
[ 41.701006] [<c09adc90>] start_kernel+0x3c0/0x3cc
[ 41.707791] INITIAL USE at:
[ 41.711003] [<c06f8fc8>] _raw_spin_lock+0x30/0x40
[ 41.717696] [<c063ba28>] dev_deactivate_many+0x114/0x304
[ 41.725026] [<c063bc3c>] dev_deactivate+0x24/0x38
[ 41.731718] [<c0630374>] linkwatch_do_dev+0x40/0x74
[ 41.738593] [<c06305d8>] __linkwatch_run_queue+0xec/0x140
[ 41.746011] [<c0630658>] linkwatch_event+0x2c/0x34
[ 41.752789] [<c004af30>] process_one_work+0x188/0x40c
[ 41.759847] [<c004b200>] worker_thread+0x4c/0x480
[ 41.766541] [<c0050f10>] kthread+0xe4/0xf8
[ 41.772596] [<c000f810>] ret_from_fork+0x14/0x24
[ 41.779198] }
[ 41.780945] ... key at: [<c124d620>] netdev_xmit_lock_key+0x38/0x1c8
[ 41.788192] ... acquired at:
[ 41.791309] [<c007bed8>] lock_acquire+0x70/0x90
[ 41.796361] [<c06f9140>] _raw_spin_lock_irqsave+0x40/0x54
[ 41.802324] [<bf083bc8>] arcnet_send_packet+0x60/0x1c0 [arcnet]
[ 41.808844] [<c06b9380>] packet_direct_xmit+0x130/0x1c8
[ 41.814622] [<c06bc7e4>] packet_sendmsg+0x3b8/0x680
[ 41.820034] [<c05fe8b0>] sock_sendmsg+0x14/0x24
[ 41.825091] [<c05ffd68>] SyS_sendto+0xb8/0xe0
[ 41.829956] [<c05ffda8>] SyS_send+0x18/0x20
[ 41.834638] [<c000f780>] ret_fast_syscall+0x0/0x1c
[ 41.839954]
[ 41.841514]
the dependencies between the lock to be acquired and SOFTIRQ-irq-unsafe lock:
[ 41.850302] -> (&(&lp->lock)->rlock){+.+...} ops: 5 {
[ 41.855644] HARDIRQ-ON-W at:
[ 41.858945] [<c06f8fc8>] _raw_spin_lock+0x30/0x40
[ 41.865726] [<bf083d54>] arcnet_interrupt+0x2c/0x800 [arcnet]
[ 41.873607] [<c0089120>] handle_nested_irq+0x8c/0xec
[ 41.880666] [<c03c1170>] regmap_irq_thread+0x190/0x314
[ 41.887901] [<c0087244>] irq_thread_fn+0x1c/0x34
[ 41.894593] [<c0087548>] irq_thread+0x13c/0x1dc
[ 41.901195] [<c0050f10>] kthread+0xe4/0xf8
[ 41.907338] [<c000f810>] ret_from_fork+0x14/0x24
[ 41.914025] SOFTIRQ-ON-W at:
[ 41.917328] [<c06f8fc8>] _raw_spin_lock+0x30/0x40
[ 41.924106] [<bf083d54>] arcnet_interrupt+0x2c/0x800 [arcnet]
[ 41.931981] [<c0089120>] handle_nested_irq+0x8c/0xec
[ 41.939028] [<c03c1170>] regmap_irq_thread+0x190/0x314
[ 41.946264] [<c0087244>] irq_thread_fn+0x1c/0x34
[ 41.952954] [<c0087548>] irq_thread+0x13c/0x1dc
[ 41.959548] [<c0050f10>] kthread+0xe4/0xf8
[ 41.965689] [<c000f810>] ret_from_fork+0x14/0x24
[ 41.972379] INITIAL USE at:
[ 41.975595] [<c06f8fc8>] _raw_spin_lock+0x30/0x40
[ 41.982283] [<bf083d54>] arcnet_interrupt+0x2c/0x800 [arcnet]
[ 41.990063] [<c0089120>] handle_nested_irq+0x8c/0xec
[ 41.997027] [<c03c1170>] regmap_irq_thread+0x190/0x314
[ 42.004172] [<c0087244>] irq_thread_fn+0x1c/0x34
[ 42.010766] [<c0087548>] irq_thread+0x13c/0x1dc
[ 42.017267] [<c0050f10>] kthread+0xe4/0xf8
[ 42.023314] [<c000f810>] ret_from_fork+0x14/0x24
[ 42.029903] }
[ 42.031648] ... key at: [<bf0854cc>] __key.42091+0x0/0xfffff0f8 [arcnet]
[ 42.039255] ... acquired at:
[ 42.042372] [<c007bed8>] lock_acquire+0x70/0x90
[ 42.047413] [<c06f9140>] _raw_spin_lock_irqsave+0x40/0x54
[ 42.053364] [<bf083bc8>] arcnet_send_packet+0x60/0x1c0 [arcnet]
[ 42.059872] [<c06b9380>] packet_direct_xmit+0x130/0x1c8
[ 42.065634] [<c06bc7e4>] packet_sendmsg+0x3b8/0x680
[ 42.071030] [<c05fe8b0>] sock_sendmsg+0x14/0x24
[ 42.076069] [<c05ffd68>] SyS_sendto+0xb8/0xe0
[ 42.080926] [<c05ffda8>] SyS_send+0x18/0x20
[ 42.085601] [<c000f780>] ret_fast_syscall+0x0/0x1c
[ 42.090918]
[ 42.092481]
[ 42.092481] stack backtrace:
[ 42.097065] CPU: 0 PID: 233 Comm: arcecho Not tainted 4.4.0-00034-gc0ae784 #536
[ 42.104751] Hardware name: Generic AM33XX (Flattened Device Tree)
[ 42.111183] [<c0017ec8>] (unwind_backtrace) from [<c00139d0>] (show_stack+0x10/0x14)
[ 42.119337] [<c00139d0>] (show_stack) from [<c02a82c4>] (dump_stack+0x8c/0x9c)
[ 42.126937] [<c02a82c4>] (dump_stack) from [<c0078260>] (check_usage+0x4bc/0x63c)
[ 42.134815] [<c0078260>] (check_usage) from [<c0078438>] (check_irq_usage+0x58/0xb0)
[ 42.142964] [<c0078438>] (check_irq_usage) from [<c007aaa0>] (__lock_acquire+0x1524/0x20b0)
[ 42.151740] [<c007aaa0>] (__lock_acquire) from [<c007bed8>] (lock_acquire+0x70/0x90)
[ 42.159886] [<c007bed8>] (lock_acquire) from [<c06f9140>] (_raw_spin_lock_irqsave+0x40/0x54)
[ 42.168768] [<c06f9140>] (_raw_spin_lock_irqsave) from [<bf083bc8>] (arcnet_send_packet+0x60/0x1c0 [arcnet])
[ 42.179115] [<bf083bc8>] (arcnet_send_packet [arcnet]) from [<c06b9380>] (packet_direct_xmit+0x130/0x1c8)
[ 42.189182] [<c06b9380>] (packet_direct_xmit) from [<c06bc7e4>] (packet_sendmsg+0x3b8/0x680)
[ 42.198059] [<c06bc7e4>] (packet_sendmsg) from [<c05fe8b0>] (sock_sendmsg+0x14/0x24)
[ 42.206199] [<c05fe8b0>] (sock_sendmsg) from [<c05ffd68>] (SyS_sendto+0xb8/0xe0)
[ 42.213978] [<c05ffd68>] (SyS_sendto) from [<c05ffda8>] (SyS_send+0x18/0x20)
[ 42.221388] [<c05ffda8>] (SyS_send) from [<c000f780>] (ret_fast_syscall+0x0/0x1c)
Signed-off-by: Michael Grzeschik <m.grzeschik@pengutronix.de>
---
v1 -> v2: removed unneeded zero assignment of flags
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Tom Lendacky says:
====================
amd-xgbe: AMD XGBE driver updates 2016-06-28
The following updates and fixes are included in this driver update series:
- Simplify mailbox interface code
- Fix SFP supported and advertising settings
- Fix PTP initialization register usage
- Insure there is timestamp skb present before using it
- Add a timeout to timestamp register updates
- Handle return code from software reset function
- Some fixes for handling 2.5Gbps rates
- Limit I2C error messages
- Fix non-DMA interrupt handling through tasklet usage
- Add NUMA affinity support for memory allocations
- Add NUMA affinity support for interrupts
- Prepare for more fine-grained cache coherency controls
- Simplify setting the DMA burst length programming
- Performance improvements
This patch series is based on net-next.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Add support to change some general performance settings and to provide
some performance settings based on the device that is probed.
This includes:
- Setting the maximum read/write outstanding request limit
- Reducing the AXI interface burst length size
- Selectively setting the Tx and Rx descriptor pre-fetch threshold
- Selectively setting additional cache coherency controls
Tested and verified on all versions of the hardware.
Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Currently the driver hardcodes the PBLx8 setting. Remove the need for
specifying the PBLx8 setting and automatically calculate based on the
specified PBL value. Since the PBLx8 setting applies to both Tx and Rx
use the same PBL value for both of them.
Also, the driver currently uses a bit field to set the AXI master burst
len setting. Change to the full bit field range and set the burst length
based on the specified value.
Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
In prep for setting fine grained read and write DMA cache coherency
controls, allow specific values to be used to set the cache coherency
registers.
Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
For IRQ affinity, set the affinity hints for the IRQs to be (initially) on
the processors corresponding to the NUMA node of the device.
Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Add support to perform memory allocations on the node of the device. The
original allocation or the ring structure and Tx/Rx queues allocated all
of the memory at once and then carved it up for each channel and queue.
To best ensure that we get as much memory from the NUMA node as we can,
break the channel and ring allocations into individual allocations.
Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Some of the device interrupts should function as level interrupts. For
some hardware configurations this requires setting some control bits
so that if the interrupt status has not been cleared the interrupt
should be reissued.
Additionally, when using MSI or MSI-X interrupts, run the interrupt
service routine as a tasklet so that the re-issuance of the interrupt
is handled properly.
Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
When I2C communication fails, it tends to always fail. Rather than
continuously issue an error message (once per second in most cases),
change the message to be issued just once.
Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The driver has some missing functionality when operating in the mode that
supports 2.5GbE. Fix the driver to fully recognize and support this speed.
Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Currently the function that performs a software reset of the hardware
provides a return code. During driver probe check this return code and
exit with an error if the software reset fails.
Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Just to be on the safe side, should the update of the timestamp registers
not complete, issue a warning rather than looping forever waiting for the
update to complete.
Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Spurious Tx timestamp interrupts can cause an oops in the Tx timestamp
processing function if a Tx timestamp skb is NULL. Add a check to insure
a Tx timestamp skb is present before attempting to use it.
Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
During PTP initialization, the Timestamp Control register should be
cleared and not the Tx Configuration register. While this typo causes
the wrong register to be cleared, the default value of each register and
and the fact that the Tx Configuration register is programmed afterwards
doesn't result in a bug, hence only fixing in net-next.
Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
When using SFPs, the supported and advertised settings should be initially
based on the SFP that has been detected. The code currently indicates the
overall support of the device as opposed to what the SFP is capable of.
Update the code to change the supported link modes, auto-negotiation, etc.
to be based on the installed SFP.
Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Simplify and centralize the mailbox command rate change interface by
having a single function perform the writes to the mailbox registers
to issue the request.
Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Commit 4751832da990 ("btrfs: fiemap: Cache and merge fiemap extent before
submit it to user") introduced a warning to catch unemitted cached
fiemap extent.
However such warning doesn't take the following case into consideration:
0 4K 8K
|<---- fiemap range --->|
|<----------- On-disk extent ------------------>|
In this case, the whole 0~8K is cached, and since it's larger than
fiemap range, it break the fiemap extent emit loop.
This leaves the fiemap extent cached but not emitted, and caught by the
final fiemap extent sanity check, causing kernel warning.
This patch removes the kernel warning and renames the sanity check to
emit_last_fiemap_cache() since it's possible and valid to have cached
fiemap extent.
Reported-by: David Sterba <dsterba@suse.cz>
Reported-by: Adam Borowski <kilobyte@angband.pl>
Fixes: 4751832da990 ("btrfs: fiemap: Cache and merge fiemap extent ...")
Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.com>
|
|
When new directory 'DIR1' is created in a directory 'DIR0' with SGID bit
set, DIR1 is expected to have SGID bit set (and owning group equal to
the owning group of 'DIR0'). However when 'DIR0' also has some default
ACLs that 'DIR1' inherits, setting these ACLs will result in SGID bit on
'DIR1' to get cleared if user is not member of the owning group.
Fix the problem by moving posix_acl_update_mode() out of
__btrfs_set_acl() into btrfs_set_acl(). That way the function will not be
called when inheriting ACLs which is what we want as it prevents SGID
bit clearing and the mode has been properly set by posix_acl_create()
anyway.
Fixes: 073931017b49d9458aa351605b43a7e34598caef
CC: stable@vger.kernel.org
CC: linux-btrfs@vger.kernel.org
CC: David Sterba <dsterba@suse.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: David Sterba <dsterba@suse.com>
|
|
My static checker complains that ofdpa_neigh_del() can sometimes free
"found". It just makes sense to use it first before deleting it.
Fixes: ecf244f753e0 ("rocker: fix maybe-uninitialized warning")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Dave Jones hit a WARN_ON(nr < 0) in btrfs_wait_ordered_roots() with
v4.12-rc6. This was because commit 70e7af244 made it possible for
calc_reclaim_items_nr() to return a negative number. It's not really a
bug in that commit, it just didn't go far enough down the stack to find
all the possible 64->32 bit overflows.
This switches calc_reclaim_items_nr() to return a u64 and changes everyone
that uses the results of that math to u64 as well.
Reported-by: Dave Jones <davej@codemonkey.org.uk>
Fixes: 70e7af2 ("Btrfs: fix delalloc accounting leak caused by u32 overflow")
Signed-off-by: Chris Mason <clm@fb.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
|
|
The commit "btrfs: scrub: inline helper scrub_setup_wr_ctx" inlined a
helper but wrongly sets up the target device. Incidentally there's a
local variable with the same name as a parameter in the previous
function, so this got caught during runtime as crash in test btrfs/027.
Reported-by: Chris Mason <clm@fb.com>
Signed-off-by: David Sterba <dsterba@suse.com>
|
|
ranges
[BUG]
For the following case, btrfs can underflow qgroup reserved space
at an error path:
(Page size 4K, function name without "btrfs_" prefix)
Task A | Task B
----------------------------------------------------------------------
Buffered_write [0, 2K) |
|- check_data_free_space() |
| |- qgroup_reserve_data() |
| Range aligned to page |
| range [0, 4K) <<< |
| 4K bytes reserved <<< |
|- copy pages to page cache |
| Buffered_write [2K, 4K)
| |- check_data_free_space()
| | |- qgroup_reserved_data()
| | Range alinged to page
| | range [0, 4K)
| | Already reserved by A <<<
| | 0 bytes reserved <<<
| |- delalloc_reserve_metadata()
| | And it *FAILED* (Maybe EQUOTA)
| |- free_reserved_data_space()
|- qgroup_free_data()
Range aligned to page range
[0, 4K)
Freeing 4K
(Special thanks to Chandan for the detailed report and analyse)
[CAUSE]
Above Task B is freeing reserved data range [0, 4K) which is actually
reserved by Task A.
And at writeback time, page dirty by Task A will go through writeback
routine, which will free 4K reserved data space at file extent insert
time, causing the qgroup underflow.
[FIX]
For btrfs_qgroup_free_data(), add @reserved parameter to only free
data ranges reserved by previous btrfs_qgroup_reserve_data().
So in above case, Task B will try to free 0 byte, so no underflow.
Reported-by: Chandan Rajendra <chandan@linux.vnet.ibm.com>
Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
Reviewed-by: Chandan Rajendra <chandan@linux.vnet.ibm.com>
Tested-by: Chandan Rajendra <chandan@linux.vnet.ibm.com>
Signed-off-by: David Sterba <dsterba@suse.com>
|
|
Introduce a new parameter, struct extent_changeset for
btrfs_qgroup_reserved_data() and its callers.
Such extent_changeset was used in btrfs_qgroup_reserve_data() to record
which range it reserved in current reserve, so it can free it in error
paths.
The reason we need to export it to callers is, at buffered write error
path, without knowing what exactly which range we reserved in current
allocation, we can free space which is not reserved by us.
This will lead to qgroup reserved space underflow.
Reviewed-by: Chandan Rajendra <chandan@linux.vnet.ibm.com>
Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.com>
|
|
and quotas being enabled
[BUG]
Under the following case, we can underflow qgroup reserved space.
Task A | Task B
---------------------------------------------------------------
Quota disabled |
Buffered write |
|- btrfs_check_data_free_space() |
| *NO* qgroup space is reserved |
| since quota is *DISABLED* |
|- All pages are copied to page |
cache |
| Enable quota
| Quota scan finished
|
| Sync_fs
| |- run_delalloc_range
| |- Write pages
| |- btrfs_finish_ordered_io
| |- insert_reserved_file_extent
| |- btrfs_qgroup_release_data()
| Since no qgroup space is
reserved in Task A, we
underflow qgroup reserved
space
This can be detected by fstest btrfs/104.
[CAUSE]
In insert_reserved_file_extent() we tell qgroup to release the @ram_bytes
size of qgroup reserved_space in all cases.
And btrfs_qgroup_release_data() will check if quotas are enabled.
However in the above case, the buffered write happens before quota is
enabled, so we don't have the reserved space for that range.
[FIX]
In insert_reserved_file_extent(), we tell qgroup to release the acctual
byte number it released.
In the above case, since we don't have the reserved space, we tell
qgroups to release 0 byte, so the problem can be fixed.
And thanks to the @reserved parameter introduced by the qgroup rework,
and previous patch to return released bytes, the fix can be as small as
10 lines.
Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
[ changelog updates ]
Signed-off-by: David Sterba <dsterba@suse.com>
|
|
btrfs_qgroup_release/free_data() only returns 0 or a negative error
number (ENOMEM is the only possible error).
This is normally good enough, but sometimes we need the exact byte
count it freed/released.
Change it to return actually released/freed bytenr number instead of 0
for success.
And slightly modify related extent_changeset structure, since in btrfs
one no-hole data extent won't be larger than 128M, so "unsigned int"
is large enough for the use case.
Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.com>
|
|
Quite a lot of qgroup corruption happens due to wrong time of calling
btrfs_qgroup_prepare_account_extents().
Since the safest time is to call it just before
btrfs_qgroup_account_extents(), there is no need to separate these 2
functions.
Merging them will make code cleaner and less bug prone.
Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
[ changelog and comment adjustments ]
Signed-off-by: David Sterba <dsterba@suse.com>
|
|
Modify btrfs_qgroup_account_extent() to exit quicker for non-fs extents.
The quick exit condition is:
1) The extent belongs to a non-fs tree
Only fs-tree extents can affect qgroup numbers and is the only case
where extent can be shared between different trees.
Although strictly speaking extent in data-reloc or tree-reloc tree
can be shared, data/tree-reloc root won't appear in the result of
btrfs_find_all_roots(), so we can ignore such case.
So we can check the first root in old_roots/new_roots ulist.
- if we find the 1st root is a not a fs/subvol root, then we can skip
the extent
- if we find the 1st root is a fs/subvol root, then we must continue
calculation
OR
2) both 'nr_old_roots' and 'nr_new_roots' are 0
This means either such extent got allocated then freed in current
transaction or it's a new reloc tree extent, whose nr_new_roots is 0.
Either way it won't affect qgroup accounting and can be skipped
safely.
Such quick exit can make trace output more quite and less confusing:
(example with fs uuid and time stamp removed)
Before:
------
add_delayed_tree_ref: bytenr=29556736 num_bytes=16384 action=ADD_DELAYED_REF parent=0(-) ref_root=2(EXTENT_TREE) level=0 type=TREE_BLOCK_REF seq=0
btrfs_qgroup_account_extent: bytenr=29556736 num_bytes=16384 nr_old_roots=0 nr_new_roots=1
------
Extent tree block will trigger btrfs_qgroup_account_extent() trace point
while no qgroup number is changed, as extent tree won't affect qgroup
accounting.
After:
------
add_delayed_tree_ref: bytenr=29556736 num_bytes=16384 action=ADD_DELAYED_REF parent=0(-) ref_root=2(EXTENT_TREE) level=0 type=TREE_BLOCK_REF seq=0
------
Now such unrelated extent won't trigger btrfs_qgroup_account_extent()
trace point, making the trace less noisy.
Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
[ changelog and comment adjustments ]
Signed-off-by: David Sterba <dsterba@suse.com>
|
|
The total_bytes_pinned counter is completely broken when accounting
delayed refs:
- If two drops for the same extent are merged, we will decrement
total_bytes_pinned twice but only increment it once.
- If an add is merged into a drop or vice versa, we will decrement the
total_bytes_pinned counter but never increment it.
- If multiple references to an extent are dropped, we will account it
multiple times, potentially vastly over-estimating the number of bytes
that will be freed by a commit and doing unnecessary work when we're
close to ENOSPC.
The last issue is relatively minor, but the first two make the
total_bytes_pinned counter leak or underflow very often. These
accounting issues were introduced in b150a4f10d87 ("Btrfs: use a percpu
to keep track of possibly pinned bytes"), but they were papered over by
zeroing out the counter on every commit until d288db5dc011 ("Btrfs: fix
race of using total_bytes_pinned").
We need to make sure that an extent is accounted as pinned exactly once
if and only if we will drop references to it when when the transaction
is committed. Ideally we would only add to total_bytes_pinned when the
*last* reference is dropped, but this information isn't readily
available for data extents. Again, this over-estimation can lead to
extra commits when we're close to ENOSPC, but it's not as bad as before.
The fix implemented here is to increment total_bytes_pinned when the
total refmod count for an extent goes negative and decrement it if the
refmod count goes back to non-negative or after we've run all of the
delayed refs for that extent.
Signed-off-by: Omar Sandoval <osandov@fb.com>
Tested-by: Holger Hoffstätte <holger@applied-asynchrony.com>
Reviewed-by: Liu Bo <bo.li.liu@oracle.com>
Signed-off-by: David Sterba <dsterba@suse.com>
|
|
We need this to decide when to account pinned bytes.
Signed-off-by: Omar Sandoval <osandov@fb.com>
Tested-by: Holger Hoffstätte <holger@applied-asynchrony.com>
Signed-off-by: David Sterba <dsterba@suse.com>
|
|
Currently, we only increment total_bytes_pinned in
btrfs_free_tree_block() when dropping the last reference on the block.
However, when the delayed ref is run later, we will decrement
total_bytes_pinned regardless of whether it was the last reference or
not. This causes the counter to underflow when the reference we dropped
was not the last reference. Fix it by incrementing the counter
unconditionally, which is what btrfs_free_extent() does. This makes
total_bytes_pinned an overestimate when references to shared extents are
dropped, but in the worst case this will just make us try to commit the
transaction to try to free up space and find we didn't free enough.
Signed-off-by: Omar Sandoval <osandov@fb.com>
Tested-by: Holger Hoffstätte <holger@applied-asynchrony.com>
Reviewed-by: Liu Bo <bo.li.liu@oracle.com>
Signed-off-by: David Sterba <dsterba@suse.com>
|
|
The extents marked in pin_down_extent() will be unpinned later in
unpin_extent_range(), which decrements total_bytes_pinned.
pin_down_extent() must increment the counter to avoid underflowing it.
Also adjust btrfs_free_tree_block() to avoid accounting for the same
extent twice.
Signed-off-by: Omar Sandoval <osandov@fb.com>
Tested-by: Holger Hoffstätte <holger@applied-asynchrony.com>
Reviewed-by: Liu Bo <bo.li.liu@oracle.com>
Signed-off-by: David Sterba <dsterba@suse.com>
|
|
The value of flags is one of DATA/METADATA/SYSTEM, they must exist at
when add_pinned_bytes is called.
Signed-off-by: Omar Sandoval <osandov@fb.com>
Tested-by: Holger Hoffstätte <holger@applied-asynchrony.com>
Reviewed-by: David Sterba <dsterba@suse.com>
[ added changelog ]
Signed-off-by: David Sterba <dsterba@suse.com>
|
|
There are a few places where we pass in a negative num_bytes, so make it
signed for clarity. Also move it up in the file since later patches will
need it there.
Signed-off-by: Omar Sandoval <osandov@fb.com>
Tested-by: Holger Hoffstätte <holger@applied-asynchrony.com>
Reviewed-by: Liu Bo <bo.li.liu@oracle.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
|
|
Bump the maximum API supported by these device families to 33.
Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
|
|
The XATTR_ITEM is a type of a directory item so we use the common
validator helper. Unlike other dir items, it can have data. The way the
name len validation is currently implemented does not reflect that. We'd
have to adjust by the data_len when comparing the read and item limits.
However, this will not work for multi-item xattr dir items.
Example from tree dump of generic/337:
item 7 key (257 XATTR_ITEM 751495445) itemoff 15667 itemsize 147
location key (0 UNKNOWN.0 0) type XATTR
transid 8 data_len 3 name_len 11
name: user.foobar
data 123
location key (0 UNKNOWN.0 0) type XATTR
transid 8 data_len 6 name_len 13
name: user.WvG1c1Td
data qwerty
location key (0 UNKNOWN.0 0) type XATTR
transid 8 data_len 5 name_len 19
name: user.J3__T_Km3dVsW_
data hello
At the point of btrfs_is_name_len_valid call we don't have access to the
data_len value of the 2nd and 3rd sub-item. So simple btrfs_dir_data_len(leaf,
di) would always return 3, although we'd need to get 6 and 5 respectively to
get the claculations right. (read_end + name_len + data_len vs item_end)
We'd have to also pass data_len externally, which is not point of the
name validation. The last check is supposed to test if there's at least
one dir item space after the one we're processing. I don't think this is
particularly useful, validation of the next item would catch that too.
So the check is removed and we don't weaken the validation. Now tests
btrfs/048, btrfs/053, generic/273 and generic/337 pass.
Signed-off-by: David Sterba <dsterba@suse.com>
|
|
The newest devices need a longer time to reset because of
their more complex hardware. Wait 5ms after device reset.
Consolidate all the places that reset the device in the
PCIe transport to avoid future bugs.
While at it, unify the flow to use set_bit instead of full
write as requested by the hardware designers.
Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
|
|
iwl_pcie_apm_init can fail so make sure that the caller
takes the status into account.
Also, ensure that the error that iwl_pcie_apm_init can emit
will appear in the kernel log by default.
Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
|
|
When a station that's not associated sends a data frame (e.g. an NDP)
hostapd will respond with a disassoc frame, telling it that it's not
associated. The station might also not be authenticated, in which case
there will not be a station entry for it, and as a result we need to
accept such frames without a station.
Fixes: 3ee0f0e23e4f ("iwlwifi: mvm: fix DQA AP mode station assumption")
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
|
|
The API has changed - update the code.
Signed-off-by: Liad Kaufman <liad.kaufman@intel.com>
Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
|
|
When we get a non-STA frame to transmit in client mode, we try to use
the IWL_MVM_DQA_BSS_CLIENT_QUEUE queue (queue #4). However, at this
point, the queue might not be allocated at all, causing warnings. The
scenario on which this happened was a race condition between mac80211
and our queue allocation work:
* mac80211 sends auth
* we stop mac80211 queues to allocate a hw queue
* authentication is aborted
* we allocate HW queue and start mac80211 queues
* mac80211 removes station
* mac80211 hands us the auth frame from the pending queue
At this point, since mac80211 has already removed the station, we try
to transmit the frame through this special non-station case on queue
4 anyway.
In order to really use it properly, we'd have to again go through the
hw queue allocation work, and attach it to a station, etc. In this
case that isn't possible (there's no station anymore), but if this
special case were needed, then we'd have to do it this way.
However, the special case is documented to exist for TDLS, but can't
trigger there because the TDLS setup frames etc. are normal to-DS
frames going to the peer through the AP. Testing also confirms that
this code path isn't triggered in TDLS.
Therefore, remove the code path to avoid using an unused queue. The
erroneous frame described above will still be transmitted on the AUX
queue, but arguably that's a mac80211 problem, which will eventually
be fixed by moving everything there to TXQs.
Fixes: e3118ad74d7e ("iwlwifi: mvm: support tdls in dqa mode")
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
|
|
When we get large sends on non-QoS association, we had a
bug that mangled the SNAP header. Fix that.
Fixes: a6d5e32f247c ("iwlwifi: mvm: send large SKBs to the transport")
Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
|
|
When going into suspend, the HW configuration for MSI-X will
likely be lost. As a consequence, after waking up, all IRQ
causes will be mapped to interrupt 0, and as a consequence we
don't notice the interrupt because in most cases this is an
interrupt for a queue, and getting it doesn't read the other
cause registers.
Fixes: 2e5d4a8f61dc ("iwlwifi: pcie: Add new configuration to enable MSIX")
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
|
|
Getting the TID of a packet before we know it is a QoS data
packet isn't a good idea. Delay the TID retrieval until
we know the packet is a QoS data packet.
Fixes: bb81bb68f472 ("iwlwifi: mvm: add Tx A-MSDU inside A-MPDU")
Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
|