Age | Commit message (Collapse) | Author |
|
dev_pm_ops are not supposed to change at runtime. All functions
working with dev_pm_ops provided by <linux/device.h> work with const
dev_pm_ops. So mark the non-const structs as const.
File size before:
text data bss dec hex filename
19057 392 0 19449 4bf9 drivers/net/ethernet/freescale/gianfar.o
File size After adding 'const':
text data bss dec hex filename
19249 192 0 19441 4bf1 drivers/net/ethernet/freescale/gianfar.o
Signed-off-by: Arvind Yadav <arvind.yadav.cs@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
dev_pm_ops are not supposed to change at runtime. All functions
working with dev_pm_ops provided by <linux/device.h> work with const
dev_pm_ops. So mark the non-const structs as const.
File size before:
text data bss dec hex filename
18709 401 0 19110 4aa6 drivers/net/ethernet/smsc/smc91x.o
File size After adding 'const':
text data bss dec hex filename
18901 201 0 19102 4a9e drivers/net/ethernet/smsc/smc91x.o
Signed-off-by: Arvind Yadav <arvind.yadav.cs@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
dev_pm_ops are not supposed to change at runtime. All functions
working with dev_pm_ops provided by <linux/device.h> work with const
dev_pm_ops. So mark the non-const structs as const.
File size before:
text data bss dec hex filename
15426 1256 0 16682 412a drivers/net/ethernet/ibm/ibmveth.o
File size After adding 'const':
text data bss dec hex filename
15618 1064 0 16682 412a drivers/net/ethernet/ibm/ibmveth.o
Signed-off-by: Arvind Yadav <arvind.yadav.cs@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Leaking kernel addresses on unpriviledged is generally disallowed,
for example, verifier rejects the following:
0: (b7) r0 = 0
1: (18) r2 = 0xffff897e82304400
3: (7b) *(u64 *)(r1 +48) = r2
R2 leaks addr into ctx
Doing pointer arithmetic on them is also forbidden, so that they
don't turn into unknown value and then get leaked out. However,
there's xadd as a special case, where we don't check the src reg
for being a pointer register, e.g. the following will pass:
0: (b7) r0 = 0
1: (7b) *(u64 *)(r1 +48) = r0
2: (18) r2 = 0xffff897e82304400 ; map
4: (db) lock *(u64 *)(r1 +48) += r2
5: (95) exit
We could store the pointer into skb->cb, loose the type context,
and then read it out from there again to leak it eventually out
of a map value. Or more easily in a different variant, too:
0: (bf) r6 = r1
1: (7a) *(u64 *)(r10 -8) = 0
2: (bf) r2 = r10
3: (07) r2 += -8
4: (18) r1 = 0x0
6: (85) call bpf_map_lookup_elem#1
7: (15) if r0 == 0x0 goto pc+3
R0=map_value(ks=8,vs=8,id=0),min_value=0,max_value=0 R6=ctx R10=fp
8: (b7) r3 = 0
9: (7b) *(u64 *)(r0 +0) = r3
10: (db) lock *(u64 *)(r0 +0) += r6
11: (b7) r0 = 0
12: (95) exit
from 7 to 11: R0=inv,min_value=0,max_value=0 R6=ctx R10=fp
11: (b7) r0 = 0
12: (95) exit
Prevent this by checking xadd src reg for pointer types. Also
add a couple of test cases related to this.
Fixes: 1be7f75d1668 ("bpf: enable non-root eBPF programs")
Fixes: 17a5267067f3 ("bpf: verifier (add verifier core)")
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Acked-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The driver currently creates RX/TX queues during device probe, but
assigns IRQ's to them during device open. On reset, however,
IRQ's are assigned when resetting the queues. If there is a reset
while the device is closed and the device is later opened, the driver will
request IRQ's twice, causing the open to fail. This patch assigns
the IRQ's in the ibmvnic_init function after the queues are reset or
initialized, ensuring IRQ's are only requested once.
Signed-off-by: Thomas Falcon <tlfalcon@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Add to RTNL_FAMILY_IPMR, RTM_GETROUTE the ability
to retrieve one S,G mroute from a specified table.
*,G will return mroute information for just that
particular mroute if it exists. This is because
it is entirely possible to have more S's then
can fit in one skb to return to the requesting
process.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The index is off-by-one when fp->aux->stack_depth
has already been rounded up to 32. In particular,
if stack_depth is 512, the index will be 16.
The fix is to round_up and then takes -1 instead of round_down.
[ 22.318680] ==================================================================
[ 22.319745] BUG: KASAN: global-out-of-bounds in bpf_prog_select_runtime+0x48a/0x670
[ 22.320737] Read of size 8 at addr ffffffff82aadae0 by task sockex3/1946
[ 22.321646]
[ 22.321858] CPU: 1 PID: 1946 Comm: sockex3 Tainted: G W 4.12.0-rc6-01680-g2ee87db3a287 #22
[ 22.323061] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.9.3-1.el7.centos 04/01/2014
[ 22.324260] Call Trace:
[ 22.324612] dump_stack+0x67/0x99
[ 22.325081] print_address_description+0x1e8/0x290
[ 22.325734] ? bpf_prog_select_runtime+0x48a/0x670
[ 22.326360] kasan_report+0x265/0x350
[ 22.326860] __asan_report_load8_noabort+0x19/0x20
[ 22.327484] bpf_prog_select_runtime+0x48a/0x670
[ 22.328109] bpf_prog_load+0x626/0xd40
[ 22.328637] ? __bpf_prog_charge+0xc0/0xc0
[ 22.329222] ? check_nnp_nosuid.isra.61+0x100/0x100
[ 22.329890] ? __might_fault+0xf6/0x1b0
[ 22.330446] ? lock_acquire+0x360/0x360
[ 22.331013] SyS_bpf+0x67c/0x24d0
[ 22.331491] ? trace_hardirqs_on+0xd/0x10
[ 22.332049] ? __getnstimeofday64+0xaf/0x1c0
[ 22.332635] ? bpf_prog_get+0x20/0x20
[ 22.333135] ? __audit_syscall_entry+0x300/0x600
[ 22.333770] ? syscall_trace_enter+0x540/0xdd0
[ 22.334339] ? exit_to_usermode_loop+0xe0/0xe0
[ 22.334950] ? do_syscall_64+0x48/0x410
[ 22.335446] ? bpf_prog_get+0x20/0x20
[ 22.335954] do_syscall_64+0x181/0x410
[ 22.336454] entry_SYSCALL64_slow_path+0x25/0x25
[ 22.337121] RIP: 0033:0x7f263fe81f19
[ 22.337618] RSP: 002b:00007ffd9a3440c8 EFLAGS: 00000202 ORIG_RAX: 0000000000000141
[ 22.338619] RAX: ffffffffffffffda RBX: 0000000000aac5fb RCX: 00007f263fe81f19
[ 22.339600] RDX: 0000000000000030 RSI: 00007ffd9a3440d0 RDI: 0000000000000005
[ 22.340470] RBP: 0000000000a9a1e0 R08: 0000000000a9a1e0 R09: 0000009d00000001
[ 22.341430] R10: 0000000000000000 R11: 0000000000000202 R12: 0000000000010000
[ 22.342411] R13: 0000000000a9a023 R14: 0000000000000001 R15: 0000000000000003
[ 22.343369]
[ 22.343593] The buggy address belongs to the variable:
[ 22.344241] interpreters+0x80/0x980
[ 22.344708]
[ 22.344908] Memory state around the buggy address:
[ 22.345556] ffffffff82aad980: 00 00 00 04 fa fa fa fa 04 fa fa fa fa fa fa fa
[ 22.346449] ffffffff82aada00: 00 00 00 00 00 fa fa fa fa fa fa fa 00 00 00 00
[ 22.347361] >ffffffff82aada80: 00 00 00 00 00 00 00 00 00 00 00 00 fa fa fa fa
[ 22.348301] ^
[ 22.349142] ffffffff82aadb00: 00 01 fa fa fa fa fa fa 00 00 00 00 00 00 00 00
[ 22.350058] ffffffff82aadb80: 00 00 07 fa fa fa fa fa 00 00 05 fa fa fa fa fa
[ 22.350984] ==================================================================
Fixes: b870aa901f4b ("bpf: use different interpreter depending on required stack size")
Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Acked-by: Alexei Starovoitov <ast@fb.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Trivial fix to spelling mistake in netdev_err message
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Should not init a NULL box. It will cause system crash.
The issue looks like caused by a typo.
This was not noticed because there is no NULL box. Also, for most
boxes, they are enabled by default. The init code is not critical.
Fixes: fff4b87e594a ("perf/x86/intel/uncore: Make package handling more robust")
Signed-off-by: Kan Liang <kan.liang@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: stable@vger.kernel.org
Link: http://lkml.kernel.org/r/20170629190926.2456-1-kan.liang@intel.com
|
|
Michael Grzeschik says:
====================
arcnet: Collection of latest features
Here we sum up the latest features to improve the arcnet framework. One
patch is used to get feedback from the transfer queue about failed xfers
by adding the err_skb message queue. Beside that we improve the
backplane status that can be read by the PCI-based cards and offer that
status via an extra sysfs attribute. In the last patch we add another
card type PCIFB2.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
We add support for the PCIFB2 card from EAE.
Beside other cards, this card has the backplane mode enabled by default.
Signed-off-by: Michael Grzeschik <m.grzeschik@pengutronix.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
We read the backplane mode of each subcard from bits 2 and 3 of the misc
register.
Signed-off-by: Michael Grzeschik <m.grzeschik@pengutronix.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
We add the sysfs interface the read back the backplane
status of the interface.
Signed-off-by: Michael Grzeschik <m.grzeschik@pengutronix.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
We need to track the status of our queued packages. This way the driving
process knows if failed packages need to be retransmitted. For this
purpose we queue the transferred/failed packages back into the err_skb
message queue added with some status information.
Signed-off-by: Michael Grzeschik <m.grzeschik@pengutronix.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Michael Grzeschik says:
====================
arcnet: Collection of latest fixes
Here we sum up the recent fixes I collected on the way to use and
stabilise the framework. Part of it is an possible deadlock that we
prevent as well to fix the calculation of the dev_id that can be setup
by an rotary encoder. Beside that we added an trivial spelling patch and
fix some wrong and missing assignments that improves the code footprint.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
We add the pdev data to the pci devices netdev structure. This way
the interface get consistent device names in the userspace (udev).
Signed-off-by: Michael Grzeschik <m.grzeschik@pengutronix.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The dev_id was miscalculated. Only the two bits 4-5 are relevant for the
MA1 card. PCIARC1 and PCIFB2 use the four bits 4-7 for id selection.
Signed-off-by: Michael Grzeschik <m.grzeschik@pengutronix.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The assignment is superfluous.
Signed-off-by: Michael Grzeschik <m.grzeschik@pengutronix.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Michael Grzeschik <m.grzeschik@pengutronix.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
This patch prevents the arcnet driver from the following deadlock.
[ 41.273910] ======================================================
[ 41.280397] [ INFO: SOFTIRQ-safe -> SOFTIRQ-unsafe lock order detected ]
[ 41.287433] 4.4.0-00034-gc0ae784 #536 Not tainted
[ 41.292366] ------------------------------------------------------
[ 41.298863] arcecho/233 [HC0[0]:SC0[2]:HE0:SE0] is trying to acquire:
[ 41.305628] (&(&lp->lock)->rlock){+.+...}, at: [<bf083bc8>] arcnet_send_packet+0x60/0x1c0 [arcnet]
[ 41.315199]
[ 41.315199] and this task is already holding:
[ 41.321324] (_xmit_ARCNET#2){+.-...}, at: [<c06b934c>] packet_direct_xmit+0xfc/0x1c8
[ 41.329593] which would create a new lock dependency:
[ 41.334893] (_xmit_ARCNET#2){+.-...} -> (&(&lp->lock)->rlock){+.+...}
[ 41.341801]
[ 41.341801] but this new dependency connects a SOFTIRQ-irq-safe lock:
[ 41.350108] (_xmit_ARCNET#2){+.-...}
... which became SOFTIRQ-irq-safe at:
[ 41.357539] [<c06f8fc8>] _raw_spin_lock+0x30/0x40
[ 41.362677] [<c063ab8c>] dev_watchdog+0x5c/0x264
[ 41.367723] [<c0094edc>] call_timer_fn+0x6c/0xf4
[ 41.372759] [<c00950b8>] run_timer_softirq+0x154/0x210
[ 41.378340] [<c0036b30>] __do_softirq+0x144/0x298
[ 41.383469] [<c0036fb4>] irq_exit+0xcc/0x130
[ 41.388138] [<c0085c50>] __handle_domain_irq+0x60/0xb4
[ 41.393728] [<c0014578>] __irq_svc+0x58/0x78
[ 41.398402] [<c0010274>] arch_cpu_idle+0x24/0x3c
[ 41.403443] [<c007127c>] cpu_startup_entry+0x1f8/0x25c
[ 41.409029] [<c09adc90>] start_kernel+0x3c0/0x3cc
[ 41.414170]
[ 41.414170] to a SOFTIRQ-irq-unsafe lock:
[ 41.419931] (&(&lp->lock)->rlock){+.+...}
... which became SOFTIRQ-irq-unsafe at:
[ 41.427996] ... [<c06f8fc8>] _raw_spin_lock+0x30/0x40
[ 41.433409] [<bf083d54>] arcnet_interrupt+0x2c/0x800 [arcnet]
[ 41.439646] [<c0089120>] handle_nested_irq+0x8c/0xec
[ 41.445063] [<c03c1170>] regmap_irq_thread+0x190/0x314
[ 41.450661] [<c0087244>] irq_thread_fn+0x1c/0x34
[ 41.455700] [<c0087548>] irq_thread+0x13c/0x1dc
[ 41.460649] [<c0050f10>] kthread+0xe4/0xf8
[ 41.465158] [<c000f810>] ret_from_fork+0x14/0x24
[ 41.470207]
[ 41.470207] other info that might help us debug this:
[ 41.470207]
[ 41.478627] Possible interrupt unsafe locking scenario:
[ 41.478627]
[ 41.485763] CPU0 CPU1
[ 41.490521] ---- ----
[ 41.495279] lock(&(&lp->lock)->rlock);
[ 41.499414] local_irq_disable();
[ 41.505636] lock(_xmit_ARCNET#2);
[ 41.511967] lock(&(&lp->lock)->rlock);
[ 41.518741] <Interrupt>
[ 41.521490] lock(_xmit_ARCNET#2);
[ 41.525356]
[ 41.525356] *** DEADLOCK ***
[ 41.525356]
[ 41.531587] 1 lock held by arcecho/233:
[ 41.535617] #0: (_xmit_ARCNET#2){+.-...}, at: [<c06b934c>] packet_direct_xmit+0xfc/0x1c8
[ 41.544355]
the dependencies between SOFTIRQ-irq-safe lock and the holding lock:
[ 41.552362] -> (_xmit_ARCNET#2){+.-...} ops: 27 {
[ 41.557357] HARDIRQ-ON-W at:
[ 41.560664] [<c06f8fc8>] _raw_spin_lock+0x30/0x40
[ 41.567445] [<c063ba28>] dev_deactivate_many+0x114/0x304
[ 41.574866] [<c063bc3c>] dev_deactivate+0x24/0x38
[ 41.581646] [<c0630374>] linkwatch_do_dev+0x40/0x74
[ 41.588613] [<c06305d8>] __linkwatch_run_queue+0xec/0x140
[ 41.596120] [<c0630658>] linkwatch_event+0x2c/0x34
[ 41.602991] [<c004af30>] process_one_work+0x188/0x40c
[ 41.610131] [<c004b200>] worker_thread+0x4c/0x480
[ 41.616912] [<c0050f10>] kthread+0xe4/0xf8
[ 41.623048] [<c000f810>] ret_from_fork+0x14/0x24
[ 41.629735] IN-SOFTIRQ-W at:
[ 41.633039] [<c06f8fc8>] _raw_spin_lock+0x30/0x40
[ 41.639820] [<c063ab8c>] dev_watchdog+0x5c/0x264
[ 41.646508] [<c0094edc>] call_timer_fn+0x6c/0xf4
[ 41.653190] [<c00950b8>] run_timer_softirq+0x154/0x210
[ 41.660425] [<c0036b30>] __do_softirq+0x144/0x298
[ 41.667201] [<c0036fb4>] irq_exit+0xcc/0x130
[ 41.673518] [<c0085c50>] __handle_domain_irq+0x60/0xb4
[ 41.680754] [<c0014578>] __irq_svc+0x58/0x78
[ 41.687077] [<c0010274>] arch_cpu_idle+0x24/0x3c
[ 41.693769] [<c007127c>] cpu_startup_entry+0x1f8/0x25c
[ 41.701006] [<c09adc90>] start_kernel+0x3c0/0x3cc
[ 41.707791] INITIAL USE at:
[ 41.711003] [<c06f8fc8>] _raw_spin_lock+0x30/0x40
[ 41.717696] [<c063ba28>] dev_deactivate_many+0x114/0x304
[ 41.725026] [<c063bc3c>] dev_deactivate+0x24/0x38
[ 41.731718] [<c0630374>] linkwatch_do_dev+0x40/0x74
[ 41.738593] [<c06305d8>] __linkwatch_run_queue+0xec/0x140
[ 41.746011] [<c0630658>] linkwatch_event+0x2c/0x34
[ 41.752789] [<c004af30>] process_one_work+0x188/0x40c
[ 41.759847] [<c004b200>] worker_thread+0x4c/0x480
[ 41.766541] [<c0050f10>] kthread+0xe4/0xf8
[ 41.772596] [<c000f810>] ret_from_fork+0x14/0x24
[ 41.779198] }
[ 41.780945] ... key at: [<c124d620>] netdev_xmit_lock_key+0x38/0x1c8
[ 41.788192] ... acquired at:
[ 41.791309] [<c007bed8>] lock_acquire+0x70/0x90
[ 41.796361] [<c06f9140>] _raw_spin_lock_irqsave+0x40/0x54
[ 41.802324] [<bf083bc8>] arcnet_send_packet+0x60/0x1c0 [arcnet]
[ 41.808844] [<c06b9380>] packet_direct_xmit+0x130/0x1c8
[ 41.814622] [<c06bc7e4>] packet_sendmsg+0x3b8/0x680
[ 41.820034] [<c05fe8b0>] sock_sendmsg+0x14/0x24
[ 41.825091] [<c05ffd68>] SyS_sendto+0xb8/0xe0
[ 41.829956] [<c05ffda8>] SyS_send+0x18/0x20
[ 41.834638] [<c000f780>] ret_fast_syscall+0x0/0x1c
[ 41.839954]
[ 41.841514]
the dependencies between the lock to be acquired and SOFTIRQ-irq-unsafe lock:
[ 41.850302] -> (&(&lp->lock)->rlock){+.+...} ops: 5 {
[ 41.855644] HARDIRQ-ON-W at:
[ 41.858945] [<c06f8fc8>] _raw_spin_lock+0x30/0x40
[ 41.865726] [<bf083d54>] arcnet_interrupt+0x2c/0x800 [arcnet]
[ 41.873607] [<c0089120>] handle_nested_irq+0x8c/0xec
[ 41.880666] [<c03c1170>] regmap_irq_thread+0x190/0x314
[ 41.887901] [<c0087244>] irq_thread_fn+0x1c/0x34
[ 41.894593] [<c0087548>] irq_thread+0x13c/0x1dc
[ 41.901195] [<c0050f10>] kthread+0xe4/0xf8
[ 41.907338] [<c000f810>] ret_from_fork+0x14/0x24
[ 41.914025] SOFTIRQ-ON-W at:
[ 41.917328] [<c06f8fc8>] _raw_spin_lock+0x30/0x40
[ 41.924106] [<bf083d54>] arcnet_interrupt+0x2c/0x800 [arcnet]
[ 41.931981] [<c0089120>] handle_nested_irq+0x8c/0xec
[ 41.939028] [<c03c1170>] regmap_irq_thread+0x190/0x314
[ 41.946264] [<c0087244>] irq_thread_fn+0x1c/0x34
[ 41.952954] [<c0087548>] irq_thread+0x13c/0x1dc
[ 41.959548] [<c0050f10>] kthread+0xe4/0xf8
[ 41.965689] [<c000f810>] ret_from_fork+0x14/0x24
[ 41.972379] INITIAL USE at:
[ 41.975595] [<c06f8fc8>] _raw_spin_lock+0x30/0x40
[ 41.982283] [<bf083d54>] arcnet_interrupt+0x2c/0x800 [arcnet]
[ 41.990063] [<c0089120>] handle_nested_irq+0x8c/0xec
[ 41.997027] [<c03c1170>] regmap_irq_thread+0x190/0x314
[ 42.004172] [<c0087244>] irq_thread_fn+0x1c/0x34
[ 42.010766] [<c0087548>] irq_thread+0x13c/0x1dc
[ 42.017267] [<c0050f10>] kthread+0xe4/0xf8
[ 42.023314] [<c000f810>] ret_from_fork+0x14/0x24
[ 42.029903] }
[ 42.031648] ... key at: [<bf0854cc>] __key.42091+0x0/0xfffff0f8 [arcnet]
[ 42.039255] ... acquired at:
[ 42.042372] [<c007bed8>] lock_acquire+0x70/0x90
[ 42.047413] [<c06f9140>] _raw_spin_lock_irqsave+0x40/0x54
[ 42.053364] [<bf083bc8>] arcnet_send_packet+0x60/0x1c0 [arcnet]
[ 42.059872] [<c06b9380>] packet_direct_xmit+0x130/0x1c8
[ 42.065634] [<c06bc7e4>] packet_sendmsg+0x3b8/0x680
[ 42.071030] [<c05fe8b0>] sock_sendmsg+0x14/0x24
[ 42.076069] [<c05ffd68>] SyS_sendto+0xb8/0xe0
[ 42.080926] [<c05ffda8>] SyS_send+0x18/0x20
[ 42.085601] [<c000f780>] ret_fast_syscall+0x0/0x1c
[ 42.090918]
[ 42.092481]
[ 42.092481] stack backtrace:
[ 42.097065] CPU: 0 PID: 233 Comm: arcecho Not tainted 4.4.0-00034-gc0ae784 #536
[ 42.104751] Hardware name: Generic AM33XX (Flattened Device Tree)
[ 42.111183] [<c0017ec8>] (unwind_backtrace) from [<c00139d0>] (show_stack+0x10/0x14)
[ 42.119337] [<c00139d0>] (show_stack) from [<c02a82c4>] (dump_stack+0x8c/0x9c)
[ 42.126937] [<c02a82c4>] (dump_stack) from [<c0078260>] (check_usage+0x4bc/0x63c)
[ 42.134815] [<c0078260>] (check_usage) from [<c0078438>] (check_irq_usage+0x58/0xb0)
[ 42.142964] [<c0078438>] (check_irq_usage) from [<c007aaa0>] (__lock_acquire+0x1524/0x20b0)
[ 42.151740] [<c007aaa0>] (__lock_acquire) from [<c007bed8>] (lock_acquire+0x70/0x90)
[ 42.159886] [<c007bed8>] (lock_acquire) from [<c06f9140>] (_raw_spin_lock_irqsave+0x40/0x54)
[ 42.168768] [<c06f9140>] (_raw_spin_lock_irqsave) from [<bf083bc8>] (arcnet_send_packet+0x60/0x1c0 [arcnet])
[ 42.179115] [<bf083bc8>] (arcnet_send_packet [arcnet]) from [<c06b9380>] (packet_direct_xmit+0x130/0x1c8)
[ 42.189182] [<c06b9380>] (packet_direct_xmit) from [<c06bc7e4>] (packet_sendmsg+0x3b8/0x680)
[ 42.198059] [<c06bc7e4>] (packet_sendmsg) from [<c05fe8b0>] (sock_sendmsg+0x14/0x24)
[ 42.206199] [<c05fe8b0>] (sock_sendmsg) from [<c05ffd68>] (SyS_sendto+0xb8/0xe0)
[ 42.213978] [<c05ffd68>] (SyS_sendto) from [<c05ffda8>] (SyS_send+0x18/0x20)
[ 42.221388] [<c05ffda8>] (SyS_send) from [<c000f780>] (ret_fast_syscall+0x0/0x1c)
Signed-off-by: Michael Grzeschik <m.grzeschik@pengutronix.de>
---
v1 -> v2: removed unneeded zero assignment of flags
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Tom Lendacky says:
====================
amd-xgbe: AMD XGBE driver updates 2016-06-28
The following updates and fixes are included in this driver update series:
- Simplify mailbox interface code
- Fix SFP supported and advertising settings
- Fix PTP initialization register usage
- Insure there is timestamp skb present before using it
- Add a timeout to timestamp register updates
- Handle return code from software reset function
- Some fixes for handling 2.5Gbps rates
- Limit I2C error messages
- Fix non-DMA interrupt handling through tasklet usage
- Add NUMA affinity support for memory allocations
- Add NUMA affinity support for interrupts
- Prepare for more fine-grained cache coherency controls
- Simplify setting the DMA burst length programming
- Performance improvements
This patch series is based on net-next.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Add support to change some general performance settings and to provide
some performance settings based on the device that is probed.
This includes:
- Setting the maximum read/write outstanding request limit
- Reducing the AXI interface burst length size
- Selectively setting the Tx and Rx descriptor pre-fetch threshold
- Selectively setting additional cache coherency controls
Tested and verified on all versions of the hardware.
Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Currently the driver hardcodes the PBLx8 setting. Remove the need for
specifying the PBLx8 setting and automatically calculate based on the
specified PBL value. Since the PBLx8 setting applies to both Tx and Rx
use the same PBL value for both of them.
Also, the driver currently uses a bit field to set the AXI master burst
len setting. Change to the full bit field range and set the burst length
based on the specified value.
Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
In prep for setting fine grained read and write DMA cache coherency
controls, allow specific values to be used to set the cache coherency
registers.
Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
For IRQ affinity, set the affinity hints for the IRQs to be (initially) on
the processors corresponding to the NUMA node of the device.
Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Add support to perform memory allocations on the node of the device. The
original allocation or the ring structure and Tx/Rx queues allocated all
of the memory at once and then carved it up for each channel and queue.
To best ensure that we get as much memory from the NUMA node as we can,
break the channel and ring allocations into individual allocations.
Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Some of the device interrupts should function as level interrupts. For
some hardware configurations this requires setting some control bits
so that if the interrupt status has not been cleared the interrupt
should be reissued.
Additionally, when using MSI or MSI-X interrupts, run the interrupt
service routine as a tasklet so that the re-issuance of the interrupt
is handled properly.
Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
When I2C communication fails, it tends to always fail. Rather than
continuously issue an error message (once per second in most cases),
change the message to be issued just once.
Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The driver has some missing functionality when operating in the mode that
supports 2.5GbE. Fix the driver to fully recognize and support this speed.
Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Currently the function that performs a software reset of the hardware
provides a return code. During driver probe check this return code and
exit with an error if the software reset fails.
Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Just to be on the safe side, should the update of the timestamp registers
not complete, issue a warning rather than looping forever waiting for the
update to complete.
Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Spurious Tx timestamp interrupts can cause an oops in the Tx timestamp
processing function if a Tx timestamp skb is NULL. Add a check to insure
a Tx timestamp skb is present before attempting to use it.
Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
During PTP initialization, the Timestamp Control register should be
cleared and not the Tx Configuration register. While this typo causes
the wrong register to be cleared, the default value of each register and
and the fact that the Tx Configuration register is programmed afterwards
doesn't result in a bug, hence only fixing in net-next.
Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
When using SFPs, the supported and advertised settings should be initially
based on the SFP that has been detected. The code currently indicates the
overall support of the device as opposed to what the SFP is capable of.
Update the code to change the supported link modes, auto-negotiation, etc.
to be based on the installed SFP.
Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Simplify and centralize the mailbox command rate change interface by
having a single function perform the writes to the mailbox registers
to issue the request.
Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Commit 4751832da990 ("btrfs: fiemap: Cache and merge fiemap extent before
submit it to user") introduced a warning to catch unemitted cached
fiemap extent.
However such warning doesn't take the following case into consideration:
0 4K 8K
|<---- fiemap range --->|
|<----------- On-disk extent ------------------>|
In this case, the whole 0~8K is cached, and since it's larger than
fiemap range, it break the fiemap extent emit loop.
This leaves the fiemap extent cached but not emitted, and caught by the
final fiemap extent sanity check, causing kernel warning.
This patch removes the kernel warning and renames the sanity check to
emit_last_fiemap_cache() since it's possible and valid to have cached
fiemap extent.
Reported-by: David Sterba <dsterba@suse.cz>
Reported-by: Adam Borowski <kilobyte@angband.pl>
Fixes: 4751832da990 ("btrfs: fiemap: Cache and merge fiemap extent ...")
Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.com>
|
|
When new directory 'DIR1' is created in a directory 'DIR0' with SGID bit
set, DIR1 is expected to have SGID bit set (and owning group equal to
the owning group of 'DIR0'). However when 'DIR0' also has some default
ACLs that 'DIR1' inherits, setting these ACLs will result in SGID bit on
'DIR1' to get cleared if user is not member of the owning group.
Fix the problem by moving posix_acl_update_mode() out of
__btrfs_set_acl() into btrfs_set_acl(). That way the function will not be
called when inheriting ACLs which is what we want as it prevents SGID
bit clearing and the mode has been properly set by posix_acl_create()
anyway.
Fixes: 073931017b49d9458aa351605b43a7e34598caef
CC: stable@vger.kernel.org
CC: linux-btrfs@vger.kernel.org
CC: David Sterba <dsterba@suse.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: David Sterba <dsterba@suse.com>
|
|
My static checker complains that ofdpa_neigh_del() can sometimes free
"found". It just makes sense to use it first before deleting it.
Fixes: ecf244f753e0 ("rocker: fix maybe-uninitialized warning")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Dave Jones hit a WARN_ON(nr < 0) in btrfs_wait_ordered_roots() with
v4.12-rc6. This was because commit 70e7af244 made it possible for
calc_reclaim_items_nr() to return a negative number. It's not really a
bug in that commit, it just didn't go far enough down the stack to find
all the possible 64->32 bit overflows.
This switches calc_reclaim_items_nr() to return a u64 and changes everyone
that uses the results of that math to u64 as well.
Reported-by: Dave Jones <davej@codemonkey.org.uk>
Fixes: 70e7af2 ("Btrfs: fix delalloc accounting leak caused by u32 overflow")
Signed-off-by: Chris Mason <clm@fb.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
|
|
The commit "btrfs: scrub: inline helper scrub_setup_wr_ctx" inlined a
helper but wrongly sets up the target device. Incidentally there's a
local variable with the same name as a parameter in the previous
function, so this got caught during runtime as crash in test btrfs/027.
Reported-by: Chris Mason <clm@fb.com>
Signed-off-by: David Sterba <dsterba@suse.com>
|
|
ranges
[BUG]
For the following case, btrfs can underflow qgroup reserved space
at an error path:
(Page size 4K, function name without "btrfs_" prefix)
Task A | Task B
----------------------------------------------------------------------
Buffered_write [0, 2K) |
|- check_data_free_space() |
| |- qgroup_reserve_data() |
| Range aligned to page |
| range [0, 4K) <<< |
| 4K bytes reserved <<< |
|- copy pages to page cache |
| Buffered_write [2K, 4K)
| |- check_data_free_space()
| | |- qgroup_reserved_data()
| | Range alinged to page
| | range [0, 4K)
| | Already reserved by A <<<
| | 0 bytes reserved <<<
| |- delalloc_reserve_metadata()
| | And it *FAILED* (Maybe EQUOTA)
| |- free_reserved_data_space()
|- qgroup_free_data()
Range aligned to page range
[0, 4K)
Freeing 4K
(Special thanks to Chandan for the detailed report and analyse)
[CAUSE]
Above Task B is freeing reserved data range [0, 4K) which is actually
reserved by Task A.
And at writeback time, page dirty by Task A will go through writeback
routine, which will free 4K reserved data space at file extent insert
time, causing the qgroup underflow.
[FIX]
For btrfs_qgroup_free_data(), add @reserved parameter to only free
data ranges reserved by previous btrfs_qgroup_reserve_data().
So in above case, Task B will try to free 0 byte, so no underflow.
Reported-by: Chandan Rajendra <chandan@linux.vnet.ibm.com>
Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
Reviewed-by: Chandan Rajendra <chandan@linux.vnet.ibm.com>
Tested-by: Chandan Rajendra <chandan@linux.vnet.ibm.com>
Signed-off-by: David Sterba <dsterba@suse.com>
|
|
Introduce a new parameter, struct extent_changeset for
btrfs_qgroup_reserved_data() and its callers.
Such extent_changeset was used in btrfs_qgroup_reserve_data() to record
which range it reserved in current reserve, so it can free it in error
paths.
The reason we need to export it to callers is, at buffered write error
path, without knowing what exactly which range we reserved in current
allocation, we can free space which is not reserved by us.
This will lead to qgroup reserved space underflow.
Reviewed-by: Chandan Rajendra <chandan@linux.vnet.ibm.com>
Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.com>
|
|
and quotas being enabled
[BUG]
Under the following case, we can underflow qgroup reserved space.
Task A | Task B
---------------------------------------------------------------
Quota disabled |
Buffered write |
|- btrfs_check_data_free_space() |
| *NO* qgroup space is reserved |
| since quota is *DISABLED* |
|- All pages are copied to page |
cache |
| Enable quota
| Quota scan finished
|
| Sync_fs
| |- run_delalloc_range
| |- Write pages
| |- btrfs_finish_ordered_io
| |- insert_reserved_file_extent
| |- btrfs_qgroup_release_data()
| Since no qgroup space is
reserved in Task A, we
underflow qgroup reserved
space
This can be detected by fstest btrfs/104.
[CAUSE]
In insert_reserved_file_extent() we tell qgroup to release the @ram_bytes
size of qgroup reserved_space in all cases.
And btrfs_qgroup_release_data() will check if quotas are enabled.
However in the above case, the buffered write happens before quota is
enabled, so we don't have the reserved space for that range.
[FIX]
In insert_reserved_file_extent(), we tell qgroup to release the acctual
byte number it released.
In the above case, since we don't have the reserved space, we tell
qgroups to release 0 byte, so the problem can be fixed.
And thanks to the @reserved parameter introduced by the qgroup rework,
and previous patch to return released bytes, the fix can be as small as
10 lines.
Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
[ changelog updates ]
Signed-off-by: David Sterba <dsterba@suse.com>
|
|
btrfs_qgroup_release/free_data() only returns 0 or a negative error
number (ENOMEM is the only possible error).
This is normally good enough, but sometimes we need the exact byte
count it freed/released.
Change it to return actually released/freed bytenr number instead of 0
for success.
And slightly modify related extent_changeset structure, since in btrfs
one no-hole data extent won't be larger than 128M, so "unsigned int"
is large enough for the use case.
Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.com>
|
|
Quite a lot of qgroup corruption happens due to wrong time of calling
btrfs_qgroup_prepare_account_extents().
Since the safest time is to call it just before
btrfs_qgroup_account_extents(), there is no need to separate these 2
functions.
Merging them will make code cleaner and less bug prone.
Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
[ changelog and comment adjustments ]
Signed-off-by: David Sterba <dsterba@suse.com>
|
|
Modify btrfs_qgroup_account_extent() to exit quicker for non-fs extents.
The quick exit condition is:
1) The extent belongs to a non-fs tree
Only fs-tree extents can affect qgroup numbers and is the only case
where extent can be shared between different trees.
Although strictly speaking extent in data-reloc or tree-reloc tree
can be shared, data/tree-reloc root won't appear in the result of
btrfs_find_all_roots(), so we can ignore such case.
So we can check the first root in old_roots/new_roots ulist.
- if we find the 1st root is a not a fs/subvol root, then we can skip
the extent
- if we find the 1st root is a fs/subvol root, then we must continue
calculation
OR
2) both 'nr_old_roots' and 'nr_new_roots' are 0
This means either such extent got allocated then freed in current
transaction or it's a new reloc tree extent, whose nr_new_roots is 0.
Either way it won't affect qgroup accounting and can be skipped
safely.
Such quick exit can make trace output more quite and less confusing:
(example with fs uuid and time stamp removed)
Before:
------
add_delayed_tree_ref: bytenr=29556736 num_bytes=16384 action=ADD_DELAYED_REF parent=0(-) ref_root=2(EXTENT_TREE) level=0 type=TREE_BLOCK_REF seq=0
btrfs_qgroup_account_extent: bytenr=29556736 num_bytes=16384 nr_old_roots=0 nr_new_roots=1
------
Extent tree block will trigger btrfs_qgroup_account_extent() trace point
while no qgroup number is changed, as extent tree won't affect qgroup
accounting.
After:
------
add_delayed_tree_ref: bytenr=29556736 num_bytes=16384 action=ADD_DELAYED_REF parent=0(-) ref_root=2(EXTENT_TREE) level=0 type=TREE_BLOCK_REF seq=0
------
Now such unrelated extent won't trigger btrfs_qgroup_account_extent()
trace point, making the trace less noisy.
Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
[ changelog and comment adjustments ]
Signed-off-by: David Sterba <dsterba@suse.com>
|
|
The total_bytes_pinned counter is completely broken when accounting
delayed refs:
- If two drops for the same extent are merged, we will decrement
total_bytes_pinned twice but only increment it once.
- If an add is merged into a drop or vice versa, we will decrement the
total_bytes_pinned counter but never increment it.
- If multiple references to an extent are dropped, we will account it
multiple times, potentially vastly over-estimating the number of bytes
that will be freed by a commit and doing unnecessary work when we're
close to ENOSPC.
The last issue is relatively minor, but the first two make the
total_bytes_pinned counter leak or underflow very often. These
accounting issues were introduced in b150a4f10d87 ("Btrfs: use a percpu
to keep track of possibly pinned bytes"), but they were papered over by
zeroing out the counter on every commit until d288db5dc011 ("Btrfs: fix
race of using total_bytes_pinned").
We need to make sure that an extent is accounted as pinned exactly once
if and only if we will drop references to it when when the transaction
is committed. Ideally we would only add to total_bytes_pinned when the
*last* reference is dropped, but this information isn't readily
available for data extents. Again, this over-estimation can lead to
extra commits when we're close to ENOSPC, but it's not as bad as before.
The fix implemented here is to increment total_bytes_pinned when the
total refmod count for an extent goes negative and decrement it if the
refmod count goes back to non-negative or after we've run all of the
delayed refs for that extent.
Signed-off-by: Omar Sandoval <osandov@fb.com>
Tested-by: Holger Hoffstätte <holger@applied-asynchrony.com>
Reviewed-by: Liu Bo <bo.li.liu@oracle.com>
Signed-off-by: David Sterba <dsterba@suse.com>
|
|
We need this to decide when to account pinned bytes.
Signed-off-by: Omar Sandoval <osandov@fb.com>
Tested-by: Holger Hoffstätte <holger@applied-asynchrony.com>
Signed-off-by: David Sterba <dsterba@suse.com>
|
|
Currently, we only increment total_bytes_pinned in
btrfs_free_tree_block() when dropping the last reference on the block.
However, when the delayed ref is run later, we will decrement
total_bytes_pinned regardless of whether it was the last reference or
not. This causes the counter to underflow when the reference we dropped
was not the last reference. Fix it by incrementing the counter
unconditionally, which is what btrfs_free_extent() does. This makes
total_bytes_pinned an overestimate when references to shared extents are
dropped, but in the worst case this will just make us try to commit the
transaction to try to free up space and find we didn't free enough.
Signed-off-by: Omar Sandoval <osandov@fb.com>
Tested-by: Holger Hoffstätte <holger@applied-asynchrony.com>
Reviewed-by: Liu Bo <bo.li.liu@oracle.com>
Signed-off-by: David Sterba <dsterba@suse.com>
|
|
The extents marked in pin_down_extent() will be unpinned later in
unpin_extent_range(), which decrements total_bytes_pinned.
pin_down_extent() must increment the counter to avoid underflowing it.
Also adjust btrfs_free_tree_block() to avoid accounting for the same
extent twice.
Signed-off-by: Omar Sandoval <osandov@fb.com>
Tested-by: Holger Hoffstätte <holger@applied-asynchrony.com>
Reviewed-by: Liu Bo <bo.li.liu@oracle.com>
Signed-off-by: David Sterba <dsterba@suse.com>
|