Age | Commit message (Collapse) | Author |
|
Johannes pointed out that after commit 886cf1901db9 ("mm: move
recent_rotated pages calculation to shrink_inactive_list()") we lost all
zone_reclaim_stat::recent_rotated history.
This fixes it.
Link: http://lkml.kernel.org/r/155905972210.26456.11178359431724024112.stgit@localhost.localdomain
Fixes: 886cf1901db9 ("mm: move recent_rotated pages calculation to shrink_inactive_list()")
Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Reported-by: Johannes Weiner <hannes@cmpxchg.org>
Acked-by: Michal Hocko <mhocko@suse.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Cc: Daniel Jordan <daniel.m.jordan@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
If mlockall() is called with only MCL_ONFAULT as flag, it removes any
previously applied lockings and does nothing else.
This behavior is counter-intuitive and doesn't match the Linux man page.
For mlockall():
EINVAL Unknown flags were specified or MCL_ONFAULT was specified
without either MCL_FUTURE or MCL_CURRENT.
Consequently, return the error EINVAL, if only MCL_ONFAULT is passed.
That way, applications will at least detect that they are calling
mlockall() incorrectly.
Link: http://lkml.kernel.org/r/20190527075333.GA6339@er01809n.ebgroup.elektrobit.com
Fixes: b0f205c2a308 ("mm: mlock: add mlock flags to enable VM_LOCKONFAULT usage")
Signed-off-by: Stefan Potyra <Stefan.Potyra@elektrobit.com>
Reviewed-by: Daniel Jordan <daniel.m.jordan@oracle.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
At least for ARM64 kernels compiled with the crosstoolchain from
Debian/stretch or with the toolchain from kernel.org the line number is
not decoded correctly by 'decode_stacktrace.sh':
$ echo "[ 136.513051] f1+0x0/0xc [kcrash]" | \
CROSS_COMPILE=/opt/gcc-8.1.0-nolibc/aarch64-linux/bin/aarch64-linux- \
./scripts/decode_stacktrace.sh /scratch/linux-arm64/vmlinux \
/scratch/linux-arm64 \
/nfs/debian/lib/modules/4.20.0-devel
[ 136.513051] f1 (/linux/drivers/staging/kcrash/kcrash.c:68) kcrash
If addr2line from the toolchain is used the decoded line number is correct:
[ 136.513051] f1 (/linux/drivers/staging/kcrash/kcrash.c:57) kcrash
Link: http://lkml.kernel.org/r/20190527083425.3763-1-manut@linutronix.de
Signed-off-by: Manuel Traut <manut@linutronix.de>
Acked-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Syzbot reported following memory leak:
ffffffffda RBX: 0000000000000003 RCX: 0000000000441f79
BUG: memory leak
unreferenced object 0xffff888114f26040 (size 32):
comm "syz-executor626", pid 7056, jiffies 4294948701 (age 39.410s)
hex dump (first 32 bytes):
40 60 f2 14 81 88 ff ff 40 60 f2 14 81 88 ff ff @`......@`......
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
backtrace:
slab_post_alloc_hook mm/slab.h:439 [inline]
slab_alloc mm/slab.c:3326 [inline]
kmem_cache_alloc_trace+0x13d/0x280 mm/slab.c:3553
kmalloc include/linux/slab.h:547 [inline]
__memcg_init_list_lru_node+0x58/0xf0 mm/list_lru.c:352
memcg_init_list_lru_node mm/list_lru.c:375 [inline]
memcg_init_list_lru mm/list_lru.c:459 [inline]
__list_lru_init+0x193/0x2a0 mm/list_lru.c:626
alloc_super+0x2e0/0x310 fs/super.c:269
sget_userns+0x94/0x2a0 fs/super.c:609
sget+0x8d/0xb0 fs/super.c:660
mount_nodev+0x31/0xb0 fs/super.c:1387
fuse_mount+0x2d/0x40 fs/fuse/inode.c:1236
legacy_get_tree+0x27/0x80 fs/fs_context.c:661
vfs_get_tree+0x2e/0x120 fs/super.c:1476
do_new_mount fs/namespace.c:2790 [inline]
do_mount+0x932/0xc50 fs/namespace.c:3110
ksys_mount+0xab/0x120 fs/namespace.c:3319
__do_sys_mount fs/namespace.c:3333 [inline]
__se_sys_mount fs/namespace.c:3330 [inline]
__x64_sys_mount+0x26/0x30 fs/namespace.c:3330
do_syscall_64+0x76/0x1a0 arch/x86/entry/common.c:301
entry_SYSCALL_64_after_hwframe+0x44/0xa9
This is a simple off by one bug on the error path.
Link: http://lkml.kernel.org/r/20190528043202.99980-1-shakeelb@google.com
Fixes: 60d3fd32a7a9 ("list_lru: introduce per-memcg lists")
Reported-by: syzbot+f90a420dfe2b1b03cb2c@syzkaller.appspotmail.com
Signed-off-by: Shakeel Butt <shakeelb@google.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Reviewed-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Cc: <stable@vger.kernel.org> [4.0+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
The kernel test robot noticed a 26% will-it-scale pagefault regression
from commit 42a300353577 ("mm: memcontrol: fix recursive statistics
correctness & scalabilty"). This appears to be caused by bouncing the
additional cachelines from the new hierarchical statistics counters.
We can fix this by getting rid of the batched local counters instead.
Originally, there were *only* group-local counters, and they were fully
maintained per cpu. A reader of a stats file high up in the cgroup tree
would have to walk the entire subtree and collect each level's per-cpu
counters to get the recursive view. This was prohibitively expensive,
and so we switched to per-cpu batched updates of the local counters
during a983b5ebee57 ("mm: memcontrol: fix excessive complexity in
memory.stat reporting"), reducing the complexity from nr_subgroups *
nr_cpus to nr_subgroups.
With growing machines and cgroup trees, the tree walk itself became too
expensive for monitoring top-level groups, and this is when the culprit
patch added hierarchy counters on each cgroup level. When the per-cpu
batch size would be reached, both the local and the hierarchy counters
would get batch-updated from the per-cpu delta simultaneously.
This makes local and hierarchical counter reads blazingly fast, but it
unfortunately makes the write-side too cache line intense.
Since local counter reads were never a problem - we only centralized
them to accelerate the hierarchy walk - and use of the local counters
are becoming rarer due to replacement with hierarchical views (ongoing
rework in the page reclaim and workingset code), we can make those local
counters unbatched per-cpu counters again.
The scheme will then be as such:
when a memcg statistic changes, the writer will:
- update the local counter (per-cpu)
- update the batch counter (per-cpu). If the batch is full:
- spill the batch into the group's atomic_t
- spill the batch into all ancestors' atomic_ts
- empty out the batch counter (per-cpu)
when a local memcg counter is read, the reader will:
- collect the local counter from all cpus
when a hiearchy memcg counter is read, the reader will:
- read the atomic_t
We might be able to simplify this further and make the recursive
counters unbatched per-cpu counters as well (batch upward propagation,
but leave per-cpu collection to the readers), but that will require a
more in-depth analysis and testing of all the callsites. Deal with the
immediate regression for now.
Link: http://lkml.kernel.org/r/20190521151647.GB2870@cmpxchg.org
Fixes: 42a300353577 ("mm: memcontrol: fix recursive statistics correctness & scalabilty")
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Reported-by: kernel test robot <rong.a.chen@intel.com>
Tested-by: kernel test robot <rong.a.chen@intel.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Shakeel Butt <shakeelb@google.com>
Cc: Roman Gushchin <guro@fb.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Commit d491f2b75237 ("PCI: PM: Avoid possible suspend-to-idle issue")
attempted to avoid a problem with devices whose drivers want them to
stay in D0 over suspend-to-idle and resume, but it did not go as far
as it should with that.
Namely, first of all, the power state of a PCI bridge with a
downstream device in D0 must be D0 (based on the PCI PM spec r1.2,
sec 6, table 6-1, if the bridge is not in D0, there can be no PCI
transactions on its secondary bus), but that is not actively enforced
during system-wide PM transitions, so use the skip_bus_pm flag
introduced by commit d491f2b75237 for that.
Second, the configuration of devices left in D0 (whatever the reason)
during suspend-to-idle need not be changed and attempting to put them
into D0 again by force is pointless, so explicitly avoid doing that.
Fixes: d491f2b75237 ("PCI: PM: Avoid possible suspend-to-idle issue")
Reported-by: Kai-Heng Feng <kai.heng.feng@canonical.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Reviewed-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Tested-by: Kai-Heng Feng <kai.heng.feng@canonical.com>
|
|
Naveen N. Rao says:
====================
The first patch updates DIV64 overflow tests to properly detect error
conditions. The second patch fixes powerpc64 JIT to generate the proper
unsigned division instruction for BPF_ALU64.
====================
Acked-by: Sandipan Das <sandipan@linux.ibm.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
|
|
BPF_ALU64 div/mod operations are currently using signed division, unlike
BPF_ALU32 operations. Fix the same. DIV64 and MOD64 overflow tests pass
with this fix.
Fixes: 156d0e290e969c ("powerpc/ebpf/jit: Implement JIT compiler for extended BPF")
Cc: stable@vger.kernel.org # v4.8+
Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
|
|
If the result of the division is LLONG_MIN, current tests do not detect
the error since the return value is truncated to a 32-bit value and ends
up being 0.
Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
|
|
Jose Abreu says:
====================
net: stmmac: Convert to phylink
This converts stmmac to use phylink. Besides the code redution this will
allow to gain more flexibility.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Convert everything to phylink.
Signed-off-by: Jose Abreu <joabreu@synopsys.com>
Cc: Joao Pinto <jpinto@synopsys.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Giuseppe Cavallaro <peppe.cavallaro@st.com>
Cc: Alexandre Torgue <alexandre.torgue@st.com>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Andrew Lunn <andrew@lunn.ch>
Cc: Florian Fainelli <f.fainelli@gmail.com>
Cc: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Start adding the phylink callbacks.
Signed-off-by: Jose Abreu <joabreu@synopsys.com>
Cc: Joao Pinto <jpinto@synopsys.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Giuseppe Cavallaro <peppe.cavallaro@st.com>
Cc: Alexandre Torgue <alexandre.torgue@st.com>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Andrew Lunn <andrew@lunn.ch>
Cc: Florian Fainelli <f.fainelli@gmail.com>
Cc: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
In preparation for the convertion, split the adjust_link function into
mac_config and add the mac_link_up and mac_link_down functions.
Signed-off-by: Jose Abreu <joabreu@synopsys.com>
Cc: Joao Pinto <jpinto@synopsys.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Giuseppe Cavallaro <peppe.cavallaro@st.com>
Cc: Alexandre Torgue <alexandre.torgue@st.com>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Andrew Lunn <andrew@lunn.ch>
Cc: Florian Fainelli <f.fainelli@gmail.com>
Cc: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Fix sparse warning:
drivers/net/ethernet/qlogic/qede/qede_main.c:963:6:
warning: symbol 'qede_lock' was not declared. Should it be static?
drivers/net/ethernet/qlogic/qede/qede_main.c:969:6:
warning: symbol 'qede_unlock' was not declared. Should it be static?
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Fix sparse warnings:
drivers/net/dsa/sja1105/sja1105_main.c:1848:6:
warning: symbol 'sja1105_port_rxtstamp' was not declared. Should it be static?
drivers/net/dsa/sja1105/sja1105_main.c:1869:6:
warning: symbol 'sja1105_port_txtstamp' was not declared. Should it be static?
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Tested-by: Vladimir Oltean <olteanv@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Sync the changes to the flags made in "bpf: simplify definition of
BPF_FIB_LOOKUP related flags" with the BPF UAPI headers.
Doing in a separate commit to ease syncing of github/libbpf.
Signed-off-by: Martynas Pumputis <m@lambda.lt>
Acked-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
|
|
Previously, the BPF_FIB_LOOKUP_{DIRECT,OUTPUT} flags in the BPF UAPI
were defined with the help of BIT macro. This had the following issues:
- In order to use any of the flags, a user was required to depend
on <linux/bits.h>.
- No other flag in bpf.h uses the macro, so it seems that an unwritten
convention is to use (1 << (nr)) to define BPF-related flags.
Fixes: 87f5fc7e48dd ("bpf: Provide helper to do forwarding lookups in kernel FIB table")
Signed-off-by: Martynas Pumputis <m@lambda.lt>
Acked-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
|
|
Documentation for devlink health reporters supported by mlx5.
Signed-off-by: Moshe Shemesh <moshe@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
|
Report devlink health on FW fatal issues via fw_fatal_reporter. The
driver recover flow for FW fatal error is now being handled by the
devlink health.
Having the recovery controlled by devlink health, the user has the
ability to cancel the auto-recovery for debug session and run it
manually.
Call mlx5_enter_error_state() before calling devlink_health_report() to
ensure entering device error state even if auto-recovery is off.
Signed-off-by: Moshe Shemesh <moshe@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
|
Add support of dump callback for mlx5 FW fatal reporter.
The FW fatal dump uses cr-dump functionality to gather cr-space data for
debug. The cr-dump uses vsc interface which is valid even if the FW
command interface is not functional, which is the case in most FW fatal
errors.
Command example and output:
$ devlink health dump show pci/0000:82:00.0 reporter fw_fatal
crdump_data:
00 20 00 01 00 00 00 00 03 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 80
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 ba 82 00 00
0c 00 00 00 00 00 00 00 00 00 00 00 00 00 00 20
00 00 00 00 00 00 00 00 00 00 00 00 00 00 fa 00
a4 0e 00 00 00 00 00 00 80 c7 fe ff 50 0a 00 00
...
...
Signed-off-by: Moshe Shemesh <moshe@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
|
Create mlx5_devlink_health_reporter for fw fatal reporter.
The fw fatal reporter is added in addition to the fw reporter and
implements the recover callback.
The point of having two reporters for FW issues, is that we
don't want to run FW recover on any issue, but only fatal ones.
Signed-off-by: Moshe Shemesh <moshe@mellanox.com>
Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
|
Use devlink_health_report() to report any symptom of FW issue as FW
counter miss or new health syndrome.
The FW issues detected in mlx5 during poll_health which is called in
timer atomic context and so health work queue is used to schedule the
reports.
Signed-off-by: Moshe Shemesh <moshe@mellanox.com>
Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
|
Add support of dump callback for mlx5 FW reporter. Once we trigger FW
dump, the FW will write the core dump to its raw data buffer. The tracer
translates the raw data to traces and save it to a cyclic array. Once
dump is done, the saved traces data is filled into the dump buffer. In
case syndrome is not zero the health buffer content will be printed as
well.
FW dump example:
$ devlink health dump show pci/0000:82:00.0 reporter fw
dump fw traces:
timestamp: 509006640427 lost: false event_id: 185 msg: dump general
info GVMI=0x0000
timestamp: 509006645474 lost: false event_id: 185 msg: GVMI
management info, gvmi_management context:
timestamp: 509006654463 lost: false event_id: 185 msg: [000]:
00000000 00000000 00000000 00000000
timestamp: 509006656127 lost: false event_id: 185 msg: [010]:
00000000 00000000 00000000 00000000
timestamp: 509006656255 lost: false event_id: 185 msg: [020]:
00000000 00000000 00000000 00000000
timestamp: 509006656511 lost: false event_id: 185 msg: [030]:
00000000 00000000 00000000 00000000
timestamp: 509006656639 lost: false event_id: 185 msg: [040]:
00000000 00000000 00000000 00000000
timestamp: 509006656895 lost: false event_id: 185 msg: [050]:
00000000 00000000 00000000 00000000
timestamp: 509006657023 lost: false event_id: 185 msg: [060]:
00000000 00000000 00000000 00000000
timestamp: 509006657180 lost: false event_id: 185 msg: [070]:
00000000 00000000 00000000 00000000
timestamp: 509006659839 lost: false event_id: 185 msg: CMDIF dbase
from IRON: active_dbase_slots = 0x00000000
timestamp: 509006667391 lost: false event_id: 185 msg: GVMI=0x0000
hw_toc context:
timestamp: 509006667647 lost: false event_id: 185 msg: [000]:
00000000 00000000 00000000 fffff000
timestamp: 509006667775 lost: false event_id: 185 msg: [010]:
00000000 00000000 00000000 80d00000
...
...
Signed-off-by: Moshe Shemesh <moshe@mellanox.com>
Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
|
Create mlx5_devlink_health_reporter for FW reporter. The FW reporter
implements devlink_health_reporter diagnose callback.
The fw reporter diagnose command can be triggered any time by the user
to check current fw status.
In healthy status, it will return clear syndrome. Otherwise it will
return the syndrome and description of the error type.
Command example and output on healthy status:
$ devlink health diagnose pci/0000:82:00.0 reporter fw
Syndrome: 0
Command example and output on non healthy status:
$ devlink health diagnose pci/0000:82:00.0 reporter fw
Syndrome: 8 Description: unrecoverable hardware error
Signed-off-by: Moshe Shemesh <moshe@mellanox.com>
Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
|
If a FW assert is considered fatal, indicated by a new bit in the health
buffer, reset the FW. After the reset go through the normal recovery
flow. Only one PF needs to issue the reset, so an attempt is made to
prevent the 2nd function from also issuing the reset.
It's not an error if that happens, it just slows recovery.
Signed-off-by: Feras Daoud <ferasda@mellanox.com>
Signed-off-by: Alex Vesker <valex@mellanox.com>
Signed-off-by: Moshe Shemesh <moshe@mellanox.com>
Signed-off-by: Daniel Jurgens <danielj@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
|
Since the FW can be shared between different PFs/VFs it is common
that more than one health poll will detected a failure, this can
lead to multiple resets which are unneeded.
The solution is to use a FW locking mechanism using semaphore space
to provide a way to allow only one device to collect the cr-dump and
to issue a sw-reset.
Signed-off-by: Feras Daoud <ferasda@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
|
New mlx5 adapters allow the driver to reset the FW in the event of an
error, this action called "SW Reset". When an SW reset is issued on any
PF all PFs enter reset state which is a recoverable condition. The
existing recovery flow was designed to allow the recovery of a VF after
a PF driver reload. This patch adds the sw reset to the NIC states
as a preparation for sw reset handling.
When a software reset is issued the following occurs:
1. The NIC interface mode is set to 7 while the reset is in progress.
2. Once the reset completes the NIC interface mode is set to 1.
Signed-off-by: Feras Daoud <ferasda@mellanox.com>
Signed-off-by: Moshe Shemesh <moshe@mellanox.com>
Signed-off-by: Daniel Jurgens <danielj@mellanox.com>
Reviewed-by: Alex Vesker <valex@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
|
Crdump allows the driver to retrieve a dump of the FW PCI crspace.
This is useful in case of catastrophic issues which may require FW
reset. The crspace dump can be used for later debug.
Signed-off-by: Alex Vesker <valex@mellanox.com>
Signed-off-by: Moshe Shemesh <moshe@mellanox.com>
Reviewed-by: Feras Daoud <ferasda@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
|
The Vendor Specific Capability (VSC) is used to activate a gateway
interfacing with the device. The gateway is used to read or write
device configurations, which are organized in different domains (spaces).
A configuration access may result in multiple actions, reads, writes.
Example usages are accessing the Crspace domain to read the crspace or
locking a device semaphore using the Semaphore domain.
The configuration access use pci_cfg_access to prevent parallel access to
the VSC space by the driver and userspace calls.
Signed-off-by: Alex Vesker <valex@mellanox.com>
Signed-off-by: Feras Daoud <ferasda@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
|
Centralize all devlink related callbacks in one file.
In the downstream patch, some more functionality will be added, this
patch is preparing the driver infrastructure for it.
Currently, move devlink un/register functions calls into this file.
Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
Reviewed-by: Moshe Shemesh <moshe@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
|
Add initial documentation for mlx5 driver.
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
|
The devlink health reporter provides a dump method on an error. Dump
may contain a large amount of data, in this case doit cb isn't sufficient.
This is because the user side is blocking and doesn't allow draining of
the socket until the socket runs out of buffers. Using dumpit cb
is the correct way to go.
Please note that thankfully the dump op is not yet implemented in any
driver and therefore this change is not breaking userspace.
Fixes: 35455e23e6f3 ("devlink: Add health dump {get,clear} commands")
Signed-off-by: Aya Levin <ayal@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
|
We can not depend on the tcon->open_file_lock here since in multiuser mode
we may have the same file/inode open via multiple different tcons.
The current code is race prone and will crash if one user deletes a file
at the same time a different user opens/create the file.
To avoid this we need to have a spinlock attached to the inode and not the tcon.
RHBZ: 1580165
CC: Stable <stable@vger.kernel.org>
Signed-off-by: Ronnie Sahlberg <lsahlber@redhat.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
Reviewed-by: Pavel Shilovsky <pshilov@microsoft.com>
|
|
RH Bugzilla: 1702264
We need to protect so that the call to smb2_reconnect() in
smb2_reconnect_server() does not end up freeing the session
because it can lead to a use after free and crash.
Reviewed-by: Aurelien Aptel <aaptel@suse.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Ronnie Sahlberg <lsahlber@redhat.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
Reviewed-by: Pavel Shilovsky <pshilov@microsoft.com>
|
|
current->mm can be non-NULL if a kthread calls use_mm(). Check for
PF_KTHREAD instead to decide when to store user mode FP state.
Fixes: 2722146eb784 ("x86/fpu: Remove fpu->initialized")
Reported-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Borislav Petkov <bp@suse.de>
Acked-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: Aubrey Li <aubrey.li@intel.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jann Horn <jannh@google.com>
Cc: Nicolai Stange <nstange@suse.de>
Cc: Rik van Riel <riel@surriel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: x86-ml <x86@kernel.org>
Link: https://lkml.kernel.org/r/20190604175411.GA27477@lst.de
|
|
Previously, EQ joined the chain notifier on creation.
This forced the caller to be ready to handle events before creating
the EQ through eq_create_generic interface.
To help the caller control when the created EQ will be attached to the
IRQ, add enable/disable API.
Signed-off-by: Yuval Avnery <yuvalav@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
|
The patch modifies the IRQ allocation so that all async EQs are
assigned to the same IRQ resulting in more available IRQs for
completion EQs.
The changes are using the support for IRQ sharing and EQ polling budget
that was introduced in previous patches so when the shared interrupt is
triggered, the kernel will serially call the handler of each of the
sharing EQs with a certain budget of EQEs to poll in order to prevent
starvation.
Signed-off-by: Ariel Levkovich <lariel@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
|
struct mlx5_irq_info is an active object and not just info.
Signed-off-by: Yuval Avnery <yuvalav@mellanox.com>
Reviewed-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
|
Finalize IRQ separation and expose irq interface.
Signed-off-by: Yuval Avnery <yuvalav@mellanox.com>
Reviewed-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
|
IRQ interface should operate within the irq_table context.
It should be independent of any EQ data structure.
The interface that will be exposed:
init/clenup, create/destroy, attach/detach
Signed-off-by: Yuval Avnery <yuvalav@mellanox.com>
Reviewed-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
|
IRQ allocation should be part of the IRQ table life-cycle.
Signed-off-by: Yuval Avnery <yuvalav@mellanox.com>
Reviewed-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
|
Affinity set/clear is part of the IRQ life-cycle.
Signed-off-by: Yuval Avnery <yuvalav@mellanox.com>
Reviewed-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
|
Rmap creation/deletion is part of the IRQ life-cycle.
Signed-off-by: Yuval Avnery <yuvalav@mellanox.com>
Reviewed-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
|
IRQ table should only exist for mlx5_core_dev for PF and VF only.
EQ table of mediated devices should hold a pointer to the IRQ table
of the parent PCI device.
Signed-off-by: Yuval Avnery <yuvalav@mellanox.com>
Reviewed-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
|
Instead of requesting IRQ with eq creation, IRQs will be requested
before EQ table creation.
Instead of freeing the IRQs after EQ destroy, free IRQs after eq
table destroy.
Signed-off-by: Yuval Avnery <yuvalav@mellanox.com>
Reviewed-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
|
Multiple EQs may share the same IRQ in subsequent patches.
Instead of calling the IRQ handler directly, the EQ will register
to an atomic chain notfier.
The Linux built-in shared IRQ is not used because it forces the caller
to disable the IRQ and clear affinity before free_irq() can be called.
This patch is the first step in the separation of IRQ and EQ logic.
Signed-off-by: Yuval Avnery <yuvalav@mellanox.com>
Reviewed-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
|
Multiple EQs may share the same irq in subsequent patches.
To avoid starvation, a budget is set per EQ's interrupt handler.
Because of this change, it is no longer required to check that
MLX5_NUM_SPARE_EQE eqes were polled (to detect that arm is required).
It is guaranteed that MLX5_NUM_SPARE_EQE > budget, therefore the
handler will arm and exit the handler before all the entries in the
eq are polled.
In the scenario where the handler is out of budget and there are more
EQEs to poll, arming the EQ guarantees that the HW will send another
interrupt and the handler will be called again.
Signed-off-by: Yuval Avnery <yuvalav@mellanox.com>
Reviewed-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
|
For ECPF with eswitch manager privilege, query the host max VF count
by querying the device using query_functions command.
With this enhancement:
1. flow steering entries are created only for valid vports based on
the max VF count of the PF.
2. Driver only queries cap of valid vport.
Eswitch requires the max VFs when doing initialization, so do sr-iov
init before eswitch init.
Signed-off-by: Bodong Wang <bodong@mellanox.com>
Reviewed-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
|
Current function only returns host num of VFs, later patch requires
other params such as host maximum num of VFs.
Return the raw output so that caller can extract info as needed.
Signed-off-by: Bodong Wang <bodong@mellanox.com>
Reviewed-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
|
Unified representors creation in esw_functions_changed context
handler. Emulate the esw_function_changed event for FW/HW that
does not support this event.
Signed-off-by: Vu Pham <vuhuong@mellanox.com>
Reviewed-by: Parav Pandit <parav@mellanox.com>
Reviewed-by: Bodong Wang <bodong@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|