Age | Commit message (Collapse) | Author |
|
Change 'handeled' to 'handled' in the Kconfig help for SCTP.
Signed-off-by: Chris Packham <chris.packham@alliedtelesis.co.nz>
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Michael Chan says:
====================
bnxt_en: Bug fixes.
3 bnxt_en driver fixes, covering a bug in preserving the counters during
some resets, proper error code when flashing NVRAM fails, and an
endian bug when extracting the firmware response message length.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The explicit mask and shift is not the appropriate way to parse fields
out of a little endian struct. The length field is internally __le16
and the strategy employed only happens to work on little endian machines
because the offset used is actually incorrect (length is at offset 6).
Also remove the related and no longer used definitions from bnxt.h.
Fixes: 845adfe40c2a ("bnxt_en: Improve valid bit checking in firmware response message.")
Signed-off-by: Edwin Peer <edwin.peer@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
When NVRAM directory is not found, return the error code
properly as per firmware command failure instead of the hardcode
-ENOBUFS.
Fixes: 3a707bed13b7 ("bnxt_en: Return -EAGAIN if fw command returns BUSY")
Signed-off-by: Vasundhara Volam <vasundhara-v.volam@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
We have logic to maintain network counters across resets by storing
the counters in bp->net_stats_prev before reset. But not all resets
will clear the counters. Certain resets that don't need to change
the number of rings do not clear the counters. The current logic
accumulates the counters before all resets, causing big jumps in
the counters after some resets, such as ethtool -G.
Fix it by only accumulating the counters during reset if the irq_re_init
parameter is set. The parameter signifies that all rings and interrupts
will be reset and that means that the counters will also be reset.
Reported-by: Vijayendra Suman <vijayendra.suman@oracle.com>
Fixes: b8875ca356f1 ("bnxt_en: Save ring statistics before reset.")
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
We can try to coalesce skbs we take from the subflows rx queue with the
tail of the mptcp rx queue.
If successful, the skb head can be discarded early.
We can also free the skb extensions, we do not access them after this.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Add support for Telit LE910C1-EUX composition
0x1031: tty, tty, tty, rmnet
Signed-off-by: Daniele Palmas <dnlplm@gmail.com>
Acked-by: Bjørn Mork <bjorn@mork.no>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Don't call netif_napi_del() manually, free_netdev() does this for us.
In addition reorder calls to match reverse order of calls in probe().
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Syzkaller again found a path to a kernel crash through bad gso input:
a packet with gso size exceeding len.
These packets are dropped in tcp_gso_segment and udp[46]_ufo_fragment.
But they may affect gso size calculations earlier in the path.
Now that we have thlen as of commit 9274124f023b ("net: stricter
validation of untrusted gso packets"), check gso_size at entry too.
Fixes: bfd5f4a3d605 ("packet: Add GSO/csum offload support.")
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Fugang Duan says:
====================
net: ethernet: fec: move GPR register offset and bit into DT
The commit da722186f654 (net: fec: set GPR bit on suspend by
DT configuration) set the GPR reigster offset and bit in driver
for wol feature support.
It brings trouble to enable wol feature on imx6sx/imx6ul/imx7d
platforms that have multiple ethernet instances with different
GPR bit for stop mode control. So the patch set is to move GPR
register offset and bit define into DT, and enable imx6q/imx6dl
imx6qp/imx6sx/imx6ul/imx7d stop mode support.
Currently, below NXP i.MX boards support wol:
- imx6q/imx6dl/imx6qp sabresd
- imx6sx sabreauto
- imx7d sdb
imx6q/imx6dl/imx6qp sabresd board dts file miss the property
"fsl,magic-packet;", so patch#4 is to add the property for stop
mode support.
v1 -> v2:
- driver: switch back to store the quirks bitmask in driver_data
- dt-bindings: rename 'gpr' property string to 'fsl,stop-mode'
- imx6/7 dtsi: add imx6sx/imx6ul/imx7d ethernet stop mode property
v2 -> v3:
- driver: suggested by Sascha Hauer, use a struct fec_devinfo for
abstracting differences between different hardware variants,
it can give more freedom to describe the differences.
- imx6/7 dtsi: correct one typo pointed out by Andrew.
Thanks Martin, Andrew and Sascha Hauer for the review.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Enable ethernet wake-on-lan feature for imx6q/dl/qp sabresd
boards since the PHY clock is supplied by external osc.
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Fugang Duan <fugang.duan@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
- Update the imx6qdl gpr property to define gpr register
offset and bit in DT.
- Add imx6sx/imx6ul/imx7d ethernet stop mode property.
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Fugang Duan <fugang.duan@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
- rename the 'gpr' property string to 'fsl,stop-mode'.
- Update the property to define gpr register offset and
bit in DT, since different instance have different gpr bit.
v2:
* rename 'gpr' property string to 'fsl,stop-mode'.
Signed-off-by: Fugang Duan <fugang.duan@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The commit da722186f654 (net: fec: set GPR bit on suspend by DT
configuration) set the GPR reigster offset and bit in driver for
wake on lan feature.
But it introduces two issues here:
- one SOC has two instances, they have different bit
- different SOCs may have different offset and bit
So to support wake-on-lan feature on other i.MX platforms, it should
configure the GPR reigster offset and bit from DT.
So the patch is to improve the commit da722186f654 (net: fec: set GPR
bit on suspend by DT configuration) to support multiple ethernet
instances on i.MX series.
v2:
* switch back to store the quirks bitmask in driver_data
v3:
* suggested by Sascha Hauer, use a struct fec_devinfo for
abstracting differences between different hardware variants,
it can give more freedom to describe the differences.
Signed-off-by: Fugang Duan <fugang.duan@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Netlink policies are generally declared as const.
This is safer and prevents potential bugs.
Signed-off-by: Dmitry Vyukov <dvyukov@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
In the MPTCP receive path we must cope with TCP fallback
on blocking recvmsg(). Currently in such code path we detect
the fallback condition, but we don't fetch the struct socket
required for fallback.
The above allowed syzkaller to trigger a NULL pointer
dereference:
general protection fault, probably for non-canonical address 0xdffffc0000000004: 0000 [#1] PREEMPT SMP KASAN
KASAN: null-ptr-deref in range [0x0000000000000020-0x0000000000000027]
CPU: 1 PID: 7226 Comm: syz-executor523 Not tainted 5.7.0-rc6-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
RIP: 0010:sock_recvmsg_nosec net/socket.c:886 [inline]
RIP: 0010:sock_recvmsg+0x92/0x110 net/socket.c:904
Code: 5b 41 5c 41 5d 41 5e 41 5f 5d c3 44 89 6c 24 04 e8 53 18 1d fb 4d 8d 6f 20 4c 89 e8 48 c1 e8 03 48 b9 00 00 00 00 00 fc ff df <80> 3c 08 00 74 08 4c 89 ef e8 20 12 5b fb bd a0 00 00 00 49 03 6d
RSP: 0018:ffffc90001077b98 EFLAGS: 00010202
RAX: 0000000000000004 RBX: ffffc90001077dc0 RCX: dffffc0000000000
RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
RBP: 0000000000000000 R08: ffffffff86565e59 R09: ffffed10115afeaa
R10: ffffed10115afeaa R11: 0000000000000000 R12: 1ffff9200020efbc
R13: 0000000000000020 R14: ffffc90001077de0 R15: 0000000000000000
FS: 00007fc6a3abe700(0000) GS:ffff8880ae900000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00000000004d0050 CR3: 00000000969f0000 CR4: 00000000001406e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
mptcp_recvmsg+0x18d5/0x19b0 net/mptcp/protocol.c:891
inet_recvmsg+0xf6/0x1d0 net/ipv4/af_inet.c:838
sock_recvmsg_nosec net/socket.c:886 [inline]
sock_recvmsg net/socket.c:904 [inline]
__sys_recvfrom+0x2f3/0x470 net/socket.c:2057
__do_sys_recvfrom net/socket.c:2075 [inline]
__se_sys_recvfrom net/socket.c:2071 [inline]
__x64_sys_recvfrom+0xda/0xf0 net/socket.c:2071
do_syscall_64+0xf3/0x1b0 arch/x86/entry/common.c:295
entry_SYSCALL_64_after_hwframe+0x49/0xb3
Address the issue initializing the struct socket reference
before entering the fallback code.
Reported-and-tested-by: syzbot+c6bfc3db991edc918432@syzkaller.appspotmail.com
Suggested-by: Ondrej Mosnacek <omosnace@redhat.com>
Fixes: 8ab183deb26a ("mptcp: cope with later TCP fallback")
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211-next
Johannes Berg says:
====================
One batch of changes, containing:
* hwsim improvements from Jouni and myself, to be able to
test more scenarios easily
* some more HE (802.11ax) support
* some initial S1G (sub 1 GHz) work for fractional MHz channels
* some (action) frame registration updates to help DPP support
* along with other various improvements/fixes
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Documentation says that gpll0 is parent of gpll0_out_even, somehow
driver coded that as bi_tcxo, so fix it
Fixes: 2a1d7eb854bb ("clk: qcom: gcc: Add global clock controller driver for SM8150")
Reported-by: Jonathan Marek <jonathan@marek.ca>
Signed-off-by: Vinod Koul <vkoul@kernel.org>
Link: https://lkml.kernel.org/r/20200521052728.2141377-1-vkoul@kernel.org
Reviewed-by: Bjorn Andersson <bjorn.andersson@linaro.org>
Signed-off-by: Stephen Boyd <sboyd@kernel.org>
|
|
The driver will always fail to probe without QCOM_GDSC, so select it.
Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Link: https://lkml.kernel.org/r/20200523040947.31946-1-jonathan@marek.ca
Reviewed-by: Bjorn Andersson <bjorn.andersson@linaro.org>
Fixes: 3e5770921a88 ("clk: qcom: gcc: Add global clock controller driver for SM8250")
Signed-off-by: Stephen Boyd <sboyd@kernel.org>
|
|
For rx filter 'HWTSTAMP_FILTER_PTP_V2_EVENT', it should be
PTP v2/802.AS1, any layer, any kind of event packet, but HW only
take timestamp snapshot for below PTP message: sync, Pdelay_req,
Pdelay_resp.
Then it causes below issue when test E2E case:
ptp4l[2479.534]: port 1: received DELAY_REQ without timestamp
ptp4l[2481.423]: port 1: received DELAY_REQ without timestamp
ptp4l[2481.758]: port 1: received DELAY_REQ without timestamp
ptp4l[2483.524]: port 1: received DELAY_REQ without timestamp
ptp4l[2484.233]: port 1: received DELAY_REQ without timestamp
ptp4l[2485.750]: port 1: received DELAY_REQ without timestamp
ptp4l[2486.888]: port 1: received DELAY_REQ without timestamp
ptp4l[2487.265]: port 1: received DELAY_REQ without timestamp
ptp4l[2487.316]: port 1: received DELAY_REQ without timestamp
Timestamp snapshot dependency on register bits in received path:
SNAPTYPSEL TSMSTRENA TSEVNTENA PTP_Messages
01 x 0 SYNC, Follow_Up, Delay_Req,
Delay_Resp, Pdelay_Req, Pdelay_Resp,
Pdelay_Resp_Follow_Up
01 0 1 SYNC, Pdelay_Req, Pdelay_Resp
For dwmac v5.10a, enabling all events by setting register
DWC_EQOS_TIME_STAMPING[SNAPTYPSEL] to 2’b01, clearing bit [TSEVNTENA]
to 0’b0, which can support all required events.
Signed-off-by: Fugang Duan <fugang.duan@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
David Ahern says:
====================
nexthops: Fix 2 fundamental flaws with nexthop groups
Nik's torture tests have exposed 2 fundamental mistakes with the initial
nexthop code for groups. First, the nexthops entries and num_nh in the
nh_grp struct should not be modified once the struct is set under rcu.
Doing so has major affects on the datapath seeing valid nexthop entries.
Second, the helpers in the header file were convenient for not repeating
code, but they cause datapath walks to potentially see 2 different group
structs after an rcu replace, disrupting a walk of the path objects.
This second problem applies solely to IPv4 as I re-used too much of the
existing code in walking legs of a multipath route.
Patches 1 is refactoring change to simplify the overhead of reviewing and
understanding the change in patch 2 which fixes the update of nexthop
groups when a compnent leg is removed.
Patches 3-5 address the second problem. Patch 3 inlines the multipath
check such that the mpath lookup and subsequent calls all use the same
nh_grp struct. Patches 4 and 5 fix datapath uses of fib_info_num_path
with iterative calls to fib_info_nhc.
fib_info_num_path can be used in control plane path in a 'for loop' with
subsequent fib_info_nhc calls to get each leg since the nh_grp struct is
only changed while holding the rtnl; the combination can not be used in
the data plane with external nexthops as it involves repeated dereferences
of nh_grp struct which can change between calls.
Similarly, nexthop_is_multipath can be used for branching decisions in
the datapath since the nexthop type can not be changed (a group can not
be converted to standalone and vice versa).
Patch set developed in coordination with Nikolay Aleksandrov. He did a
lot of work creating a good reproducer, discussing options to fix it
and testing iterations.
I have adapted Nik's commands into additional tests in the nexthops
selftest script which I will send against -next.
v2
- fixed whitespace errors
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Similar to the last path, need to fix fib_info_nh_uses_dev for
external nexthops to avoid referencing multiple nh_grp structs.
Move the device check in fib_info_nh_uses_dev to a helper and
create a nexthop version that is called if the fib_info uses an
external nexthop.
Fixes: 430a049190de ("nexthop: Add support for nexthop groups")
Signed-off-by: David Ahern <dsahern@gmail.com>
Acked-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
FIB lookups can return an entry that references an external nexthop.
While walking the nexthop struct we do not want to make multiple calls
into the nexthop code which can result in 2 different structs getting
accessed - one returning the number of paths the rest of the loop
seeing a different nh_grp struct. If the nexthop group shrunk, the
result is an attempt to access a fib_nh_common that does not exist for
the new nh_grp struct but did for the old one.
To fix that move the device evaluation code to a helper that can be
used for inline fib_nh path as well as external nexthops.
Update the existing check for fi->nh in fib_table_lookup to call a
new helper, nexthop_get_nhc_lookup, which walks the external nexthop
with a single rcu dereference.
Fixes: 430a049190de ("nexthop: Add support for nexthop groups")
Signed-off-by: David Ahern <dsahern@gmail.com>
Acked-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
I got too fancy consolidating checks on multipath type. The result
is that path lookups can access 2 different nh_grp structs as exposed
by Nik's torture tests. Expand nexthop_is_multipath within nexthop.h to
avoid multiple, nh_grp dereferences and make decisions based on the
consistent struct.
Only 2 places left using nexthop_is_multipath are within IPv6, both
only check that the nexthop is a multipath for a branching decision
which are acceptable.
Fixes: 430a049190de ("nexthop: Add support for nexthop groups")
Signed-off-by: David Ahern <dsahern@gmail.com>
Acked-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
We must avoid modifying published nexthop groups while they might be
in use, otherwise we might see NULL ptr dereferences. In order to do
that we allocate 2 nexthoup group structures upon nexthop creation
and swap between them when we have to delete an entry. The reason is
that we can't fail nexthop group removal, so we can't handle allocation
failure thus we move the extra allocation on creation where we can
safely fail and return ENOMEM.
Fixes: 430a049190de ("nexthop: Add support for nexthop groups")
Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Move nh_grp dereference and check for removing nexthop group due to
all members gone into remove_nh_grp_entry.
Fixes: 430a049190de ("nexthop: Add support for nexthop groups")
Signed-off-by: David Ahern <dsahern@gmail.com>
Acked-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Antoine Tenart says:
====================
net: phy: mscc-miim: reduce waiting time between MDIO transactions
This series aims at reducing the waiting time between MDIO transactions
when using the MSCC MIIM MDIO controller.
I'm not sure we need patch 4/4 and we could reasonably drop it from the
series. I'm including the patch as it could help to ensure the system
is functional with a non optimal configuration.
We needed to improve the driver's performances as when using a PHY
requiring lots of registers accesses (such as the VSC85xx family),
delays would add up and ended up to be quite large which would cause
issues such as: a slow initialization of the PHY, and issues when using
timestamping operations (this feature will be sent quite soon to the
mailing lists).
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The driver uses a read polling mechanism to check the status of the MDIO
bus, to know if it is ready to accept next commands. This polling
mechanism uses usleep_delay() under the hood between reads which is fine
as long as high resolution timers are enabled. Otherwise the delays will
end up to be much longer than expected.
This patch fixes this by using udelay() under the hood when
CONFIG_HIGH_RES_TIMERS isn't enabled. This increases CPU usage.
Signed-off-by: Antoine Tenart <antoine.tenart@bootlin.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The MSCC MIIM MDIO driver uses a waiting logic to wait for the MDIO bus
to be ready to accept next commands. It does so by polling the BUSY
status bit which indicates the MDIO bus has completed all pending
operations. This can take time, and the controller supports writing the
next command as soon as there are no pending commands (which happens
while the MDIO bus is busy completing its current command).
This patch implements this improved logic by adding an helper to poll
the PENDING status bit, and by adjusting where we should wait for the
bus to not be busy or to not be pending.
Signed-off-by: Antoine Tenart <antoine.tenart@bootlin.com>
Reviewed-by: Alexandre Belloni <alexandre.belloni@bootlin.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
readl_poll_timeout already returns -ETIMEDOUT if the condition isn't
satisfied, there's no need to check again the condition after calling
it. Remove the redundant timeout check.
Signed-off-by: Antoine Tenart <antoine.tenart@bootlin.com>
Reviewed-by: Alexandre Belloni <alexandre.belloni@bootlin.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The MSCC MIIM MDIO driver uses delays to read poll a status register. I
made multiple tests on a Ocelot PCS120 platform which led me to reduce
those delays. The delay in between which the polling function is allowed
to sleep is reduced from 100us to 50us which in almost all cases is a
good value to succeed at the first retry. The overall delay is also
lowered as the prior value was really way to high, 10000us is large
enough.
Signed-off-by: Antoine Tenart <antoine.tenart@bootlin.com>
Reviewed-by: Alexandre Belloni <alexandre.belloni@bootlin.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
There is a recurring pattern throughout some of the PHY code converting
a devad and regnum to our packed clause 45 representation. Rather than
having this scattered around the code, let's put a common translation
function in mdio.h, and provide some register accessors.
Convert the phylib core, phylink, bcm87xx and cortina to use these.
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Guillaume Nault says:
====================
flow_dissector, cls_flower: Add support for multiple MPLS Label Stack Entries
Currently, the flow dissector and the Flower classifier can only handle
the first entry of an MPLS label stack. This patch series generalises
the code to allow parsing and matching the Label Stack Entries that
follow.
Patch 1 extends the flow dissector to parse MPLS LSEs until the Bottom
Of Stack bit is reached. The number of parsed LSEs is capped at
FLOW_DIS_MPLS_MAX (arbitrarily set to 7). Flower and the NFP driver
are updated to take into account the new layout of struct
flow_dissector_key_mpls.
Patch 2 extends Flower. It defines new netlink attributes, which are
independent from the previous MPLS ones. Mixing the old and the new
attributes in a same filter is not allowed. For backward compatibility,
the old attributes are used when dumping filters that don't require the
new ones.
Changes since v2:
* Fix compilation with the new MLX5 bareudp tunnel code.
Changes since v1:
* Fix compilation of NFP driver (kbuild test robot).
* Fix sparse warning with entropy label (kbuild test robot).
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
With struct flow_dissector_key_mpls now recording the first
FLOW_DIS_MPLS_MAX labels, we can extend Flower to filter on any of
these LSEs independently.
In order to avoid creating new netlink attributes for every possible
depth, let's define a new TCA_FLOWER_KEY_MPLS_OPTS nested attribute
that contains the list of LSEs to match. Each LSE is represented by
another attribute, TCA_FLOWER_KEY_MPLS_OPTS_LSE, which then contains
the attributes representing the depth and the MPLS fields to match at
this depth (label, TTL, etc.).
For each MPLS field, the mask is always set to all-ones, as this is
what the original API did. We could allow user configurable masks in
the future if there is demand for more flexibility.
The new API also allows to only specify an LSE depth. In that case,
Flower only verifies that the MPLS label stack depth is greater or
equal to the provided depth (that is, an LSE exists at this depth).
Filters that only match on one (or more) fields of the first LSE are
dumped using the old netlink attributes, to avoid confusing user space
programs that don't understand the new API.
Signed-off-by: Guillaume Nault <gnault@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The current MPLS dissector only parses the first MPLS Label Stack
Entry (second LSE can be parsed too, but only to set a key_id).
This patch adds the possibility to parse several LSEs by making
__skb_flow_dissect_mpls() return FLOW_DISSECT_RET_PROTO_AGAIN as long
as the Bottom Of Stack bit hasn't been seen, up to a maximum of
FLOW_DIS_MPLS_MAX entries.
FLOW_DIS_MPLS_MAX is arbitrarily set to 7. This should be enough for
many practical purposes, without wasting too much space.
To record the parsed values, flow_dissector_key_mpls is modified to
store an array of stack entries, instead of just the values of the
first one. A bit field, "used_lses", is also added to keep track of
the LSEs that have been set. The objective is to avoid defining a
new FLOW_DISSECTOR_KEY_MPLS_XX for each level of the MPLS stack.
TC flower is adapted for the new struct flow_dissector_key_mpls layout.
Matching on several MPLS Label Stack Entries will be added in the next
patch.
The NFP and MLX5 drivers are also adapted: nfp_flower_compile_mac() and
mlx5's parse_tunnel() now verify that the rule only uses the first LSE
and fail if it doesn't.
Finally, the behaviour of the FLOW_DISSECTOR_KEY_MPLS_ENTROPY key is
slightly modified. Instead of recording the first Entropy Label, it
now records the last one. This shouldn't have any consequences since
there doesn't seem to have any user of FLOW_DISSECTOR_KEY_MPLS_ENTROPY
in the tree. We'd probably better do a hash of all parsed MPLS labels
instead (excluding reserved labels) anyway. That'd give better entropy
and would probably also simplify the code. But that's not the purpose
of this patch, so I'm keeping that as a future possible improvement.
Signed-off-by: Guillaume Nault <gnault@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
git://git.open-mesh.org/linux-merge
Simon Wunderlich says:
====================
This cleanup patchset includes the following patches:
- Fix revert dynamic lockdep key changes for batman-adv,
by Sven Eckelmann
- use rcu_replace_pointer() where appropriate, by Antonio Quartulli
- Revert "disable ethtool link speed detection when auto negotiation
off", by Sven Eckelmann
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Tuong Lien says:
====================
tipc: add some improvements
This series adds some improvements to TIPC.
The first patch improves the TIPC broadcast's performance with the 'Gap
ACK blocks' mechanism similar to unicast before, while the others give
support on tracing & statistics for broadcast links, and an alternative
to carry broadcast retransmissions via unicast which might be useful in
some cases.
Besides, the Nagle algorithm can now automatically 'adjust' itself
depending on the specific network condition a stream connection runs by
the last patch.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
When streaming in Nagle mode, we try to bundle small messages from user
as many as possible if there is one outstanding buffer, i.e. not ACK-ed
by the receiving side, which helps boost up the overall throughput. So,
the algorithm's effectiveness really depends on when Nagle ACK comes or
what the specific network latency (RTT) is, compared to the user's
message sending rate.
In a bad case, the user's sending rate is low or the network latency is
small, there will not be many bundles, so making a Nagle ACK or waiting
for it is not meaningful.
For example: a user sends its messages every 100ms and the RTT is 50ms,
then for each messages, we require one Nagle ACK but then there is only
one user message sent without any bundles.
In a better case, even if we have a few bundles (e.g. the RTT = 300ms),
but now the user sends messages in medium size, then there will not be
any difference at all, that says 3 x 1000-byte data messages if bundled
will still result in 3 bundles with MTU = 1500.
When Nagle is ineffective, the delay in user message sending is clearly
wasted instead of sending directly.
Besides, adding Nagle ACKs will consume some processor load on both the
sending and receiving sides.
This commit adds a test on the effectiveness of the Nagle algorithm for
an individual connection in the network on which it actually runs.
Particularly, upon receipt of a Nagle ACK we will compare the number of
bundles in the backlog queue to the number of user messages which would
be sent directly without Nagle. If the ratio is good (e.g. >= 2), Nagle
mode will be kept for further message sending. Otherwise, we will leave
Nagle and put a 'penalty' on the connection, so it will have to spend
more 'one-way' messages before being able to re-enter Nagle.
In addition, the 'ack-required' bit is only set when really needed that
the number of Nagle ACKs will be reduced during Nagle mode.
Testing with benchmark showed that with the patch, there was not much
difference in throughput for small messages since the tool continuously
sends messages without a break, so Nagle would still take in effect.
Acked-by: Ying Xue <ying.xue@windriver.com>
Acked-by: Jon Maloy <jmaloy@redhat.com>
Signed-off-by: Tuong Lien <tuong.t.lien@dektech.com.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
This commit enables dumping the statistics of a broadcast-receiver link
like the traditional 'broadcast-link' one (which is for broadcast-
sender). The link dumping can be triggered via netlink (e.g. the
iproute2/tipc tool) by the link flag - 'TIPC_NLA_LINK_BROADCAST' as the
indicator.
The name of a broadcast-receiver link of a specific peer will be in the
format: 'broadcast-link:<peer-id>'.
For example:
Link <broadcast-link:1001002>
Window:50 packets
RX packets:7841 fragments:2408/440 bundles:0/0
TX packets:0 fragments:0/0 bundles:0/0
RX naks:0 defs:124 dups:0
TX naks:21 acks:0 retrans:0
Congestion link:0 Send queue max:0 avg:0
In addition, the broadcast-receiver link statistics can be reset in the
usual way via netlink by specifying that link name in command.
Note: the 'tipc_link_name_ext()' is removed because the link name can
now be retrieved simply via the 'l->name'.
Acked-by: Ying Xue <ying.xue@windriver.com>
Acked-by: Jon Maloy <jmaloy@redhat.com>
Signed-off-by: Tuong Lien <tuong.t.lien@dektech.com.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
In some environment, broadcast traffic is suppressed at high rate (i.e.
a kind of bandwidth limit setting). When it is applied, TIPC broadcast
can still run successfully. However, when it comes to a high load, some
packets will be dropped first and TIPC tries to retransmit them but the
packet retransmission is intentionally broadcast too, so making things
worse and not helpful at all.
This commit enables the broadcast retransmission via unicast which only
retransmits packets to the specific peer that has really reported a gap
i.e. not broadcasting to all nodes in the cluster, so will prevent from
being suppressed, and also reduce some overheads on the other peers due
to duplicates, finally improve the overall TIPC broadcast performance.
Note: the functionality can be turned on/off via the sysctl file:
echo 1 > /proc/sys/net/tipc/bc_retruni
echo 0 > /proc/sys/net/tipc/bc_retruni
Default is '0', i.e. the broadcast retransmission still works as usual.
Acked-by: Ying Xue <ying.xue@windriver.com>
Acked-by: Jon Maloy <jmaloy@redhat.com>
Signed-off-by: Tuong Lien <tuong.t.lien@dektech.com.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
In the previous commit ("tipc: add Gap ACK blocks support for broadcast
link"), we have removed the following link trace events due to the code
changes:
- tipc_link_bc_ack
- tipc_link_retrans
This commit adds them back along with some minor changes to adapt to
the new code.
Acked-by: Ying Xue <ying.xue@windriver.com>
Acked-by: Jon Maloy <jmaloy@redhat.com>
Signed-off-by: Tuong Lien <tuong.t.lien@dektech.com.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
As achieved through commit 9195948fbf34 ("tipc: improve TIPC throughput
by Gap ACK blocks"), we apply the same mechanism for the broadcast link
as well. The 'Gap ACK blocks' data field in a 'PROTOCOL/STATE_MSG' will
consist of two parts built for both the broadcast and unicast types:
31 16 15 0
+-------------+-------------+-------------+-------------+
| bgack_cnt | ugack_cnt | len |
+-------------+-------------+-------------+-------------+ -
| gap | ack | |
+-------------+-------------+-------------+-------------+ > bc gacks
: : : |
+-------------+-------------+-------------+-------------+ -
| gap | ack | |
+-------------+-------------+-------------+-------------+ > uc gacks
: : : |
+-------------+-------------+-------------+-------------+ -
which is "automatically" backward-compatible.
We also increase the max number of Gap ACK blocks to 128, allowing upto
64 blocks per type (total buffer size = 516 bytes).
Besides, the 'tipc_link_advance_transmq()' function is refactored which
is applicable for both the unicast and broadcast cases now, so some old
functions can be removed and the code is optimized.
With the patch, TIPC broadcast is more robust regardless of packet loss
or disorder, latency, ... in the underlying network. Its performance is
boost up significantly.
For example, experiment with a 5% packet loss rate results:
$ time tipc-pipe --mc --rdm --data_size 123 --data_num 1500000
real 0m 42.46s
user 0m 1.16s
sys 0m 17.67s
Without the patch:
$ time tipc-pipe --mc --rdm --data_size 123 --data_num 1500000
real 8m 27.94s
user 0m 0.55s
sys 0m 2.38s
Acked-by: Ying Xue <ying.xue@windriver.com>
Acked-by: Jon Maloy <jmaloy@redhat.com>
Signed-off-by: Tuong Lien <tuong.t.lien@dektech.com.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
In older FW versions the completion flag was treated as the ack flag in
edpm messages. Expose the FW option of setting which mode the QP is in
by adding a flag to the qedr <-> qed API.
Flag is added for backward compatibility with libqedr.
This flag will be set by qedr after determining whether the libqedr is
using the updated version.
Fixes: f10939403352 ("qed: Add support for QP verbs")
Signed-off-by: Yuval Basson <yuval.bason@marvell.com>
Signed-off-by: Michal Kalderon <michal.kalderon@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
I missed the fact that tcp_v4_err() differs from tcp_v6_err().
After commit 4d1a2d9ec1c1 ("Rename skb to icmp_skb in tcp_v4_err()")
the skb argument has been renamed to icmp_skb only in one function.
I will in a future patch reconciliate these functions to avoid
this kind of confusion.
Fixes: 45af29ca761c ("tcp: allow traceroute -Mtcp for unpriv users")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
An invariant of cap_bprm_set_creds is that every field in the new cred
structure that cap_bprm_set_creds might set, needs to be set every
time to ensure the fields does not get a stale value.
The field cap_ambient is not set every time cap_bprm_set_creds is
called, which means that if there is a suid or sgid script with an
interpreter that has neither the suid nor the sgid bits set the
interpreter should be able to accept ambient credentials.
Unfortuantely because cap_ambient is not reset to it's original value
the interpreter can not accept ambient credentials.
Given that the ambient capability set is expected to be controlled by
the caller, I don't think this is particularly serious. But it is
definitely worth fixing so the code works correctly.
I have tested to verify my reading of the code is correct and the
interpreter of a sgid can receive ambient capabilities with this
change and cannot receive ambient capabilities without this change.
Cc: stable@vger.kernel.org
Cc: Andy Lutomirski <luto@kernel.org>
Fixes: 58319057b784 ("capabilities: ambient capabilities")
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
|
|
Stefano reported a crash with using SQPOLL with io_uring:
BUG: kernel NULL pointer dereference, address: 00000000000003b0
CPU: 2 PID: 1307 Comm: io_uring-sq Not tainted 5.7.0-rc7 #11
RIP: 0010:task_numa_work+0x4f/0x2c0
Call Trace:
task_work_run+0x68/0xa0
io_sq_thread+0x252/0x3d0
kthread+0xf9/0x130
ret_from_fork+0x35/0x40
which is task_numa_work() oopsing on current->mm being NULL.
The task work is queued by task_tick_numa(), which checks if current->mm is
NULL at the time of the call. But this state isn't necessarily persistent,
if the kthread is using use_mm() to temporarily adopt the mm of a task.
Change the task_tick_numa() check to exclude kernel threads in general,
as it doesn't make sense to attempt ot balance for kthreads anyway.
Reported-by: Stefano Garzarella <sgarzare@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/865de121-8190-5d30-ece5-3b097dc74431@kernel.dk
|
|
Revert
45e29d119e99 ("x86/syscalls: Make __X32_SYSCALL_BIT be unsigned long")
and add a comment to discourage someone else from making the same
mistake again.
It turns out that some user code fails to compile if __X32_SYSCALL_BIT
is unsigned long. See, for example [1] below.
[ bp: Massage and do the same thing in the respective tools/ header. ]
Fixes: 45e29d119e99 ("x86/syscalls: Make __X32_SYSCALL_BIT be unsigned long")
Reported-by: Thorsten Glaser <t.glaser@tarent.de>
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: stable@kernel.org
Link: [1] https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=954294
Link: https://lkml.kernel.org/r/92e55442b744a5951fdc9cfee10badd0a5f7f828.1588983892.git.luto@kernel.org
|
|
Commit 702f09805222 ("powerpc/64s/exception: Remove lite interrupt
return") changed the interrupt return path to not restore non-volatile
registers by default, and explicitly restore them in paths where it is
required.
But it missed that the facility unavailable exception can sometimes
modify user registers, ie. when it does emulation of move from DSCR.
This is seen as a failure of the dscr_sysfs_thread_test:
test: dscr_sysfs_thread_test
[cpu 0] User DSCR should be 1 but is 0
failure: dscr_sysfs_thread_test
So restore non-volatile GPRs after facility unavailable exceptions.
Currently the hypervisor facility unavailable exception is also wired
up to call facility_unavailable_exception().
In practice we should never take a hypervisor facility unavailable
exception for the DSCR. On older bare metal systems we set HFSCR_DSCR
unconditionally in __init_HFSCR, or on newer systems it should be
enabled via the "data-stream-control-register" device tree CPU
feature.
Even if it's not, since commit f3c99f97a3cd ("KVM: PPC: Book3S HV:
Don't access HFSCR, LPIDR or LPCR when running nested"), the KVM code
has unconditionally set HFSCR_DSCR when running guests.
So we should only get a hypervisor facility unavailable for the DSCR
if skiboot has disabled the "data-stream-control-register" feature,
and we are somehow in guest context but not via KVM.
Given all that, it should be unnecessary to add a restore of
non-volatile GPRs after the hypervisor facility exception, because we
never expect to hit that path. But equally we may as well add the
restore, because we never expect to hit that path, and if we ever did,
at least we would correctly restore the registers to their post
emulation state.
In future we can split the non-HV and HV facility unavailable handling
so that there is no emulation in the HV handler, and then remove the
restore for the HV case.
Fixes: 702f09805222 ("powerpc/64s/exception: Remove lite interrupt return")
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200526061808.2472279-1-mpe@ellerman.id.au
|
|
negotiation off"
The commit 8c46fcd78308 ("batman-adv: disable ethtool link speed detection
when auto negotiation off") disabled the usage of ethtool's link_ksetting
when auto negotation was enabled due to invalid values when used with
tun/tap virtual net_devices. According to the patch, automatic measurements
should be used for these kind of interfaces.
But there are major flaws with this argumentation:
* automatic measurements are not implemented
* auto negotiation has nothing to do with the validity of the retrieved
values
The first point has to be fixed by a longer patch series. The "validity"
part of the second point must be addressed in the same patch series by
dropping the usage of ethtool's link_ksetting (thus always doing automatic
measurements over ethernet).
Drop the patch again to have more default values for various net_device
types/configurations. The user can still overwrite them using the
batadv_hardif's BATADV_ATTR_THROUGHPUT_OVERRIDE.
Reported-by: Matthias Schiffer <mschiffer@universe-factory.net>
Signed-off-by: Sven Eckelmann <sven@narfation.org>
Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de>
|
|
The Asus USB DAC is a USB type-C audio dongle for connecting to
the headset and headphone. The volume minimum value -23040 which
is 0xa600 in hexadecimal with the resolution value 1 indicates
this should be endianness issue caused by the firmware bug. Add
a volume quirk to fix the volume control problem.
Also fixes this warning:
Warning! Unlikely big volume range (=23040), cval->res is probably wrong.
[5] FU [Headset Capture Volume] ch = 1, val = -23040/0/1
Warning! Unlikely big volume range (=23040), cval->res is probably wrong.
[7] FU [Headset Playback Volume] ch = 1, val = -23040/0/1
Signed-off-by: Chris Chiu <chiu@endlessm.com>
Cc: <stable@vger.kernel.org>
Link: https://lore.kernel.org/r/20200526062613.55401-1-chiu@endlessm.com
Signed-off-by: Takashi Iwai <tiwai@suse.de>
|