Age | Commit message (Collapse) | Author |
|
The neighbor monitor employs a threshold, default set to 32 peer nodes,
where it activates the "Overlapping Neighbor Monitoring" algorithm.
Below that threshold, monitoring is full-mesh, and no "domain records"
are passed between the nodes.
Because of this, a node never received a peer's ack that it has received
the most recent update of the own domain. Hence, the field 'acked_gen'
in struct tipc_monitor_state remains permamently at zero, whereas the
own domain generation is incremented for each added or removed peer.
This has the effect that the function tipc_mon_get_state() always sets
the field 'probing' in struct tipc_monitor_state true, again leading the
tipc_link_timeout() of the link in question to always send out a probe,
even when link->silent_intv_count is zero.
This is functionally harmless, but leads to some unncessary probing,
which can easily be eliminated by setting the 'probing' field of the
said struct correctly in such cases.
At the same time, we explictly invalidate the sent domain records when
the algorithm is not activated. This will eliminate any risk that an
invalid domain record might be inadverently accepted by the peer.
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Currently, the offload unbind is done before the chains are flushed.
That causes driver to unregister block callback before it can get all
the callback calls done during flush, leaving the offloaded tps inside
the HW. So fix the order to prevent this situation and restore the
original behaviour.
Reported-by: Alexander Duyck <alexander.duyck@gmail.com>
Reported-by: Jakub Kicinski <kubakici@wp.pl>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Add support to new ethtool operation get_fecparam to
fetch FEC parameters.
Original Work by: Casey Leedom <leedom@chelsio.com>
Signed-off-by: Ganesh Goudar <ganeshgr@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Add 0x6086 T6 device id.
Signed-off-by: Ganesh Goudar <ganeshgr@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Fixes the following sparse warnings:
net/ncsi/ncsi-manage.c:41:5: warning:
symbol 'ncsi_get_filter' was not declared. Should it be static?
Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
switch to using the new timer_setup() and from_timer() api's.
Signed-off-by: Allen Pais <allen.pais@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
switch to using the new timer_setup() and from_timer() api's.
Signed-off-by: Allen Pais <allen.pais@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Jiri Pirko says:
====================
mlxsw: Align multipath hash parameters with kernel's
Ido says:
This set makes sure the device is using the same parameters as the
kernel when it computes the multipath hash during IP forwarding.
First patch adds a new netevent to let interested listeners know that
the multipath hash policy has changed.
Next two patches do small and non-functional changes in the mlxsw
driver.
Last patches configure the multipath hash policy upon driver
initialization and as a response to netevents.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Make sure the device and the kernel are performing the multipath hash
according to the same parameters by updating the device whenever the
relevant netevent is generated.
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Up until now we used the hardware's defaults for multipath hash
computation. This patch aligns the hardware's multipath parameters with
the kernel's.
For IPv4 packets, the parameters are determined according to the
'fib_multipath_hash_policy' sysctl during module initialization. In case
L3-mode is requested, only the source and destination IP addresses are
used. There is no special handling of ICMP error packets.
In case L4-mode is requested, a 5-tuple is used: source and destination
IP addresses, source and destination ports and IP protocol. Note that
the layer 4 fields are not considered for fragmented packets.
For IPv6 packets, the source and destination IP addresses are used, as
well as the flow label and the next header fields.
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The RECRv2 register is used for setting up the router's ECMP hash
configuration.
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The struct containing the work item queued from the netevent handler is
named after the only event it is currently used for, which is neighbour
updates.
Use a more appropriate name for the struct, as we are going to use it
for more events.
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
We are going to need to respond to netevents notifying us about
multipath hash updates by configuring the device's hash parameters.
Embed the netevent notifier in the router struct so that we could
retrieve it upon notifications and use it to configure the device.
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Devices performing IPv4 forwarding need to update their multipath hash
policy whenever it is changed.
Inform these devices by generating a netevent.
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Acked-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Add my name to the list.
Signed-off-by: Tim Bird <tim.bird@sony.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
Due to a documentation mistake, the IPG length was set to 0x12 while it
should have been 12 (decimal). This would affect short packet (64B
typically) performance since the IPG was bigger than necessary.
Fixes: 44a4524c54af ("net: systemport: Add support for SYSTEMPORT Lite")
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Christoph Paasch sent a patch to address the following issue :
tcp_make_synack() is leaving some TCP private info in skb->cb[],
then send the packet by other means than tcp_transmit_skb()
tcp_transmit_skb() makes sure to clear skb->cb[] to not confuse
IPv4/IPV6 stacks, but we have no such cleanup for SYNACK.
tcp_make_synack() should not use tcp_init_nondata_skb() :
tcp_init_nondata_skb() really should be limited to skbs put in write/rtx
queues (the ones that are only sent via tcp_transmit_skb())
This patch fixes the issue and should even save few cpu cycles ;)
Fixes: 971f10eca186 ("tcp: better TCP_SKB_CB layout to reduce cache line misses")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Christoph Paasch <cpaasch@apple.com>
Reviewed-by: Christoph Paasch <cpaasch@apple.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
syzbot reported yet another regression added with DOIT_UNLOCKED.
When nexthop is marked as dead, fib_dump_info uses __in_dev_get_rtnl():
./include/linux/inetdevice.h:230 suspicious rcu_dereference_protected() usage!
rcu_scheduler_active = 2, debug_locks = 1
1 lock held by syz-executor2/23859:
#0: (rcu_read_lock){....}, at: [<ffffffff840283f0>]
inet_rtm_getroute+0xaa0/0x2d70 net/ipv4/route.c:2738
[..]
lockdep_rcu_suspicious+0x123/0x170 kernel/locking/lockdep.c:4665
__in_dev_get_rtnl include/linux/inetdevice.h:230 [inline]
fib_dump_info+0x1136/0x13d0 net/ipv4/fib_semantics.c:1377
inet_rtm_getroute+0xf97/0x2d70 net/ipv4/route.c:2785
..
This isn't safe anymore, callers either hold RTNL mutex or rcu read lock,
so these spots must use rcu_dereference_rtnl() or plain rcu_derefence()
(plus unconditional rcu read lock).
This does the latter.
Fixes: 394f51abb3d04f ("ipv4: route: set ipv4 RTM_GETROUTE to not use rtnl")
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Fix to return a negative error code from thecxgb4_alloc_atid()
error handling case instead of 0.
Fixes: 12b276fbf6e0 ("cxgb4: add support to create hash filters")
Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com>
Acked-By: Kumar Sanghvi <kumaras@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The bpf_verifer_ops array is generated dynamically and may be
empty depending on configuration, which then causes an out
of bounds access:
kernel/bpf/verifier.c: In function 'bpf_check':
kernel/bpf/verifier.c:4320:29: error: array subscript is above array bounds [-Werror=array-bounds]
This adds a check to the start of the function as a workaround.
I would assume that the function is never called in that configuration,
so the warning is probably harmless.
Fixes: 00176a34d9e2 ("bpf: remove the verifier ops from program structure")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
I ran into this link error with the latest net-next plus linux-next
trees when networking is disabled:
kernel/bpf/verifier.o:(.rodata+0x2958): undefined reference to `tc_cls_act_analyzer_ops'
kernel/bpf/verifier.o:(.rodata+0x2970): undefined reference to `xdp_analyzer_ops'
It seems that the code was written to deal with varying contents of
the arrray, but the actual #ifdef was missing. Both tc_cls_act_analyzer_ops
and xdp_analyzer_ops are defined in the core networking code, so adding
a check for CONFIG_NET seems appropriate here, and I've verified this with
many randconfig builds
Fixes: 4f9218aaf8a4 ("bpf: move knowledge about post-translation offsets out of verifier")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The lan9303 driver defines eth_stp_addr as a synonym to
eth_reserved_addr_base to get the STP ethernet address 01:80:c2:00:00:00.
eth_reserved_addr_base is also used to define the start of Bridge Reserved
ethernet address range, which happen to be the STP address.
br_dev_setup refer to eth_reserved_addr_base as a definition of STP
address.
Clean up by:
- Move the eth_stp_addr definition to linux/etherdevice.h
- Use eth_stp_addr instead of eth_reserved_addr_base in br_dev_setup.
Signed-off-by: Egil Hjelmeland <privat@egil-hjelmeland.no>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Numbers in DT are stored in “cells” which are 32-bits
in size. of_property_read_u8 does not work properly
because of endianness problem.
This causes it to always return 0 with little-endian
architectures.
Fix it by using of_property_read_u32() OF API.
Signed-off-by: Bhadram Varka <vbhadram@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Signed-off-by: Felix Manlunas <felix.manlunas@cavium.com>
Acked-by: Derek Chickles <derek.chickles@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
OpenRISC borrows its timer sync logic from MIPS, Matt helped to review
the OpenRISC implementation and noted that we may suffer the same
deadlock case that MIPS has faced. The case being:
"the MIPS timer synchronization code contained the possibility of
deadlock. If you mark a CPU online before it goes into the synchronize
loop, then the boot CPU can schedule a different thread and send IPIs to
all "online" CPUs. It gets stuck waiting for the secondary to ack it's
IPI, since this secondary CPU has not enabled IRQs yet, and is stuck
waiting for the master to synchronise with it. The system then
deadlocks."
Fix this by moving set_cpu_online() to after timer sync.
Reported-by: Matt Redfearn <matt.redfearn@mips.com>
Signed-off-by: Stafford Horne <shorne@gmail.com>
|
|
openrisc is big-endian only but sparse assumes the same endianness
as the building machine.
This is problematic for code which expect __BYTE_ORDER__ being
correctly predefined by the compiler which sparse can then
pre-process differently from what gcc would, depending on the
building machine endianness.
Fix this by letting sparse know about the architecture endianness.
To: Jonas Bonn <jonas@southpole.se>
To: Stefan Kristiansson <stefan.kristiansson@saunalahti.fi>
To: Stafford Horne <shorne@gmail.com>
Signed-off-by: Luc Van Oostenryck <luc.vanoostenryck@gmail.com>
Signed-off-by: Stafford Horne <shorne@gmail.com>
|
|
In case timers are not in sync when cpus start (i.e. hot plug / offset
resets) we need to synchronize the secondary cpus internal timer with
the main cpu. This is needed as in OpenRISC SMP there is only one
clocksource registered which reads from the same ttcr register on each
cpu.
This synchronization routine heavily borrows from mips implementation that
does something similar.
Signed-off-by: Stafford Horne <shorne@gmail.com>
|
|
Lockdep is needed for proving the spinlocks and rwlocks work fine on our
platform. It also requires calling the trace_hardirqs_off() and
trace_hardirqs_on() pair of routines when entering and exiting an
interrupt.
For OpenRISC the interrupt stack frame does not support frame pointers,
so to call trace_hardirqs_on() and trace_hardirqs_off() here the macro's
build up a stack frame each time.
There is one necessary small change in _sys_call_handler to move
interrupt enabling later so they can get re-enabled during syscall
restart. This was done to fix lockdep warnings that are now possible due
to this
patch.
Signed-off-by: Stafford Horne <shorne@gmail.com>
|
|
For lockdep support a reliable stack trace mechanism is needed. This
patch adds support in OpenRISC for the stacktrace framework, implemented
by a simple unwinder api. The unwinder api supports both framepointer
and basic stack tracing.
The unwinder is now used to replace the stack_dump() implementation as
well. The new traces are inline with other architectures trace format:
Call trace:
[<c0004448>] show_stack+0x3c/0x58
[<c031c940>] dump_stack+0xa8/0xe4
[<c0008104>] __cpu_up+0x64/0x130
[<c000d268>] bringup_cpu+0x3c/0x178
[<c000d038>] cpuhp_invoke_callback+0xa8/0x1fc
[<c000d680>] cpuhp_up_callbacks+0x44/0x14c
[<c000e400>] cpu_up+0x14c/0x1bc
[<c041da60>] smp_init+0x104/0x15c
[<c033843c>] ? kernel_init+0x0/0x140
[<c0415e04>] kernel_init_freeable+0xbc/0x25c
[<c033843c>] ? kernel_init+0x0/0x140
[<c0338458>] kernel_init+0x1c/0x140
[<c003a174>] ? schedule_tail+0x18/0xa0
[<c0006b80>] ret_from_fork+0x1c/0x9c
Signed-off-by: Stafford Horne <shorne@gmail.com>
|
|
Simple enough to be compatible with simulation environments,
such as verilated systems, QEMU and other targets supporting OpenRISC
SMP. This also supports our base FPGA SoC's if the cpu frequency is
upped to 50Mhz.
Signed-off-by: Stefan Kristiansson <stefan.kristiansson@saunalahti.fi>
[shorne@gmail.com: Added defconfig]
Signed-off-by: Stafford Horne <shorne@gmail.com>
|
|
On OpenRISC the icache does not snoop data stores. This can cause
aliasing as reported by Jan. This patch fixes the issue to ensure icache
is properly synchronized when code is written to memory. It supports both
SMP and UP flushing.
This supports dcache flush as well for architectures that do not support
write-through caches; most OpenRISC implementations do implement
write-through cache however. Dcache flushes are done only on a single
core as OpenRISC dcaches all support snooping of bus stores.
Signed-off-by: Jan Henrik Weinstock <jan.weinstock@ice.rwth-aachen.de>
[shorne@gmail.com: Squashed patches and wrote commit message]
Signed-off-by: Stafford Horne <shorne@gmail.com>
|
|
Currently we do a spin on secondary cpus when waiting to boot. This
theoretically causes issues with power consumption and does cause issues
with qemu cycle burning (it starves cpu 0 from actually being able to
boot.)
This change puts each secondary cpu to sleep if they have a power
management unit, then signals them to wake via IPI when its time to boot.
If the cpus have no power management unit they will loop as before.
Note: The wakeup IPI requires a special interrupt handler as on secondary
cpu's the interrupt infrastructure is not yet established. This
interrupt handler is set and reset by updating SPR_EVBAR.
Signed-off-by: Stafford Horne <shorne@gmail.com>
|
|
During SMP testing we were getting the below warning after booting the
secondary cpu:
[ 0.060000] BUG: scheduling while atomic: swapper/1/0/0x00000000
This change follows similar patterns from other architectures to start
the schduler with preempt disabled.
Signed-off-by: Stafford Horne <shorne@gmail.com>
|
|
This patch introduces the SMP support for the OpenRISC architecture.
The SMP architecture requires cores which have multi-core features which
have been introduced a few years back including:
- New SPRS SPR_COREID SPR_NUMCORES
- Shadow SPRs
- Atomic Instructions
- Cache Coherency
- A wired in IPI controller
This patch adds all of the SMP specific changes to core infrastructure,
it looks big but it needs to go all together as its hard to split this
one up.
Boot loader spinning of second cpu is not supported yet, it's assumed
that Linux is booted straight after cpu reset.
The bulk of these changes are trivial changes to refactor to use per cpu
data structures throughout. The addition of the smp.c and changes in
time.c are the changes. Some specific notes:
MM changes
----------
The reason why this is created as an array, and not with DEFINE_PER_CPU
is that doing it this way, we'll save a load in the tlb-miss handler
(the load from __per_cpu_offset).
TLB Flush
---------
The SMP implementation of flush_tlb_* works by sending out a
function-call IPI to all the non-local cpus by using the generic
on_each_cpu() function.
Currently, all flush_tlb_* functions will result in a flush_tlb_all(),
which has always been the behaviour in the UP case.
CPU INFO
--------
This creates a per cpu cpuinfo struct and fills it out accordingly for
each activated cpu. show_cpuinfo is also updated to reflect new version
information in later versions of the spec.
SMP API
-------
This imitates the arm64 implementation by having a smp_cross_call
callback that can be set by set_smp_cross_call to initiate an IPI and a
handle_IPI function that is expected to be called from an IPI irqchip
driver.
Signed-off-by: Stefan Kristiansson <stefan.kristiansson@saunalahti.fi>
[shorne@gmail.com: added cpu stop, checkpatch fixes, wrote commit message]
Signed-off-by: Stafford Horne <shorne@gmail.com>
|
|
IPI driver for the Open Multi-Processor Interrupt Controller (ompic) as
described in the Multi-core support section of the OpenRISC 1.2
architecture specification:
https://github.com/openrisc/doc/raw/master/openrisc-arch-1.2-rev0.pdf
Each OpenRISC core contains a full interrupt controller which is used in
the SMP architecture for interrupt balancing. This IPI device, the
ompic, is the only external device required for enabling SMP on
OpenRISC.
Pending ops are stored in a memory bit mask which can allow multiple
pending operations to be set and serviced at a time. This is mostly
borrowed from the alpha IPI implementation.
Reviewed-by: Marc Zyngier <marc.zyngier@arm.com>
Acked-by: Rob Herring <robh@kernel.org>
Signed-off-by: Stefan Kristiansson <stefan.kristiansson@saunalahti.fi>
[shorne@gmail.com: converted ops to bitmask, wrote commit message]
Signed-off-by: Stafford Horne <shorne@gmail.com>
|
|
Add OpenRISC.io to vendor prefixes. This is reserved for softcores
developed by the OpenRISC community. The OpenRISC community has
separated from OpenCores.org requiring a new prefix.
Reviewed-by: Andreas Färber <afaerber@suse.de>
Acked-by: Rob Herring <robh@kernel.org>
Signed-off-by: Stafford Horne <shorne@gmail.com>
|
|
Enable OpenRISC to use qspinlocks and qrwlocks for upcoming SMP support.
Signed-off-by: Stafford Horne <shorne@gmail.com>
|
|
OpenRISC only supports hardware instructions that perform 4 byte atomic
operations. For enabling qrwlocks for upcoming SMP support 1 and 2 byte
implementations are needed. To do this we leverage the 4 byte atomic
operations and shift/mask the 1 and 2 byte areas as needed.
This heavily borrows ideas and routines from sh and mips, which do
something similar.
Cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Stafford Horne <shorne@gmail.com>
|
|
Previously, the area between 0x0-0x100 have been used as a "scratch"
memory area to temporarily store regs during exception entry. In a
multi-core environment, this will not work.
This change is to use shadow registers for nested context.
Currently only the "critical" temp load/stores are covered, the
EMERGENCY_PRINT ones are left as is (when they are used, it's game over
anyway), they need to be handled as well in the future.
Signed-off-by: Stefan Kristiansson <stefan.kristiansson@saunalahti.fi>
Signed-off-by: Stafford Horne <shorne@gmail.com>
|
|
Add devicetree binding documentation for the OpenRISC platform
opencores,or1ksim. This is the main OpenRISC reference platform
supporting multiple FPGA SoC's.
This format is based on some of the mips binding docs as we have
similar requirements.
Also, update maintainers so openrisc related binding changes are visible
to the openrisc team.
Acked-by: Rob Herring <robh@kernel.org>
Suggested-by: Pavel Machek <pavel@ucw.cz>
Signed-off-by: Stafford Horne <shorne@gmail.com>
|
|
Cong Wang says:
====================
net_sched: fix a use-after-free for tc actions
This patchset fixes a use-after-free reported by Lucas
and closes potential races too.
Please see each patch for details.
====================
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
TC actions have been destroyed asynchronously for a long time,
previously in a RCU callback and now in a workqueue. If we
don't hold a refcnt for its netns, we could use the per netns
data structure, struct tcf_idrinfo, after it has been freed by
netns workqueue.
Hold refcnt to ensure netns destroy happens after all actions
are gone.
Fixes: ddf97ccdd7cb ("net_sched: add network namespace support for tc actions")
Reported-by: Lucas Bates <lucasb@mojatatu.com>
Tested-by: Lucas Bates <lucasb@mojatatu.com>
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Cc: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
I forgot to acquire RTNL in tc_action_net_exit()
which leads that action ops->cleanup() is not always
called with RTNL. This usually is not a big deal because
this function is called after all netns refcnt are gone,
but given RTNL protects more than just actions, add it
for safety and consistency.
Also add an assertion to catch other potential bugs.
Fixes: ddf97ccdd7cb ("net_sched: add network namespace support for tc actions")
Reported-by: Lucas Bates <lucasb@mojatatu.com>
Tested-by: Lucas Bates <lucasb@mojatatu.com>
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Cc: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
This tracepoint can be used to trace synack retransmits. It maintains
pointer to struct request_sock.
We cannot simply reuse trace_tcp_retransmit_skb() here, because the
sk here is the LISTEN socket. The IP addresses and ports should be
extracted from struct request_sock.
Note that, like many other tracepoints, this patch uses IS_ENABLED
in TP_fast_assign macro, which triggers sparse warning like:
./include/trace/events/tcp.h:274:1: error: directive in argument list
./include/trace/events/tcp.h:281:1: error: directive in argument list
However, there is no good solution to avoid these warnings. To the
best of our knowledge, these warnings are harmless.
Signed-off-by: Song Liu <songliubraving@fb.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
RFC 8200 (IPv6) defines Hop-by-Hop options and Destination options
extension headers. Both of these carry a list of TLVs which is
only limited by the maximum length of the extension header (2048
bytes). By the spec a host must process all the TLVs in these
options, however these could be used as a fairly obvious
denial of service attack. I think this could in fact be
a significant DOS vector on the Internet, one mitigating
factor might be that many FWs drop all packets with EH (and
obviously this is only IPv6) so an Internet wide attack might not
be so effective (yet!).
By my calculation, the worse case packet with TLVs in a standard
1500 byte MTU packet that would be processed by the stack contains
1282 invidual TLVs (including pad TLVS) or 724 two byte TLVs. I
wrote a quick test program that floods a whole bunch of these
packets to a host and sure enough there is substantial time spent
in ip6_parse_tlv. These packets contain nothing but unknown TLVS
(that are ignored), TLV padding, and bogus UDP header with zero
payload length.
25.38% [kernel] [k] __fib6_clean_all
21.63% [kernel] [k] ip6_parse_tlv
4.21% [kernel] [k] __local_bh_enable_ip
2.18% [kernel] [k] ip6_pol_route.isra.39
1.98% [kernel] [k] fib6_walk_continue
1.88% [kernel] [k] _raw_write_lock_bh
1.65% [kernel] [k] dst_release
This patch adds configurable limits to Destination and Hop-by-Hop
options. There are three limits that may be set:
- Limit the number of options in a Hop-by-Hop or Destination options
extension header.
- Limit the byte length of a Hop-by-Hop or Destination options
extension header.
- Disallow unrecognized options in a Hop-by-Hop or Destination
options extension header.
The limits are set in corresponding sysctls:
ipv6.sysctl.max_dst_opts_cnt
ipv6.sysctl.max_hbh_opts_cnt
ipv6.sysctl.max_dst_opts_len
ipv6.sysctl.max_hbh_opts_len
If a max_*_opts_cnt is less than zero then unknown TLVs are disallowed.
The number of known TLVs that are allowed is the absolute value of
this number.
If a limit is exceeded when processing an extension header the packet is
dropped.
Default values are set to 8 for options counts, and set to INT_MAX
for maximum length. Note the choice to limit options to 8 is an
arbitrary guess (roughly based on the fact that the stack supports
three HBH options and just one destination option).
These limits have being proposed in draft-ietf-6man-rfc6434-bis.
Tested (by Martin Lau)
I tested out 1 thread (i.e. one raw_udp process).
I changed the net.ipv6.max_dst_(opts|hbh)_number between 8 to 2048.
With sysctls setting to 2048, the softirq% is packed to 100%.
With 8, the softirq% is almost unnoticable from mpstat.
v2;
- Code and documention cleanup.
- Change references of RFC2460 to be RFC8200.
- Add reference to RFC6434-bis where the limits will be in standard.
Signed-off-by: Tom Herbert <tom@quantonium.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
https://git.kernel.org/pub/scm/linux/kernel/git/kees/linux into timers/core
Pull the 3rd batch of timer conversions from Kees Cook:
- various per-architecture conversions
- several driver conversions not picked up by a specific maintainer
- other Acked/Reviewed conversions to go through tip
|
|
In preparation for unconditionally passing the struct timer_list pointer to
all timer callbacks, switch to using the new timer_setup() and from_timer()
to pass the timer pointer explicitly.
Cc: Cliff Whickman <cpw@sgi.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
Acked-by: Robin Holt <robinmholt@gmail.com>
|
|
In preparation for unconditionally passing the struct timer_list pointer to
all timer callbacks, switch to using the new timer_setup() and from_timer()
to pass the timer pointer explicitly.
Cc: Florian Fainelli <f.fainelli@gmail.com>
Cc: bcm-kernel-feedback-list@broadcom.com
Cc: David Howells <dhowells@redhat.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: linux-pcmcia@lists.infradead.org
Cc: linux-arm-kernel@lists.infradead.org
Signed-off-by: Kees Cook <keescook@chromium.org>
Acked-by: Russell King <rmk+kernel@armlinux.org.uk> # for soc_common.c
|
|
In preparation for unconditionally passing the struct timer_list pointer to
all timer callbacks, switch to using the new timer_setup() and from_timer()
to pass the timer pointer explicitly.
Cc: Maxim Levitsky <maximlevitsky@gmail.com>
Cc: Alex Dubov <oakad@yahoo.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
|
|
In preparation for unconditionally passing the struct timer_list pointer to
all timer callbacks, switch to using the new timer_setup() and from_timer()
to pass the timer pointer explicitly.
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: linuxppc-dev@lists.ozlabs.org
Signed-off-by: Kees Cook <keescook@chromium.org>
|