Age | Commit message (Collapse) | Author |
|
There is no point in calling dsa_port_setup_8021q_tagging for each
individual port. Additionally, it will become more difficult to do that
when we'll have a context structure to tag_8021q (next patch). So
refactor this now.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The previous assumption was that the caller would already have this
header file included.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux
Pull i2c updates from Wolfram Sang:
"Usual driver bugfixes for the I2C subsystem"
* 'i2c/for-current' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux:
i2c: algo: pca: Reapply i2c bus settings after reset
i2c: npcm7xx: Fix timeout calculation
misc: eeprom: at24: register nvmem only after eeprom is ready to use
|
|
MIPS defines two kvm types:
#define KVM_VM_MIPS_TE 0
#define KVM_VM_MIPS_VZ 1
In Documentation/virt/kvm/api.rst it is said that "You probably want to
use 0 as machine type", which implies that type 0 be the "automatic" or
"default" type. And, in user-space libvirt use the null-machine (with
type 0) to detect the kvm capability, which returns "KVM not supported"
on a VZ platform.
I try to fix it in QEMU but it is ugly:
https://lists.nongnu.org/archive/html/qemu-devel/2020-08/msg05629.html
And Thomas Huth suggests me to change the definition of kvm type:
https://lists.nongnu.org/archive/html/qemu-devel/2020-09/msg03281.html
So I define like this:
#define KVM_VM_MIPS_AUTO 0
#define KVM_VM_MIPS_VZ 1
#define KVM_VM_MIPS_TE 2
Since VZ and TE cannot co-exists, using type 0 on a TE platform will
still return success (so old user-space tools have no problems on new
kernels); the advantage is that using type 0 on a VZ platform will not
return failure. So, the only problem is "new user-space tools use type
2 on old kernels", but if we treat this as a kernel bug, we can backport
this patch to old stable kernels.
Signed-off-by: Huacai Chen <chenhc@lemote.com>
Message-Id: <1599734031-28746-1-git-send-email-chenhc@lemote.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
* powercap:
powercap: make documentation reflect code
powercap/intel_rapl: add support for AlderLake
powercap/intel_rapl: add support for RocketLake
powercap/intel_rapl: add support for TigerLake Desktop
|
|
For new advertising features, it will be important for userspace to
know the capabilities of the controller and kernel. If the controller
and kernel support extended advertising, we include flags indicating
hardware offloading support and support for setting tx power of adv
instances.
In the future, vendor-specific commands may allow the setting of tx
power in advertising instances, but for now this feature is only
marked available if extended advertising is supported.
This change is manually verified in userspace by ensuring the
advertising manager's supported_flags field is updated with new flags on
hatch chromebook (ext advertising supported).
Signed-off-by: Daniel Winkler <danielwinkler@google.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
|
|
Now that the previous patches ensure that all call sites for
tcp_set_congestion_control() want to initialize congestion control, we
can simplify tcp_set_congestion_control() by removing the reinit
argument and the code to support it.
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Yuchung Cheng <ycheng@google.com>
Acked-by: Kevin Yang <yyd@google.com>
Cc: Lawrence Brakmo <brakmo@fb.com>
|
|
Change tcp_init_transfer() to only initialize congestion control if it
has not been initialized already.
With this new approach, we can arrange things so that if the EBPF code
sets the congestion control by calling setsockopt(TCP_CONGESTION) then
tcp_init_transfer() will not re-initialize the CC module.
This is an approach that has the following beneficial properties:
(1) This allows CC module customizations made by the EBPF called in
tcp_init_transfer() to persist, and not be wiped out by a later
call to tcp_init_congestion_control() in tcp_init_transfer().
(2) Does not flip the order of EBPF and CC init, to avoid causing bugs
for existing code upstream that depends on the current order.
(3) Does not cause 2 initializations for for CC in the case where the
EBPF called in tcp_init_transfer() wants to set the CC to a new CC
algorithm.
(4) Allows follow-on simplifications to the code in net/core/filter.c
and net/ipv4/tcp_cong.c, which currently both have some complexity
to special-case CC initialization to avoid double CC
initialization if EBPF sets the CC.
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Yuchung Cheng <ycheng@google.com>
Acked-by: Kevin Yang <yyd@google.com>
Cc: Lawrence Brakmo <brakmo@fb.com>
|
|
This should be "current" not "skb".
Fixes: c6b5fb8690fa ("bpf: add documentation for eBPF helpers (42-50)")
Signed-off-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Cc: <stable@vger.kernel.org>
Link: https://lore.kernel.org/bpf/20200910203314.70018-1-songliubraving@fb.com
|
|
As Alexei points out, struct bpf_sk_lookup_kern has two 4-byte holes.
This leads to suboptimal instructions being generated (IPv4, x86):
1372 struct bpf_sk_lookup_kern ctx = {
0xffffffff81b87f30 <+624>: xor %eax,%eax
0xffffffff81b87f32 <+626>: mov $0x6,%ecx
0xffffffff81b87f37 <+631>: lea 0x90(%rsp),%rdi
0xffffffff81b87f3f <+639>: movl $0x110002,0x88(%rsp)
0xffffffff81b87f4a <+650>: rep stos %rax,%es:(%rdi)
0xffffffff81b87f4d <+653>: mov 0x8(%rsp),%eax
0xffffffff81b87f51 <+657>: mov %r13d,0x90(%rsp)
0xffffffff81b87f59 <+665>: incl %gs:0x7e4970a0(%rip)
0xffffffff81b87f60 <+672>: mov %eax,0x8c(%rsp)
0xffffffff81b87f67 <+679>: movzwl 0x10(%rsp),%eax
0xffffffff81b87f6c <+684>: mov %ax,0xa8(%rsp)
0xffffffff81b87f74 <+692>: movzwl 0x38(%rsp),%eax
0xffffffff81b87f79 <+697>: mov %ax,0xaa(%rsp)
Fix this by moving around sport and dport. pahole confirms there
are no more holes:
struct bpf_sk_lookup_kern {
u16 family; /* 0 2 */
u16 protocol; /* 2 2 */
__be16 sport; /* 4 2 */
u16 dport; /* 6 2 */
struct {
__be32 saddr; /* 8 4 */
__be32 daddr; /* 12 4 */
} v4; /* 8 8 */
struct {
const struct in6_addr * saddr; /* 16 8 */
const struct in6_addr * daddr; /* 24 8 */
} v6; /* 16 16 */
struct sock * selected_sk; /* 32 8 */
bool no_reuseport; /* 40 1 */
/* size: 48, cachelines: 1, members: 8 */
/* padding: 7 */
/* last cacheline: 48 bytes */
};
The assembly also doesn't contain the pesky rep stos anymore:
1372 struct bpf_sk_lookup_kern ctx = {
0xffffffff81b87f60 <+624>: movzwl 0x10(%rsp),%eax
0xffffffff81b87f65 <+629>: movq $0x0,0xa8(%rsp)
0xffffffff81b87f71 <+641>: movq $0x0,0xb0(%rsp)
0xffffffff81b87f7d <+653>: mov %ax,0x9c(%rsp)
0xffffffff81b87f85 <+661>: movzwl 0x38(%rsp),%eax
0xffffffff81b87f8a <+666>: movq $0x0,0xb8(%rsp)
0xffffffff81b87f96 <+678>: mov %ax,0x9e(%rsp)
0xffffffff81b87f9e <+686>: mov 0x8(%rsp),%eax
0xffffffff81b87fa2 <+690>: movq $0x0,0xc0(%rsp)
0xffffffff81b87fae <+702>: movl $0x110002,0x98(%rsp)
0xffffffff81b87fb9 <+713>: mov %eax,0xa0(%rsp)
0xffffffff81b87fc0 <+720>: mov %r13d,0xa4(%rsp)
1: https://lore.kernel.org/bpf/CAADnVQKE6y9h2fwX6OS837v-Uf+aBXnT_JXiN_bbo2gitZQ3tA@mail.gmail.com/
Fixes: e9ddbb7707ff ("bpf: Introduce SK_LOOKUP program type with a dedicated attach point")
Suggested-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Lorenz Bauer <lmb@cloudflare.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Jesper Dangaard Brouer <brouer@redhat.com>
Link: https://lore.kernel.org/bpf/20200910110248.198326-1-lmb@cloudflare.com
|
|
There is no @validate argument.
CC: Johannes Berg <johannes.berg@intel.com>
Fixes: 3de644035446 ("netlink: re-add parse/validate functions in strict mode")
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Remove the weird space inside the NETIF_F_CSUM_MASK.
Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
This commit adds a new TCP feature to reflect the tos value received in
SYN, and send it out on the SYN-ACK, and eventually set the tos value of
the established socket with this reflected tos value. This provides a
way to set the traffic class/QoS level for all traffic in the same
connection to be the same as the incoming SYN request. It could be
useful in data centers to provide equivalent QoS according to the
incoming request.
This feature is guarded by /proc/sys/net/ipv4/tcp_reflect_tos, and is by
default turned off.
Signed-off-by: Wei Wang <weiwan@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
This commit adds tos as a new passed in parameter to
ip_build_and_send_pkt() which will be used in the later commit.
This is a pure restructure and does not have any functional change.
Signed-off-by: Wei Wang <weiwan@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
A new field is added to the request sock to record the TOS value
received on the listening socket during 3WHS:
When not under syn flood, it is recording the TOS value sent in SYN.
When under syn flood, it is recording the TOS value sent in the ACK.
This is a preparation patch in order to do TOS reflection in the later
commit.
Signed-off-by: Wei Wang <weiwan@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
To RCUify napi->dev_list we need to replace list_del_init()
with list_del_rcu(). There is no _init() version for RCU for
obvious reasons. Up until now netif_napi_del() was idempotent
so to make sure it remains such add a bit which is set when
NAPI is listed, and cleared when it removed. Since we don't
expect multiple calls to netif_napi_add() to be correct,
add a warning on that side.
Now that napi_hash_add / napi_hash_del are only called by
napi_add / del we can actually steal its bit. We just need
to make sure hash node is initialized correctly.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
We allow drivers to call napi_hash_del() before calling
netif_napi_del() to batch RCU grace periods. This makes
the API asymmetric and leaks internal implementation details.
Soon we will want the grace period to protect more than just
the NAPI hash table.
Restructure the API and have drivers call a new function -
__netif_napi_del() if they want to take care of RCU waits.
Note that only core was checking the return status from
napi_hash_del() so the new helper does not report if the
NAPI was actually deleted.
Some notes on driver oddness:
- veth observed the grace period before calling netif_napi_del()
but that should not matter
- myri10ge observed normal RCU flavor
- bnx2x and enic did not actually observe the grace period
(unless they did so implicitly)
- virtio_net and enic only unhashed Rx NAPIs
The last two points seem to indicate that the calls to
napi_hash_del() were a left over rather than an optimization.
Regardless, it's easy enough to correct them.
This patch may introduce extra synchronize_net() calls for
interfaces which set NAPI_STATE_NO_BUSY_POLL and depend on
free_netdev() to call netif_napi_del(). This seems inevitable
since we want to use RCU for netpoll dev->napi_list traversal,
and almost no drivers set IFF_DISABLE_NETPOLL.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Use the unused3 byte in struct igmpmsg to hold the high 8 bits of the
VIF ID.
If using more than 255 IPv4 multicast interfaces it is necessary to have
access to a VIF ID for cache reports that is wider than 8 bits, the VIF
ID present in the igmpmsg reports sent to mroute_sk was only 8 bits wide
in the igmpmsg header. Adding the high 8 bits of the 16 bit VIF ID in
the unused byte allows use of more than 255 IPv4 multicast interfaces.
Signed-off-by: Paul Davey <paul.davey@alliedtelesis.co.nz>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Insert the multicast route table ID as a Netlink attribute to Netlink
cache report notifications.
When multiple route tables are in use it is necessary to have a way to
determine which route table a given cache report belongs to when
receiving the cache report.
Signed-off-by: Paul Davey <paul.davey@alliedtelesis.co.nz>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Fix up the documentation of the struct powercap_control_type members
to match the code.
Also fixup stray whitespace.
Signed-off-by: Amit Kucheria <amitk@kernel.org>
[ rjw: Changelog edits ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
|
|
Fix kernel-doc warning in <linux/device.h>:
../include/linux/device.h:613: warning: Function parameter or member 'em_pd' not described in 'device'
Fixes: 1bc138c62295 ("PM / EM: add support for other devices than CPUs in Energy Model")
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Reviewed-by: Lukasz Luba <lukasz.luba@arm.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
|
|
On non-EFI systems, it wasn't possible to test the platform firmware
loader because it will have never set "checked_fw" during __init.
Instead, allow the test code to override this check. Additionally split
the declarations into a private symbol namespace so there is greater
enforcement of the symbol visibility.
Fixes: 548193cba2a7 ("test_firmware: add support for firmware_request_platform")
Cc: stable@vger.kernel.org
Signed-off-by: Kees Cook <keescook@chromium.org>
Acked-by: Ard Biesheuvel <ardb@kernel.org>
Link: https://lore.kernel.org/r/20200909225354.3118328-1-keescook@chromium.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
In CMT and NPAR the PF is unknown when the GFS block processes the
packet. Therefore cannot use searcher as it has a per PF database,
and thus ARFS must be disabled.
Fixes: d51e4af5c209 ("qed: aRFS infrastructure support")
Signed-off-by: Manish Chopra <manishc@marvell.com>
Signed-off-by: Igor Russkikh <irusskikh@marvell.com>
Signed-off-by: Michal Kalderon <michal.kalderon@marvell.com>
Signed-off-by: Dmitry Bogdanov <dbogdanov@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
A devlink port may be for a controller consist of PCI device.
A devlink instance holds ports of two types of controllers.
(1) controller discovered on same system where eswitch resides
This is the case where PCI PF/VF of a controller and devlink eswitch
instance both are located on a single system.
(2) controller located on external host system.
This is the case where a controller is located in one system and its
devlink eswitch ports are located in a different system.
When a devlink eswitch instance serves the devlink ports of both
controllers together, PCI PF/VF numbers may overlap.
Due to this a unique phys_port_name cannot be constructed.
For example in below such system controller-0 and controller-1, each has
PCI PF pf0 whose eswitch ports can be present in controller-0.
These results in phys_port_name as "pf0" for both.
Similar problem exists for VFs and upcoming Sub functions.
An example view of two controller systems:
---------------------------------------------------------
| |
| --------- --------- ------- ------- |
----------- | | vf(s) | | sf(s) | |vf(s)| |sf(s)| |
| server | | ------- ----/---- ---/----- ------- ---/--- ---/--- |
| pci rc |=== | pf0 |______/________/ | pf1 |___/_______/ |
| connect | | ------- ------- |
----------- | | controller_num=1 (no eswitch) |
------|--------------------------------------------------
(internal wire)
|
---------------------------------------------------------
| devlink eswitch ports and reps |
| ----------------------------------------------------- |
| |ctrl-0 | ctrl-0 | ctrl-0 | ctrl-0 | ctrl-0 |ctrl-0 | |
| |pf0 | pf0vfN | pf0sfN | pf1 | pf1vfN |pf1sfN | |
| ----------------------------------------------------- |
| |ctrl-1 | ctrl-1 | ctrl-1 | ctrl-1 | ctrl-1 |ctrl-1 | |
| |pf1 | pf1vfN | pf1sfN | pf1 | pf1vfN |pf0sfN | |
| ----------------------------------------------------- |
| |
| |
| --------- --------- ------- ------- |
| | vf(s) | | sf(s) | |vf(s)| |sf(s)| |
| ------- ----/---- ---/----- ------- ---/--- ---/--- |
| | pf0 |______/________/ | pf1 |___/_______/ |
| ------- ------- |
| |
| local controller_num=0 (eswitch) |
---------------------------------------------------------
An example devlink port for external controller with controller
number = 1 for a VF 1 of PF 0:
$ devlink port show pci/0000:06:00.0/2
pci/0000:06:00.0/2: type eth netdev ens2f0pf0vf1 flavour pcivf controller 1 pfnum 0 vfnum 1 external true splittable false
function:
hw_addr 00:00:00:00:00:00
$ devlink port show pci/0000:06:00.0/2 -jp
{
"port": {
"pci/0000:06:00.0/2": {
"type": "eth",
"netdev": "ens2f0pf0vf1",
"flavour": "pcivf",
"controller": 1,
"pfnum": 0,
"vfnum": 1,
"external": true,
"splittable": false,
"function": {
"hw_addr": "00:00:00:00:00:00"
}
}
}
}
Signed-off-by: Parav Pandit <parav@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
A devlink eswitch port may represent PCI PF/VF ports of a controller.
A controller either located on same system or it can be an external
controller located in host where such NIC is plugged in.
Add the ability for driver to specify if a port is for external
controller.
Use such flag in the mlx5_core driver.
An example of an external controller having VF1 of PF0 belong to
controller 1.
$ devlink port show pci/0000:06:00.0/2
pci/0000:06:00.0/2: type eth netdev ens2f0pf0vf1 flavour pcivf pfnum 0 vfnum 1 external true splittable false
function:
hw_addr 00:00:00:00:00:00
$ devlink port show pci/0000:06:00.0/2 -jp
{
"port": {
"pci/0000:06:00.0/2": {
"type": "eth",
"netdev": "ens2f0pf0vf1",
"flavour": "pcivf",
"pfnum": 0,
"vfnum": 1,
"external": true,
"splittable": false,
"function": {
"hw_addr": "00:00:00:00:00:00"
}
}
}
}
Signed-off-by: Parav Pandit <parav@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
To add more fields to the PCI PF and VF port attributes, follow standard
structure comment format.
Signed-off-by: Parav Pandit <parav@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Add comment block for physical, PF and VF port attributes.
Signed-off-by: Parav Pandit <parav@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Pablo Neira Ayuso says:
====================
Netfilter updates for net-next
The following patchset contains Netfilter updates for net-next:
1) Rewrite inner header IPv6 in ICMPv6 messages in ip6t_NPT,
from Michael Zhou.
2) do_ip_vs_set_ctl() dereferences uninitialized value,
from Peilin Ye.
3) Support for userdata in tables, from Jose M. Guisado.
4) Do not increment ct error and invalid stats at the same time,
from Florian Westphal.
5) Remove ct ignore stats, also from Florian.
6) Add ct stats for clash resolution, from Florian Westphal.
7) Bump reference counter bump on ct clash resolution only,
this is safe because bucket lock is held, again from Florian.
8) Use ip_is_fragment() in xt_HMARK, from YueHaibing.
9) Add wildcard support for nft_socket, from Balazs Scheidler.
10) Remove superfluous IPVS dependency on iptables, from
Yaroslav Bolyukin.
11) Remove unused definition in ebt_stp, from Wang Hai.
12) Replace CONFIG_NFT_CHAIN_NAT_{IPV4,IPV6} by CONFIG_NFT_NAT
in selftests/net, from Fabian Frederick.
13) Add userdata support for nft_object, from Jose M. Guisado.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
skb_put_padto() and __skb_put_padto() callers
must check return values or risk use-after-free.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
If something goes wrong (such as the SCL being stuck low) then we need
to reset the PCA chip. The issue with this is that on reset we lose all
config settings and the chip ends up in a disabled state which results
in a lock up/high CPU usage. We need to re-apply any configuration that
had previously been set and re-enable the chip.
Signed-off-by: Evan Nimmo <evan.nimmo@alliedtelesis.co.nz>
Reviewed-by: Chris Packham <chris.packham@alliedtelesis.co.nz>
Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Wolfram Sang <wsa@kernel.org>
|
|
Rewrite the rxrpc client connection manager so that it can support multiple
connections for a given security key to a peer. The following changes are
made:
(1) For each open socket, the code currently maintains an rbtree with the
connections placed into it, keyed by communications parameters. This
is tricky to maintain as connections can be culled from the tree or
replaced within it. Connections can require replacement for a number
of reasons, e.g. their IDs span too great a range for the IDR data
type to represent efficiently, the call ID numbers on that conn would
overflow or the conn got aborted.
This is changed so that there's now a connection bundle object placed
in the tree, keyed on the same parameters. The bundle, however, does
not need to be replaced.
(2) An rxrpc_bundle object can now manage the available channels for a set
of parallel connections. The lock that manages this is moved there
from the rxrpc_connection struct (channel_lock).
(3) There'a a dummy bundle for all incoming connections to share so that
they have a channel_lock too. It might be better to give each
incoming connection its own bundle. This bundle is not needed to
manage which channels incoming calls are made on because that's the
solely at whim of the client.
(4) The restrictions on how many client connections are around are
removed. Instead, a previous patch limits the number of client calls
that can be allocated. Ordinarily, client connections are reaped
after 2 minutes on the idle queue, but when more than a certain number
of connections are in existence, the reaper starts reaping them after
2s of idleness instead to get the numbers back down.
It could also be made such that new call allocations are forced to
wait until the number of outstanding connections subsides.
Signed-off-by: David Howells <dhowells@redhat.com>
|
|
Enables storing userdata for nft_object. Initially this will store an
optional comment but can be extended in the future as needed.
Adds new attribute NFTA_OBJ_USERDATA to nft_object.
Signed-off-by: Jose M. Guisado Gomez <guigom@riseup.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
On x86_64, each notification results in one skbuff allocation which
consumes at least 768 bytes due to the skbuff overhead.
This patch coalesces several notifications into one single skbuff, so
each notification consumes at least ~211 bytes, that ~3.5 times less
memory consumption. As a result, this is reducing the chances to exhaust
the netlink socket receive buffer.
Rule of thumb is that each notification batch only contains netlink
messages whose report flag is the same, nfnetlink_send() requires this
to do appropriate delivery to userspace, either via unicast (echo
mode) or multicast (monitor mode).
The skbuff control buffer is used to annotate the report flag for later
handling at the new coalescing routine.
The batch skbuff notification size is NLMSG_GOODSIZE, using a larger
skbuff would allow for more socket receiver buffer savings (to amortize
the cost of the skbuff even more), however, going over that size might
break userspace applications, so let's be conservative and stick to
NLMSG_GOODSIZE.
Reported-by: Phil Sutter <phil@nwl.cc>
Acked-by: Phil Sutter <phil@nwl.cc>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
This patch is born out of an investigation into which IEEE statistics
correspond to which struct rtnl_link_stats64 members. Turns out that
there seems to be reasonable consensus on the matter, among many drivers.
To save others the time (and it took more time than I'm comfortable
admitting) I'm adding comments referring to IEEE attributes to
struct rtnl_link_stats64.
Up until now we had two forms of documentation for stats - in
Documentation/ABI/testing/sysfs-class-net-statistics and the comments
on struct rtnl_link_stats64 itself. While the former is very cautious
in defining the expected behavior, the latter feel quite dated and
may not be easy to understand for modern day driver author
(e.g. rx_over_errors). At the same time modern systems are far more
complex and once obvious definitions lost their clarity. For example
- does rx_packet count at the MAC layer (aFramesReceivedOK)?
packets processed correctly by hardware? received by the driver?
or maybe received by the stack?
I tried to clarify the expectations, further clarifications from
others are very welcome.
The part hardest to untangle is rx_over_errors vs rx_fifo_errors
vs rx_missed_errors. After much deliberation I concluded that for
modern HW only two of the counters will make sense. The distinction
between internal FIFO overflow and packets dropped due to back-pressure
from the host is likely too implementation (driver and device) specific
to expose in the standard stats.
Now - which two of those counters we select to use is anyone's pick:
sysfs documentation suggests rx_over_errors counts packets which
did not fit into buffers due to MTU being too small, which I reused.
There don't seem to be many modern drivers using it (well, CAN drivers
seem to love this statistic).
Of the remaining two I picked rx_missed_errors to report device drops.
bnxt reports it and it's folded into "drop"s in procfs (while
rx_fifo_errors is an error, and modern devices usually receive the frame
OK, they just can't admit it into the pipeline).
Of the drivers I looked at only AMD Lance-like and NS8390-like use all
three of these counters. rx_missed_errors counts missed frames,
rx_over_errors counts overflow events, and rx_fifo_errors counts frames
which were truncated because they didn't fit into buffers. This suggests
that rx_fifo_errors may be the correct stat for truncated packets, but
I'd think a FIFO stat counting truncated packets would be very confusing
to a modern reader.
v2:
- add driver developer notes about ethtool stat count and reset
- replace Ethernet with IEEE 802.3 to better indicate source of attrs
- mention byte counters don't count FCS
- clarify RX counter is from device to host
- drop "sightly" from sysfs paragraph
- add examples of ethtool stats
- s/incoming/received/ s/incoming/transmitted/
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Support per port group src list (address and timer) and filter mode
dumping. Protected by either multicast_lock or rcu.
v3: add IPv6 support
v2: require RCU or multicast_lock to traverse src groups
Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Fix kernel-doc warning in <linux/netdevice.h>:
../include/linux/netdevice.h:2158: warning: Function parameter or member 'xdp_state' not described in 'net_device'
Fixes: 7f0a838254bd ("bpf, xdp: Maintain info on attached XDP BPF programs in net_device")
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Cc: Andrii Nakryiko <andriin@fb.com>
Cc: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Fix kernel-doc warning in <linux/netdevice.h>:
../include/linux/netdevice.h:2158: warning: Function parameter or member 'proto_down_reason' not described in 'net_device'
Fixes: 829eb208e80d ("rtnetlink: add support for protodown reason")
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Acked-by: Roopa Prabhu <roopa@cumulusnetworks.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Fix a formatting error in the description of bpf_load_hdr_opt() (rst2man
complains about a wrong indentation, but what is missing is actually a
blank line before the bullet list).
Fix and harmonise the formatting for other helpers.
Signed-off-by: Quentin Monnet <quentin@isovalent.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20200904161454.31135-3-quentin@isovalent.com
|
|
Fix kernel-doc warning in <linux/device.h>:
../include/linux/device.h:613: warning: Function parameter or member 'em_pd' not described in 'device'
Fixes: 1bc138c62295 ("PM / EM: add support for other devices than CPUs in Energy Model")
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Reviewed-by: Lukasz Luba <lukasz.luba@arm.com>
Link: https://lore.kernel.org/r/d97f40ad-3033-703a-c3cb-2843ce0f6371@infradead.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
Introduce for_each_rtd_dais_rollback macro which behaves exactly like
for_each_codec_dais_rollback and its cpu_dais equivalent but for all
dais instead.
Use newly added macro to fix soc_pcm_open error path and prevent
uninitialized dais from being cleaned-up.
Signed-off-by: Cezary Rojewski <cezary.rojewski@intel.com>
Fixes: 5d9fa03e6c35 ("ASoC: soc-pcm: tidyup soc_pcm_open() order")
Acked-by: Liam Girdwood <liam.r.girdwood@linux.intel.com>
Acked-by: Kuninori Morimoto <kuninori.morimoto.gx@renesas.com>
Link: https://lore.kernel.org/r/20200907111939.16169-1-cezary.rojewski@intel.com
Signed-off-by: Mark Brown <broonie@kernel.org>
|
|
XFRMA_REPLAY_ESN_VAL was not cloned completely from the old to the new.
Migrate this attribute during XFRMA_MSG_MIGRATE
v1->v2:
- move curleft cloning to a separate patch
Fixes: af2f464e326e ("xfrm: Assign esn pointers when cloning a state")
Signed-off-by: Antony Antony <antony.antony@secunet.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Ingo Molnar:
- more generic entry code ABI fallout
- debug register handling bugfixes
- fix vmalloc mappings on 32-bit kernels
- kprobes instrumentation output fix on 32-bit kernels
- fix over-eager WARN_ON_ONCE() on !SMAP hardware
- NUMA debugging fix
- fix Clang related crash on !RETPOLINE kernels
* tag 'x86-urgent-2020-09-06' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/entry: Unbreak 32bit fast syscall
x86/debug: Allow a single level of #DB recursion
x86/entry: Fix AC assertion
tracing/kprobes, x86/ptrace: Fix regs argument order for i386
x86, fakenuma: Fix invalid starting node ID
x86/mm/32: Bring back vmalloc faulting on x86_32
x86/cmdline: Disable jump tables for cmdline.c
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip
Pull xen updates from Juergen Gross:
"A small series for fixing a problem with Xen PVH guests when running
as backends (e.g. as dom0).
Mapping other guests' memory is now working via ZONE_DEVICE, thus not
requiring to abuse the memory hotplug functionality for that purpose"
* tag 'for-linus-5.9-rc4-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
xen: add helpers to allocate unpopulated memory
memremap: rename MEMORY_DEVICE_DEVDAX to MEMORY_DEVICE_GENERIC
xen/balloon: add header guard
|
|
'clang-format-for-linus-v5.9-rc4' and 'compiler-attributes-for-linus-v5.9-rc4' of git://github.com/ojeda/linux
Pull misc fixes from Miguel Ojeda:
"A trivial patch for auxdisplay:
- Replace HTTP links with HTTPS ones (Alexander A. Klimov)
The usual clang-format trivial update:
- Update with the latest for_each macro list (Miguel Ojeda)
And Luc requested me to pick a sparse fix on my queue, so here it goes
along with other two trivial Compiler Attributes ones (also from Luc).
- sparse: use static inline for __chk_{user,io}_ptr() (Luc Van
Oostenryck)
- Compiler Attributes: fix comment concerning GCC 4.6 (Luc Van
Oostenryck)
- Compiler Attributes: remove comment about sparse not supporting
__has_attribute (Luc Van Oostenryck)"
* tag 'auxdisplay-for-linus-v5.9-rc4' of git://github.com/ojeda/linux:
auxdisplay: Replace HTTP links with HTTPS ones
* tag 'clang-format-for-linus-v5.9-rc4' of git://github.com/ojeda/linux:
clang-format: Update with the latest for_each macro list
* tag 'compiler-attributes-for-linus-v5.9-rc4' of git://github.com/ojeda/linux:
sparse: use static inline for __chk_{user,io}_ptr()
Compiler Attributes: fix comment concerning GCC 4.6
Compiler Attributes: remove comment about sparse not supporting __has_attribute
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/vgupta/arc
Pull ARC fixes from Vineet Gupta:
- HSDK-4xd Dev system: perf driver updates for sampling interrupt
- HSDK* Dev System: Ethernet broken [Evgeniy Didin]
- HIGHMEM broken (2 memory banks) [Mike Rapoport]
- show_regs() rewrite once and for all
- Other minor fixes
* tag 'arc-5.9-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/vgupta/arc:
ARC: [plat-hsdk]: Switch ethernet phy-mode to rgmii-id
arc: fix memory initialization for systems with two memory banks
irqchip/eznps: Fix build error for !ARC700 builds
ARC: show_regs: fix r12 printing and simplify
ARC: HSDK: wireup perf irq
ARC: perf: don't bail setup if pct irq missing in device-tree
ARC: pgalloc.h: delete a duplicated word + other fixes
|
|
Merge misc fixes from Andrew Morton:
"19 patches.
Subsystems affected by this patch series: MAINTAINERS, ipc, fork,
checkpatch, lib, and mm (memcg, slub, pagemap, madvise, migration,
hugetlb)"
* emailed patches from Andrew Morton <akpm@linux-foundation.org>:
include/linux/log2.h: add missing () around n in roundup_pow_of_two()
mm/khugepaged.c: fix khugepaged's request size in collapse_file
mm/hugetlb: fix a race between hugetlb sysctl handlers
mm/hugetlb: try preferred node first when alloc gigantic page from cma
mm/migrate: preserve soft dirty in remove_migration_pte()
mm/migrate: remove unnecessary is_zone_device_page() check
mm/rmap: fixup copying of soft dirty and uffd ptes
mm/migrate: fixup setting UFFD_WP flag
mm: madvise: fix vma user-after-free
checkpatch: fix the usage of capture group ( ... )
fork: adjust sysctl_max_threads definition to match prototype
ipc: adjust proc_ipc_sem_dointvec definition to match prototype
mm: track page table modifications in __apply_to_page_range()
MAINTAINERS: IA64: mark Status as Odd Fixes only
MAINTAINERS: add LLVM maintainers
MAINTAINERS: update Cavium/Marvell entries
mm: slub: fix conversion of freelist_corrupted()
mm: memcg: fix memcg reclaim soft lockup
memcg: fix use-after-free in uncharge_batch
|
|
We will need to remove some OF properties in drivers/net/dsa/bcm_sf2.c
with a subsequent commit. Export of_remove_property() to modules so we
can keep bcm_sf2 modular and provide an empty stub for when CONFIG_OF is
disabled to maintain the ability to compile test.
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Acked-by: Rob Herring <robh@kernel.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Otherwise gcc generates warnings if the expression is complicated.
Fixes: 312a0c170945 ("[PATCH] LOG2: Alter roundup_pow_of_two() so that it can use a ilog2() on a constant")
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Link: https://lkml.kernel.org/r/0-v1-8a2697e3c003+41165-log_brackets_jgg@nvidia.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
We got slightly different patches removing a double word
in a comment in net/ipv4/raw.c - picked the version from net.
Simple conflict in drivers/net/ethernet/ibm/ibmvnic.c. Use cached
values instead of VNIC login response buffer (following what
commit 507ebe6444a4 ("ibmvnic: Fix use-after-free of VNIC login
response buffer") did).
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Pull libata fixes from Jens Axboe:
- improve Sandisks ATA_HORKAGE on NCQ (Tejun)
- link printk cleanup (Xu)
* tag 'libata-5.9-2020-09-04' of git://git.kernel.dk/linux-block:
libata: implement ATA_HORKAGE_MAX_TRIM_128M and apply to Sandisks
ata: ahci: use ata_link_info() instead of ata_link_printk()
|