Age | Commit message (Collapse) | Author |
|
Display the CoS counters as additional priority counters by looking up
the priority to CoS queue mapping. If the TX extended port statistics
block size returned by firmware is big enough to cover the CoS counters,
then we will display the new priority counters. We call firmware to get
the up-to-date pri2cos mapping to convert the CoS counters to
priority counters.
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
There are some minor differences when assigning VF resources on the
new chips. The MSIX (NQ) resource has to be assigned and ring group
is not needed on the new chips.
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
When bringing up a device, the code checks to see if the number of
MSIX has changed. pci_disable_msix() should be called first before
changing the number of reserved NQs/CMPL rings. This ensures that
the MSIX vectors associated with the NQs/CMPL rings are still
properly mapped when pci_disable_msix() masks the vectors.
This patch will prevent errors when RDMA support is added for the new
57500 chips. When the RDMA driver shuts down, the number of NQs is
decreased and we must use the new sequence to prevent MSIX errors.
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
bnxt_en requires same number of stat_ctxs as CP rings but RDMA
requires only 1 stat_ctx. Also add a new parameter resv_stat_ctxs
to better keep track of stat_ctxs reserved including resources used
by RDMA. Add a stat_ctxs parameter to all the relevant resource
reservation functions so we can reserve the correct number of
stat_ctxs.
Prior to this patch, we were not reserving the extra stat_ctx for
RDMA and RDMA would not work on the new 57500 chips.
Signed-off-by: Vasundhara Volam <vasundhara-v.volam@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Calling bnxt_set_max_func_stat_ctxs() to modify max stat_ctxs requested
or freed by the RDMA driver is wrong. After introducing reservation of
resources recently, the driver has to keep track of all stat_ctxs
including the ones used by the RDMA driver. This will provide a better
foundation for accurate accounting of the stat_ctxs.
Signed-off-by: Vasundhara Volam <vasundhara-v.volam@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
For bnxt_en driver, stat_ctxs created will always be same as
cp_nr_rings. Remove extra variable that duplicates the value.
Also introduce bnxt_get_avail_stat_ctxs_for_en() helper to get
available stat_ctxs and bnxt_get_ulp_stat_ctxs() helper to return
number of stat_ctxs used by RDMA.
Signed-off-by: Vasundhara Volam <vasundhara-v.volam@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The available CP rings are calculated differently on the new 57500
chips, so add this helper to do this calculation correctly. The
VFs will be assigned these available CP rings.
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The PF has a pool of NQs and MSIX vectors assigned to it based on
NVRAM configurations. The number of usable MSIX vectors on the PF
is the minimum of the NQs and MSIX vectors. Any excess NQs without
associated MSIX may be used for the VFs, so we need to store this
max_nqs value. max_nqs minus the NQs used by the PF will be the
available NQs for the VFs.
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The fp_mode_switching field in struct mm_context_t was left unused by
commit 8c8d953c2800 ("MIPS: Schedule on CPUs we need to lose FPU for a
mode switch") in v4.19, with nothing modifying its value & nothing
waiting on it having any particular value after that commit. Remove the
unused field & the one remaining reference to it.
Signed-off-by: Paul Burton <paul.burton@mips.com>
|
|
Handling exceptions for direct UDP encapsulation in GUE (that is,
UDP-in-UDP) leads to unbounded recursion in the GUE exception handler,
syzbot reported.
While draft-ietf-intarea-gue-06 doesn't explicitly forbid direct
encapsulation of UDP in GUE, it probably doesn't make sense to set up GUE
this way, and it's currently not even possible to configure this.
Skip exception handling if the GUE proto/ctype field is set to the UDP
protocol number. Should we need to handle exceptions for UDP-in-GUE one
day, we might need to either explicitly set a bound for recursion, or
implement a special iterative handling for these cases.
Reported-and-tested-by: syzbot+43f6755d1c2e62743468@syzkaller.appspotmail.com
Fixes: b8a51b38e4d4 ("fou, fou6: ICMP error handlers for FoU and GUE")
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The queue mapping of type poll only exists when set->map[HCTX_TYPE_POLL].nr_queues
is bigger than zero, so enhance the constraint by checking .nr_queues of type poll
before enabling IO poll.
Otherwise IO race & timeout can be observed when running block/007.
Cc: Jeff Moyer <jmoyer@redhat.com>
Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
There's a single user of this function, dm, and dm just wants
to check if IO is inflight, not that it's just allocated.
This fixes a hang with srp/002 in blktests with dm, where it tries
to suspend but waits for inflight IO to finish first. As it checks
for just allocated requests, this fails.
Tested-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Start the policy_tokens and the associated enumeration from zero,
simplifying the pt macro.
Signed-off-by: Mimi Zohar <zohar@linux.ibm.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
The code uses a bitmap to check for duplicate tokens during parsing, and
that doesn't work at all for the negative Opt_err token case.
There is absolutely no reason to make Opt_err be negative, and in fact
it only confuses things, since some of the affected functions actually
return a positive Opt_xyz enum _or_ a regular negative error code (eg
-EINVAL), and using -1 for Opt_err makes no sense.
There are similar problems in ima_policy.c and key encryption, but they
don't have the immediate bug wrt bitmap handing, and ima_policy.c in
particular needs a different patch to make the enum values match the
token array index. Mimi is sending that separately.
Reported-by: syzbot+a22e0dc07567662c50bc@syzkaller.appspotmail.com
Reported-by: Eric Biggers <ebiggers@kernel.org>
Fixes: 5208cc83423d ("keys, trusted: fix: *do not* allow duplicate key options")
Fixes: 00d60fd3b932 ("KEYS: Provide keyctls to drive the new key type ops for asymmetric keys [ver #2]")
Cc: James Morris James Morris <jmorris@namei.org>
Cc: Mimi Zohar <zohar@linux.vnet.ibm.com>
Cc: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
Cc: Peter Huewe <peterhuewe@gmx.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
If same destination IP address config is already existing, that config is
just used. MAC address also should be same.
However, there is no MAC address checking routine.
So that MAC address checking routine is added.
test commands:
%iptables -A INPUT -p tcp -i lo -d 192.168.0.5 --dport 80 \
-j CLUSTERIP --new --hashmode sourceip \
--clustermac 01:00:5e:00:00:20 --total-nodes 2 --local-node 1
%iptables -A INPUT -p tcp -i lo -d 192.168.0.5 --dport 80 \
-j CLUSTERIP --new --hashmode sourceip \
--clustermac 01:00:5e:00:00:21 --total-nodes 2 --local-node 1
After this patch, above commands are disallowed.
Signed-off-by: Taehee Yoo <ap420073@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
clusterip_config_entry_put()
A proc_remove() can sleep. so that it can't be inside of spin_lock.
Hence proc_remove() is moved to outside of spin_lock. and it also
adds mutex to sync create and remove of proc entry(config->pde).
test commands:
SHELL#1
%while :; do iptables -A INPUT -p udp -i enp2s0 -d 192.168.1.100 \
--dport 9000 -j CLUSTERIP --new --hashmode sourceip \
--clustermac 01:00:5e:00:00:21 --total-nodes 3 --local-node 3; \
iptables -F; done
SHELL#2
%while :; do echo +1 > /proc/net/ipt_CLUSTERIP/192.168.1.100; \
echo -1 > /proc/net/ipt_CLUSTERIP/192.168.1.100; done
[ 2949.569864] BUG: sleeping function called from invalid context at kernel/sched/completion.c:99
[ 2949.579944] in_atomic(): 1, irqs_disabled(): 0, pid: 5472, name: iptables
[ 2949.587920] 1 lock held by iptables/5472:
[ 2949.592711] #0: 000000008f0ebcf2 (&(&cn->lock)->rlock){+...}, at: refcount_dec_and_lock+0x24/0x50
[ 2949.603307] CPU: 1 PID: 5472 Comm: iptables Tainted: G W 4.19.0-rc5+ #16
[ 2949.604212] Hardware name: To be filled by O.E.M. To be filled by O.E.M./Aptio CRB, BIOS 5.6.5 07/08/2015
[ 2949.604212] Call Trace:
[ 2949.604212] dump_stack+0xc9/0x16b
[ 2949.604212] ? show_regs_print_info+0x5/0x5
[ 2949.604212] ___might_sleep+0x2eb/0x420
[ 2949.604212] ? set_rq_offline.part.87+0x140/0x140
[ 2949.604212] ? _rcu_barrier_trace+0x400/0x400
[ 2949.604212] wait_for_completion+0x94/0x710
[ 2949.604212] ? wait_for_completion_interruptible+0x780/0x780
[ 2949.604212] ? __kernel_text_address+0xe/0x30
[ 2949.604212] ? __lockdep_init_map+0x10e/0x5c0
[ 2949.604212] ? __lockdep_init_map+0x10e/0x5c0
[ 2949.604212] ? __init_waitqueue_head+0x86/0x130
[ 2949.604212] ? init_wait_entry+0x1a0/0x1a0
[ 2949.604212] proc_entry_rundown+0x208/0x270
[ 2949.604212] ? proc_reg_get_unmapped_area+0x370/0x370
[ 2949.604212] ? __lock_acquire+0x4500/0x4500
[ 2949.604212] ? complete+0x18/0x70
[ 2949.604212] remove_proc_subtree+0x143/0x2a0
[ 2949.708655] ? remove_proc_entry+0x390/0x390
[ 2949.708655] clusterip_tg_destroy+0x27a/0x630 [ipt_CLUSTERIP]
[ ... ]
Fixes: b3e456fce9f5 ("netfilter: ipt_CLUSTERIP: fix a race condition of proc file creation")
Signed-off-by: Taehee Yoo <ap420073@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
When network namespace is destroyed, both clusterip_tg_destroy() and
clusterip_net_exit() are called. and clusterip_net_exit() is called
before clusterip_tg_destroy().
Hence cleanup check code in clusterip_net_exit() doesn't make sense.
test commands:
%ip netns add vm1
%ip netns exec vm1 bash
%ip link set lo up
%iptables -A INPUT -p tcp -i lo -d 192.168.0.5 --dport 80 \
-j CLUSTERIP --new --hashmode sourceip \
--clustermac 01:00:5e:00:00:20 --total-nodes 2 --local-node 1
%exit
%ip netns del vm1
splat looks like:
[ 341.184508] WARNING: CPU: 1 PID: 87 at net/ipv4/netfilter/ipt_CLUSTERIP.c:840 clusterip_net_exit+0x319/0x380 [ipt_CLUSTERIP]
[ 341.184850] Modules linked in: ipt_CLUSTERIP nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 xt_tcpudp iptable_filter bpfilter ip_tables x_tables
[ 341.184850] CPU: 1 PID: 87 Comm: kworker/u4:2 Not tainted 4.19.0-rc5+ #16
[ 341.227509] Workqueue: netns cleanup_net
[ 341.227509] RIP: 0010:clusterip_net_exit+0x319/0x380 [ipt_CLUSTERIP]
[ 341.227509] Code: 0f 85 7f fe ff ff 48 c7 c2 80 64 2c c0 be a8 02 00 00 48 c7 c7 a0 63 2c c0 c6 05 18 6e 00 00 01 e8 bc 38 ff f5 e9 5b fe ff ff <0f> 0b e9 33 ff ff ff e8 4b 90 50 f6 e9 2d fe ff ff 48 89 df e8 de
[ 341.227509] RSP: 0018:ffff88011086f408 EFLAGS: 00010202
[ 341.227509] RAX: dffffc0000000000 RBX: 1ffff1002210de85 RCX: 0000000000000000
[ 341.227509] RDX: 1ffff1002210de85 RSI: ffff880110813be8 RDI: ffffed002210de58
[ 341.227509] RBP: ffff88011086f4d0 R08: 0000000000000000 R09: 0000000000000000
[ 341.227509] R10: 0000000000000000 R11: 0000000000000000 R12: 1ffff1002210de81
[ 341.227509] R13: ffff880110625a48 R14: ffff880114cec8c8 R15: 0000000000000014
[ 341.227509] FS: 0000000000000000(0000) GS:ffff880116600000(0000) knlGS:0000000000000000
[ 341.227509] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 341.227509] CR2: 00007f11fd38e000 CR3: 000000013ca16000 CR4: 00000000001006e0
[ 341.227509] Call Trace:
[ 341.227509] ? __clusterip_config_find+0x460/0x460 [ipt_CLUSTERIP]
[ 341.227509] ? default_device_exit+0x1ca/0x270
[ 341.227509] ? remove_proc_entry+0x1cd/0x390
[ 341.227509] ? dev_change_net_namespace+0xd00/0xd00
[ 341.227509] ? __init_waitqueue_head+0x130/0x130
[ 341.227509] ops_exit_list.isra.10+0x94/0x140
[ 341.227509] cleanup_net+0x45b/0x900
[ ... ]
Fixes: 613d0776d3fe ("netfilter: exit_net cleanup check added")
Signed-off-by: Taehee Yoo <ap420073@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
When network namespace is destroyed, cleanup_net() is called.
cleanup_net() holds pernet_ops_rwsem then calls each ->exit callback.
So that clusterip_tg_destroy() is called by cleanup_net().
And clusterip_tg_destroy() calls unregister_netdevice_notifier().
But both cleanup_net() and clusterip_tg_destroy() hold same
lock(pernet_ops_rwsem). hence deadlock occurrs.
After this patch, only 1 notifier is registered when module is inserted.
And all of configs are added to per-net list.
test commands:
%ip netns add vm1
%ip netns exec vm1 bash
%ip link set lo up
%iptables -A INPUT -p tcp -i lo -d 192.168.0.5 --dport 80 \
-j CLUSTERIP --new --hashmode sourceip \
--clustermac 01:00:5e:00:00:20 --total-nodes 2 --local-node 1
%exit
%ip netns del vm1
splat looks like:
[ 341.809674] ============================================
[ 341.809674] WARNING: possible recursive locking detected
[ 341.809674] 4.19.0-rc5+ #16 Tainted: G W
[ 341.809674] --------------------------------------------
[ 341.809674] kworker/u4:2/87 is trying to acquire lock:
[ 341.809674] 000000005da2d519 (pernet_ops_rwsem){++++}, at: unregister_netdevice_notifier+0x8c/0x460
[ 341.809674]
[ 341.809674] but task is already holding lock:
[ 341.809674] 000000005da2d519 (pernet_ops_rwsem){++++}, at: cleanup_net+0x119/0x900
[ 341.809674]
[ 341.809674] other info that might help us debug this:
[ 341.809674] Possible unsafe locking scenario:
[ 341.809674]
[ 341.809674] CPU0
[ 341.809674] ----
[ 341.809674] lock(pernet_ops_rwsem);
[ 341.809674] lock(pernet_ops_rwsem);
[ 341.809674]
[ 341.809674] *** DEADLOCK ***
[ 341.809674]
[ 341.809674] May be due to missing lock nesting notation
[ 341.809674]
[ 341.809674] 3 locks held by kworker/u4:2/87:
[ 341.809674] #0: 00000000d9df6c92 ((wq_completion)"%s""netns"){+.+.}, at: process_one_work+0xafe/0x1de0
[ 341.809674] #1: 00000000c2cbcee2 (net_cleanup_work){+.+.}, at: process_one_work+0xb60/0x1de0
[ 341.809674] #2: 000000005da2d519 (pernet_ops_rwsem){++++}, at: cleanup_net+0x119/0x900
[ 341.809674]
[ 341.809674] stack backtrace:
[ 341.809674] CPU: 1 PID: 87 Comm: kworker/u4:2 Tainted: G W 4.19.0-rc5+ #16
[ 341.809674] Workqueue: netns cleanup_net
[ 341.809674] Call Trace:
[ ... ]
[ 342.070196] down_write+0x93/0x160
[ 342.070196] ? unregister_netdevice_notifier+0x8c/0x460
[ 342.070196] ? down_read+0x1e0/0x1e0
[ 342.070196] ? sched_clock_cpu+0x126/0x170
[ 342.070196] ? find_held_lock+0x39/0x1c0
[ 342.070196] unregister_netdevice_notifier+0x8c/0x460
[ 342.070196] ? register_netdevice_notifier+0x790/0x790
[ 342.070196] ? __local_bh_enable_ip+0xe9/0x1b0
[ 342.070196] ? __local_bh_enable_ip+0xe9/0x1b0
[ 342.070196] ? clusterip_tg_destroy+0x372/0x650 [ipt_CLUSTERIP]
[ 342.070196] ? trace_hardirqs_on+0x93/0x210
[ 342.070196] ? __bpf_trace_preemptirq_template+0x10/0x10
[ 342.070196] ? clusterip_tg_destroy+0x372/0x650 [ipt_CLUSTERIP]
[ 342.123094] clusterip_tg_destroy+0x3ad/0x650 [ipt_CLUSTERIP]
[ 342.123094] ? clusterip_net_init+0x3d0/0x3d0 [ipt_CLUSTERIP]
[ 342.123094] ? cleanup_match+0x17d/0x200 [ip_tables]
[ 342.123094] ? xt_unregister_table+0x215/0x300 [x_tables]
[ 342.123094] ? kfree+0xe2/0x2a0
[ 342.123094] cleanup_entry+0x1d5/0x2f0 [ip_tables]
[ 342.123094] ? cleanup_match+0x200/0x200 [ip_tables]
[ 342.123094] __ipt_unregister_table+0x9b/0x1a0 [ip_tables]
[ 342.123094] iptable_filter_net_exit+0x43/0x80 [iptable_filter]
[ 342.123094] ops_exit_list.isra.10+0x94/0x140
[ 342.123094] cleanup_net+0x45b/0x900
[ ... ]
Fixes: 202f59afd441 ("netfilter: ipt_CLUSTERIP: do not hold dev")
Signed-off-by: Taehee Yoo <ap420073@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
This patch fixes a memory leak in libbpf by freeing up line_info
member of struct bpf_program while unloading a program.
Fixes: 3d65014146c6 ("bpf: libbpf: Add btf_line_info support to libbpf")
Signed-off-by: Prashant Bhole <bhole_prashant_q7@lab.ntt.co.jp>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
|
|
Yonghong Song says:
====================
Commit 69b693f0aefa ("bpf: btf: Introduce BPF Type Format (BTF)")
introduced BTF, a debug info format for BTF.
The original design has a couple of issues though.
First, the bitfield size is only encoded in int type.
If the struct member bitfield type is enum, pahole ([1])
or llvm is forced to replace enum with int type. As a result, the original
type information gets lost.
Second, the original BTF design does not envision the possibility of
BTF=>header_file conversion ([2]), hence does not encode "struct" or
"union" info for a forward type. Such information is necessary to
convert BTF to a header file.
This patch set fixed the issue by introducing kind_flag, using one bit
in type->info. When kind_flag, the struct/union btf_member->offset
will encode both bitfield_size and bit_offset, covering both
int and enum base types. The kind_flag is also used to indicate whether
the forward type is a union (when set) or a struct.
Patch #1 refactors function btf_int_bits_seq_show() so Patch #2
can reuse part of the function.
Patch #2 implemented kind_flag support for struct/union/fwd types.
Patch #3 added kind_flag support for cgroup local storage map pretty print.
Patch #4 syncs kernel uapi btf.h to tools directory.
Patch #5 added unit tests for kind_flag.
Patch #6 added tests for kernel bpffs based pretty print with kind_flag.
Patch #7 refactors function btf_dumper_int_bits() so Patch #8
can reuse part of the function.
Patch #8 added bpftool support of pretty print with kind_flag set.
[1] https://git.kernel.org/pub/scm/devel/pahole/pahole.git/commit/?id=b18354f64cc215368c3bc0df4a7e5341c55c378c
[2] https://lwn.net/SubscriberLink/773198/fe3074838f5c3f26/
Change logs:
v2 -> v3:
. Relocated comments about bitfield_size/bit_offset interpretation
of the "offset" field right before the "offset" struct member.
. Added missing byte alignment checking for non-bitfield enum
member of a struct with kind_flag set.
. Added two test cases in unit tests for struct type, kind_flag set,
non-bitfield int/enum member, not-byte aligned bit offsets.
. Added comments to help understand there is no overflow for
total_bits_offset in bpftool function btf_dumper_int_bits().
. Added explanation of typedef type dumping fix in Patch #8 commit
message.
v1 -> v2:
. If kind_flag is set for a structure, ensure an int member,
whether it is a bitfield or not, is a regular int type.
. Added support so cgroup local storage map pretty print
works with kind_flag.
====================
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
|
|
The following example shows map pretty print with structures
which include bitfield members.
enum A { A1, A2, A3, A4, A5 };
typedef enum A ___A;
struct tmp_t {
char a1:4;
int a2:4;
int :4;
__u32 a3:4;
int b;
___A b1:4;
enum A b2:4;
};
struct bpf_map_def SEC("maps") tmpmap = {
.type = BPF_MAP_TYPE_ARRAY,
.key_size = sizeof(__u32),
.value_size = sizeof(struct tmp_t),
.max_entries = 1,
};
BPF_ANNOTATE_KV_PAIR(tmpmap, int, struct tmp_t);
and the following map update in the bpf program:
key = 0;
struct tmp_t t = {};
t.a1 = 2;
t.a2 = 4;
t.a3 = 6;
t.b = 7;
t.b1 = 8;
t.b2 = 10;
bpf_map_update_elem(&tmpmap, &key, &t, 0);
With this patch, I am able to print out the map values
correctly with this patch:
bpftool map dump id 187
[{
"key": 0,
"value": {
"a1": 0x2,
"a2": 0x4,
"a3": 0x6,
"b": 7,
"b1": 0x8,
"b2": 0xa
}
}
]
Previously, if a function prototype argument has a typedef
type, the prototype is not printed since
function __btf_dumper_type_only() bailed out with error
if the type is a typedef. This commit corrected this
behavior by printing out typedef properly.
The following example shows forward type and
typedef type can be properly printed in function prototype
with modified test_btf_haskv.c.
struct t;
union u;
__attribute__((noinline))
static int test_long_fname_1(struct dummy_tracepoint_args *arg,
struct t *p1, union u *p2,
__u32 unused)
...
int _dummy_tracepoint(struct dummy_tracepoint_args *arg) {
return test_long_fname_1(arg, 0, 0, 0);
}
$ bpftool p d xlated id 24
...
int test_long_fname_1(struct dummy_tracepoint_args * arg,
struct t * p1, union u * p2,
__u32 unused)
...
Acked-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
|
|
The core dump funcitonality in btf_dumper_int_bits() is
refactored into a separate function btf_dumper_bitfield()
which will be used by the next patch.
Acked-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
|
|
The new tests are added to test bpffs map pretty print in kernel with kind_flag
for structure type.
$ test_btf -p
......
BTF pretty print array(#1)......OK
BTF pretty print array(#2)......OK
PASS:8 SKIP:0 FAIL:0
Acked-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
|
|
This patch added unit tests for different types handling
type->info.kind_flag. The following new tests are added:
$ test_btf
...
BTF raw test[82] (invalid int kind_flag): OK
BTF raw test[83] (invalid ptr kind_flag): OK
BTF raw test[84] (invalid array kind_flag): OK
BTF raw test[85] (invalid enum kind_flag): OK
BTF raw test[86] (valid fwd kind_flag): OK
BTF raw test[87] (invalid typedef kind_flag): OK
BTF raw test[88] (invalid volatile kind_flag): OK
BTF raw test[89] (invalid const kind_flag): OK
BTF raw test[90] (invalid restrict kind_flag): OK
BTF raw test[91] (invalid func kind_flag): OK
BTF raw test[92] (invalid func_proto kind_flag): OK
BTF raw test[93] (valid struct kind_flag, bitfield_size = 0): OK
BTF raw test[94] (valid struct kind_flag, int member, bitfield_size != 0): OK
BTF raw test[95] (valid union kind_flag, int member, bitfield_size != 0): OK
BTF raw test[96] (valid struct kind_flag, enum member, bitfield_size != 0): OK
BTF raw test[97] (valid union kind_flag, enum member, bitfield_size != 0): OK
BTF raw test[98] (valid struct kind_flag, typedef member, bitfield_size != 0): OK
BTF raw test[99] (valid union kind_flag, typedef member, bitfield_size != 0): OK
BTF raw test[100] (invalid struct type, bitfield_size greater than struct size): OK
BTF raw test[101] (invalid struct type, kind_flag bitfield base_type int not regular): OK
BTF raw test[102] (invalid struct type, kind_flag base_type int not regular): OK
BTF raw test[103] (invalid union type, bitfield_size greater than struct size): OK
...
PASS:122 SKIP:0 FAIL:0
The second parameter name of macro
BTF_INFO_ENC(kind, root, vlen)
in selftests test_btf.c is also renamed from "root" to "kind_flag".
Note that before this patch "root" is not used and always 0.
Acked-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
|
|
Sync include/uapi/linux/btf.h to tools/include/uapi/linux/btf.h.
Acked-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
|
|
Commit 970289fc0a83 ("bpf: add bpffs pretty print for cgroup
local storage maps") added bpffs pretty print for cgroup
local storage maps. The commit worked for struct without kind_flag
set.
This patch refactored and made pretty print also work
with kind_flag set for the struct.
Acked-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
|
|
This patch fixed two issues with BTF. One is related to
struct/union bitfield encoding and the other is related to
forward type.
Issue #1 and solution:
======================
Current btf encoding of bitfield follows what pahole generates.
For each bitfield, pahole will duplicate the type chain and
put the bitfield size at the final int or enum type.
Since the BTF enum type cannot encode bit size,
pahole workarounds the issue by generating
an int type whenever the enum bit size is not 32.
For example,
-bash-4.4$ cat t.c
typedef int ___int;
enum A { A1, A2, A3 };
struct t {
int a[5];
___int b:4;
volatile enum A c:4;
} g;
-bash-4.4$ gcc -c -O2 -g t.c
The current kernel supports the following BTF encoding:
$ pahole -JV t.o
[1] TYPEDEF ___int type_id=2
[2] INT int size=4 bit_offset=0 nr_bits=32 encoding=SIGNED
[3] ENUM A size=4 vlen=3
A1 val=0
A2 val=1
A3 val=2
[4] STRUCT t size=24 vlen=3
a type_id=5 bits_offset=0
b type_id=9 bits_offset=160
c type_id=11 bits_offset=164
[5] ARRAY (anon) type_id=2 index_type_id=2 nr_elems=5
[6] INT sizetype size=8 bit_offset=0 nr_bits=64 encoding=(none)
[7] VOLATILE (anon) type_id=3
[8] INT int size=1 bit_offset=0 nr_bits=4 encoding=(none)
[9] TYPEDEF ___int type_id=8
[10] INT (anon) size=1 bit_offset=0 nr_bits=4 encoding=SIGNED
[11] VOLATILE (anon) type_id=10
Two issues are in the above:
. by changing enum type to int, we lost the original
type information and this will not be ideal later
when we try to convert BTF to a header file.
. the type duplication for bitfields will cause
BTF bloat. Duplicated types cannot be deduplicated
later if the bitfield size is different.
To fix this issue, this patch implemented a compatible
change for BTF struct type encoding:
. the bit 31 of struct_type->info, previously reserved,
now is used to indicate whether bitfield_size is
encoded in btf_member or not.
. if bit 31 of struct_type->info is set,
btf_member->offset will encode like:
bit 0 - 23: bit offset
bit 24 - 31: bitfield size
if bit 31 is not set, the old behavior is preserved:
bit 0 - 31: bit offset
So if the struct contains a bit field, the maximum bit offset
will be reduced to (2^24 - 1) instead of MAX_UINT. The maximum
bitfield size will be 256 which is enough for today as maximum
bitfield in compiler can be 128 where int128 type is supported.
This kernel patch intends to support the new BTF encoding:
$ pahole -JV t.o
[1] TYPEDEF ___int type_id=2
[2] INT int size=4 bit_offset=0 nr_bits=32 encoding=SIGNED
[3] ENUM A size=4 vlen=3
A1 val=0
A2 val=1
A3 val=2
[4] STRUCT t kind_flag=1 size=24 vlen=3
a type_id=5 bitfield_size=0 bits_offset=0
b type_id=1 bitfield_size=4 bits_offset=160
c type_id=7 bitfield_size=4 bits_offset=164
[5] ARRAY (anon) type_id=2 index_type_id=2 nr_elems=5
[6] INT sizetype size=8 bit_offset=0 nr_bits=64 encoding=(none)
[7] VOLATILE (anon) type_id=3
Issue #2 and solution:
======================
Current forward type in BTF does not specify whether the original
type is struct or union. This will not work for type pretty print
and BTF-to-header-file conversion as struct/union must be specified.
$ cat tt.c
struct t;
union u;
int foo(struct t *t, union u *u) { return 0; }
$ gcc -c -g -O2 tt.c
$ pahole -JV tt.o
[1] INT int size=4 bit_offset=0 nr_bits=32 encoding=SIGNED
[2] FWD t type_id=0
[3] PTR (anon) type_id=2
[4] FWD u type_id=0
[5] PTR (anon) type_id=4
To fix this issue, similar to issue #1, type->info bit 31
is used. If the bit is set, it is union type. Otherwise, it is
a struct type.
$ pahole -JV tt.o
[1] INT int size=4 bit_offset=0 nr_bits=32 encoding=SIGNED
[2] FWD t kind_flag=0 type_id=0
[3] PTR (anon) kind_flag=0 type_id=2
[4] FWD u kind_flag=1 type_id=0
[5] PTR (anon) kind_flag=0 type_id=4
Pahole/LLVM change:
===================
The new kind_flag functionality has been implemented in pahole
and llvm:
https://github.com/yonghong-song/pahole/tree/bitfield
https://github.com/yonghong-song/llvm/tree/bitfield
Note that pahole hasn't implemented func/func_proto kind
and .BTF.ext. So to print function signature with bpftool,
the llvm compiler should be used.
Fixes: 69b693f0aefa ("bpf: btf: Introduce BPF Type Format (BTF)")
Acked-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
|
|
Refactor function btf_int_bits_seq_show() by creating
function btf_bitfield_seq_show() which has no dependence
on btf and btf_type. The function btf_bitfield_seq_show()
will be in later patch to directly dump bitfield member values.
Acked-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
|
|
... and don't abuse mount_nodev(), while we are at it.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Reviewed-by: David Howells <dhowells@redhat.com>
|
|
According to datasheet, BAT_COMP field spans bits 5-7. The rest of the
code seems to assume this already.
Fixes: 4aeae9cb0dad ("power_supply: Add support for TI BQ25890 charger chip")
Signed-off-by: Michał Mirosław <mirq-linux@rere.qmqm.pl>
Signed-off-by: Sebastian Reichel <sebastian.reichel@collabora.com>
|
|
If just a table name was given, nf_tables_dump_rules() continued over
the list of tables even after a match was found. The simple fix is to
exit the loop if it reached the bottom and ctx->table was not NULL.
When iterating over the table's chains, the same problem as above
existed. But worse than that, if a chain name was given the hash table
wasn't used to find the corresponding chain. Fix this by introducing a
helper function iterating over a chain's rules (and taking care of the
cb->args handling), then introduce a shortcut to it if a chain name was
given.
Signed-off-by: Phil Sutter <phil@nwl.cc>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
Make my Gmail address the primary one from now on.
Signed-off-by: Vivien Didelot <vivien.didelot@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Each media stream negotiation between 2 SIP peers will trigger creation
of 4 different expectations (2 RTP and 2 RTCP):
- INVITE will create expectations for the media packets sent by the
called peer
- reply to the INVITE will create expectations for media packets sent
by the caller
The dport used by these expectations usually match the ones selected
by the SIP peers, but they might get translated due to conflicts with
another expectation. When such event occur, it is important to do
this translation in both directions, dport translation on the receiving
path and sport translation on the sending path.
This commit fixes the sport translation when the peer requiring it is
also the one that starts the media stream. In this scenario, first media
stream packet is forwarded from LAN to WAN and will rely on
nf_nat_sip_expected() to do the necessary sport translation. However, the
expectation matched by this packet does not contain the necessary information
for doing SNAT, this data being stored in the paired expectation created by
the sender's SIP message (INVITE or reply to it).
Signed-off-by: Alin Nastac <alin.nastac@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
Adopt the SPDX license identifier headers to ease license compliance
management.
Signed-off-by: Thierry Reding <treding@nvidia.com>
Signed-off-by: Wolfram Sang <wsa@the-dreams.de>
|
|
The failure to create debugfs entry is unpleasant event, but not enough
to abort drier initialization. Align the mlx5_core code to debugfs design
and continue execution whenever debugfs_create_dir() successes or not.
Fixes: e126ba97dba9 ("mlx5: Add driver for Mellanox Connect-IB adapters")
Reviewed-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
|
Not all fields were properly documented. Add kerneldoc for the missing
fields to prevent the build from flagging them.
Reported-by: Wolfram Sang <wsa@the-dreams.de>
Signed-off-by: Thierry Reding <treding@nvidia.com>
Signed-off-by: Wolfram Sang <wsa@the-dreams.de>
|
|
This removes the (now empty) nf_nat_l4proto struct, all its instances
and all the no longer needed runtime (un)register functionality.
nf_nat_need_gre() can be axed as well: the module that calls it (to
load the no-longer-existing nat_gre module) also calls other nat core
functions. GRE nat is now always available if kernel is built with it.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
This removes the last l4proto indirection, the two callers, the l3proto
packet mangling helpers for ipv4 and ipv6, now call the
nf_nat_l4proto_manip_pkt() helper.
nf_nat_proto_{dccp,tcp,sctp,gre,icmp,icmpv6} are left behind, even though
they contain no functionality anymore to not clutter this patch.
Next patch will remove the empty files and the nf_nat_l4proto
struct.
nf_nat_proto_udp.c is renamed to nf_nat_proto.c, as it now contains the
other nat manip functionality as well, not just udp and udplite.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
all protocols did set this to nf_nat_l4proto_nlattr_to_range, so
just call it directly.
The important difference is that we'll now also call it for
protocols that we don't support (i.e., nf_nat_proto_unknown did
not provide .nlattr_to_range).
However, there should be no harm, even icmp provided this callback.
If we don't implement a specific l4nat for this, nothing would make
use of this information, so adding a big switch/case construct listing
all supported l4protocols seems a bit pointless.
This change leaves a single function pointer in the l4proto struct.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
With exception of icmp, all of the l4 nat protocols set this to
nf_nat_l4proto_in_range.
Get rid of this and just check the l4proto in the caller.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
No need for indirections here, we only support ipv4 and ipv6
and the called functions are very small.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
fold remaining users (icmp, icmpv6, gre) into nf_nat_l4proto_unique_tuple.
The static-save of old incarnation of resolved key in gre and icmp is
removed as well, just use the prandom based offset like the others.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
almost all l4proto->unique_tuple implementations just call this helper,
so make ->unique_tuple() optional and call its helper directly if the
l4proto doesn't override it.
This is an intermediate step to get rid of ->unique_tuple completely.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
Some of the kerneldoc uses a strange spelling for abbreviations. Turn
them into all-uppercase and clean up some whitespace issues while at it.
Signed-off-by: Thierry Reding <treding@nvidia.com>
Signed-off-by: Wolfram Sang <wsa@the-dreams.de>
|
|
Historically this was net_random() based, and was then converted to
a hash based algorithm (private boot seed + hash of endpoint addresses)
due to concerns of leaking net_random() bits.
RANDOM_FULLY mode was added later to avoid problems with hash
based mode (see commit 34ce324019e76,
"netfilter: nf_nat: add full port randomization support" for details).
Just make prandom_u32() the default search starting point and get rid of
->secure_port() altogether.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
These parameters aren't used now.
So remove them.
Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
Acked-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
In case almost or all available ports are taken, clash resolution can
take a very long time, resulting in soft lockup.
This can happen when many to-be-natted hosts connect to same
destination:port (e.g. a proxy) and all connections pass the same SNAT.
Pick a random offset in the acceptable range, then try ever smaller
number of adjacent port numbers, until either the limit is reached or a
useable port was found. This results in at most 248 attempts
(128 + 64 + 32 + 16 + 8, i.e. 4 restarts with new search offset)
instead of 64000+,
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
The device tree bindings for the MMC card detect and
write protect lines specify that these should be active
low unless "cd-inverted" or "wp-inverted" has been
specified.
However that is not how the kernel code has worked. It
has always respected the flags passed to the phandle in
the device tree, but respected the "cd-inverted" and
"wp-inverted" flags such that if those are set, the
polarity will be the inverse of that specified in the
device tree.
Switch to behaving like the old code did and fix the
regression.
Fixes: 81c85ec15a19 ("gpio: OF: Parse MMC-specific CD and WP properties")
Cc: Bartosz Golaszewski <bgolaszewski@baylibre.com>
Cc: Guenter Roeck <linux@roeck-us.net>
Reported-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
|
|
Since a pseudo-random starting point is used in finding a port in
the default case, that 'else if' branch above is no longer a necessity.
So remove it to simplify code.
Signed-off-by: Xiaozhou Liu <liuxiaozhou@bytedance.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
In order to comply with SMBus specification, the Axxia I²C module will
abort the multi message transfer if the delay between finishing sending
one message and starting another is longer than 25ms. Unfortunately it
isn't that hard to trigger this situation on a busy system. In order to
fix this problem, we should make sure hardware does whole transaction
without waiting for software to fill some data.
Fortunately, in addition to Manual mode that is currently used by the
driver to perform I²C transfers, the module supports also so called
Sequence mode. In this mode, the module automatically performs
predefined sequence of operations - it sends a slave address, transmits
specified number of bytes from the FIFO, changes transfer direction,
resends the slave address and then reads specified number of bytes to
FIFO. While very inflexible, this does fit a most common case of multi
message transfer - the one where you first write a register number you
want to read and then read it.
To use this mode effectively, a number of conditions must be met to
ensure the transaction does fit the predefined sequence. In case this is
not the case, a fallback to manual mode is used.
The initialization of this mode is very similar to Manual mode. The most
notable difference is different bit in the Master Interrupt Status
designating finishing of transaction. Also some of the errors, like TSS,
cannot happen in this mode.
While it is possible to support transactions requesting a read of any
size (RFL interrupt will be generated when FIFO size is not enough) the
TFL interrupt is not available in this mode, thus the write part of the
transaction cannot exceed FIFO_SIZE (8).
Note that in case of a NAK during transaction, the NA/ND status bits
will be set before STOP command is generated, triggering an interrupt
while the controller is still busy. Current solution for this problem is
to actively wait for this command to stop before leaving xfer callback.
Signed-off-by: Krzysztof Adamski <krzysztof.adamski@nokia.com>
Reviewed-by: Alexander Sverdlin <alexander.sverdlin@nokia.com>
[wsa: added braces around else branch spotted by checkpatch]
Signed-off-by: Wolfram Sang <wsa@the-dreams.de>
|