Age | Commit message (Collapse) | Author |
|
git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/selinux
Pull selinux updates from Paul Moore:
"We've got a number of SELinux patches queued up, the highlights are:
- Fixup the security_fs_context_parse_param() LSM hook so it executes
all of the LSM hook implementations unless a serious error occurs.
We also correct the SELinux hook implementation so that it returns
zero on success.
- In addition to a few SELinux mount option parsing fixes, we
simplified the parsing by moving it earlier in the process.
The logic was that it was unlikely an admin/user would use the new
mount API and not have the policy loaded before passing the SELinux
options.
- Properly fixed the LSM/SELinux/SCTP hooks with the addition of the
security_sctp_assoc_established() hook.
This work was done in conjunction with the netdev folks and should
complete the move of the SCTP labeling from the endpoints to the
associations.
- Fixed a variety of sparse warnings caused by changes in the "__rcu"
markings of some core kernel structures.
- Ensure we access the superblock's LSM security blob using the
stacking-safe accessors.
- Added the ability for the kernel to always allow FIOCLEX and
FIONCLEX if the "ioctl_skip_cloexec" policy capability is
specified.
- Various constifications improvements, type casting improvements,
additional return value checks, and dead code/parameter removal.
- Documentation fixes"
* tag 'selinux-pr-20220321' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/selinux: (23 commits)
selinux: shorten the policy capability enum names
docs: fix 'make htmldocs' warning in SCTP.rst
selinux: allow FIOCLEX and FIONCLEX with policy capability
selinux: use correct type for context length
selinux: drop return statement at end of void functions
security: implement sctp_assoc_established hook in selinux
security: add sctp_assoc_established hook
selinux: parse contexts for mount options early
selinux: various sparse fixes
selinux: try to use preparsed sid before calling parse_sid()
selinux: Fix selinux_sb_mnt_opts_compat()
LSM: general protection fault in legacy_parse_param
selinux: fix a type cast problem in cred_init_security()
selinux: drop unused macro
selinux: simplify cred_init_security
selinux: do not discard const qualifier in cast
selinux: drop unused parameter of avtab_insert_node
selinux: drop cast to same type
selinux: enclose macro arguments in parenthesis
selinux: declare name parameter of hash_eval const
...
|
|
It is known that priority setting HW offload when set tls TX/RX offload
by setsockopt(). Check netdevice whether support NETIF_F_HW_TLS_TX or
not at the later stages in the whole tls_set_device_offload() process,
some memory allocations have been done before that. We must release those
memory and return error if we judge the netdevice not support
NETIF_F_HW_TLS_TX. It is redundant.
Move NETIF_F_HW_TLS_TX judgement forward, and move start_marker_record
and offload_ctx memory allocation back slightly. Thus, we can get
simpler exception handling process.
Signed-off-by: Ziyang Xuan <william.xuanziyang@huawei.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Avoid using "goto" jump instruction unconditionally when we
can return directly. Remove unnecessary jump instructions in
do_tls_setsockopt_conf().
Signed-off-by: Ziyang Xuan <william.xuanziyang@huawei.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
tp->rx_opt.mss_clamp is not populated, yet, during TFO send so we
rise it to the local MSS. tp->mss_cache is not updated, however:
tcp_v6_connect():
tp->rx_opt.mss_clamp = IPV6_MIN_MTU - headers;
tcp_connect():
tcp_connect_init():
tp->mss_cache = min(mtu, tp->rx_opt.mss_clamp)
tcp_send_syn_data():
tp->rx_opt.mss_clamp = tp->advmss
After recent fixes to ICMPv6 PTB handling we started dropping
PMTU updates higher than tp->mss_cache. Because of the stale
tp->mss_cache value PMTU updates during TFO are always dropped.
Thanks to Wei for helping zero in on the problem and the fix!
Fixes: c7bb4b89033b ("ipv6: tcp: drop silly ICMPv6 packet too big messages")
Reported-by: Andre Nash <alnash@fb.com>
Reported-by: Neil Spring <ntspring@fb.com>
Reviewed-by: Wei Wang <weiwan@google.com>
Acked-by: Yuchung Cheng <ycheng@google.com>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://lore.kernel.org/r/20220321165957.1769954-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
The lockdep annotation lockdep_assert_softirq_will_run() expects that
either hard or soft interrupts are disabled because both guaranty that
the "raised" soft-interrupts will be processed once the context is left.
This triggers in flush_smp_call_function_from_idle() but it this case it
explicitly calls do_softirq() in case of pending softirqs.
Revert the "softirq will run" annotation in ____napi_schedule() and move
the check back to __netif_rx() as it was. Keep the IRQ-off assert in
____napi_schedule() because this is always required.
Fixes: fbd9a2ceba5c7 ("net: Add lockdep asserts to ____napi_schedule().")
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Reviewed-by: Jason A. Donenfeld <Jason@zx2c4.com>
Link: https://lore.kernel.org/r/YjhD3ZKWysyw8rc6@linutronix.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Make the devlink core hold the instance lock during eswitch_mode
callbacks. Cheat in case of mlx5 (see the cover letter).
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
We'll need an explicitly locked rate node API for netdevsim
to switch eswitch mode setting to locked.
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Pablo Neira Ayuso says:
====================
Netfilter updates for net-next
The following patchset contains Netfilter updates for net-next.
This patchset contains updates for the nf_tables register tracking
infrastructure, disable bogus warning when attaching ct helpers,
one namespace pollution fix and few cleanups for the flowtable.
1) Revisit conntrack gc routine to reduce chances of overruning
the netlink buffer from the event path. From Florian Westphal.
2) Disable warning on explicit ct helper assignment, from Phil Sutter.
3) Read-only expressions do not update registers, mark them as
NFT_REDUCE_READONLY. Add helper functions to update the register
tracking information. This patch re-enables the register tracking
infrastructure.
4) Cancel register tracking in case an expression fully/partially
clobbers existing data.
5) Add register tracking support for remaining expressions: ct,
lookup, meta, numgen, osf, hash, immediate, socket, xfrm, tunnel,
fib, exthdr.
6) Rename init and exit functions for the conntrack h323 helper,
from Randy Dunlap.
7) Remove redundant field in struct flow_offload_work.
8) Update nf_flow_table_iterate() to pass flowtable to callback.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
In calipso_map_cat_ntoh(), in the for loop, if the return value of
netlbl_bitmap_walk() is equal to (net_clen_bits - 1), when
netlbl_bitmap_walk() is called next time, out-of-bounds memory accesses
of bitmap[byte_offset] occurs.
The bug was found during fuzzing. The following is the fuzzing report
BUG: KASAN: slab-out-of-bounds in netlbl_bitmap_walk+0x3c/0xd0
Read of size 1 at addr ffffff8107bf6f70 by task err_OH/252
CPU: 7 PID: 252 Comm: err_OH Not tainted 5.17.0-rc7+ #17
Hardware name: linux,dummy-virt (DT)
Call trace:
dump_backtrace+0x21c/0x230
show_stack+0x1c/0x60
dump_stack_lvl+0x64/0x7c
print_address_description.constprop.0+0x70/0x2d0
__kasan_report+0x158/0x16c
kasan_report+0x74/0x120
__asan_load1+0x80/0xa0
netlbl_bitmap_walk+0x3c/0xd0
calipso_opt_getattr+0x1a8/0x230
calipso_sock_getattr+0x218/0x340
calipso_sock_getattr+0x44/0x60
netlbl_sock_getattr+0x44/0x80
selinux_netlbl_socket_setsockopt+0x138/0x170
selinux_socket_setsockopt+0x4c/0x60
security_socket_setsockopt+0x4c/0x90
__sys_setsockopt+0xbc/0x2b0
__arm64_sys_setsockopt+0x6c/0x84
invoke_syscall+0x64/0x190
el0_svc_common.constprop.0+0x88/0x200
do_el0_svc+0x88/0xa0
el0_svc+0x128/0x1b0
el0t_64_sync_handler+0x9c/0x120
el0t_64_sync+0x16c/0x170
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: Wang Yufen <wangyufen@huawei.com>
Acked-by: Paul Moore <paul@paul-moore.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The previous commit 7ec02f5ac8a5 ("ax25: fix NPD bug in ax25_disconnect")
move ax25_disconnect into lock_sock() in order to prevent NPD bugs. But
there are race conditions that may lead to null pointer dereferences in
ax25_heartbeat_expiry(), ax25_t1timer_expiry(), ax25_t2timer_expiry(),
ax25_t3timer_expiry() and ax25_idletimer_expiry(), when we use
ax25_kill_by_device() to detach the ax25 device.
One of the race conditions that cause null pointer dereferences can be
shown as below:
(Thread 1) | (Thread 2)
ax25_connect() |
ax25_std_establish_data_link() |
ax25_start_t1timer() |
mod_timer(&ax25->t1timer,..) |
| ax25_kill_by_device()
(wait a time) | ...
| s->ax25_dev = NULL; //(1)
ax25_t1timer_expiry() |
ax25->ax25_dev->values[..] //(2)| ...
... |
We set null to ax25_cb->ax25_dev in position (1) and dereference
the null pointer in position (2).
The corresponding fail log is shown below:
===============================================================
BUG: kernel NULL pointer dereference, address: 0000000000000050
CPU: 1 PID: 0 Comm: swapper/1 Not tainted 5.17.0-rc6-00794-g45690b7d0
RIP: 0010:ax25_t1timer_expiry+0x12/0x40
...
Call Trace:
call_timer_fn+0x21/0x120
__run_timers.part.0+0x1ca/0x250
run_timer_softirq+0x2c/0x60
__do_softirq+0xef/0x2f3
irq_exit_rcu+0xb6/0x100
sysvec_apic_timer_interrupt+0xa2/0xd0
...
This patch moves ax25_disconnect() before s->ax25_dev = NULL
and uses del_timer_sync() to delete timers in ax25_disconnect().
If ax25_disconnect() is called by ax25_kill_by_device() or
ax25->ax25_dev is NULL, the reason in ax25_disconnect() will be
equal to ENETUNREACH, it will wait all timers to stop before we
set null to s->ax25_dev in ax25_kill_by_device().
Fixes: 7ec02f5ac8a5 ("ax25: fix NPD bug in ax25_disconnect")
Signed-off-by: Duoming Zhou <duoming@zju.edu.cn>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The previous commit d01ffb9eee4a ("ax25: add refcount in ax25_dev to
avoid UAF bugs") and commit feef318c855a ("ax25: fix UAF bugs of
net_device caused by rebinding operation") increase the refcounts of
ax25_dev and net_device in ax25_bind() and decrease the matching refcounts
in ax25_kill_by_device() in order to prevent UAF bugs, but there are
reference count leaks.
The root cause of refcount leaks is shown below:
(Thread 1) | (Thread 2)
ax25_bind() |
... |
ax25_addr_ax25dev() |
ax25_dev_hold() //(1) |
... |
dev_hold_track() //(2) |
... | ax25_destroy_socket()
| ax25_cb_del()
| ...
| hlist_del_init() //(3)
|
|
(Thread 3) |
ax25_kill_by_device() |
... |
ax25_for_each(s, &ax25_list) { |
if (s->ax25_dev == ax25_dev) //(4) |
... |
Firstly, we use ax25_bind() to increase the refcount of ax25_dev in
position (1) and increase the refcount of net_device in position (2).
Then, we use ax25_cb_del() invoked by ax25_destroy_socket() to delete
ax25_cb in hlist in position (3) before calling ax25_kill_by_device().
Finally, the decrements of refcounts in ax25_kill_by_device() will not
be executed, because no s->ax25_dev equals to ax25_dev in position (4).
This patch adds decrements of refcounts in ax25_release() and use
lock_sock() to do synchronization. If refcounts decrease in ax25_release(),
the decrements of refcounts in ax25_kill_by_device() will not be
executed and vice versa.
Fixes: d01ffb9eee4a ("ax25: add refcount in ax25_dev to avoid UAF bugs")
Fixes: 87563a043cef ("ax25: fix reference count leaks of ax25_dev")
Fixes: feef318c855a ("ax25: fix UAF bugs of net_device caused by rebinding operation")
Reported-by: Thomas Osterried <thomas@osterried.de>
Signed-off-by: Duoming Zhou <duoming@zju.edu.cn>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
When CONFIG_DEBUG_INFO_BTF is disabled, bpf_get_btf_vmlinux can return a
NULL pointer. Check for it in btf_get_module_btf to prevent a NULL pointer
dereference.
While kernel test robot only complained about this specific case, let's
also check for NULL in other call sites of bpf_get_btf_vmlinux.
Fixes: 9492450fd287 ("bpf: Always raise reference in btf_get_module_btf")
Reported-by: kernel test robot <oliver.sang@intel.com>
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20220320143003.589540-1-memxor@gmail.com
|
|
In commit 9a69e2b385f4 ("bpf: Make remote_port field in struct
bpf_sk_lookup 16-bit wide") the remote_port field has been split up and
re-declared from u32 to be16.
However, the accompanying changes to the context access converter have not
been well thought through when it comes big-endian platforms.
Today 2-byte wide loads from offsetof(struct bpf_sk_lookup, remote_port)
are handled as narrow loads from a 4-byte wide field.
This by itself is not enough to create a problem, but when we combine
1. 32-bit wide access to ->remote_port backed by a 16-wide wide load, with
2. inherent difference between litte- and big-endian in how narrow loads
need have to be handled (see bpf_ctx_narrow_access_offset),
we get inconsistent results for a 2-byte loads from &ctx->remote_port on LE
and BE architectures. This in turn makes BPF C code for the common case of
2-byte load from ctx->remote_port not portable.
To rectify it, inform the context access converter that remote_port is
2-byte wide field, and only 1-byte loads need to be treated as narrow
loads.
At the same time, we special-case the 4-byte load from &ctx->remote_port to
continue handling it the same way as do today, in order to keep the
existing BPF programs working.
Fixes: 9a69e2b385f4 ("bpf: Make remote_port field in struct bpf_sk_lookup 16-bit wide")
Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Link: https://lore.kernel.org/bpf/20220319183356.233666-2-jakub@cloudflare.com
|
|
Currently, local storage memory can only be allocated atomically
(GFP_ATOMIC). This restriction is too strict for sleepable bpf
programs.
In this patch, the verifier detects whether the program is sleepable,
and passes the corresponding GFP_KERNEL or GFP_ATOMIC flag as a
5th argument to bpf_task/sk/inode_storage_get. This flag will propagate
down to the local storage functions that allocate memory.
Please note that bpf_task/sk/inode_storage_update_elem functions are
invoked by userspace applications through syscalls. Preemption is
disabled before bpf_task/sk/inode_storage_update_elem is called, which
means they will always have to allocate memory atomically.
Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: KP Singh <kpsingh@kernel.org>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Link: https://lore.kernel.org/bpf/20220318045553.3091807-2-joannekoong@fb.com
|
|
The flowtable object is already passed as argument to
nf_flow_table_iterate(), do use not data pointer to pass flowtable.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
Already available through the flowtable object, remove it.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
Eliminate anonymous module_init() and module_exit(), which can lead to
confusion or ambiguity when reading System.map, crashes/oops/bugs,
or an initcall_debug log.
Give each of these init and exit functions unique driver-specific
names to eliminate the anonymous names.
Example 1: (System.map)
ffffffff832fc78c t init
ffffffff832fc79e t init
ffffffff832fc8f8 t init
Example 2: (initcall_debug log)
calling init+0x0/0x12 @ 1
initcall init+0x0/0x12 returned 0 after 15 usecs
calling init+0x0/0x60 @ 1
initcall init+0x0/0x60 returned 0 after 2 usecs
calling init+0x0/0x9a @ 1
initcall init+0x0/0x9a returned 0 after 74 usecs
Fixes: f587de0e2feb ("[NETFILTER]: nf_conntrack/nf_nat: add H.323 helper port")
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Acked-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
Check if we can elide the load. Cancel if the new candidate
isn't identical to previous store.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
The fib expression stores to a register, so we can't add empty stub.
Check that the register that is being written is in fact redundant.
In most cases, this is expected to cancel tracking as re-use is
unlikely.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
Check if the destination register already contains the data that this
tunnel expression performs. This allows to skip this redundant operation.
If the destination contains a different selector, update the register
tracking information. This patch does not perform bitwise tracking.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
Check if the destination register already contains the data that this
xfrm expression performs. This allows to skip this redundant operation.
If the destination contains a different selector, update the register
tracking information.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
Check if the destination register already contains the data that this
socket expression performs. This allows to skip this redundant
operation. If the destination contains a different selector, update the
register tracking information.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
The immediate expression might clobber existing data on the registers,
cancel register tracking for the destination register.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
Check if the destination register already contains the data that this
osf expression performs. Always cancel register tracking for jhash since
this requires tracking multiple source registers in case of
concatenations. Perform register tracking (without bitwise) for symhash
since input does not come from source register.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
Allow to recycle the previous output of the OS fingerprint expression
if flags and ttl are the same.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
Random and increment are stateful, each invocation results in fresh output.
Cancel register tracking for these two expressions.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
its enough to export the meta get reduce helper and then call it
from nft_meta_bridge too.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
In most cases, nft_lookup will be read-only, i.e. won't clobber
registers. In case of map, we need to cancel the registers that will
see stores.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
Check if the destination register already contains the data that this ct
expression performs. This allows to skip this redundant operation. If
the destination contains a different selector, update the register
tracking information.
Export nft_expr_reduce_bitwise as a symbol since nft_ct might be
compiled as a module.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
Output of expressions might be larger than one single register, this might
clobber existing data. Reset tracking for all destination registers that
required to store the expression output.
This patch adds three new helper functions:
- nft_reg_track_update: cancel previous register tracking and update it.
- nft_reg_track_cancel: cancel any previous register tracking info.
- __nft_reg_track_cancel: cancel only one single register tracking info.
Partial register clobbering detection is also supported by checking the
.num_reg field which describes the number of register that are used.
This patch updates the following expressions:
- meta_bridge
- bitwise
- byteorder
- meta
- payload
to use these helper functions.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
Skip register tracking for expressions that perform read-only operations
on the registers. Define and use a cookie pointer NFT_REDUCE_READONLY to
avoid defining stubs for these expressions.
This patch re-enables register tracking which was disabled in ed5f85d42290
("netfilter: nf_tables: disable register tracking"). Follow up patches
add remaining register tracking for existing expressions.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
The function sets the pernet boolean to avoid the spurious warning from
nf_ct_lookup_helper() when assigning conntrack helpers via nftables.
Fixes: 1a64edf54f55 ("netfilter: nft_ct: add helper set support")
Signed-off-by: Phil Sutter <phil@nwl.cc>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
as of commit 4608fdfc07e1
("netfilter: conntrack: collect all entries in one cycle")
conntrack gc was changed to run every 2 minutes.
On systems where conntrack hash table is set to large value, most evictions
happen from gc worker rather than the packet path due to hash table
distribution.
This causes netlink event overflows when events are collected.
This change collects average expiry of scanned entries and
reschedules to the average remaining value, within 1 to 60 second interval.
To avoid event overflows, reschedule after each bucket and add a
limit for both run time and number of evictions per run.
If more entries have to be evicted, reschedule and restart 1 jiffy
into the future.
Reported-by: Karel Rericha <karel@maxtel.cz>
Cc: Shmulik Ladkani <shmulik.ladkani@gmail.com>
Cc: Eyal Birger <eyal.birger@gmail.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
ipsec-next
Steffen Klassert says:
====================
pull request (net-next): ipsec-next 2022-03-19
1) Delete duplicated functions that calls same xfrm_api_check.
From Leon Romanovsky.
2) Align userland API of the default policy structure to the
internal structures. From Nicolas Dichtel.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
When a netlink message is received, netlink_recvmsg() fills in the address
of the sender. One of the fields is the 32-bit bitfield nl_groups, which
carries the multicast group on which the message was received. The least
significant bit corresponds to group 1, and therefore the highest group
that the field can represent is 32. Above that, the UB sanitizer flags the
out-of-bounds shift attempts.
Which bits end up being set in such case is implementation defined, but
it's either going to be a wrong non-zero value, or zero, which is at least
not misleading. Make the latter choice deterministic by always setting to 0
for higher-numbered multicast groups.
To get information about membership in groups >= 32, userspace is expected
to use nl_pktinfo control messages[0], which are enabled by NETLINK_PKTINFO
socket option.
[0] https://lwn.net/Articles/147608/
The way to trigger this issue is e.g. through monitoring the BRVLAN group:
# bridge monitor vlan &
# ip link add name br type bridge
Which produces the following citation:
UBSAN: shift-out-of-bounds in net/netlink/af_netlink.c:162:19
shift exponent 32 is too large for 32-bit type 'int'
Fixes: f7fa9b10edbb ("[NETLINK]: Support dynamic number of multicast groups per netlink family")
Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Link: https://lore.kernel.org/r/2bef6aabf201d1fc16cca139a744700cff9dcb04.1647527635.git.petrm@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth-next
Luiz Augusto von Dentz says:
====================
bluetooth-next pull request for net-next:
- Add support for Asus TF103C
- Add support for Realtek RTL8852B
- Add support for Realtek RTL8723BE
- Add WBS support to mt7921s
* tag 'for-net-next-2022-03-18' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth-next: (24 commits)
Bluetooth: ath3k: remove superfluous header files
Bluetooth: bcm203x: remove superfluous header files
Bluetooth: hci_bcm: Add the Asus TF103C to the bcm_broken_irq_dmi_table
Bluetooth: mt7921s: Add WBS support
Bluetooth: mt7921s: Add .btmtk_get_codec_config_data
Bluetooth: mt7921s: Add .get_data_path_id
Bluetooth: mt7921s: Set HCI_QUIRK_VALID_LE_STATES
Bluetooth: btmtksdio: Fix kernel oops in btmtksdio_interrupt
Bluetooth: btmtkuart: fix error handling in mtk_hci_wmt_sync()
Bluetooth: call hci_le_conn_failed with hdev lock in hci_le_conn_failed
Bluetooth: Send AdvMonitor Dev Found for all matched devices
Bluetooth: msft: Clear tracked devices on resume
Bluetooth: fix incorrect nonblock bitmask in bt_sock_wait_ready()
Bluetooth: Don't assign twice the same value
Bluetooth: btrtl: Add support for RTL8852B
Bluetooth: hci_uart: add missing NULL check in h5_enqueue
Bluetooth: Fix use after free in hci_send_acl
Bluetooth: btusb: Use quirk to skip HCI_FLT_CLEAR_ALL on fake CSR controllers
Bluetooth: hci_sync: Add a new quirk to skip HCI_FLT_CLEAR_ALL
Bluetooth: btmtkuart: fix the conflict between mtk and msft vendor event
...
====================
Link: https://lore.kernel.org/r/20220318224752.1477292-1-luiz.dentz@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
In some corner cases, the peer handing an incoming ADD_ADDR option, can
receive a retransmitted ADD_ADDR for the same address before the subflow
creation completes.
We can avoid the above issue by generating and sending the ADD_ADDR echo
before starting the MPJ subflow connection.
This slightly changes the behaviour of the packetdrill tests as the
ADD_ADDR echo packet is sent earlier.
Acked-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Yonglong Li <liyonglong@chinatelecom.cn>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Link: https://lore.kernel.org/r/20220317221444.426335-1-mathew.j.martineau@linux.intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Got crash when doing pressure test of mptcp:
===========================================================================
dst_release: dst:ffffa06ce6e5c058 refcnt:-1
kernel tried to execute NX-protected page - exploit attempt? (uid: 0)
BUG: unable to handle kernel paging request at ffffa06ce6e5c058
PGD 190a01067 P4D 190a01067 PUD 43fffb067 PMD 22e403063 PTE 8000000226e5c063
Oops: 0011 [#1] SMP PTI
CPU: 7 PID: 7823 Comm: kworker/7:0 Kdump: loaded Tainted: G E
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.2.1 04/01/2014
Call Trace:
? skb_release_head_state+0x68/0x100
? skb_release_all+0xe/0x30
? kfree_skb+0x32/0xa0
? mptcp_sendmsg_frag+0x57e/0x750
? __mptcp_retrans+0x21b/0x3c0
? __switch_to_asm+0x35/0x70
? mptcp_worker+0x25e/0x320
? process_one_work+0x1a7/0x360
? worker_thread+0x30/0x390
? create_worker+0x1a0/0x1a0
? kthread+0x112/0x130
? kthread_flush_work_fn+0x10/0x10
? ret_from_fork+0x35/0x40
===========================================================================
In __mptcp_alloc_tx_skb skb was allocated and skb->tcp_tsorted_anchor will
be initialized, in under memory pressure situation sk_wmem_schedule will
return false and then kfree_skb. In this case skb->_skb_refdst is not null
because_skb_refdst and tcp_tsorted_anchor are stored in the same mem, and
kfree_skb will try to release dst and cause crash.
Fixes: f70cad1085d1 ("mptcp: stop relying on tcp_tx_skb_cache")
Reviewed-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Yonglong Li <liyonglong@chinatelecom.cn>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Link: https://lore.kernel.org/r/20220317220953.426024-1-mathew.j.martineau@linux.intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
The PMTU update and ICMP redirect helper functions initialise their fl4
variable with either __build_flow_key() or build_sk_flow_key(). These
initialisation functions always set ->flowi4_scope with
RT_SCOPE_UNIVERSE and might set the ECN bits of ->flowi4_tos. This is
not a problem when the route lookup is later done via
ip_route_output_key_hash(), which properly clears the ECN bits from
->flowi4_tos and initialises ->flowi4_scope based on the RTO_ONLINK
flag. However, some helpers call fib_lookup() directly, without
sanitising the tos and scope fields, so the route lookup can fail and,
as a result, the ICMP redirect or PMTU update aren't taken into
account.
Fix this by extracting the ->flowi4_tos and ->flowi4_scope sanitisation
code into ip_rt_fix_tos(), then use this function in handlers that call
fib_lookup() directly.
Note 1: We can't sanitise ->flowi4_tos and ->flowi4_scope in a central
place (like __build_flow_key() or flowi4_init_output()), because
ip_route_output_key_hash() expects non-sanitised values. When called
with sanitised values, it can erroneously overwrite RT_SCOPE_LINK with
RT_SCOPE_UNIVERSE in ->flowi4_scope. Therefore we have to be careful to
sanitise the values only for those paths that don't call
ip_route_output_key_hash().
Note 2: The problem is mostly about sanitising ->flowi4_tos. Having
->flowi4_scope initialised with RT_SCOPE_UNIVERSE instead of
RT_SCOPE_LINK probably wasn't really a problem: sockets with the
SOCK_LOCALROUTE flag set (those that'd result in RTO_ONLINK being set)
normally shouldn't receive ICMP redirects or PMTU updates.
Fixes: 4895c771c7f0 ("ipv4: Add FIB nexthop exceptions.")
Signed-off-by: Guillaume Nault <gnault@redhat.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Let's remove unnecessary brackets around CONFIG_AF_UNIX_OOB.
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.co.jp>
Link: https://lore.kernel.org/r/20220317032308.65372-1-kuniyu@amazon.co.jp
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Network drivers can call to netif_get_num_default_rss_queues to get the
default number of receive queues to use. Right now, this default number
is min(8, num_online_cpus()).
Instead, as suggested by Jakub, use the number of physical cores divided
by 2 as a way to avoid wasting CPU resources and to avoid using both CPU
threads, but still allowing to scale for high-end processors with many
cores.
As an exception, select 2 queues for processors with 2 cores, because
otherwise it won't take any advantage of RSS despite being SMP capable.
Tested: Processor Intel Xeon E5-2620 (2 sockets, 6 cores/socket, 2
threads/core). NIC Broadcom NetXtreme II BCM57810 (10GBps). Ran some
tests with `perf stat iperf3 -R`, with parallelisms of 1, 8 and 24,
getting the following results:
- Number of queues: 6 (instead of 8)
- Network throughput: not affected
- CPU usage: utilized 0.05-0.12 CPUs more than before (having 24 CPUs
this is only 0.2-0.5% higher)
- Reduced the number of context switches by 7-50%, being more noticeable
when using a higher number of parallel threads.
Suggested-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Íñigo Huguet <ihuguet@redhat.com>
Link: https://lore.kernel.org/r/20220315091832.13873-1-ihuguet@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Daniel Borkmann says:
====================
pull-request: bpf 2022-03-18
We've added 2 non-merge commits during the last 18 day(s) which contain
a total of 2 files changed, 50 insertions(+), 20 deletions(-).
The main changes are:
1) Fix a race in XSK socket teardown code that can lead to a NULL pointer
dereference, from Magnus.
2) Small MAINTAINERS doc update to remove Lorenz from sockmap, from Lorenz.
* https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf:
xsk: Fix race at socket teardown
bpf: Remove Lorenz Bauer from L7 BPF maintainers
====================
Link: https://lore.kernel.org/r/20220318152418.28638-1-daniel@iogearbox.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
hci_le_conn_failed function's documentation says that the caller must
hold hdev->lock. The only callsite that does not hold that lock is
hci_le_conn_failed. The other 3 callsites hold the hdev->lock very
locally. The solution is to hold the lock during the call to
hci_le_conn_failed.
Fixes: 3c857757ef6e ("Bluetooth: Add directed advertising support through connect()")
Signed-off-by: Niels Dossche <dossche.niels@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
|
|
When an Advertisement Monitor is configured with SamplingPeriod 0xFF,
the controller reports only one adv report along with the MSFT Monitor
Device event.
When an advertiser matches multiple monitors, some controllers send one
adv report for each matched monitor; whereas, some controllers send just
one adv report for all matched monitors.
In such a case, report Adv Monitor Device Found event for each matched
monitor.
Signed-off-by: Manish Mandlik <mmandlik@google.com>
Reviewed-by: Miao-chen Chou <mcchou@chromium.org>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
|
|
Clear already tracked devices on system resume. Once the monitors are
reregistered after resume, matched devices in range will be found again.
Signed-off-by: Manish Mandlik <mmandlik@google.com>
Reviewed-by: Miao-chen Chou <mcchou@chromium.org>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
|
|
Callers pass msg->msg_flags as flags, which contains MSG_DONTWAIT
instead of O_NONBLOCK.
Signed-off-by: Gavin Li <gavin@matician.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
|
|
data.pid is set twice with the same value. Remove one of these redundant
calls.
Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
|
|
This fixes the following trace caused by receiving
HCI_EV_DISCONN_PHY_LINK_COMPLETE which does call hci_conn_del without
first checking if conn->type is in fact AMP_LINK and in case it is
do properly cleanup upper layers with hci_disconn_cfm:
==================================================================
BUG: KASAN: use-after-free in hci_send_acl+0xaba/0xc50
Read of size 8 at addr ffff88800e404818 by task bluetoothd/142
CPU: 0 PID: 142 Comm: bluetoothd Not tainted
5.17.0-rc5-00006-gda4022eeac1a #7
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS
rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04/01/2014
Call Trace:
<TASK>
dump_stack_lvl+0x45/0x59
print_address_description.constprop.0+0x1f/0x150
kasan_report.cold+0x7f/0x11b
hci_send_acl+0xaba/0xc50
l2cap_do_send+0x23f/0x3d0
l2cap_chan_send+0xc06/0x2cc0
l2cap_sock_sendmsg+0x201/0x2b0
sock_sendmsg+0xdc/0x110
sock_write_iter+0x20f/0x370
do_iter_readv_writev+0x343/0x690
do_iter_write+0x132/0x640
vfs_writev+0x198/0x570
do_writev+0x202/0x280
do_syscall_64+0x38/0x90
entry_SYSCALL_64_after_hwframe+0x44/0xae
RSP: 002b:00007ffce8a099b8 EFLAGS: 00000246 ORIG_RAX: 0000000000000014
Code: 0f 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b8 0f 1f 00 f3
0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 b8 14 00 00 00 0f 05
<48> 3d 00 f0 ff ff 77 51 c3 48 83 ec 28 89 54 24 1c 48 89 74 24 10
RDX: 0000000000000001 RSI: 00007ffce8a099e0 RDI: 0000000000000015
RAX: ffffffffffffffda RBX: 00007ffce8a099e0 RCX: 00007f788fc3cf77
R10: 00007ffce8af7080 R11: 0000000000000246 R12: 000055e4ccf75580
RBP: 0000000000000015 R08: 0000000000000002 R09: 0000000000000001
</TASK>
R13: 000055e4ccf754a0 R14: 000055e4ccf75cd0 R15: 000055e4ccf4a6b0
Allocated by task 45:
kasan_save_stack+0x1e/0x40
__kasan_kmalloc+0x81/0xa0
hci_chan_create+0x9a/0x2f0
l2cap_conn_add.part.0+0x1a/0xdc0
l2cap_connect_cfm+0x236/0x1000
le_conn_complete_evt+0x15a7/0x1db0
hci_le_conn_complete_evt+0x226/0x2c0
hci_le_meta_evt+0x247/0x450
hci_event_packet+0x61b/0xe90
hci_rx_work+0x4d5/0xc50
process_one_work+0x8fb/0x15a0
worker_thread+0x576/0x1240
kthread+0x29d/0x340
ret_from_fork+0x1f/0x30
Freed by task 45:
kasan_save_stack+0x1e/0x40
kasan_set_track+0x21/0x30
kasan_set_free_info+0x20/0x30
__kasan_slab_free+0xfb/0x130
kfree+0xac/0x350
hci_conn_cleanup+0x101/0x6a0
hci_conn_del+0x27e/0x6c0
hci_disconn_phylink_complete_evt+0xe0/0x120
hci_event_packet+0x812/0xe90
hci_rx_work+0x4d5/0xc50
process_one_work+0x8fb/0x15a0
worker_thread+0x576/0x1240
kthread+0x29d/0x340
ret_from_fork+0x1f/0x30
The buggy address belongs to the object at ffff88800c0f0500
The buggy address is located 24 bytes inside of
which belongs to the cache kmalloc-128 of size 128
The buggy address belongs to the page:
128-byte region [ffff88800c0f0500, ffff88800c0f0580)
flags: 0x100000000000200(slab|node=0|zone=1)
page:00000000fe45cd86 refcount:1 mapcount:0
mapping:0000000000000000 index:0x0 pfn:0xc0f0
raw: 0000000000000000 0000000080100010 00000001ffffffff
0000000000000000
raw: 0100000000000200 ffffea00003a2c80 dead000000000004
ffff8880078418c0
page dumped because: kasan: bad access detected
ffff88800c0f0400: 00 00 00 00 00 00 00 00 00 00 00 00 00 fc fc fc
Memory state around the buggy address:
>ffff88800c0f0500: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff88800c0f0480: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
ffff88800c0f0580: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
^
==================================================================
ffff88800c0f0600: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
Reported-by: Sönke Huster <soenke.huster@eknoes.de>
Tested-by: Sönke Huster <soenke.huster@eknoes.de>
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
|
|
Some controllers have problems with being sent a command to clear
all filtering. While the HCI code does not unconditionally
send a clear-all anymore at BR/EDR setup (after the state machine
refactor), there might be more ways of hitting these codepaths
in the future as the kernel develops.
Cc: stable@vger.kernel.org
Cc: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Ismael Ferreras Morezuelas <swyterzone@gmail.com>
Reviewed-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
|
|
Variable cur_len is being ininitialized with a value in the start of
a for-loop but this is never read, it is being re-assigned a new value
on the first statement in the for-loop. The initialization is redundant
and can be removed.
Cleans up clang scan build warning:
net/bluetooth/mgmt.c:7958:14: warning: Although the value stored to 'cur_len'
is used in the enclosing expression, the value is never actually read
from 'cur_len' [deadcode.DeadStores]
Signed-off-by: Colin Ian King <colin.i.king@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
|