Age | Commit message (Collapse) | Author |
|
This option at minimum adds extra code to the scheduler - even if
it's default unused - and most users wouldn't want it.
Reported-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
Michael Chan says:
====================
bnxt_en: Add hardware PTP timestamping support on 575XX devices
Add PTP RX and TX hardware timestamp support on 575XX devices. These
devices use the two-step method to implement the IEEE-1588 timestamping
support.
v2: Add spinlock to serialize access to the timecounter.
Use .do_aux_work() for the periodic timer reading and to get the TX
timestamp from the firmware.
Propagate error code from ptp_clock_register().
Make the 64-bit timer access safe on 32-bit CPUs.
Read PHC using direct register access.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Call bnxt_ptp_init() to initialize and register with the clock driver
to enable PTP support. Call bnxt_ptp_free() to unregister and clean
up during shutdown.
Reviewed-by: Edwin Peer <edwin.peer@broadcom.com>
Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Setup the TXBD to enable TX timestamp if requested. At TX packet DMA
completion, if we requested TX timestamp on that packet, we defer to
.do_aux_work() to obtain the TX timestamp from the firmware before we
free the TX SKB.
v2: Use .do_aux_work() to get the TX timestamp from firmware.
Reviewed-by: Edwin Peer <edwin.peer@broadcom.com>
Signed-off-by: Pavan Chebbi <pavan.chebbi@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
If the RX packet is timestamped by the hardware, the RX completion
record will contain the lower 32-bit of the timestamp. This needs
to be combined with the upper 16-bit of the periodic timestamp that
we get from the timer. The previous snapshot in ptp->old_timer is
used to make sure that the snapshot is not ahead of the RX timestamp
and we adjust for wrap-around if needed.
v2: Make ptp->old_time read access safe on 32-bit CPUs.
Reviewed-by: Edwin Peer <edwin.peer@broadcom.com>
Signed-off-by: Pavan Chebbi <pavan.chebbi@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
From the bnxt_timer(), read the 48-bit hardware running clock
periodically and store it in ptp->current_time. The previous snapshot
of the clock will be stored in ptp->old_time. The old_time snapshot
will be used in the next patches to compute the RX packet timestamps.
v2: Use .do_aux_work() to read the timer periodically.
Reviewed-by: Edwin Peer <edwin.peer@broadcom.com>
Signed-off-by: Pavan Chebbi <pavan.chebbi@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Add the clock APIs to set/get/adjust the hw clock, and the related
ioctls and ethtool methods.
v2: Propagate error code from ptp_clock_register().
Add spinlock to serialize access to the timecounter. The
timecounter is accessed in process context and the RX datapath.
Read the PHC using direct registers.
Reviewed-by: Edwin Peer <edwin.peer@broadcom.com>
Signed-off-by: Pavan Chebbi <pavan.chebbi@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Store PTP hardware info in a structure if hardware and firmware support PTP.
Reviewed-by: Edwin Peer <edwin.peer@broadcom.com>
Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Adding the PTP related firmware interface is the main change.
There is also a name change for admin_mtu, requiring code fixup.
Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com>
Signed-off-by: Edwin Peer <edwin.peer@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Guangbin Huang says:
====================
net: hns3: add new debugfs commands
This series adds three new debugfs commands for the HNS3 ethernet driver.
change log:
V1 -> V2:
1. remove patch "net: hns3: add support for link diagnosis info in debugfs"
and use ethtool extended link state to implement similar function
according to Jakub Kicinski's opinion.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
This patch adds support of dumping MAC umv counter in debugfs,
which will be helpful for debugging.
The display style is below:
$ cat umv_info
num_alloc_vport : 2
max_umv_size : 256
wanted_umv_size : 256
priv_umv_size : 85
share_umv_size : 86
vport(0) used_umv_num : 1
vport(1) used_umv_num : 1
Signed-off-by: Jian Shen <shenjian15@huawei.com>
Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Previously, the flow director counter is not enabled. To improve the
maintainability for chechking whether flow director hit or not, enable
flow director counter for each function, and add debugfs query inerface
to query the counters for each function.
The debugfs command is below:
cat fd_counter
func_id hit_times
pf 0
vf0 0
vf1 0
Signed-off-by: Jian Shen <shenjian15@huawei.com>
Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Menglong Dong says:
====================
net: tipc: fix FB_MTU eat two pages and do some code cleanup
In the first patch, FB_MTU is redefined to make sure data size will not
exceed PAGE_SIZE. Besides, I removed the alignment for buf_size in
tipc_buf_acquire, because skb_alloc_fclone will do the alignment job.
In the second patch, I removed align() in msg.c and replace it with
ALIGN().
Changes since V5:
- remove blank line after Fixes in commit log in the first patch
Changes since V4:
- remove ONE_PAGE_SKB_SZ and replace it with one_page_mtu in the first
patch.
- fix some code style problems for the second patch.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The function align() which is defined in msg.c is redundant, replace it
with ALIGN() and introduce a BUF_ALIGN().
Signed-off-by: Menglong Dong <dong.menglong@zte.com.cn>
Acked-by: Jon Maloy <jmaloy@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
FB_MTU is used in 'tipc_msg_build()' to alloc smaller skb when memory
allocation fails, which can avoid unnecessary sending failures.
The value of FB_MTU now is 3744, and the data size will be:
(3744 + SKB_DATA_ALIGN(sizeof(struct skb_shared_info)) + \
SKB_DATA_ALIGN(BUF_HEADROOM + BUF_TAILROOM + 3))
which is larger than one page(4096), and two pages will be allocated.
To avoid it, replace '3744' with a calculation:
(PAGE_SIZE - SKB_DATA_ALIGN(BUF_OVERHEAD) - \
SKB_DATA_ALIGN(sizeof(struct skb_shared_info)))
What's more, alloc_skb_fclone() will call SKB_DATA_ALIGN for data size,
and it's not necessary to make alignment for buf_size in
tipc_buf_acquire(). So, just remove it.
Fixes: 4c94cc2d3d57 ("tipc: fall back to smaller MTU if allocation of local send skb fails")
Signed-off-by: Menglong Dong <dong.menglong@zte.com.cn>
Acked-by: Jon Maloy <jmaloy@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Commit 95b88f4d71cb953e02206be3c757083601391a0f ("dm writecache: pause
writeback if cache full and origin being written directly") introduced a
code that pauses cache flushing if we are issuing writes directly to the
origin.
Improve that initial commit by making the timeout code configurable
(via the option "pause_writeback"). Also change the default from 1s to
3s because it performed better.
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 splitlock updates from Ingo Molnar:
- Add the "ratelimit:N" parameter to the split_lock_detect= boot
option, to rate-limit the generation of bus-lock exceptions.
This is both easier on system resources and kinder to offending
applications than the current policy of outright killing them.
- Document the split-lock detection feature and its parameters.
* tag 'x86-splitlock-2021-06-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
Documentation/x86: Add ratelimit in buslock.rst
Documentation/admin-guide: Add bus lock ratelimit
x86/bus_lock: Set rate limit for bus lock
Documentation/x86: Add buslock.rst
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 mm update from Ingo Molnar:
"Do not create the x86/init_pkru debugfs file if the CPU doesn't
support PKRU"
* tag 'x86-mm-2021-06-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/pkeys: Skip 'init_pkru' debugfs file creation when pkeys not supported
|
|
/klassert/ipsec-next
Steffen Klassert says:
====================
pull request (net-next): ipsec-next 2021-06-28
1) Remove an unneeded error assignment in esp4_gro_receive().
From Yang Li.
2) Add a new byseq state hashtable to find acquire states faster.
From Sabrina Dubroca.
3) Remove some unnecessary variables in pfkey_create().
From zuoqilin.
4) Remove the unused description from xfrm_type struct.
From Florian Westphal.
5) Fix a spelling mistake in the comment of xfrm_state_ok().
From gushengxian.
6) Replace hdr_off indirections by a small helper function.
From Florian Westphal.
7) Remove xfrm4_output_finish and xfrm6_output_finish declarations,
they are not used anymore.From Antony Antony.
8) Remove xfrm replay indirections.
From Florian Westphal.
Please pull or let me know if there are problems.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 uapi fixlet from Ingo Molnar:
"Fix the <uapi/asm/hwcap2.h> UAPI header to build in user-space too"
* tag 'x86-misc-2021-06-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/elf: Use _BITUL() macro in UAPI headers
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 cleanups from Ingo Molnar:
"Misc cleanups & removal of obsolete code"
* tag 'x86-cleanups-2021-06-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/sgx: Correct kernel-doc's arg name in sgx_encl_release()
doc: Remove references to IBM Calgary
x86/setup: Document that Windows reserves the first MiB
x86/crash: Remove crash_reserve_low_1M()
x86/setup: Remove CONFIG_X86_RESERVE_LOW and reservelow= options
x86/alternative: Align insn bytes vertically
x86: Fix leftover comment typos
x86/asm: Simplify __smp_mb() definition
x86/alternatives: Make the x86nops[] symbol static
|
|
The sample mtty mdev driver doesn't actually enforce the number of
device instances it claims are available. Implement this properly.
Link: https://lore.kernel.org/r/162465624894.3338367.12935940647049917981.stgit@omen
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Reviewed by: Kirti Wankhede <kwankhede@nvidia.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 resource control documentation fixes from Ingo Molnar:
"Fix Docbook comments in the x86/resctrl code"
* tag 'x86-cache-2021-06-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/resctrl: Fix kernel-doc in internal.h
x86/resctrl: Fix kernel-doc in pseudo_lock.c
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211-next
Johannes berg says:
====================
Lots of changes:
* aggregation handling improvements for some drivers
* hidden AP discovery on 6 GHz and other HE 6 GHz
improvements
* minstrel improvements for no-ack frames
* deferred rate control for TXQs to improve reaction
times
* virtual time-based airtime scheduler
* along with various little cleanups/fixups
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 boot update from Ingo Molnar:
"Modernize the genimage.sh script, add a 'hdimage' target and EFI
support"
* tag 'x86-boot-2021-06-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/boot: Modernize genimage script; hdimage+EFI support
|
|
Dan Carpenter reported an issue introduced in
commit fde56eea01f9 ("mptcp: refine mptcp_cleanup_rbuf") where a new
boolean (ack_pending) is masked with 0x9.
This is not the intention to ignore values by using a boolean. This
variable should not have a 'bool' type: we should keep the 'u8' to allow
this comparison.
Fixes: fde56eea01f9 ("mptcp: refine mptcp_cleanup_rbuf")
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Acked-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Syzbot reported warning in tcindex_alloc_perfect_hash. The problem
was in too big cp->hash, which triggers warning in kmalloc. Since
cp->hash comes from userspace, there is no need to warn if value
is not correct
Fixes: b9a24bb76bf6 ("net_sched: properly handle failure case of tcf_exts_init()")
Reported-and-tested-by: syzbot+1071ad60cd7df39fdadb@syzkaller.appspotmail.com
Signed-off-by: Pavel Skripkin <paskripkin@gmail.com>
Acked-by: Cong Wang <cong.wang@bytedance.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 asm updates from Ingo Molnar:
- Micro-optimize and standardize the do_syscall_64() calling convention
- Make syscall entry flags clearing more conservative
- Clean up syscall table handling
- Clean up & standardize assembly macros, in preparation of FRED
- Misc cleanups and fixes
* tag 'x86-asm-2021-06-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/asm: Make <asm/asm.h> valid on cross-builds as well
x86/regs: Syscall_get_nr() returns -1 for a non-system call
x86/entry: Split PUSH_AND_CLEAR_REGS into two submacros
x86/syscall: Maximize MSR_SYSCALL_MASK
x86/syscall: Unconditionally prototype {ia32,x32}_sys_call_table[]
x86/entry: Reverse arguments to do_syscall_64()
x86/entry: Unify definitions from <asm/calling.h> and <asm/ptrace-abi.h>
x86/asm: Use _ASM_BYTES() in <asm/nops.h>
x86/asm: Add _ASM_BYTES() macro for a .byte ... opcode sequence
x86/asm: Have the __ASM_FORM macros handle commas in arguments
|
|
Instead of depending on "sysctl" being installed, just use "grep -H" for
sysctl status reporting. Additionally report kernel version for easier
comparisons.
Signed-off-by: Kees Cook <keescook@chromium.org>
|
|
When running the seccomp benchmark under a test runner, it wouldn't
provide any feedback on progress. Set stdout unbuffered.
Suggested-by: Will Drewry <wad@chromium.org>
Signed-off-by: Kees Cook <keescook@chromium.org>
|
|
Since the open fds might not always start at "4" (especially when
running under kselftest, etc), start counting from the first assigned
fd, rather than using the more permissive EXPECT_GE(fd, 0).
Signed-off-by: Kees Cook <keescook@chromium.org>
Link: https://lore.kernel.org/lkml/20210527032948.3730953-1-keescook@chromium.org
Reviewed-by: Rodrigo Campos <rodrigo@kinvolk.io>
Acked-by: Christian Brauner <christian.brauner@ubuntu.com>
|
|
This just adds a test to verify that when using the new introduced flag
to ADDFD, a valid fd is added and returned as the syscall result.
Signed-off-by: Rodrigo Campos <rodrigo@kinvolk.io>
Signed-off-by: Sargun Dhillon <sargun@sargun.me>
Acked-by: Tycho Andersen <tycho@tycho.pizza>
Acked-by: Christian Brauner <christian.brauner@ubuntu.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
Link: https://lore.kernel.org/r/20210517193908.3113-5-sargun@sargun.me
|
|
Alban Crequy reported a race condition userspace faces when we want to
add some fds and make the syscall return them[1] using seccomp notify.
The problem is that currently two different ioctl() calls are needed by
the process handling the syscalls (agent) for another userspace process
(target): SECCOMP_IOCTL_NOTIF_ADDFD to allocate the fd and
SECCOMP_IOCTL_NOTIF_SEND to return that value. Therefore, it is possible
for the agent to do the first ioctl to add a file descriptor but the
target is interrupted (EINTR) before the agent does the second ioctl()
call.
This patch adds a flag to the ADDFD ioctl() so it adds the fd and
returns that value atomically to the target program, as suggested by
Kees Cook[2]. This is done by simply allowing
seccomp_do_user_notification() to add the fd and return it in this case.
Therefore, in this case the target wakes up from the wait in
seccomp_do_user_notification() either to interrupt the syscall or to add
the fd and return it.
This "allocate an fd and return" functionality is useful for syscalls
that return a file descriptor only, like connect(2). Other syscalls that
return a file descriptor but not as return value (or return more than
one fd), like socketpair(), pipe(), recvmsg with SCM_RIGHTs, will not
work with this flag.
This effectively combines SECCOMP_IOCTL_NOTIF_ADDFD and
SECCOMP_IOCTL_NOTIF_SEND into an atomic opteration. The notification's
return value, nor error can be set by the user. Upon successful invocation
of the SECCOMP_IOCTL_NOTIF_ADDFD ioctl with the SECCOMP_ADDFD_FLAG_SEND
flag, the notifying process's errno will be 0, and the return value will
be the file descriptor number that was installed.
[1]: https://lore.kernel.org/lkml/CADZs7q4sw71iNHmV8EOOXhUKJMORPzF7thraxZYddTZsxta-KQ@mail.gmail.com/
[2]: https://lore.kernel.org/lkml/202012011322.26DCBC64F2@keescook/
Signed-off-by: Rodrigo Campos <rodrigo@kinvolk.io>
Signed-off-by: Sargun Dhillon <sargun@sargun.me>
Acked-by: Tycho Andersen <tycho@tycho.pizza>
Acked-by: Christian Brauner <christian.brauner@ubuntu.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
Link: https://lore.kernel.org/r/20210517193908.3113-4-sargun@sargun.me
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 exception handling updates from Ingo Molnar:
- Clean up & simplify AP exception handling setup.
- Consolidate the disjoint IDT setup code living in idt_setup_traps()
and idt_setup_ist_traps() into a single idt_setup_traps()
initialization function and call it before cpu_init().
* tag 'x86-apic-2021-06-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/idt: Rework IDT setup for boot CPU
x86/cpu: Init AP exception handling from cpu_init_secondary()
|
|
Guillaume Nault says:
====================
net: reset MAC header consistently across L3 virtual devices
Some virtual L3 devices, like vxlan-gpe and gre (in collect_md mode),
reset the MAC header pointer after they parsed the outer headers. This
accurately reflects the fact that the decapsulated packet is pure L3
packet, as that makes the MAC header 0 bytes long (the MAC and network
header pointers are equal).
However, many L3 devices only adjust the network header after
decapsulation and leave the MAC header pointer to its original value.
This can confuse other parts of the networking stack, like TC, which
then considers the outer headers as one big MAC header.
This patch series makes the following L3 tunnels behave like VXLAN-GPE:
bareudp, ipip, sit, gre, ip6gre, ip6tnl, gtp.
The case of gre is a bit special. It already resets the MAC header
pointer in collect_md mode, so only the classical mode needs to be
adjusted. However, gre also has a special case that expects the MAC
header pointer to keep pointing to the outer header even after
decapsulation. Therefore, patch 4 keeps an exception for this case.
Ideally, we'd centralise the call to skb_reset_mac_header() in
ip_tunnel_rcv(), to avoid manual calls in ipip (patch 2),
sit (patch 3) and gre (patch 4). That's unfortunately not feasible
currently, because of the gre special case discussed above that
precludes us from resetting the MAC header unconditionally.
The original motivation is to redirect bareudp packets to Ethernet
devices (as described in patch 1). The rest of this series aims at
bringing consistency across all L3 devices (apart from gre's special
case unfortunately).
Note: the gtp patch results from pure code inspection and has been
compiled tested only.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
For consistency with other L3 tunnel devices, reset the mac_header
pointer after decapsulation. This makes the mac_header 0 bytes long,
thus making it clear that this skb has no mac_header.
Compile tested only.
Signed-off-by: Guillaume Nault <gnault@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Reset the mac_header pointer even when the tunnel transports only L3
data (in the ARPHRD_ETHER case, this is already done by eth_type_trans).
This prevents other parts of the stack from mistakenly accessing the
outer header after the packet has been decapsulated.
In practice, this allows to push an Ethernet header to ipip6, ip6ip6,
mplsip6 or ip6gre packets and redirect them to an Ethernet device:
$ tc filter add dev ip6tnl0 ingress matchall \
action vlan push_eth dst_mac 00:00:5e:00:53:01 \
src_mac 00:00:5e:00:53:00 \
action mirred egress redirect dev eth0
Without this patch, push_eth refuses to add an ethernet header because
the skb appears to already have a MAC header.
Signed-off-by: Guillaume Nault <gnault@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Commit e271c7b4420d ("gre: do not keep the GRE header around in collect
medata mode") did reset the mac_header for the collect_md case. Let's
extend this behaviour to classical gre devices as well.
ipgre_header_parse() seems to be the only case that requires mac_header
to point to the outer header. We can detect this case accurately by
checking ->header_ops. For all other cases, we can reset mac_header.
This allows to push an Ethernet header to ipgre packets and redirect
them to an Ethernet device:
$ tc filter add dev gre0 ingress matchall \
action vlan push_eth dst_mac 00:00:5e:00:53:01 \
src_mac 00:00:5e:00:53:00 \
action mirred egress redirect dev eth0
Before this patch, this worked only for collect_md gre devices.
Now this works for regular gre devices as well. Only the special case
of gre devices that use ipgre_header_ops isn't supported.
Signed-off-by: Guillaume Nault <gnault@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Even though sit transports L3 data (IPv6, IPv4 or MPLS) packets, it
needs to reset the mac_header pointer, so that other parts of the stack
don't mistakenly access the outer header after the packet has been
decapsulated. There are two rx handlers to modify: ipip6_rcv() for the
ip6ip mode and sit_tunnel_rcv() which is used to re-implement the ipip
and mplsip modes of ipip.ko.
This allows to push an Ethernet header to sit packets and redirect
them to an Ethernet device:
$ tc filter add dev sit0 ingress matchall \
action vlan push_eth dst_mac 00:00:5e:00:53:01 \
src_mac 00:00:5e:00:53:00 \
action mirred egress redirect dev eth0
Without this patch, push_eth refuses to add an ethernet header because
the skb appears to already have a MAC header.
Signed-off-by: Guillaume Nault <gnault@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Even though ipip transports IPv4 or MPLS packets, it needs to reset the
mac_header pointer, so that other parts of the stack don't mistakenly
access the outer header after the packet has been decapsulated.
This allows to push an Ethernet header to ipip or mplsip packets and
redirect them to an Ethernet device:
$ tc filter add dev ipip0 ingress matchall \
action vlan push_eth dst_mac 00:00:5e:00:53:01 \
src_mac 00:00:5e:00:53:00 \
action mirred egress redirect dev eth0
Without this patch, push_eth refuses to add an ethernet header because
the skb appears to already have a MAC header.
Signed-off-by: Guillaume Nault <gnault@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Even though bareudp transports L3 data (typically IP or MPLS), it needs
to reset the mac_header pointer, so that other parts of the stack don't
mistakenly access the outer header after the packet has been
decapsulated.
This allows to push an Ethernet header to bareudp packets and redirect
them to an Ethernet device:
$ tc filter add dev bareudp0 ingress matchall \
action vlan push_eth dst_mac 00:00:5e:00:53:01 \
src_mac 00:00:5e:00:53:00 \
action mirred egress redirect dev eth0
Without this patch, push_eth refuses to add an ethernet header because
the skb appears to already have a MAC header.
Signed-off-by: Guillaume Nault <gnault@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Commit 14972cbd34ff ("net: lwtunnel: Handle fragmentation") moved
fragmentation logic away from lwtunnel by carry encap headroom and
use it in output MTU calculation. But the forwarding part was not
covered and created difference in MTU for output and forwarding and
further to silent drops on ipv4 forwarding path. Fix it by taking
into account lwtunnel encap headroom.
The same commit also introduced difference in how to treat RTAX_MTU
in IPv4 and IPv6 where latter explicitly removes lwtunnel encap
headroom from route MTU. Make IPv4 version do the same.
Fixes: 14972cbd34ff ("net: lwtunnel: Handle fragmentation")
Suggested-by: David Ahern <dsahern@gmail.com>
Signed-off-by: Vadim Fedorenko <vfedorenko@novek.ru>
Reviewed-by: David Ahern <dsahern@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull timers/nohz updates from Ingo Molnar:
- Micro-optimize tick_nohz_full_cpu()
- Optimize idle exit tick restarts to be less eager
- Optimize tick_nohz_dep_set_task() to only wake up a single CPU.
This reduces IPIs and interruptions on nohz_full CPUs.
- Optimize tick_nohz_dep_set_signal() in a similar fashion.
- Skip IPIs in tick_nohz_kick_task() when trying to kick a
non-running task.
- Micro-optimize tick_nohz_task_switch() IRQ flags handling to
reduce context switching costs.
- Misc cleanups and fixes
* tag 'timers-nohz-2021-06-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
MAINTAINERS: Add myself as context tracking maintainer
tick/nohz: Call tick_nohz_task_switch() with interrupts disabled
tick/nohz: Kick only _queued_ task whose tick dependency is updated
tick/nohz: Change signal tick dependency to wake up CPUs of member tasks
tick/nohz: Only wake up a single target cpu when kicking a task
tick/nohz: Update nohz_full Kconfig help
tick/nohz: Update idle_exittime on actual idle exit
tick/nohz: Remove superflous check for CONFIG_VIRT_CPU_ACCOUNTING_NATIVE
tick/nohz: Conditionally restart tick on idle exit
tick/nohz: Evaluate the CPU expression after the static key
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull scheduler udpates from Ingo Molnar:
- Changes to core scheduling facilities:
- Add "Core Scheduling" via CONFIG_SCHED_CORE=y, which enables
coordinated scheduling across SMT siblings. This is a much
requested feature for cloud computing platforms, to allow the
flexible utilization of SMT siblings, without exposing untrusted
domains to information leaks & side channels, plus to ensure more
deterministic computing performance on SMT systems used by
heterogenous workloads.
There are new prctls to set core scheduling groups, which allows
more flexible management of workloads that can share siblings.
- Fix task->state access anti-patterns that may result in missed
wakeups and rename it to ->__state in the process to catch new
abuses.
- Load-balancing changes:
- Tweak newidle_balance for fair-sched, to improve 'memcache'-like
workloads.
- "Age" (decay) average idle time, to better track & improve
workloads such as 'tbench'.
- Fix & improve energy-aware (EAS) balancing logic & metrics.
- Fix & improve the uclamp metrics.
- Fix task migration (taskset) corner case on !CONFIG_CPUSET.
- Fix RT and deadline utilization tracking across policy changes
- Introduce a "burstable" CFS controller via cgroups, which allows
bursty CPU-bound workloads to borrow a bit against their future
quota to improve overall latencies & batching. Can be tweaked via
/sys/fs/cgroup/cpu/<X>/cpu.cfs_burst_us.
- Rework assymetric topology/capacity detection & handling.
- Scheduler statistics & tooling:
- Disable delayacct by default, but add a sysctl to enable it at
runtime if tooling needs it. Use static keys and other
optimizations to make it more palatable.
- Use sched_clock() in delayacct, instead of ktime_get_ns().
- Misc cleanups and fixes.
* tag 'sched-core-2021-06-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (72 commits)
sched/doc: Update the CPU capacity asymmetry bits
sched/topology: Rework CPU capacity asymmetry detection
sched/core: Introduce SD_ASYM_CPUCAPACITY_FULL sched_domain flag
psi: Fix race between psi_trigger_create/destroy
sched/fair: Introduce the burstable CFS controller
sched/uclamp: Fix uclamp_tg_restrict()
sched/rt: Fix Deadline utilization tracking during policy change
sched/rt: Fix RT utilization tracking during policy change
sched: Change task_struct::state
sched,arch: Remove unused TASK_STATE offsets
sched,timer: Use __set_current_state()
sched: Add get_current_state()
sched,perf,kvm: Fix preemption condition
sched: Introduce task_is_running()
sched: Unbreak wakeups
sched/fair: Age the average idle time
sched/cpufreq: Consider reduced CPU capacity in energy calculation
sched/fair: Take thermal pressure into account while estimating energy
thermal/cpufreq_cooling: Update offline CPUs per-cpu thermal_pressure
sched/fair: Return early from update_tg_cfs_load() if delta == 0
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf events updates from Ingo Molnar:
- Platform PMU driver updates:
- x86 Intel uncore driver updates for Skylake (SNR) and Icelake (ICX) servers
- Fix RDPMC support
- Fix [extended-]PEBS-via-PT support
- Fix Sapphire Rapids event constraints
- Fix :ppp support on Sapphire Rapids
- Fix fixed counter sanity check on Alder Lake & X86_FEATURE_HYBRID_CPU
- Other heterogenous-PMU fixes
- Kprobes:
- Remove the unused and misguided kprobe::fault_handler callbacks.
- Warn about kprobes taking a page fault.
- Fix the 'nmissed' stat counter.
- Misc cleanups and fixes.
* tag 'perf-core-2021-06-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
perf: Fix task context PMU for Hetero
perf/x86/intel: Fix instructions:ppp support in Sapphire Rapids
perf/x86/intel: Add more events requires FRONTEND MSR on Sapphire Rapids
perf/x86/intel: Fix fixed counter check warning for some Alder Lake
perf/x86/intel: Fix PEBS-via-PT reload base value for Extended PEBS
perf/x86: Reset the dirty counter to prevent the leak for an RDPMC task
kprobes: Do not increment probe miss count in the fault handler
x86,kprobes: WARN if kprobes tries to handle a fault
kprobes: Remove kprobe::fault_handler
uprobes: Update uprobe_write_opcode() kernel-doc comment
perf/hw_breakpoint: Fix DocBook warnings in perf hw_breakpoint
perf/core: Fix DocBook warnings
perf/core: Make local function perf_pmu_snapshot_aux() static
perf/x86/intel/uncore: Enable I/O stacks to IIO PMON mapping on ICX
perf/x86/intel/uncore: Enable I/O stacks to IIO PMON mapping on SNR
perf/x86/intel/uncore: Generalize I/O stacks to PMON mapping procedure
perf/x86/intel/uncore: Drop unnecessary NULL checks after container_of()
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull locking updates from Ingo Molnar:
- Core locking & atomics:
- Convert all architectures to ARCH_ATOMIC: move every architecture
to ARCH_ATOMIC, then get rid of ARCH_ATOMIC and all the
transitory facilities and #ifdefs.
Much reduction in complexity from that series:
63 files changed, 756 insertions(+), 4094 deletions(-)
- Self-test enhancements
- Futexes:
- Add the new FUTEX_LOCK_PI2 ABI, which is a variant that doesn't
set FLAGS_CLOCKRT (.e. uses CLOCK_MONOTONIC).
[ The temptation to repurpose FUTEX_LOCK_PI's implicit setting of
FLAGS_CLOCKRT & invert the flag's meaning to avoid having to
introduce a new variant was resisted successfully. ]
- Enhance futex self-tests
- Lockdep:
- Fix dependency path printouts
- Optimize trace saving
- Broaden & fix wait-context checks
- Misc cleanups and fixes.
* tag 'locking-core-2021-06-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (52 commits)
locking/lockdep: Correct the description error for check_redundant()
futex: Provide FUTEX_LOCK_PI2 to support clock selection
futex: Prepare futex_lock_pi() for runtime clock selection
lockdep/selftest: Remove wait-type RCU_CALLBACK tests
lockdep/selftests: Fix selftests vs PROVE_RAW_LOCK_NESTING
lockdep: Fix wait-type for empty stack
locking/selftests: Add a selftest for check_irq_usage()
lockding/lockdep: Avoid to find wrong lock dep path in check_irq_usage()
locking/lockdep: Remove the unnecessary trace saving
locking/lockdep: Fix the dep path printing for backwards BFS
selftests: futex: Add futex compare requeue test
selftests: futex: Add futex wait test
seqlock: Remove trailing semicolon in macros
locking/lockdep: Reduce LOCKDEP dependency list
locking/lockdep,doc: Improve readability of the block matrix
locking/atomics: atomic-instrumented: simplify ifdeffery
locking/atomic: delete !ARCH_ATOMIC remnants
locking/atomic: xtensa: move to ARCH_ATOMIC
locking/atomic: sparc: move to ARCH_ATOMIC
locking/atomic: sh: move to ARCH_ATOMIC
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull objtool fix and updates from Ingo Molnar:
"An ELF format fix for a section flags mismatch bug that breaks kernel
tooling such as kpatch-build.
The biggest change in this cycle is the new code to handle and rewrite
variable sized jump labels - which results in slightly tighter code
generation in hot paths, through the use of short(er) NOPs.
Also a number of cleanups and fixes, and a change to the generic
include/linux/compiler.h to handle a s390 GCC quirk"
* tag 'objtool-urgent-2021-06-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
objtool: Don't make .altinstructions writable
* tag 'objtool-core-2021-06-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
objtool: Improve reloc hash size guestimate
instrumentation.h: Avoid using inline asm operand modifiers
compiler.h: Avoid using inline asm operand modifiers
kbuild: Fix objtool dependency for 'OBJECT_FILES_NON_STANDARD_<obj> := n'
objtool: Reflow handle_jump_alt()
jump_label/x86: Remove unused JUMP_LABEL_NOP_SIZE
jump_label, x86: Allow short NOPs
objtool: Provide stats for jump_labels
objtool: Rewrite jump_label instructions
objtool: Decode jump_entry::key addend
jump_label, x86: Emit short JMP
jump_label: Free jump_entry::key bit1 for build use
jump_label, x86: Add variable length patching support
jump_label, x86: Introduce jump_entry_size()
jump_label, x86: Improve error when we fail expected text
jump_label, x86: Factor out the __jump_table generation
jump_label, x86: Strip ASM jump_label support
x86, objtool: Dont exclude arch/x86/realmode/
objtool: Rewrite hashtable sizing
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull EFI updates from Ingo Molnar:
"Two driver API cleanups, and a log message tweak"
* tag 'efi-core-2021-06-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/efi: Log 32/64-bit mismatch with kernel as an error
efi/dev-path-parser: Switch to use for_each_acpi_dev_match()
efi/apple-properties: Handle device properties with software node API
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 SEV updates from Borislav Petkov:
- Differentiate the type of exception the #VC handler raises depending
on code executed in the guest and handle the case where failure to
get the RIP would result in a #GP, as it should, instead of in a #PF
- Disable interrupts while the per-CPU GHCB is held
- Split the #VC handler depending on where the #VC exception has
happened and therefore provide for precise context tracking like the
rest of the exception handlers deal with noinstr regions now
- Add defines for the GHCB version 2 protocol so that further shared
development with KVM can happen without merge conflicts
- The usual small cleanups
* tag 'x86_sev_for_v5.14_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/sev: Use "SEV: " prefix for messages from sev.c
x86/sev: Add defines for GHCB version 2 MSR protocol requests
x86/sev: Split up runtime #VC handler for correct state tracking
x86/sev: Make sure IRQs are disabled while GHCB is active
x86/sev: Propagate #GP if getting linear instruction address failed
x86/insn: Extend error reporting from insn_fetch_from_user[_inatomic]()
x86/insn-eval: Make 0 a valid RIP for insn_get_effective_ip()
x86/sev: Fix error message in runtime #VC handler
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 cpu updates from Borislav Petkov:
- New AMD models support
- Allow MONITOR/MWAIT to be used for C1 state entry on Hygon too
- Use the special RAPL CPUID bit to detect the functionality on AMD and
Hygon instead of doing family matching.
- Add support for new Intel microcode deprecating TSX on some models
and do not enable kernel workarounds for those CPUs when TSX
transactions always abort, as a result of that microcode update.
* tag 'x86_cpu_for_v5.14_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/tsx: Clear CPUID bits when TSX always force aborts
x86/events/intel: Do not deploy TSX force abort workaround when TSX is deprecated
x86/msr: Define new bits in TSX_FORCE_ABORT MSR
perf/x86/rapl: Use CPUID bit on AMD and Hygon parts
x86/cstate: Allow ACPI C1 FFH MWAIT use on Hygon systems
x86/amd_nb: Add AMD family 19h model 50h PCI ids
x86/cpu: Fix core name for Sapphire Rapids
|