Age | Commit message (Collapse) | Author |
|
KMSAN will complain if valid address length passed to udpv6_pre_connect()
is shorter than sizeof("struct sockaddr"->sa_family) bytes.
(This patch is bogus if it is guaranteed that udpv6_pre_connect() is
always called after checking "struct sockaddr"->sa_family. In that case,
we want a comment why we don't need to check valid address length here.)
Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Acked-by: Song Liu <songliubraving@fb.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
KMSAN will complain if valid address length passed to bpf_bind() is
shorter than sizeof("struct sockaddr"->sa_family) bytes.
Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Acked-by: Andrey Ignatov <rdna@fb.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
KMSAN will complain if valid address length passed to bind() is shorter
than sizeof(struct sockaddr_llc) bytes.
Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
KMSAN will complain if valid address length passed to bind() is shorter
than sizeof(struct sockaddr_sco) bytes.
Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
KMSAN will complain if valid address length passed to bind() is shorter
than sizeof(struct sockaddr_rxrpc) bytes.
Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
KMSAN will complain if valid address length passed to bind() is shorter
than sizeof(struct sockaddr_nl) bytes.
Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
KMSAN will complain if valid address length passed to connect() is shorter
than sizeof("struct sockaddr"->sa_family) bytes.
Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
KMSAN will complain if valid address length passed to bind() is shorter
than sizeof("struct sockaddr_mISDN"->family) bytes.
Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
syzbot is reporting uninitialized value at rds_connect() [1] and
rds_bind() [2]. This is because syzbot is passing ulen == 0 whereas
these functions expect that it is safe to access sockaddr->family field
in order to determine minimal address length for validation.
[1] https://syzkaller.appspot.com/bug?id=f4e61c010416c1e6f0fa3ffe247561b60a50ad71
[2] https://syzkaller.appspot.com/bug?id=a4bf9e41b7e055c3823fdcd83e8c58ca7270e38f
Reported-by: syzbot <syzbot+0049bebbf3042dbd2e8f@syzkaller.appspotmail.com>
Reported-by: syzbot <syzbot+915c9f99f3dbc4bd6cd1@syzkaller.appspotmail.com>
Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
coredump analysis
During coredump analysis, it is not easy to obtain the address of
backend_info in xen-netback.
So far there are two ways to obtain backend_info:
1. Do what xenbus_device_find() does for vmcore to find the xenbus_device
and then derive it from dev_get_drvdata().
2. Extract backend_info from callstack of xenwatch (e.g., netback_remove()
or frontend_changed()).
This patch adds a reference from xenvif to backend_info so that it would be
much more easier to obtain backend_info during coredump analysis.
Signed-off-by: Dongli Zhang <dongli.zhang@oracle.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
CLK_SET_RATE_PARENT would be dropped.
Merge two flag setting together to correct the error.
Fixes: 5a1cc4c27ad2 ("clk: mediatek: Add flags to mtk_gate")
Cc: <stable@vger.kernel.org>
Signed-off-by: Weiyi Lu <weiyi.lu@mediatek.com>
Reviewed-by: Matthias Brugger <matthias.bgg@gmail.com>
Signed-off-by: Stephen Boyd <sboyd@kernel.org>
|
|
Pull dma-mapping fixes from Christoph Hellwig:
"Fix a sparc64 sun4v_pci regression introduced in this merged window,
and a dma-debug stracktrace regression from the big refactor last
merge window"
* tag 'dma-mapping-5.1-1' of git://git.infradead.org/users/hch/dma-mapping:
dma-debug: only skip one stackframe entry
sparc64/pci_sun4v: fix ATU checks for large DMA masks
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu
Pull IOMMU fix from Joerg Roedel:
"Fix an AMD IOMMU issue where the driver didn't correctly setup the
exclusion range in the hardware registers, resulting in exclusion
ranges being one page too big.
This can cause data corruption of the address of that last page is
used by DMA operations"
* tag 'iommu-fix-v5.1-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu:
iommu/amd: Set exclusion range correctly
|
|
Pull clang-format update from Miguel Ojeda:
"The usual roughly-per-release .clang-format macro list update"
* tag 'clang-format-for-linus-v5.1-rc5' of git://github.com/ojeda/linux:
clang-format: Update with the latest for_each macro list
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc
Pull MMC host fixes from Ulf Hansson:
- alcor: Stabilize data write requests
- sdhci-omap: Fix command error path during tuning
* tag 'mmc-v5.1-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc:
mmc: sdhci-omap: Don't finish_mrq() on a command error during tuning
mmc: alcor: don't write data before command has completed
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound
Pull sound fixes from Takashi Iwai:
"Well, this one became unpleasantly larger than previous pull requests,
but it's a kind of usual pattern: now it contains a collection of ASoC
fixes, and nothing to worry too much.
The fixes for ASoC core (DAPM, DPCM, topology) are all small and just
covering corner cases. The rest changes are driver-specific, many of
which are for x86 platforms and new drivers like STM32, in addition to
the usual fixups for HD-audio"
* tag 'sound-5.1-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound: (66 commits)
ASoC: wcd9335: Fix missing regmap requirement
ALSA: hda: Fix racy display power access
ASoC: pcm: fix error handling when try_module_get() fails.
ASoC: stm32: sai: fix master clock management
ASoC: Intel: kbl: fix wrong number of channels
ALSA: hda - Add two more machines to the power_save_blacklist
ASoC: pcm: update module refcount if module_get_upon_open is set
ASoC: core: conditionally increase module refcount on component open
ASoC: stm32: fix sai driver name initialisation
ASoC: topology: Use the correct dobj to free enum control values and texts
ALSA: seq: Fix OOB-reads from strlcpy
ASoC: intel: skylake: add remove() callback for component driver
ASoC: cs35l35: Disable regulators on driver removal
ALSA: xen-front: Do not use stream buffer size before it is set
ASoC: rockchip: pdm: change dma burst to 8
ASoC: rockchip: pdm: fix regmap_ops hang issue
ASoC: simple-card: don't select DPCM via simple-audio-card
ASoC: audio-graph-card: don't select DPCM via audio-graph-card
ASoC: tlv320aic32x4: Change author's name
ALSA: hda/realtek - Add quirk for Tuxedo XC 1509
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull ACPI fix from Rafael Wysocki:
"Fix an ACPICA issue introduced during the 4.20 development cycle and
causing some systems to crash because of leftover operation region
data still maintained after the operation region in question has gone
away (Erik Schmauss)"
* tag 'acpi-5.1-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
ACPICA: Namespace: remove address node from global list after method termination
|
|
Pull drm fixes from Dave Airlie:
"Fixes across the driver spectrum this week, the mediatek fbdev support
might be a bit late for this round, but I looked over it and it's not
very large and seems like a useful feature for them.
Otherwise the main thing is a regression fix for i915 5.0 bug that
caused black screens on a bunch of Dell XPS 15s I think, I know at
least Fedora is waiting for this to land, and the udl fix is also for
a regression since 5.0 where unplugging the device would end badly.
core:
- make atomic hooks optional
i915:
- Revert a 5.0 regression where some eDP panels stopped working
- DSI related fixes for platforms up to IceLake
- GVT (regression fix, warning fix, use-after free fix)
amdgpu:
- Cursor fixes
- missing PCI ID fix for KFD
- XGMI fix
- shadow buffer handling after reset fix
udl:
- fix unplugging device crashes.
mediatek:
- stabilise MT2701 HDMI support
- fbdev support
tegra:
- fix for build regression in rc1.
sun4i:
- Allwinner A6 max freq improvements
- null ptr deref fix
dw-hdmi:
- SCDC configuration improvements
omap:
- CEC clock management policy fix"
* tag 'drm-fixes-2019-04-12' of git://anongit.freedesktop.org/drm/drm: (32 commits)
gpu: host1x: Fix compile error when IOMMU API is not available
drm/i915/gvt: Roundup fb->height into tile's height at calucation fb->size
drm/i915/dp: revert back to max link rate and lane count on eDP
drm/i915/icl: Fix port disable sequence for mipi-dsi
drm/i915/icl: Ungate ddi clocks before IO enable
drm/mediatek: no change parent rate in round_rate() for MT2701 hdmi phy
drm/mediatek: using new factor for tvdpll for MT2701 hdmi phy
drm/mediatek: remove flag CLK_SET_RATE_PARENT for MT2701 hdmi phy
drm/mediatek: make implementation of recalc_rate() for MT2701 hdmi phy
drm/mediatek: fix the rate and divder of hdmi phy for MT2701
drm/mediatek: fix possible object reference leak
drm/i915: Get power refs in encoder->get_power_domains()
drm/i915: Fix pipe_bpp readout for BXT/GLK DSI
drm/amd/display: Fix negative cursor pos programming (v2)
drm/sun4i: tcon top: Fix NULL/invalid pointer dereference in sun8i_tcon_top_un/bind
drm/udl: add a release method and delay modeset teardown
drm/i915/gvt: Prevent use-after-free in ppgtt_free_all_spt()
drm/i915/gvt: Annotate iomem usage
drm/sun4i: DW HDMI: Lower max. supported rate for H6
Revert "Documentation/gpu/meson: Remove link to meson_canvas.c"
...
|
|
Rather embarrassingly, our futex() FUTEX_WAKE_OP implementation doesn't
explicitly set the return value on the non-faulting path and instead
leaves it holding the result of the underlying atomic operation. This
means that any FUTEX_WAKE_OP atomic operation which computes a non-zero
value will be reported as having failed. Regrettably, I wrote the buggy
code back in 2011 and it was upstreamed as part of the initial arm64
support in 2012.
The reasons we appear to get away with this are:
1. FUTEX_WAKE_OP is rarely used and therefore doesn't appear to get
exercised by futex() test applications
2. If the result of the atomic operation is zero, the system call
behaves correctly
3. Prior to version 2.25, the only operation used by GLIBC set the
futex to zero, and therefore worked as expected. From 2.25 onwards,
FUTEX_WAKE_OP is not used by GLIBC at all.
Fix the implementation by ensuring that the return value is either 0
to indicate that the atomic operation completed successfully, or -EFAULT
if we encountered a fault when accessing the user mapping.
Cc: <stable@kernel.org>
Fixes: 6170a97460db ("arm64: Atomic operations")
Signed-off-by: Will Deacon <will.deacon@arm.com>
|
|
The exlcusion range limit register needs to contain the
base-address of the last page that is part of the range, as
bits 0-11 of this register are treated as 0xfff by the
hardware for comparisons.
So correctly set the exclusion range in the hardware to the
last page which is _in_ the range.
Fixes: b2026aa2dce44 ('x86, AMD IOMMU: add functions for programming IOMMU MMIO space')
Signed-off-by: Joerg Roedel <jroedel@suse.de>
|
|
Re-run the shell fragment that generated the original list now that
there are two dozens of new entries after v5.1's merge window.
Signed-off-by: Miguel Ojeda <miguel.ojeda.sandonis@gmail.com>
|
|
Thomas-Mich Richter reported he triggered a WARN()ing from event_function_local()
on his s390. The problem boils down to:
CPU-A CPU-B
perf_event_overflow()
perf_event_disable_inatomic()
@pending_disable = 1
irq_work_queue();
sched-out
event_sched_out()
@pending_disable = 0
sched-in
perf_event_overflow()
perf_event_disable_inatomic()
@pending_disable = 1;
irq_work_queue(); // FAILS
irq_work_run()
perf_pending_event()
if (@pending_disable)
perf_event_disable_local(); // WHOOPS
The problem exists in generic, but s390 is particularly sensitive
because it doesn't implement arch_irq_work_raise(), nor does it call
irq_work_run() from it's PMU interrupt handler (nor would that be
sufficient in this case, because s390 also generates
perf_event_overflow() from pmu::stop). Add to that the fact that s390
is a virtual architecture and (virtual) CPU-A can stall long enough
for the above race to happen, even if it would self-IPI.
Adding a irq_work_sync() to event_sched_in() would work for all hardare
PMUs that properly use irq_work_run() but fails for software PMUs.
Instead encode the CPU number in @pending_disable, such that we can
tell which CPU requested the disable. This then allows us to detect
the above scenario and even redirect the IPI to make up for the failed
queue.
Reported-by: Thomas-Mich Richter <tmricht@linux.ibm.com>
Tested-by: Thomas Richter <tmricht@linux.ibm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Mark Rutland <mark.rutland@arm.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Hendrik Brueckner <brueckner@linux.ibm.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
David Miller says:
====================
SCTP: Event skb list overhaul.
This patch series eliminates the explicit reference to the skb list
implementation via skb->prev dereferences.
The approach used is to pass a non-empty skb list around instead of an
event skb object which may or may not be on a list.
I'd like to thank Marcelo Leitner, Xin Long, and Neil Horman for
reviewing previous versions of this series.
Testing would be very much appreciated, in addition to the review of
course.
v4 --> v5: Rebase to net-next
v3 --> v4: Fix the logic in patch #4 so that we don't miss cases
where we should add event to the on-stack temp list.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Now the SKB list implementation assumption can be removed.
And now that we know that the list head is always non-NULL
we can remove the code blocks dealing with that as well.
Signed-off-by: David S. Miller <davem@davemloft.net>
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Pass this, instead of an event. Then everything trickles down and we
always have events a non-empty list.
Then we needs a list creating stub to place into .enqueue_event for sctp_stream_interleave_1.
Signed-off-by: David S. Miller <davem@davemloft.net>
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
This way we can make sure events sent this way to
sctp_ulpq_tail_event() are on a list as well. Now all such code paths
are fully covered.
Signed-off-by: David S. Miller <davem@davemloft.net>
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
This way we can simplify the logic and remove assumptions
about the implementation of skb lists.
Signed-off-by: David S. Miller <davem@davemloft.net>
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Inside the loop, we always start with event non-NULL.
Signed-off-by: David S. Miller <davem@davemloft.net>
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Fix net reference counting in fl_change() and remove redundant call to
tcf_exts_get_net() from __fl_delete(). __fl_put() already tries to get net
before releasing exts and deallocating a filter, so this code caused flower
classifier to obtain net twice per filter that is being deleted.
Implementation of __fl_delete() called tcf_exts_get_net() to pass its
result as 'async' flag to fl_mask_put(). However, 'async' flag is redundant
and only complicates fl_mask_put() implementation. This functionality seems
to be copied from filter cleanup code, where it was added by Cong with
following explanation:
This patchset tries to fix the race between call_rcu() and
cleanup_net() again. Without holding the netns refcnt the
tc_action_net_exit() in netns workqueue could be called before
filter destroy works in tc filter workqueue. This patchset
moves the netns refcnt from tc actions to tcf_exts, without
breaking per-netns tc actions.
This doesn't apply to flower mask, which doesn't call any tc action code
during cleanup. Simplify fl_mask_put() by removing the flag parameter and
always use tcf_queue_work() to free mask objects.
Fixes: 061775583e35 ("net: sched: flower: introduce reference counting for filters")
Fixes: 1f17f7742eeb ("net: sched: flower: insert filter to ht before offloading it to hw")
Fixes: 05cd271fd61a ("cls_flower: Support multiple masks per priority")
Reported-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
pmtu.sh script runs a number of tests and dumps a summary of pass/fail.
If a test fails, it is near impossible to debug why. For example:
TEST: ipv6: PMTU exceptions [FAIL]
There are a lot of commands run behind the scenes for this test. Which
one is failing?
Add a VERBOSE option to show commands that are run and any output from
those commands. Add a PAUSE_ON_FAIL option to halt the script if a test
fails allowing users to poke around with the setup in the failed state.
In the process, rename tracing to TRACING and move declaration to top
with the new variables.
Signed-off-by: David Ahern <dsahern@gmail.com>
Reviewed-by: Stefano Brivio <sbrivio@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
After commit e21db6f69a95 ("tcp: track total bytes delivered with ECN CE marks")
core TCP stack does a very good job tracking ECN signals.
The "sender's best estimate of CE information" Yuchung mentioned in his
patch is indeed the best we can do.
DCTCP can use tp->delivered_ce and tp->delivered to not duplicate the logic,
and use the existing best estimate.
This solves some problems, since current DCTCP logic does not deal with losses
and/or GRO or ack aggregation very well.
This also removes a dubious use of inet_csk(sk)->icsk_ack.rcv_mss
(this should have been tp->mss_cache), and a 64 bit divide.
Finally, we can see that the DCTCP logic, calling dctcp_update_alpha() for
every ACK could be done differently, calling it only once per RTT.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Yuchung Cheng <ycheng@google.com>
Cc: Neal Cardwell <ncardwell@google.com>
Cc: Soheil Hassas Yeganeh <soheil@google.com>
Cc: Florian Westphal <fw@strlen.de>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Lawrence Brakmo <brakmo@fb.com>
Cc: Abdul Kabbani <akabbani@google.com>
Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
git://anongit.freedesktop.org/drm/drm-intel into drm-fixes
- Revert back to max link rate and lane count on eDP.
- DSI related fixes for all platforms including Ice Lake.
- GVT Fixes including one vGPU display plane size regression fix,
one for preventing use-after-free in ppgtt shadow free function,
and another warning fix for iomem access annotation.
Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Rodrigo Vivi <rodrigo.vivi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190411235832.GA6476@intel.com
|
|
If the last bio returned is not dio->bio, the status of the bio will
not assigned to dio->bio if it is error. This will cause the whole IO
status wrong.
ksoftirqd/21-117 [021] ..s. 4017.966090: 8,0 C N 4883648 [0]
<idle>-0 [018] ..s. 4017.970888: 8,0 C WS 4924800 + 1024 [0]
<idle>-0 [018] ..s. 4017.970909: 8,0 D WS 4935424 + 1024 [<idle>]
<idle>-0 [018] ..s. 4017.970924: 8,0 D WS 4936448 + 321 [<idle>]
ksoftirqd/21-117 [021] ..s. 4017.995033: 8,0 C R 4883648 + 336 [65475]
ksoftirqd/21-117 [021] d.s. 4018.001988: myprobe1: (blkdev_bio_end_io+0x0/0x168) bi_status=7
ksoftirqd/21-117 [021] d.s. 4018.001992: myprobe: (aio_complete_rw+0x0/0x148) x0=0xffff802f2595ad80 res=0x12a000 res2=0x0
We always have to assign bio->bi_status to dio->bio.bi_status because we
will only check dio->bio.bi_status when we return the whole IO to
the upper layer.
Fixes: 542ff7bf18c6 ("block: new direct I/O implementation")
Cc: stable@vger.kernel.org
Cc: Christoph Hellwig <hch@lst.de>
Cc: Jens Axboe <axboe@kernel.dk>
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Jason Yan <yanaijie@huawei.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Daniel Borkmann says:
====================
pull-request: bpf-next 2019-04-12
The following pull-request contains BPF updates for your *net-next* tree.
The main changes are:
1) Improve BPF verifier scalability for large programs through two
optimizations: i) remove verifier states that are not useful in pruning,
ii) stop walking parentage chain once first LIVE_READ is seen. Combined
gives approx 20x speedup. Increase limits for accepting large programs
under root, and add various stress tests, from Alexei.
2) Implement global data support in BPF. This enables static global variables
for .data, .rodata and .bss sections to be properly handled which allows
for more natural program development. This also opens up the possibility
to optimize program workflow by compiling ELFs only once and later only
rewriting section data before reload, from Daniel and with test cases and
libbpf refactoring from Joe.
3) Add config option to generate BTF type info for vmlinux as part of the
kernel build process. DWARF debug info is converted via pahole to BTF.
Latter relies on libbpf and makes use of BTF deduplication algorithm which
results in 100x savings compared to DWARF data. Resulting .BTF section is
typically about 2MB in size, from Andrii.
4) Add BPF verifier support for stack access with variable offset from
helpers and add various test cases along with it, from Andrey.
5) Extend bpf_skb_adjust_room() growth BPF helper to mark inner MAC header
so that L2 encapsulation can be used for tc tunnels, from Alan.
6) Add support for input __sk_buff context in BPF_PROG_TEST_RUN so that
users can define a subset of allowed __sk_buff fields that get fed into
the test program, from Stanislav.
7) Add bpf fs multi-dimensional array tests for BTF test suite and fix up
various UBSAN warnings in bpftool, from Yonghong.
8) Generate a pkg-config file for libbpf, from Luca.
9) Dump program's BTF id in bpftool, from Prashant.
10) libbpf fix to use smaller BPF log buffer size for AF_XDP's XDP
program, from Magnus.
11) kallsyms related fixes for the case when symbols are not present in
BPF selftests and samples, from Daniel
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
This makes broute a normal ebtables table, hooking at PREROUTING.
The broute hook is removed.
It uses skb->cb to signal to bridge rx handler that the skb should be
routed instead of being bridged.
This change is backwards compatible with ebtables as no userspace visible
parts are changed.
This means we can also remove the !ops test in ebt_register_table,
it was only there for broute table sake.
Signed-off-by: Florian Westphal <fw@strlen.de>
Acked-by: David S. Miller <davem@davemloft.net>
Acked-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
Replace NF_HOOK() based invocation of the netfilter hooks with a private
copy of nf_hook_slow().
This copy has one difference: it can return the rx handler value expected
by the stack, i.e. RX_HANDLER_CONSUMED or RX_HANDLER_PASS.
This is needed by the next patch to invoke the ebtables
"broute" table via the standard netfilter hooks rather than the custom
"br_should_route_hook" indirection that is used now.
When the skb is to be "brouted", we must return RX_HANDLER_PASS from the
bridge rx input handler, but there is no way to indicate this via
NF_HOOK(), unless perhaps by some hack such as exposing bridge_cb in the
netfilter core or a percpu flag.
text data bss dec filename
3369 56 0 3425 net/bridge/br_input.o.before
3458 40 0 3498 net/bridge/br_input.o.after
This allows removal of the "br_should_route_hook" in the next patch.
Signed-off-by: Florian Westphal <fw@strlen.de>
Acked-by: David S. Miller <davem@davemloft.net>
Acked-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
Reduce size of br_input_skb_cb from 24 to 16 bytes by
using bitfield for those values that can only be 0 or 1.
igmp is the igmp type value, so it needs to be at least u8.
Furthermore, the bridge currently relies on step-by-step initialization
of br_input_skb_cb fields as the skb passes through the stack.
Explicitly zero out the bridge input cb instead, this avoids having to
review/validate that no BR_INPUT_SKB_CB(skb)->foo test can see a
'random' value from previous protocol cb.
AFAICS all current fields are always set up before they are read again,
so this is not a bug fix.
Signed-off-by: Florian Westphal <fw@strlen.de>
Acked-by: David S. Miller <davem@davemloft.net>
Acked-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
ebtables -t broute allows to redirect packets in a way that
they get pushed up the stack, even if the interface is part
of a bridge.
In case of IP packets to non-local address, this means
those IP packets are routed instead of bridged-forwarded, just
as if the bridge would not have existed.
Expected test output is:
PASS: netns connectivity: ns1 and ns2 can reach each other
PASS: ns1/ns2 connectivity with active broute rule
PASS: ns1/ns2 connectivity with active broute rule and bridge forward drop
Signed-off-by: Florian Westphal <fw@strlen.de>
Acked-by: David S. Miller <davem@davemloft.net>
Acked-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
This should allow us later to extend BPF_PROG_TEST_RUN for non-skb case
and be sure that nobody is erroneously setting ctx_{in,out}.
Fixes: b0b9395d865e ("bpf: support input __sk_buff context in BPF_PROG_TEST_RUN")
Reported-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Stanislav Fomichev <sdf@google.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
|
|
Add the definition for smp_rmb(), smp_wmb(), and smp_mb() to the
tools include infrastructure: this patch adds the implementation
for x86-64 and arm64, and have it fall back as currently is for
other archs which do not have it implemented at this point. The
x86-64 one uses lock + add combination for smp_mb() with address
below red zone.
This is on top of 09d62154f613 ("tools, perf: add and use optimized
ring_buffer_{read_head, write_tail} helpers"), which didn't touch
smp_* barrier implementations. Magnus recently rightfully reported
however that the latter on x86-64 still wrongly falls back to sfence,
lfence and mfence respectively, thus fix that for applications under
tools making use of these to avoid such ugly surprises. The main
header under tools (include/asm/barrier.h) will in that case not
select the fallback implementation.
Reported-by: Magnus Karlsson <magnus.karlsson@intel.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
David Ahern says:
====================
ipv6: Refactor nexthop selection helpers during a fib lookup
IPv6 has a fib6_nh embedded within each fib6_info and a separate
fib6_info for each path in a multipath route. A side effect is that
a fib6_info is passed all the way down the stack when selecting a path
on a fib lookup. Refactor the fib lookup functions and associated
helper functions to take a fib6_nh when appropriate to enable IPv6
to work with nexthop objects where the fib6_nh is not directly part
of a fib entry.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Move the nexthop evaluation of a fib entry to a helper that can be
leveraged for each fib6_nh in a multipath nexthop object.
In the move, 'continue' statements means the helper returns false
(loop should continue) and 'break' means return true (found the entry
of interest).
Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Move the device and gateway checks in the fib6_next loop to a helper
that can be called per fib6_nh entry.
Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Move the siblings and fib6_multipath_select after the null entry check
since a null entry can not have siblings.
Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Clean up the fib6_null_entry handling in ip6_pol_route_lookup.
rt6_device_match can return fib6_null_entry, but fib6_multipath_select
can not. Consolidate the fib6_null_entry handling and on the final
null_entry check set rt and goto out - no need to defer to a second
check after rt6_find_cached_rt.
Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
find_rr_leaf has 3 loops over fib_entries calling find_match. The loops
are very similar with differences in start point and whether the metric
is evaluated:
1. start at rr_head, no extra loop compare, check fib metric
2. start at leaf, compare rt against rr_head, check metric
3. start at cont (potential saved point from earlier loops), no
extra loop compare, no metric check
Create 1 loop that is called 3 different times. This will make a
later change with multipath nexthop objects much simpler.
Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
find_match primarily needs a fib6_nh (and fib6_flags which it passes
through to rt6_score_route). Move fib6_check_expired up to the call
sites so find_match is only called for relevant entries. Remove the
match argument which is mostly a pass through and use the return
boolean to decide if match gets set in the call sites.
The end result is a helper that can be called per fib6_nh struct
which is needed once fib entries reference nexthop objects that
have more than one fib6_nh.
Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
rt6_score_route only needs the fib6_flags and nexthop data. Change
it accordingly. Allows re-use later for nexthop based fib6_nh.
Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
rt6_probe sends probes for gateways in a nexthop. As such it really
depends on a fib6_nh, not a fib entry. Move last_probe to fib6_nh and
update rt6_probe to a fib6_nh struct.
Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
rt6_check_dev is a simpler helper with only 1 caller. Fold the code
into rt6_score_route.
Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|