Age | Commit message (Collapse) | Author |
|
Similarly to gfp_t, define fgf_t as its own type to prevent various
misuses and confusion. Leave the flags as FGP_* for now to reduce the
size of this patch; they will be converted to FGF_* later. Move the
documentation to the definition of the type insted of burying it in the
__filemap_get_folio() documentation.
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Add a folio wrapper around copy_page_from_iter_atomic().
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
|
|
Trying to restrict the '$'-prefix change to RISC-V caused some fallout,
so let's just treat all those symbols as special.
Fixes: c05780ef3c190 ("module: Ignore RISC-V mapping symbols too")
Link: https://lore.kernel.org/all/20230712015747.77263-1-wangkefeng.wang@huawei.com/
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
Reviewed-by: Masahiro Yamada <masahiroy@kernel.org>
Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
|
|
Make the comments for I/O, system and DMA memory say the same.
Makes the header file's structure more obvious.
Suggested-by: Javier Martinez Canillas <javierm@redhat.com>
Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de>
Reviewed-by: Javier Martinez Canillas <javierm@redhat.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20230707083422.18691-13-tzimmermann@suse.de
|
|
Remove the initializer macro FB_DEFAULT_SYS_OPS and its helper macro
__FB_DEFAULT_SYS_OPS_MMAP. There are no users.
Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de>
Reviewed-by: Javier Martinez Canillas <javierm@redhat.com>
Acked-by: Maxime Ripard <mripard@kernel.org>
Cc: Helge Deller <deller@gmx.de> (maintainer:FRAMEBUFFER LAYER)
Link: https://patchwork.freedesktop.org/patch/msgid/20230707083422.18691-12-tzimmermann@suse.de
|
|
Add initializer macros for struct fb_ops for framebuffers in DMA-able
memory areas. Also add a corresponding Kconfig token. As of now, this
is equivalent to system framebuffers and mostly useful for labeling
drivers correctly.
A later patch may add a generic DMA-specific mmap operation. Linux
offers a number of dma_mmap_*() helpers for different use cases.
Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de>
Reviewed-by: Javier Martinez Canillas <javierm@redhat.com>
Acked-by: Maxime Ripard <mripard@kernel.org>
Cc: Helge Deller <deller@gmx.de>
Link: https://patchwork.freedesktop.org/patch/msgid/20230707083422.18691-2-tzimmermann@suse.de
|
|
Currently, all files with EXPORT_SYMBOL() are rebuilt when CONFIG_MODULES
is flipped due to <linux/export.h> depending on CONFIG_MODULES.
Now that modpost can make a final decision about export symbols,
<linux/export.h> does not need to make EXPORT_SYMBOL() no-op.
Instead, modpost can skip emitting KSYMTAB when CONFIG_MODULES is unset.
This commit will reduce the number of recompilation when CONFIG_MODULES
is toggled.
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
|
|
Remove the unused flags FBINFO_DEFAULT and FBINFO_FLAG_DEFAULT. No
functional changes.
Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de>
Acked-by: Sam Ravnborg <sam@ravnborg.org>
Cc: Helge Deller <deller@gmx.de>
Link: https://patchwork.freedesktop.org/patch/msgid/20230715185343.7193-18-tzimmermann@suse.de
|
|
IPv6 inet sockets are supposed to have a "struct ipv6_pinfo"
field at the end of their definition, so that inet6_sk_generic()
can derive from socket size the offset of the "struct ipv6_pinfo".
This is very fragile, and prevents adding bigger alignment
in sockets, because inet6_sk_generic() does not work
if the compiler adds padding after the ipv6_pinfo component.
We are currently working on a patch series to reorganize
TCP structures for better data locality and found issues
similar to the one fixed in commit f5d547676ca0
("tcp: fix tcp_inet6_sk() for 32bit kernels")
Alternative would be to force an alignment on "struct ipv6_pinfo",
greater or equal to __alignof__(any ipv6 sock) to ensure there is
no padding. This does not look great.
v2: fix typo in mptcp_proto_v6_init() (Paolo)
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Chao Wu <wwchao@google.com>
Cc: Wei Wang <weiwan@google.com>
Cc: Coco Li <lixiaoyan@google.com>
Cc: YiFei Zhu <zhuyifei@google.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Now that everything in-tree is converted to use the accessor functions,
rename the i_ctime field in the inode to discourage direct access.
Signed-off-by: Jeff Layton <jlayton@kernel.org>
Reviewed-by: Damien Le Moal <dlemoal@kernel.org>
Reviewed-by: Jan Kara <jack@suse.cz>
Message-Id: <20230705185812.579118-4-jlayton@kernel.org>
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
In later patches, we're going to change how the inode's ctime field is
used. Switch to using accessor functions instead of raw accesses of
inode->i_ctime.
Signed-off-by: Jeff Layton <jlayton@kernel.org>
Reviewed-by: Jan Kara <jack@suse.cz>
Message-Id: <20230705190309.579783-82-jlayton@kernel.org>
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
Reuse dev_pm_opp_get_freq_indexed() from dev_pm_opp_get_freq().
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Acked-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
|
|
The indexed version of the API is added for other floor and ceil, add
the same for exact as well for completeness.
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
|
|
In the case of devices with multiple clocks, drivers need to specify the
frequency index for the OPP framework to get the specific frequency within
the required OPP. So let's introduce the dev_pm_opp_get_freq_indexed() API
accepting the frequency index as an argument.
Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
[ Viresh: Fixed potential access to NULL opp pointer ]
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
|
|
Drop the wake-irq enable and disable helpers which have not been used
since commit bed570307ed7 ("PM / wakeirq: Fix dedicated wakeirq for
drivers not using autosuspend").
Note that these functions are essentially just leftovers from the first
iteration of the wake-irq implementation where device drivers were
supposed to call these functions themselves instead of PM core (as
is also indicated by the bogus kernel doc comments).
Signed-off-by: Johan Hovold <johan+linaro@kernel.org>
Reviewed-by: Tony Lindgren <tony@atomide.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
|
|
Since commit 3d439b1a2ad3 ("thermal/core: Alloc-copy-free the thermal zone
parameters structure"), thermal_zone_device_register() allocates a copy
of the tzp argument and callers need not explicitly manage its lifetime.
This means the function no longer cares about the parameter being
mutable, so constify it.
No functional change.
Signed-off-by: Ahmad Fatoum <a.fatoum@pengutronix.de>
Acked-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
|
|
There's several things here that will really help my CI.
|
|
There's several things here that will really help my CI.
|
|
There's several things here that will really help my CI.
|
|
commit 830ae442202e ("extcon: Remove the deprecated extcon functions")
left behind this.
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: Chanwoo Choi <cw00.choi@samsung.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4
Pull ext4 fixes from Ted Ts'o:
"Bug and regression fixes for 6.5-rc3 for ext4's mballoc and jbd2's
checkpoint code"
* tag 'ext4_for_linus-6.5-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4:
ext4: fix rbtree traversal bug in ext4_mb_use_preallocated
ext4: fix off by one issue in ext4_mb_choose_next_group_best_avail()
ext4: correct inline offset when handling xattrs in inode body
jbd2: remove __journal_try_to_free_buffer()
jbd2: fix a race when checking checkpoint buffer busy
jbd2: Fix wrongly judgement for buffer head removing while doing checkpoint
jbd2: remove journal_clean_one_cp_list()
jbd2: remove t_checkpoint_io_list
jbd2: recheck chechpointing non-dirty buffer
|
|
There are devices (such as Murata IRS-D200 PIR proximity sensor) that
check the data signal with a running period. I.e. for a specified time,
they count the number of conditions that have occurred, and then signal
if that is more than a specified amount.
`IIO_EV_INFO_PERIOD` resets when the condition no longer is true and is
therefore not suitable for these devices. Add a new `iio_event_info`
`IIO_EV_INFO_RUNNING_PERIOD` that can be used as a running period. Also
add a new `IIO_EV_INFO_RUNNING_COUNT` that can be used to specify the
number of conditions that must occur during this running period.
Signed-off-by: Waqar Hameed <waqar.hameed@axis.com>
Link: https://lore.kernel.org/r/ee4a801ae9b9c4716c7bd23d8f79f232351df8bd.1689753076.git.waqar.hameed@axis.com
Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
|
|
This change adds a new sysctl accept_ra_min_rtr_lft to specify the
minimum acceptable router lifetime in an RA. If the received RA router
lifetime is less than the configured value (and not 0), the RA is
ignored.
This is useful for mobile devices, whose battery life can be impacted
by networks that configure RAs with a short lifetime. On such networks,
the device should never gain IPv6 provisioning and should attempt to
drop RAs via hardware offload, if available.
Signed-off-by: Patrick Rohr <prohr@google.com>
Cc: Maciej Żenczykowski <maze@google.com>
Cc: Lorenzo Colitti <lorenzo@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The current proc connector code has the foll. bugs - if there are more
than one listeners for the proc connector messages, and one of them
deregisters for listening using PROC_CN_MCAST_IGNORE, they will still get
all proc connector messages, as long as there is another listener.
Another issue is if one client calls PROC_CN_MCAST_LISTEN, and another one
calls PROC_CN_MCAST_IGNORE, then both will end up not getting any messages.
This patch adds filtering and drops packet if client has sent
PROC_CN_MCAST_IGNORE. This data is stored in the client socket's
sk_user_data. In addition, we only increment or decrement
proc_event_num_listeners once per client. This fixes the above issues.
cn_release is the release function added for NETLINK_CONNECTOR. It uses
the newly added netlink_release function added to netlink_sock. It will
free sk_user_data.
Signed-off-by: Anjali Kulkarni <anjali.k.kulkarni@oracle.com>
Reviewed-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
A new function netlink_release is added in netlink_sock to store the
protocol's release function. This is called when the socket is deleted.
This can be supplied by the protocol via the release function in
netlink_kernel_cfg. This is being added for the NETLINK_CONNECTOR
protocol, so it can free it's data when socket is deleted.
Signed-off-by: Anjali Kulkarni <anjali.k.kulkarni@oracle.com>
Reviewed-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Acked-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
To use filtering at the connector & cn_proc layers, we need to enable
filtering in the netlink layer. This reverses the patch which removed
netlink filtering - commit ID for that patch:
549017aa1bb7 (netlink: remove netlink_broadcast_filtered).
Signed-off-by: Anjali Kulkarni <anjali.k.kulkarni@oracle.com>
Reviewed-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Acked-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Pull block fixes from Jens Axboe:
- Fix for loop regressions (Mauricio)
- Fix a potential stall with batched wakeups in sbitmap (David)
- Fix for stall with recursive plug flushes (Ross)
- Skip accounting of empty requests for blk-iocost (Chengming)
- Remove a dead field in struct blk_mq_hw_ctx (Chengming)
* tag 'block-6.5-2023-07-21' of git://git.kernel.dk/linux:
loop: do not enforce max_loop hard limit by (new) default
loop: deprecate autoloading callback loop_probe()
sbitmap: fix batching wakeup
blk-iocost: skip empty flush bio in iocost
blk-mq: delete dead struct blk_mq_hw_ctx->queued field
blk-mq: Fix stall due to recursive flush plug
|
|
When the system is shut down, the process is killed, but the
accelerator device does not stop executing the tasks. If the
accelerator device still accesses the memory and writes back data
to the memory after the memory is reclaimed by the system,
an NFE error may occur. Therefore, before the system is shut
down, the driver needs to stop the device and write data back
to the memory.
Signed-off-by: Weili Qian <qianweili@huawei.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
|
|
Since commit 3a3b7ce93369 ("CRED: Allow kernel services to override LSM settings for task actions")
this is never used, so can be removed.
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Fixes: 3a3b7ce93369 ("CRED: Allow kernel services to override LSM settings for task actions")
[PM: subject tweak, fixes tag]
Signed-off-by: Paul Moore <paul@paul-moore.com>
|
|
In the case of devices with multiple clocks, drivers need to specify the
clock index for the OPP framework to find the OPP corresponding to the
floor/ceil of the supplied frequency. So let's introduce the two new APIs
accepting the clock index as an argument.
These APIs use the exising _find_key_ceil() helper by supplying the clock
index to it.
Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
[ Viresh: Rearranged definitions in pm_opp.h ]
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
|
|
Rearrange the helper function declarations / definitions to keep them in
order of freq, level and then bw.
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
|
|
Cross-merge networking fixes after downstream PR.
No conflicts or adjacent changes.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
This patch fixes the current handling of F_CANCELLK by not just doing a
unlock as we need to try to cancel a lock at first. A unlock makes sense
on a non-blocking lock request but if it's a blocking lock request we
need to cancel the request until it's not granted yet. This patch is fixing
this behaviour by first try to cancel a lock request and if it's failed
it's unlocking the lock which seems to be granted.
Note: currently the nfs locking handling was disabled by commit
40595cdc93ed ("nfs: block notification on fs with its own ->lock").
However DLM was never being updated regarding to this change. Future
patches will try to fix lockd lock requests for DLM. This patch is
currently assuming the upstream DLM lockd handling is correct.
Signed-off-by: Alexander Aring <aahringo@redhat.com>
Signed-off-by: David Teigland <teigland@redhat.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Pull networking fixes from Jakub Kicinski:
"Including fixes from BPF, netfilter, bluetooth and CAN.
Current release - regressions:
- eth: r8169: multiple fixes for PCIe ASPM-related problems
- vrf: fix RCU lockdep splat in output path
Previous releases - regressions:
- gso: fall back to SW segmenting with GSO_UDP_L4 dodgy bit set
- dsa: mv88e6xxx: do a final check before timing out when polling
- nf_tables: fix sleep in atomic in nft_chain_validate
Previous releases - always broken:
- sched: fix undoing tcf_bind_filter() in multiple classifiers
- bpf, arm64: fix BTI type used for freplace attached functions
- can: gs_usb: fix time stamp counter initialization
- nft_set_pipapo: fix improper element removal (leading to UAF)
Misc:
- net: support STP on bridge in non-root netns, STP prevents packet
loops so not supporting it results in freezing systems of
unsuspecting users, and in turn very upset noises being made
- fix kdoc warnings
- annotate various bits of TCP state to prevent data races"
* tag 'net-6.5-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (95 commits)
net: phy: prevent stale pointer dereference in phy_init()
tcp: annotate data-races around fastopenq.max_qlen
tcp: annotate data-races around icsk->icsk_user_timeout
tcp: annotate data-races around tp->notsent_lowat
tcp: annotate data-races around rskq_defer_accept
tcp: annotate data-races around tp->linger2
tcp: annotate data-races around icsk->icsk_syn_retries
tcp: annotate data-races around tp->keepalive_probes
tcp: annotate data-races around tp->keepalive_intvl
tcp: annotate data-races around tp->keepalive_time
tcp: annotate data-races around tp->tsoffset
tcp: annotate data-races around tp->tcp_tx_delay
Bluetooth: MGMT: Use correct address for memcpy()
Bluetooth: btusb: Fix bluetooth on Intel Macbook 2014
Bluetooth: SCO: fix sco_conn related locking and validity issues
Bluetooth: hci_conn: return ERR_PTR instead of NULL when there is no link
Bluetooth: hci_sync: Avoid use-after-free in dbg for hci_remove_adv_monitor()
Bluetooth: coredump: fix building with coredump disabled
Bluetooth: ISO: fix iso_conn related locking and validity issues
Bluetooth: hci_event: call disconnect callback before deleting conn
...
|
|
This field can be read locklessly.
Fixes: 1536e2857bd3 ("tcp: Add a TCP_FASTOPEN socket option to get a max backlog on its listner")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Link: https://lore.kernel.org/r/20230719212857.3943972-12-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
This counter is not used anywhere, so delete it.
Signed-off-by: Chengming Zhou <zhouchengming@bytedance.com>
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Link: https://lore.kernel.org/r/20230720095512.1403123-1-chengming.zhou@linux.dev
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Rename common module to inv_sensors_timestamp, add configuration
at init (chip internal clock, acceptable jitter, ...) and update
inv_icm42600 driver integration.
Signed-off-by: Jean-Baptiste Maneyrol <jean-baptiste.maneyrol@tdk.com>
Reviewed-by: Andy Shevchenko <andy.shevchenko@gmail.com>
Link: https://lore.kernel.org/r/20230606162147.79667-4-inv.git-commit@tdk.com
Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
|
|
Create new inv_sensors common modules and move inv_icm42600
timestamp module inside. This module will be used by IMUs and
also in the future by other chips.
Modify inv_icm42600 driver to use timestamp module and do some
headers cleanup.
Signed-off-by: Jean-Baptiste Maneyrol <jean-baptiste.maneyrol@tdk.com>
Reviewed-by: Andy Shevchenko <andy.shevchenko@gmail.com>
Link: https://lore.kernel.org/r/20230606162147.79667-3-inv.git-commit@tdk.com
Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
|
|
Currently the global @console_suspended is used to determine if
consoles are in a suspended state. Its primary purpose is to allow
usage of the console_lock when suspended without causing console
printing. It is synchronized by the console_lock.
Rather than relying on the console_lock to determine suspended
state, make it an official per-console state that is set within
console->flags. This allows the state to be queried via SRCU.
Remove @console_suspended. Console printing will still be avoided
when suspended because console_is_usable() returns false when
the new suspended flag is set for that console.
Signed-off-by: John Ogness <john.ogness@linutronix.de>
Reviewed-by: Sergey Senozhatsky <senozhatsky@chromium.org>
Reviewed-by: Petr Mladek <pmladek@suse.com>
Signed-off-by: Petr Mladek <pmladek@suse.com>
Link: https://lore.kernel.org/r/20230717194607.145135-7-john.ogness@linutronix.de
|
|
Add a driver for the Marvell 88Q2110. This driver allows to detect the
link, switch between 100BASE-T1 and 1000BASE-T1 and switch between
master and slave mode. Autonegotiation supported by the PHY does not yet
work.
Signed-off-by: Stefan Eichenberger <eichest@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
Add a separate function to read the BASE-T1 abilities. Some PHYs do not
indicate the availability of the extended BASE-T1 ability register, so
this function must be called separately.
Signed-off-by: Stefan Eichenberger <eichest@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
After software has authenticated a dynamic boost control request,
it can fetch and set supported parameters using a selection of messages.
Add support for these messages and export the ability to do this to
userspace.
Acked-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
|
|
As part of the authentication flow for Dynamic Boost Control, the calling
software will need to send a uid used in all of its future
communications.
Add support for another IOCTL call to let userspace software set this up.
Acked-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
|
|
Dynamic Boost Control is a feature offered on AMD client platforms that
allows software to request and set power or frequency limits.
Only software that has authenticated with the PSP can retrieve or set
these limits.
Create a character device and ioctl for fetching the nonce. This ioctl
supports optionally passing authentication information which will influence
how many calls the nonce is valid for.
Acked-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
|
|
Most variables of type struct pwm_chip * are named "chip", there are
only three outliers called "pc". Change these three to "chip", too, for
consistency.
Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Signed-off-by: Thierry Reding <thierry.reding@gmail.com>
|
|
https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next
Alexei Starovoitov says:
====================
pull-request: bpf-next 2023-07-19
We've added 45 non-merge commits during the last 3 day(s) which contain
a total of 71 files changed, 7808 insertions(+), 592 deletions(-).
The main changes are:
1) multi-buffer support in AF_XDP, from Maciej Fijalkowski,
Magnus Karlsson, Tirthendu Sarkar.
2) BPF link support for tc BPF programs, from Daniel Borkmann.
3) Enable bpf_map_sum_elem_count kfunc for all program types,
from Anton Protopopov.
4) Add 'owner' field to bpf_rb_node to fix races in shared ownership,
Dave Marchevsky.
5) Prevent potential skb_header_pointer() misuse, from Alexei Starovoitov.
* tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (45 commits)
bpf, net: Introduce skb_pointer_if_linear().
bpf: sync tools/ uapi header with
selftests/bpf: Add mprog API tests for BPF tcx links
selftests/bpf: Add mprog API tests for BPF tcx opts
bpftool: Extend net dump with tcx progs
libbpf: Add helper macro to clear opts structs
libbpf: Add link-based API for tcx
libbpf: Add opts-based attach/detach/query API for tcx
bpf: Add fd-based tcx multi-prog infra with link support
bpf: Add generic attach/detach/query API for multi-progs
selftests/xsk: reset NIC settings to default after running test suite
selftests/xsk: add test for too many frags
selftests/xsk: add metadata copy test for multi-buff
selftests/xsk: add invalid descriptor test for multi-buffer
selftests/xsk: add unaligned mode test for multi-buffer
selftests/xsk: add basic multi-buffer test
selftests/xsk: transmit and receive multi-buffer packets
xsk: add multi-buffer documentation
i40e: xsk: add TX multi-buffer support
ice: xsk: Tx multi-buffer support
...
====================
Link: https://lore.kernel.org/r/20230719175424.75717-1-alexei.starovoitov@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Introduce a list of generic reset sources and use them to export the
power on reason through sysfs. Update the ABI documentation to describe
this new interface.
Signed-off-by: Kamel Bouhara <kamel.bouhara@bootlin.com>
Acked-by: Nicolas Ferre <nicolas.ferre@microchip.com>
[Miquel Raynal: Follow-up on Kamel's work, 4 years later]
Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com>
Signed-off-by: Sebastian Reichel <sebastian.reichel@collabora.com>
|
|
Those who have worked with RCU for some time will naturally think in
terms of the long-standing call_rcu() API rather than the much newer
call_rcu_hurry() API. But it is call_rcu_hurry() that you should normally
pass to synchronize_rcu_mult(). This commit therefore updates the header
comment to point this out.
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Reviewed-by: Joel Fernandes (Google) <joel@joelfernandes.org>
|
|
Network drivers always call skb_header_pointer() with non-null buffer.
Remove !buffer check to prevent accidental misuse of skb_header_pointer().
Introduce skb_pointer_if_linear() instead.
Reported-by: Jakub Kicinski <kuba@kernel.org>
Acked-by: Jakub Kicinski <kuba@kernel.org>
Link: https://lore.kernel.org/r/20230718234021.43640-1-alexei.starovoitov@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
This work refactors and adds a lightweight extension ("tcx") to the tc BPF
ingress and egress data path side for allowing BPF program management based
on fds via bpf() syscall through the newly added generic multi-prog API.
The main goal behind this work which we also presented at LPC [0] last year
and a recent update at LSF/MM/BPF this year [3] is to support long-awaited
BPF link functionality for tc BPF programs, which allows for a model of safe
ownership and program detachment.
Given the rise in tc BPF users in cloud native environments, this becomes
necessary to avoid hard to debug incidents either through stale leftover
programs or 3rd party applications accidentally stepping on each others toes.
As a recap, a BPF link represents the attachment of a BPF program to a BPF
hook point. The BPF link holds a single reference to keep BPF program alive.
Moreover, hook points do not reference a BPF link, only the application's
fd or pinning does. A BPF link holds meta-data specific to attachment and
implements operations for link creation, (atomic) BPF program update,
detachment and introspection. The motivation for BPF links for tc BPF programs
is multi-fold, for example:
- From Meta: "It's especially important for applications that are deployed
fleet-wide and that don't "control" hosts they are deployed to. If such
application crashes and no one notices and does anything about that, BPF
program will keep running draining resources or even just, say, dropping
packets. We at FB had outages due to such permanent BPF attachment
semantics. With fd-based BPF link we are getting a framework, which allows
safe, auto-detachable behavior by default, unless application explicitly
opts in by pinning the BPF link." [1]
- From Cilium-side the tc BPF programs we attach to host-facing veth devices
and phys devices build the core datapath for Kubernetes Pods, and they
implement forwarding, load-balancing, policy, EDT-management, etc, within
BPF. Currently there is no concept of 'safe' ownership, e.g. we've recently
experienced hard-to-debug issues in a user's staging environment where
another Kubernetes application using tc BPF attached to the same prio/handle
of cls_bpf, accidentally wiping all Cilium-based BPF programs from underneath
it. The goal is to establish a clear/safe ownership model via links which
cannot accidentally be overridden. [0,2]
BPF links for tc can co-exist with non-link attachments, and the semantics are
in line also with XDP links: BPF links cannot replace other BPF links, BPF
links cannot replace non-BPF links, non-BPF links cannot replace BPF links and
lastly only non-BPF links can replace non-BPF links. In case of Cilium, this
would solve mentioned issue of safe ownership model as 3rd party applications
would not be able to accidentally wipe Cilium programs, even if they are not
BPF link aware.
Earlier attempts [4] have tried to integrate BPF links into core tc machinery
to solve cls_bpf, which has been intrusive to the generic tc kernel API with
extensions only specific to cls_bpf and suboptimal/complex since cls_bpf could
be wiped from the qdisc also. Locking a tc BPF program in place this way, is
getting into layering hacks given the two object models are vastly different.
We instead implemented the tcx (tc 'express') layer which is an fd-based tc BPF
attach API, so that the BPF link implementation blends in naturally similar to
other link types which are fd-based and without the need for changing core tc
internal APIs. BPF programs for tc can then be successively migrated from classic
cls_bpf to the new tc BPF link without needing to change the program's source
code, just the BPF loader mechanics for attaching is sufficient.
For the current tc framework, there is no change in behavior with this change
and neither does this change touch on tc core kernel APIs. The gist of this
patch is that the ingress and egress hook have a lightweight, qdisc-less
extension for BPF to attach its tc BPF programs, in other words, a minimal
entry point for tc BPF. The name tcx has been suggested from discussion of
earlier revisions of this work as a good fit, and to more easily differ between
the classic cls_bpf attachment and the fd-based one.
For the ingress and egress tcx points, the device holds a cache-friendly array
with program pointers which is separated from control plane (slow-path) data.
Earlier versions of this work used priority to determine ordering and expression
of dependencies similar as with classic tc, but it was challenged that for
something more future-proof a better user experience is required. Hence this
resulted in the design and development of the generic attach/detach/query API
for multi-progs. See prior patch with its discussion on the API design. tcx is
the first user and later we plan to integrate also others, for example, one
candidate is multi-prog support for XDP which would benefit and have the same
'look and feel' from API perspective.
The goal with tcx is to have maximum compatibility to existing tc BPF programs,
so they don't need to be rewritten specifically. Compatibility to call into
classic tcf_classify() is also provided in order to allow successive migration
or both to cleanly co-exist where needed given its all one logical tc layer and
the tcx plus classic tc cls/act build one logical overall processing pipeline.
tcx supports the simplified return codes TCX_NEXT which is non-terminating (go
to next program) and terminating ones with TCX_PASS, TCX_DROP, TCX_REDIRECT.
The fd-based API is behind a static key, so that when unused the code is also
not entered. The struct tcx_entry's program array is currently static, but
could be made dynamic if necessary at a point in future. The a/b pair swap
design has been chosen so that for detachment there are no allocations which
otherwise could fail.
The work has been tested with tc-testing selftest suite which all passes, as
well as the tc BPF tests from the BPF CI, and also with Cilium's L4LB.
Thanks also to Nikolay Aleksandrov and Martin Lau for in-depth early reviews
of this work.
[0] https://lpc.events/event/16/contributions/1353/
[1] https://lore.kernel.org/bpf/CAEf4BzbokCJN33Nw_kg82sO=xppXnKWEncGTWCTB9vGCmLB6pw@mail.gmail.com
[2] https://colocatedeventseu2023.sched.com/event/1Jo6O/tales-from-an-ebpf-programs-murder-mystery-hemanth-malla-guillaume-fournier-datadog
[3] http://vger.kernel.org/bpfconf2023_material/tcx_meta_netdev_borkmann.pdf
[4] https://lore.kernel.org/bpf/20210604063116.234316-1-memxor@gmail.com
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Jakub Kicinski <kuba@kernel.org>
Link: https://lore.kernel.org/r/20230719140858.13224-3-daniel@iogearbox.net
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|