Age | Commit message (Collapse) | Author |
|
Daniel Borkmann says:
====================
pull-request: bpf-next 2022-05-23
We've added 113 non-merge commits during the last 26 day(s) which contain
a total of 121 files changed, 7425 insertions(+), 1586 deletions(-).
The main changes are:
1) Speed up symbol resolution for kprobes multi-link attachments, from Jiri Olsa.
2) Add BPF dynamic pointer infrastructure e.g. to allow for dynamically sized ringbuf
reservations without extra memory copies, from Joanne Koong.
3) Big batch of libbpf improvements towards libbpf 1.0 release, from Andrii Nakryiko.
4) Add BPF link iterator to traverse links via seq_file ops, from Dmitrii Dolgov.
5) Add source IP address to BPF tunnel key infrastructure, from Kaixi Fan.
6) Refine unprivileged BPF to disable only object-creating commands, from Alan Maguire.
7) Fix JIT blinding of ld_imm64 when they point to subprogs, from Alexei Starovoitov.
8) Add BPF access to mptcp_sock structures and their meta data, from Geliang Tang.
9) Add new BPF helper for access to remote CPU's BPF map elements, from Feng Zhou.
10) Allow attaching 64-bit cookie to BPF link of fentry/fexit/fmod_ret, from Kui-Feng Lee.
11) Follow-ups to typed pointer support in BPF maps, from Kumar Kartikeya Dwivedi.
12) Add busy-poll test cases to the XSK selftest suite, from Magnus Karlsson.
13) Improvements in BPF selftest test_progs subtest output, from Mykola Lysenko.
14) Fill bpf_prog_pack allocator areas with illegal instructions, from Song Liu.
15) Add generic batch operations for BPF map-in-map cases, from Takshak Chahande.
16) Make bpf_jit_enable more user friendly when permanently on 1, from Tiezhu Yang.
17) Fix an array overflow in bpf_trampoline_get_progs(), from Yuntao Wang.
====================
Link: https://lore.kernel.org/r/20220523223805.27931-1-daniel@iogearbox.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
This patch adds a new helper function
void *bpf_dynptr_data(struct bpf_dynptr *ptr, u32 offset, u32 len);
which returns a pointer to the underlying data of a dynptr. *len*
must be a statically known value. The bpf program may access the returned
data slice as a normal buffer (eg can do direct reads and writes), since
the verifier associates the length with the returned pointer, and
enforces that no out of bounds accesses occur.
Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/bpf/20220523210712.3641569-6-joannelkoong@gmail.com
|
|
Currently, our only way of writing dynamically-sized data into a ring
buffer is through bpf_ringbuf_output but this incurs an extra memcpy
cost. bpf_ringbuf_reserve + bpf_ringbuf_commit avoids this extra
memcpy, but it can only safely support reservation sizes that are
statically known since the verifier cannot guarantee that the bpf
program won’t access memory outside the reserved space.
The bpf_dynptr abstraction allows for dynamically-sized ring buffer
reservations without the extra memcpy.
There are 3 new APIs:
long bpf_ringbuf_reserve_dynptr(void *ringbuf, u32 size, u64 flags, struct bpf_dynptr *ptr);
void bpf_ringbuf_submit_dynptr(struct bpf_dynptr *ptr, u64 flags);
void bpf_ringbuf_discard_dynptr(struct bpf_dynptr *ptr, u64 flags);
These closely follow the functionalities of the original ringbuf APIs.
For example, all ringbuffer dynptrs that have been reserved must be
either submitted or discarded before the program exits.
Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: David Vernet <void@manifault.com>
Link: https://lore.kernel.org/bpf/20220523210712.3641569-4-joannelkoong@gmail.com
|
|
This patch adds the bulk of the verifier work for supporting dynamic
pointers (dynptrs) in bpf.
A bpf_dynptr is opaque to the bpf program. It is a 16-byte structure
defined internally as:
struct bpf_dynptr_kern {
void *data;
u32 size;
u32 offset;
} __aligned(8);
The upper 8 bits of *size* is reserved (it contains extra metadata about
read-only status and dynptr type). Consequently, a dynptr only supports
memory less than 16 MB.
There are different types of dynptrs (eg malloc, ringbuf, ...). In this
patchset, the most basic one, dynptrs to a bpf program's local memory,
is added. For now only local memory that is of reg type PTR_TO_MAP_VALUE
is supported.
In the verifier, dynptr state information will be tracked in stack
slots. When the program passes in an uninitialized dynptr
(ARG_PTR_TO_DYNPTR | MEM_UNINIT), the stack slots corresponding
to the frame pointer where the dynptr resides at are marked
STACK_DYNPTR. For helper functions that take in initialized dynptrs (eg
bpf_dynptr_read + bpf_dynptr_write which are added later in this
patchset), the verifier enforces that the dynptr has been initialized
properly by checking that their corresponding stack slots have been
marked as STACK_DYNPTR.
The 6th patch in this patchset adds test cases that the verifier should
successfully reject, such as for example attempting to use a dynptr
after doing a direct write into it inside the bpf program.
Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: David Vernet <void@manifault.com>
Link: https://lore.kernel.org/bpf/20220523210712.3641569-2-joannelkoong@gmail.com
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/dlemoal/libata
Pull ata updates from Damien Le Moal:
"For this cycle, the libata.force kernel parameter changes stand out.
Beside that, some small cleanups in various drivers. In more detail:
- Changes to the pata_mpc52xx driver in preparation for powerpc's
asm/prom.h cleanup, from Christophe.
- Improved ATA command allocation, from John.
- Various small cleanups to the pata_via, pata_sil680, pata_ftide010,
sata_gemini, ahci_brcm drivers and to libata-core, from Sergey,
Diego, Ruyi, Mighao and Jiabing.
- Add support for the RZ/G2H SoC to the rcar-sata driver, from Lad.
- AHCI RAID ID cleanup, from Dan.
- Improvement to the libata.force kernel parameter to allow most
horkage flags to be manually forced for debugging drive issues in
the field without needing recompiling a kernel, from me"
* tag 'ata-5.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/dlemoal/libata:
ata: pata_ftide010: Remove unneeded ERROR check before clk_disable_unprepare
doc: admin-guide: Update libata kernel parameters
ata: libata-core: Allow forcing most horkage flags
ata: libata-core: Improve link flags forced settings
ata: libata-core: Refactor force_tbl definition
ata: libata-core: cleanup ata_device_blacklist
ata: simplify the return expression of brcm_ahci_remove
ata: Make use of the helper function devm_platform_ioremap_resource()
ata: libata-core: replace "its" with "it is"
ahci: Add a generic 'controller2' RAID id
dt-bindings: ata: renesas,rcar-sata: Add r8a774e1 support
ata: pata_via: fix sloppy typing in via_do_set_mode()
ata: pata_sil680: fix result type of sil680_sel{dev|reg}()
ata: libata-core: fix parameter type in ata_xfer_mode2shift()
libata: Improve ATA queued command allocation
ata: pata_mpc52xx: Prepare cleanup of powerpc's asm/prom.h
|
|
Introduce bpf_arch_text_invalidate and use it to fill unused part of the
bpf_prog_pack with illegal instructions when a BPF program is freed.
Fixes: 57631054fae6 ("bpf: Introduce bpf_prog_pack allocator")
Fixes: 33c9805860e5 ("bpf: Introduce bpf_jit_binary_pack_[alloc|finalize|free]")
Reported-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Song Liu <song@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20220520235758.1858153-4-song@kernel.org
|
|
Pull block driver updates from Jens Axboe:
"Here are the driver updates queued up for 5.19. This contains:
- NVMe pull requests via Christoph:
- tighten the PCI presence check (Stefan Roese)
- fix a potential NULL pointer dereference in an error path (Kyle
Miller Smith)
- fix interpretation of the DMRSL field (Tom Yan)
- relax the data transfer alignment (Keith Busch)
- verbose error logging improvements (Max Gurtovoy, Chaitanya
Kulkarni)
- misc cleanups (Chaitanya Kulkarni, Christoph)
- set non-mdts limits in nvme_scan_work (Chaitanya Kulkarni)
- add support for TP4084 - Time-to-Ready Enhancements (Christoph)
- MD pull request via Song:
- Improve annotation in raid5 code, by Logan Gunthorpe
- Support MD_BROKEN flag in raid-1/5/10, by Mariusz Tkaczyk
- Other small fixes/cleanups
- null_blk series making the configfs side much saner (Damien)
- Various minor drbd cleanups and fixes (Haowen, Uladzislau, Jiapeng,
Arnd, Cai)
- Avoid using the system workqueue (and hence flushing it) in rnbd
(Jack)
- Avoid using the system workqueue (and hence flushing it) in aoe
(Tetsuo)
- Series fixing discard_alignment issues in drivers (Christoph)
- Small series fixing drivers poking at disk->part0 for openers
information (Christoph)
- Series fixing deadlocks in loop (Christoph, Tetsuo)
- Remove loop.h and add SPDX headers (Christoph)
- Various fixes and cleanups (Julia, Xie, Yu)"
* tag 'for-5.19/drivers-2022-05-22' of git://git.kernel.dk/linux-block: (72 commits)
mtip32xx: fix typo in comment
nvme: set non-mdts limits in nvme_scan_work
nvme: add support for TP4084 - Time-to-Ready Enhancements
nvme: split the enum used for various register constants
nbd: Fix hung on disconnect request if socket is closed before
nvme-fabrics: add a request timeout helper
nvme-pci: harden drive presence detect in nvme_dev_disable()
nvme-pci: fix a NULL pointer dereference in nvme_alloc_admin_tags
nvme: mark internal passthru request RQF_QUIET
nvme: remove unneeded include from constants file
nvme: add missing status values to verbose logging
nvme: set dma alignment to dword
nvme: fix interpretation of DMRSL
loop: remove most the top-of-file boilerplate comment from the UAPI header
loop: remove most the top-of-file boilerplate comment
loop: add a SPDX header
loop: remove loop.h
block: null_blk: Improve device creation with configfs
block: null_blk: Cleanup messages
block: null_blk: Cleanup device creation and deletion
...
|
|
Pull block updates from Jens Axboe:
"Here are the core block changes for 5.19. This contains:
- blk-throttle accounting fix (Laibin)
- Series removing redundant assignments (Michal)
- Expose bio cache via the bio_set, so that DM can use it (Mike)
- Finish off the bio allocation interface cleanups by dealing with
the weirdest member of the family. bio_kmalloc combines a kmalloc
for the bio and bio_vecs with a hidden bio_init call and magic
cleanup semantics (Christoph)
- Clean up the block layer API so that APIs consumed by file systems
are (almost) only struct block_device based, so that file systems
don't have to poke into block layer internals like the
request_queue (Christoph)
- Clean up the blk_execute_rq* API (Christoph)
- Clean up various lose end in the blk-cgroup code to make it easier
to follow in preparation of reworking the blkcg assignment for bios
(Christoph)
- Fix use-after-free issues in BFQ when processes with merged queues
get moved to different cgroups (Jan)
- BFQ fixes (Jan)
- Various fixes and cleanups (Bart, Chengming, Fanjun, Julia, Ming,
Wolfgang, me)"
* tag 'for-5.19/block-2022-05-22' of git://git.kernel.dk/linux-block: (83 commits)
blk-mq: fix typo in comment
bfq: Remove bfq_requeue_request_body()
bfq: Remove superfluous conversion from RQ_BIC()
bfq: Allow current waker to defend against a tentative one
bfq: Relax waker detection for shared queues
blk-cgroup: delete rcu_read_lock_held() WARN_ON_ONCE()
blk-throttle: Set BIO_THROTTLED when bio has been throttled
blk-cgroup: Remove unnecessary rcu_read_lock/unlock()
blk-cgroup: always terminate io.stat lines
block, bfq: make bfq_has_work() more accurate
block, bfq: protect 'bfqd->queued' by 'bfqd->lock'
block: cleanup the VM accounting in submit_bio
block: Fix the bio.bi_opf comment
block: reorder the REQ_ flags
blk-iocost: combine local_stat and desc_stat to stat
block: improve the error message from bio_check_eod
block: allow passing a NULL bdev to bio_alloc_clone/bio_init_clone
block: remove superfluous calls to blkcg_bio_issue_init
kthread: unexport kthread_blkcg
blk-cgroup: cleanup blkcg_maybe_throttle_current
...
|
|
Pull cdrom updates from Jens Axboe:
"Removal of unused code and documentation updates"
* tag 'for-5.19/cdrom-2022-05-22' of git://git.kernel.dk/linux-block:
cdrom: remove obsolete TODO list
block: remove last remaining traces of IDE documentation
cdrom: mark CDROMGETSPINDOWN/CDROMSETSPINDOWN obsolete
cdrom: remove the unused driver specific disc change ioctl
cdrom: make EXPORT_SYMBOL follow exported function
|
|
git://git.kernel.dk/linux-block
Pull io_uring NVMe command passthrough from Jens Axboe:
"On top of everything else, this adds support for passthrough for
io_uring.
The initial feature for this is NVMe passthrough support, which allows
non-filesystem based IO commands and admin commands.
To support this, io_uring grows support for SQE and CQE members that
are twice as big, allowing to pass in a full NVMe command without
having to copy data around. And to complete with more than just a
single 32-bit value as the output"
* tag 'for-5.19/io_uring-passthrough-2022-05-22' of git://git.kernel.dk/linux-block: (22 commits)
io_uring: cleanup handling of the two task_work lists
nvme: enable uring-passthrough for admin commands
nvme: helper for uring-passthrough checks
blk-mq: fix passthrough plugging
nvme: add vectored-io support for uring-cmd
nvme: wire-up uring-cmd support for io-passthru on char-device.
nvme: refactor nvme_submit_user_cmd()
block: wire-up support for passthrough plugging
fs,io_uring: add infrastructure for uring-cmd
io_uring: support CQE32 for nop operation
io_uring: enable CQE32
io_uring: support CQE32 in /proc info
io_uring: add tracing for additional CQE32 fields
io_uring: overflow processing for CQE32
io_uring: flush completions for CQE32
io_uring: modify io_get_cqe for CQE32
io_uring: add CQE32 completion processing
io_uring: add CQE32 setup processing
io_uring: change ring size calculation for CQE32
io_uring: store add. return values for CQE32
...
|
|
Pull io_uring 'more data in socket' support from Jens Axboe:
"To be able to fully utilize the 'poll first' support in the core
io_uring branch, it's advantageous knowing if the socket was empty
after a receive. This adds support for that"
* tag 'for-5.19/io_uring-net-2022-05-22' of git://git.kernel.dk/linux-block:
io_uring: return hint on whether more data is available after receive
tcp: pass back data left in socket after receive
|
|
This reverts commit c7dacf5b0f32957b24ef29df1207dc2cd8307743,
"mailbox: avoid timer start from callback"
The previous commit was reverted since it lead to a race that
caused the hrtimer to not be started at all. The check for
hrtimer_active() in msg_submit() will return true if the
callback function txdone_hrtimer() is currently running. This
function could return HRTIMER_NORESTART and then the timer
will not be restarted, and also msg_submit() will not start
the timer. This will lead to a message actually being submitted
but no timer will start to check for its compleation.
The original fix that added checking hrtimer_active() was added to
avoid a warning with hrtimer_forward. Looking in the kernel
another solution to avoid this warning is to check hrtimer_is_queued()
before calling hrtimer_forward_now() instead. This however requires a
lock so the timer is not started by msg_submit() inbetween this check
and the hrtimer_forward() call.
Fixes: c7dacf5b0f32 ("mailbox: avoid timer start from callback")
Signed-off-by: Björn Ardö <bjorn.ardo@axis.com>
Signed-off-by: Jassi Brar <jaswinder.singh@linaro.org>
|
|
git://git.kernel.dk/linux-block
Pull io_uring socket() support from Jens Axboe:
"This adds support for socket(2) for io_uring. This is handy when using
direct / registered file descriptors with io_uring.
Outside of those two patches, a small series from Dylan on top that
improves the tracing by providing a text representation of the opcode
rather than needing to decode this by reading the header file every
time.
That sits in this branch as it was the last opcode added (until it
wasn't...)"
* tag 'for-5.19/io_uring-socket-2022-05-22' of git://git.kernel.dk/linux-block:
io_uring: use the text representation of ops in trace
io_uring: rename op -> opcode
io_uring: add io_uring_get_opcode
io_uring: add type to op enum
io_uring: add socket(2) support
net: add __sys_socket_file()
|
|
Pull io_uring updates from Jens Axboe:
"Here are the main io_uring changes for 5.19. This contains:
- Fixes for sparse type warnings (Christoph, Vasily)
- Support for multi-shot accept (Hao)
- Support for io_uring managed fixed files, rather than always
needing the applicationt o manage the indices (me)
- Fix for a spurious poll wakeup (Dylan)
- CQE overflow fixes (Dylan)
- Support more types of cancelations (me)
- Support for co-operative task_work signaling, rather than always
forcing an IPI (me)
- Support for doing poll first when appropriate, rather than always
attempting a transfer first (me)
- Provided buffer cleanups and support for mapped buffers (me)
- Improve how io_uring handles inflight SCM files (Pavel)
- Speedups for registered files (Pavel, me)
- Organize the completion data in a struct in io_kiocb rather than
keep it in separate spots (Pavel)
- task_work improvements (Pavel)
- Cleanup and optimize the submission path, in general and for
handling links (Pavel)
- Speedups for registered resource handling (Pavel)
- Support sparse buffers and file maps (Pavel, me)
- Various fixes and cleanups (Almog, Pavel, me)"
* tag 'for-5.19/io_uring-2022-05-22' of git://git.kernel.dk/linux-block: (111 commits)
io_uring: fix incorrect __kernel_rwf_t cast
io_uring: disallow mixed provided buffer group registrations
io_uring: initialize io_buffer_list head when shared ring is unregistered
io_uring: add fully sparse buffer registration
io_uring: use rcu_dereference in io_close
io_uring: consistently use the EPOLL* defines
io_uring: make apoll_events a __poll_t
io_uring: drop a spurious inline on a forward declaration
io_uring: don't use ERR_PTR for user pointers
io_uring: use a rwf_t for io_rw.flags
io_uring: add support for ring mapped supplied buffers
io_uring: add io_pin_pages() helper
io_uring: add buffer selection support to IORING_OP_NOP
io_uring: fix locking state for empty buffer group
io_uring: implement multishot mode for accept
io_uring: let fast poll support multishot
io_uring: add REQ_F_APOLL_MULTISHOT for requests
io_uring: add IORING_ACCEPT_MULTISHOT for accept
io_uring: only wake when the correct events are set
io_uring: avoid io-wq -EAGAIN looping for !IOPOLL
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu
Pull RCU update from Paul McKenney:
- Documentation updates
- Miscellaneous fixes
- Callback-offloading updates, mainly simplifications
- RCU-tasks updates, including some -rt fixups, handling of systems
with sparse CPU numbering, and a fix for a boot-time race-condition
failure
- Put SRCU on a memory diet in order to reduce the size of the
srcu_struct structure
- Torture-test updates fixing some bugs in tests and closing some
testing holes
- Torture-test updates for the RCU tasks flavors, most notably ensuring
that building rcutorture and friends does not change the
RCU-tasks-related Kconfig options
- Torture-test scripting updates
- Expedited grace-period updates, most notably providing
milliseconds-scale (not all that) soft real-time response from
synchronize_rcu_expedited().
This is also the first time in almost 30 years of RCU that someone
other than me has pushed for a reduction in the RCU CPU stall-warning
timeout, in this case by more than three orders of magnitude from 21
seconds to 20 milliseconds. This tighter timeout applies only to
expedited grace periods
* tag 'rcu.2022.05.19a' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu: (80 commits)
rcu: Move expedited grace period (GP) work to RT kthread_worker
rcu: Introduce CONFIG_RCU_EXP_CPU_STALL_TIMEOUT
srcu: Drop needless initialization of sdp in srcu_gp_start()
srcu: Prevent expedited GPs and blocking readers from consuming CPU
srcu: Add contention check to call_srcu() srcu_data ->lock acquisition
srcu: Automatically determine size-transition strategy at boot
rcutorture: Make torture.sh allow for --kasan
rcutorture: Make torture.sh refscale and rcuscale specify Tasks Trace RCU
rcutorture: Make kvm.sh allow more memory for --kasan runs
torture: Save "make allmodconfig" .config file
scftorture: Remove extraneous "scf" from per_version_boot_params
rcutorture: Adjust scenarios' Kconfig options for CONFIG_PREEMPT_DYNAMIC
torture: Enable CSD-lock stall reports for scftorture
torture: Skip vmlinux check for kvm-again.sh runs
scftorture: Adjust for TASKS_RCU Kconfig option being selected
rcuscale: Allow rcuscale without RCU Tasks Rude/Trace
rcuscale: Allow rcuscale without RCU Tasks
refscale: Allow refscale without RCU Tasks Rude/Trace
refscale: Allow refscale without RCU Tasks
rcutorture: Allow specifying per-scenario stat_interval
...
|
|
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/efi/efi
Pull EFI updates from Ard Biesheuvel:
- Allow runtime services to be re-enabled at boot on RT kernels.
- Provide access to secrets injected into the boot image by CoCo
hypervisors (COnfidential COmputing)
- Use DXE services on x86 to make the boot image executable after
relocation, if needed.
- Prefer mirrored memory for randomized allocations.
- Only randomize the placement of the kernel image on arm64 if the
loader has not already done so.
- Add support for obtaining the boot hartid from EFI on RISC-V.
* tag 'efi-next-for-v5.19' of git://git.kernel.org/pub/scm/linux/kernel/git/efi/efi:
riscv/efi_stub: Add support for RISCV_EFI_BOOT_PROTOCOL
efi: stub: prefer mirrored memory for randomized allocations
efi/arm64: libstub: run image in place if randomized by the loader
efi: libstub: pass image handle to handle_kernel_image()
efi: x86: Set the NX-compatibility flag in the PE header
efi: libstub: ensure allocated memory to be executable
efi: libstub: declare DXE services table
efi: Add missing prototype for efi_capsule_setup_info
docs: security: Add secrets/coco documentation
efi: Register efi_secret platform device if EFI secret area is declared
virt: Add efi_secret module to expose confidential computing secrets
efi: Save location of EFI confidential computing area
efi: Allow to enable EFI runtime services by default on RT
|
|
Merge generlic power domains update for 5.19-rc1:
- Extend dev_pm_domain_detach() doc (Krzysztof Kozlowski).
- Move genpd's time-accounting to ktime_get_mono_fast_ns() (Ulf
Hansson).
- Improve the way genpd deals with its governors (Ulf Hansson).
* pm-domains:
PM: domains: Trust domain-idle-states from DT to be correct by genpd
PM: domains: Measure power-on/off latencies in genpd based on a governor
PM: domains: Allocate governor data dynamically based on a genpd governor
PM: domains: Clean up some code in pm_genpd_init() and genpd_remove()
PM: domains: Fix initialization of genpd's next_wakeup
PM: domains: Fixup QoS latency measurements for IRQ safe devices in genpd
PM: domains: Measure suspend/resume latencies in genpd based on governor
PM: domains: Move the next_wakeup variable into the struct gpd_timing_data
PM: domains: Allocate gpd_timing_data dynamically based on governor
PM: domains: Skip another warning in irq_safe_dev_in_sleep_domain()
PM: domains: Rename irq_safe_dev_in_no_sleep_domain() in genpd
PM: domains: Don't check PM_QOS_FLAG_NO_POWER_OFF in genpd
PM: domains: Drop redundant code for genpd always-on governor
PM: domains: Add GENPD_FLAG_RPM_ALWAYS_ON for the always-on governor
PM: domains: Move genpd's time-accounting to ktime_get_mono_fast_ns()
PM: domains: Extend dev_pm_domain_detach() doc
|
|
Merge cpufreq updates for 5.19-rc1:
- Fix cpufreq governor clean up code to avoid using kfree() directly
to free kobject-based items (Kevin Hao).
- Prepare cpufreq for powerpc's asm/prom.h cleanup (Christophe Leroy).
- Make intel_pstate notify frequency invariance code when no_turbo is
turned on and off (Chen Yu).
- Add Sapphire Rapids OOB mode support to intel_pstate (Srinivas
Pandruvada).
- Make cpufreq avoid unnecessary frequency updates due to mismatch
between hardware and the frequency table (Viresh Kumar).
- Make remove_cpu_dev_symlink() clear the real_cpus mask to simplify
code (Viresh Kumar).
- Rearrange cpufreq_offline() and cpufreq_remove_dev() to make the
calling convention for some driver callbacks consistent (Rafael
Wysocki).
- Avoid accessing half-initialized cpufreq policies from the show()
and store() sysfs functions (Schspa Shi).
- Rearrange cpufreq_offline() to make the calling convention for some
driver callbacks consistent (Schspa Shi).
- Update CPPC handling in cpufreq (Pierre Gondois):
* Add per_cpu efficiency_class to the CPPC driver.
* Make the CPPC driver Register EM based on efficiency class
information.
* Adjust _OSC for flexible address space in the ACPI platform
initialization code and always set CPPC _OSC bits if CPPC_LIB is
supported.
* Assume no transition latency if no PCCT in the CPPC driver.
* Add fast_switch and dvfs_possible_from_any_cpu support to the CPPC
driver.
* pm-cpufreq:
cpufreq: CPPC: Enable dvfs_possible_from_any_cpu
cpufreq: CPPC: Enable fast_switch
ACPI: CPPC: Assume no transition latency if no PCCT
ACPI: bus: Set CPPC _OSC bits for all and when CPPC_LIB is supported
ACPI: CPPC: Check _OSC for flexible address space
cpufreq: make interface functions and lock holding state clear
cpufreq: Abort show()/store() for half-initialized policies
cpufreq: Rearrange locking in cpufreq_remove_dev()
cpufreq: Split cpufreq_offline()
cpufreq: Reorganize checks in cpufreq_offline()
cpufreq: Clear real_cpus mask from remove_cpu_dev_symlink()
cpufreq: intel_pstate: Support Sapphire Rapids OOB mode
Revert "cpufreq: Fix possible race in cpufreq online error path"
cpufreq: CPPC: Register EM based on efficiency class information
cpufreq: CPPC: Add per_cpu efficiency_class
cpufreq: Avoid unnecessary frequency updates due to mismatch
cpufreq: Fix possible race in cpufreq online error path
cpufreq: intel_pstate: Handle no_turbo in frequency invariance
cpufreq: Prepare cleanup of powerpc's asm/prom.h
cpufreq: governor: Use kobject release() method to free dbs_data
|
|
Marge Energy Model support updates and cpuidle updates for 5.19-rc1:
- Update the Energy Model support code to allow the Energy Model to be
artificial, which means that the power values may not be on a uniform
scale with other devices providing power information, and update the
cpufreq_cooling and devfreq_cooling thermal drivers to support
artificial Energy Models (Lukasz Luba).
- Make DTPM check the Energy Model type (Lukasz Luba).
- Fix policy counter decrementation in cpufreq if Energy Model is in
use (Pierre Gondois).
- Add AlderLake processor support to the intel_idle driver (Zhang Rui).
- Fix regression leading to no genpd governor in the PSCI cpuidle
driver and fix the riscv-sbi cpuidle driver to allow a genpd
governor to be used (Ulf Hansson).
* pm-em:
PM: EM: Decrement policy counter
powercap: DTPM: Check for Energy Model type
thermal: cooling: Check Energy Model type in cpufreq_cooling and devfreq_cooling
Documentation: EM: Add artificial EM registration description
PM: EM: Remove old debugfs files and print all 'flags'
PM: EM: Change the order of arguments in the .active_power() callback
PM: EM: Use the new .get_cost() callback while registering EM
PM: EM: Add artificial EM flag
PM: EM: Add .get_cost() callback
* pm-cpuidle:
cpuidle: riscv-sbi: Fix code to allow a genpd governor to be used
cpuidle: psci: Fix regression leading to no genpd governor
intel_idle: Add AlderLake support
|
|
Merge PM core changes, updates related to system sleep and power capping
updates for 5.19-rc1:
- Export dev_pm_ops instead of suspend() and resume() in the IIO
chemical scd30 driver (Jonathan Cameron).
- Add namespace variants of EXPORT[_GPL]_SIMPLE_DEV_PM_OPS and
PM-runtime counterparts (Jonathan Cameron).
- Move symbol exports in the IIO chemical scd30 driver into the
IIO_SCD30 namespace (Jonathan Cameron).
- Avoid device PM-runtime usage count underflows (Rafael Wysocki).
- Allow dynamic debug to control printing of PM messages (David
Cohen).
- Fix some kernel-doc comments in hibernation code (Yang Li, Haowen
Bai).
- Preserve ACPI-table override during hibernation (Amadeusz Sławiński).
- Improve support for suspend-to-RAM for PSCI OSI mode (Ulf Hansson).
- Make Intel RAPL power capping driver support the RaptorLake and
AlderLake N processors (Zhang Rui, Sumeet Pawnikar).
- Remove redundant store to value after multiply in the RAPL power
capping driver (Colin Ian King).
* pm-core:
PM: runtime: Avoid device usage count underflows
iio: chemical: scd30: Move symbol exports into IIO_SCD30 namespace
PM: core: Add NS varients of EXPORT[_GPL]_SIMPLE_DEV_PM_OPS and runtime pm equiv
iio: chemical: scd30: Export dev_pm_ops instead of suspend() and resume()
* pm-sleep:
cpuidle: PSCI: Improve support for suspend-to-RAM for PSCI OSI mode
PM: runtime: Allow to call __pm_runtime_set_status() from atomic context
PM: hibernate: Don't mark comment as kernel-doc
x86/ACPI: Preserve ACPI-table override during hibernation
PM: hibernate: Fix some kernel-doc comments
PM: sleep: enable dynamic debug support within pm_pr_dbg()
PM: sleep: Narrow down -DDEBUG on kernel/power/ files
* powercap:
powercap: intel_rapl: remove redundant store to value after multiply
powercap: intel_rapl: add support for ALDERLAKE_N
powercap: RAPL: Add Power Limit4 support for RaptorLake
powercap: intel_rapl: add support for RaptorLake
|
|
There have been reports of races that cause NFSv4 OPEN(CREATE) to
return an error even though the requested file was created. NFSv4
does not provide a status code for this case.
To mitigate some of these problems, reorganize the NFSv4
OPEN(CREATE) logic to allocate resources before the file is actually
created, and open the new file while the parent directory is still
locked.
Two new APIs are added:
+ Add an API that works like nfsd_file_acquire() but does not open
the underlying file. The OPEN(CREATE) path can use this API when it
already has an open file.
+ Add an API that is kin to dentry_open(). NFSD needs to create a
file and grab an open "struct file *" atomically. The
alloc_empty_file() has to be done before the inode create. If it
fails (for example, because the NFS server has exceeded its
max_files limit), we avoid creating the file and can still return
an error to the NFS client.
BugLink: https://bugzilla.linux-nfs.org/show_bug.cgi?id=382
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Tested-by: JianHong Yin <jiyin@redhat.com>
|
|
https://git.kernel.org/pub/scm/linux/kernel/git/broonie/sound into for-linus
ASoC: Updates for v5.19
This is quite a big update, partly due to the addition of some larger
drivers (more of which is to follow since at least the AVS driver is
still a work in progress) and partly due to Charles' work sorting out
our handling of endianness. As has been the case recently it's much
more about drivers than the core.
- Overhaul of endianness specification for data formats, avoiding
needless restrictions due to CODECs.
- Initial stages of Intel AVS driver merge.
- Introduction of v4 IPC mechanism for SOF.
- TDM mode support for AK4613.
- Support for Analog Devices ADAU1361, Cirrus Logic CS35L45, Maxim
MAX98396, MediaTek MT8186, NXP i.MX8 micfil and SAI interfaces,
nVidia Tegra186 ASRC, and Texas Instruments TAS2764 and TAS2780
|
|
In order to be able to identify a file exchange with renameat2(2) and
RENAME_EXCHANGE, which will be useful for Landlock [1], propagate the
rename flags to LSMs. This may also improve performance because of the
switch from two set of LSM hook calls to only one, and because LSMs
using this hook may optimize the double check (e.g. only one lock,
reduce the number of path walks).
AppArmor, Landlock and Tomoyo are updated to leverage this change. This
should not change the current behavior (same check order), except
(different level of) speed boosts.
[1] https://lore.kernel.org/r/20220221212522.320243-1-mic@digikod.net
Cc: James Morris <jmorris@namei.org>
Cc: Kentaro Takeda <takedakn@nttdata.co.jp>
Cc: Serge E. Hallyn <serge@hallyn.com>
Acked-by: John Johansen <john.johansen@canonical.com>
Acked-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Reviewed-by: Paul Moore <paul@paul-moore.com>
Signed-off-by: Mickaël Salaün <mic@digikod.net>
Link: https://lore.kernel.org/r/20220506161102.525323-7-mic@digikod.net
|
|
slab/for-linus
|
|
sync_blockdev_range() is to support syncing multiple sectors
with as few block device requests as possible, it is helpful
to make the block device to give full play to its performance.
Signed-off-by: Yuezhang Mo <Yuezhang.Mo@sony.com>
Suggested-by: Christoph Hellwig <hch@infradead.org>
Reviewed-by: Andy Wu <Andy.Wu@sony.com>
Reviewed-by: Aoyama Wataru <wataru.aoyama@sony.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jens Axboe <axboe@kernel.dk>
Acked-by: Sungjong Seo <sj1557.seo@samsung.com>
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
|
|
Most protocol-specific pointers in struct net_device are under
a respective ifdef. Wireless is the notable exception. Since
there's a sizable number of custom-built kernels for datacenter
workloads which don't build wireless it seems reasonable to
ifdefy those pointers as well.
While at it move IPv4 and IPv6 pointers up, those are special
for obvious reasons.
Acked-by: Johannes Berg <johannes@sipsolutions.net>
Acked-by: Stefan Schmidt <stefan@datenfreihafen.org> # ieee802154
Acked-by: Sven Eckelmann <sven@narfation.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
There's a locking issue with the per-netns list of calls in rxrpc. The
pieces of code that add and remove a call from the list use write_lock()
and the calls procfile uses read_lock() to access it. However, the timer
callback function may trigger a removal by trying to queue a call for
processing and finding that it's already queued - at which point it has a
spare refcount that it has to do something with. Unfortunately, if it puts
the call and this reduces the refcount to 0, the call will be removed from
the list. Unfortunately, since the _bh variants of the locking functions
aren't used, this can deadlock.
================================
WARNING: inconsistent lock state
5.18.0-rc3-build4+ #10 Not tainted
--------------------------------
inconsistent {SOFTIRQ-ON-W} -> {IN-SOFTIRQ-W} usage.
ksoftirqd/2/25 [HC0[0]:SC1[1]:HE1:SE0] takes:
ffff888107ac4038 (&rxnet->call_lock){+.?.}-{2:2}, at: rxrpc_put_call+0x103/0x14b
{SOFTIRQ-ON-W} state was registered at:
...
Possible unsafe locking scenario:
CPU0
----
lock(&rxnet->call_lock);
<Interrupt>
lock(&rxnet->call_lock);
*** DEADLOCK ***
1 lock held by ksoftirqd/2/25:
#0: ffff8881008ffdb0 ((&call->timer)){+.-.}-{0:0}, at: call_timer_fn+0x5/0x23d
Changes
=======
ver #2)
- Changed to using list_next_rcu() rather than rcu_dereference() directly.
Fixes: 17926a79320a ("[AF_RXRPC]: Provide secure RxRPC sockets for use by userspace and kernel both")
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: linux-afs@lists.infradead.org
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Spelling mistakes (triple letters) in comments.
Detected with the help of Coccinelle.
Signed-off-by: Julia Lawall <Julia.Lawall@inria.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The thermal subsystem registers a hwmon driver without providing
chip or sysfs group information. This is for legacy reasons and
would be difficult to change. At the same time, we want to enforce
that chip information is provided when registering a hwmon device
using hwmon_device_register_with_info(). To enable this, introduce
a special API for use only by the thermal subsystem.
Acked-by: Rafael J . Wysocki <rafael@kernel.org>
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
|
|
Some temperature and voltage sensors use a polynomial to convert between
raw data points and actual temperature or voltage. The polynomial is
usually the result of a curve fitting of the diode characteristic.
The BT1 PVT hwmon driver already uses such a polynonmial calculation
which is rather generic. Move it to lib/ so other drivers can reuse it.
Signed-off-by: Michael Walle <michael@walle.cc>
Reviewed-by: Guenter Roeck <linux@roeck-us.net>
Link: https://lore.kernel.org/r/20220401214032.3738095-2-michael@walle.cc
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
|
|
The of_irq_parse_oldworld() does not modify passed device_node so make
it a pointer to const for safety. Drop the extern while modifying the
line.
Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@canonical.com>
Acked-by: Rob Herring <robh@kernel.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210924105653.46963-2-krzysztof.kozlowski@canonical.com
|
|
Add the sysfs reporting file for Processor MMIO Stale Data
vulnerability. It exposes the vulnerability and mitigation state similar
to the existing files for the other hardware vulnerabilities.
Signed-off-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
|
|
Currently the trampoline_count test doesn't include any fmod_ret bpf
programs, fix it to make the test cover all possible trampoline program
types.
Since fmod_ret bpf programs can't be attached to __set_task_comm function,
as it's neither whitelisted for error injection nor a security hook, change
it to bpf_modify_return_test.
This patch also does some other cleanups such as removing duplicate code,
dropping inconsistent comments, etc.
Signed-off-by: Yuntao Wang <ytcoode@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/bpf/20220519150610.601313-1-ytcoode@gmail.com
|
|
This patch implements a new struct bpf_func_proto, named
bpf_skc_to_mptcp_sock_proto. Define a new bpf_id BTF_SOCK_TYPE_MPTCP,
and a new helper bpf_skc_to_mptcp_sock(), which invokes another new
helper bpf_mptcp_sock_from_subflow() in net/mptcp/bpf.c to get struct
mptcp_sock from a given subflow socket.
v2: Emit BTF type, add func_id checks in verifier.c and bpf_trace.c,
remove build check for CONFIG_BPF_JIT
v5: Drop EXPORT_SYMBOL (Martin)
Co-developed-by: Nicolas Rybowski <nicolas.rybowski@tessares.net>
Co-developed-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: Nicolas Rybowski <nicolas.rybowski@tessares.net>
Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: Geliang Tang <geliang.tang@suse.com>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20220519233016.105670-2-mathew.j.martineau@linux.intel.com
|
|
Pull ceph fix from Ilya Dryomov:
"A fix for a nasty use-after-free, marked for stable"
* tag 'ceph-for-5.18-rc8' of https://github.com/ceph/ceph-client:
libceph: fix misleading ceph_osdc_cancel_request() comment
libceph: fix potential use-after-free on linger ping and resends
|
|
'for-next/fault-in-subpage', 'for-next/misc', 'for-next/ftrace' and 'for-next/crashkernel', remote-tracking branch 'arm64/for-next/perf' into for-next/core
* arm64/for-next/perf:
perf/arm-cmn: Decode CAL devices properly in debugfs
perf/arm-cmn: Fix filter_sel lookup
perf/marvell_cn10k: Fix tad_pmu_event_init() to check pmu type first
drivers/perf: hisi: Add Support for CPA PMU
drivers/perf: hisi: Associate PMUs in SICL with CPUs online
drivers/perf: arm_spe: Expose saturating counter to 16-bit
perf/arm-cmn: Add CMN-700 support
perf/arm-cmn: Refactor occupancy filter selector
perf/arm-cmn: Add CMN-650 support
dt-bindings: perf: arm-cmn: Add CMN-650 and CMN-700
perf: check return value of armpmu_request_irq()
perf: RISC-V: Remove non-kernel-doc ** comments
* for-next/sme: (30 commits)
: Scalable Matrix Extensions support.
arm64/sve: Move sve_free() into SVE code section
arm64/sve: Make kernel FPU protection RT friendly
arm64/sve: Delay freeing memory in fpsimd_flush_thread()
arm64/sme: More sensibly define the size for the ZA register set
arm64/sme: Fix NULL check after kzalloc
arm64/sme: Add ID_AA64SMFR0_EL1 to __read_sysreg_by_encoding()
arm64/sme: Provide Kconfig for SME
KVM: arm64: Handle SME host state when running guests
KVM: arm64: Trap SME usage in guest
KVM: arm64: Hide SME system registers from guests
arm64/sme: Save and restore streaming mode over EFI runtime calls
arm64/sme: Disable streaming mode and ZA when flushing CPU state
arm64/sme: Add ptrace support for ZA
arm64/sme: Implement ptrace support for streaming mode SVE registers
arm64/sme: Implement ZA signal handling
arm64/sme: Implement streaming SVE signal handling
arm64/sme: Disable ZA and streaming mode when handling signals
arm64/sme: Implement traps and syscall handling for SME
arm64/sme: Implement ZA context switching
arm64/sme: Implement streaming SVE context switching
...
* for-next/stacktrace:
: Stacktrace cleanups.
arm64: stacktrace: align with common naming
arm64: stacktrace: rename stackframe to unwind_state
arm64: stacktrace: rename unwinder functions
arm64: stacktrace: make struct stackframe private to stacktrace.c
arm64: stacktrace: delete PCS comment
arm64: stacktrace: remove NULL task check from unwind_frame()
* for-next/fault-in-subpage:
: btrfs search_ioctl() live-lock fix using fault_in_subpage_writeable().
btrfs: Avoid live-lock in search_ioctl() on hardware with sub-page faults
arm64: Add support for user sub-page fault probing
mm: Add fault_in_subpage_writeable() to probe at sub-page granularity
* for-next/misc:
: Miscellaneous patches.
arm64: Kconfig.platforms: Add comments
arm64: Kconfig: Fix indentation and add comments
arm64: mm: avoid writable executable mappings in kexec/hibernate code
arm64: lds: move special code sections out of kernel exec segment
arm64/hugetlb: Implement arm64 specific huge_ptep_get()
arm64/hugetlb: Use ptep_get() to get the pte value of a huge page
arm64: mm: Make arch_faults_on_old_pte() check for migratability
arm64: mte: Clean up user tag accessors
arm64/hugetlb: Drop TLB flush from get_clear_flush()
arm64: Declare non global symbols as static
arm64: mm: Cleanup useless parameters in zone_sizes_init()
arm64: fix types in copy_highpage()
arm64: Set ARCH_NR_GPIO to 2048 for ARCH_APPLE
arm64: cputype: Avoid overflow using MIDR_IMPLEMENTOR_MASK
arm64: document the boot requirements for MTE
arm64/mm: Compute PTRS_PER_[PMD|PUD] independently of PTRS_PER_PTE
* for-next/ftrace:
: ftrace cleanups.
arm64/ftrace: Make function graph use ftrace directly
ftrace: cleanup ftrace_graph_caller enable and disable
* for-next/crashkernel:
: Support for crashkernel reservations above ZONE_DMA.
arm64: kdump: Do not allocate crash low memory if not needed
docs: kdump: Update the crashkernel description for arm64
of: Support more than one crash kernel regions for kexec -s
of: fdt: Add memory for devices by DT property "linux,usable-memory-range"
arm64: kdump: Reimplement crashkernel=X
arm64: Use insert_resource() to simplify code
kdump: return -ENOENT if required cmdline option does not exist
|
|
NAND core:
* Print offset instead of page number for bad blocks
Raw NAND controller drivers:
* Cadence: Fix possible null-ptr-deref in cadence_nand_dt_probe()
* CS553X: simplify the return expression of cs553x_write_ctrl_byte()
* Davinci: Remove redundant unsigned comparison to zero
* Denali: Use managed device resources
* GPMI:
- Add large oob bch setting support
- Rename the variable ecc_chunk_size
- Uninline the gpmi_check_ecc function
- Add strict ecc strength check
- Refactor BCH geometry settings function
* Intel: Fix possible null-ptr-deref in ebu_nand_probe()
* MPC5121: Check before clk_disable_unprepare() not needed
* Mtk:
- MTD_NAND_ECC_MEDIATEK should depend on ARCH_MEDIATEK
- Also parse the default nand-ecc-engine property if available
- Make mtk_ecc.c a separated module
* OMAP ELM:
- Convert the bindings to yaml
- Describe the bindings for AM64 ELM
- Add support for its compatible
* Renesas: Use runtime PM instead of the raw clock API and update the
bindings accordingly
* Rockchip: Check before clk_disable_unprepare() not needed
* TMIO: Check return value after calling platform_get_resource()
Raw NAND chip driver:
* Kioxia: Add support for TH58NVG3S0HBAI4 and TC58NVG0S3HTA00
SPI-NAND chip drivers:
* Gigadevice:
- Add support for:
- GD5FxGM7xExxG
- GD5F{2,4}GQ5xExxG
- GD5F1GQ5RExxG
- GD5FxGQ4xExxG
- Fix Quad IO for GD5F1GQ5UExxG
* XTX: Add support for XT26G0xA
Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com>
|
|
SPI NOR core changes:
- Read back written SR value to make sure the write was done correctly.
- Introduce a common function for Read ID that manufacturer drivers can
use to verify the Octal DTR switch worked correctly.
- Add helpers for read/write any register commands so manufacturer
drivers don't open code it every time.
- Clarify rdsr dummy cycles documentation.
- Add debugfs entry to expose internal flash parameters and state.
SPI NOR manufacturer drivers changes:
- Add support for Winbond W25Q512NW-IM, and Eon EN25QH256A.
- Move spi_nor_write_ear() to Winbond module since only Winbond flashes
use it.
- Rework Micron and Cypress Octal DTR enable methods to improve
readability.
- Use the common Read ID function to verify switch to Octal DTR mode for
Micron and Cypress flashes.
- Skip polling status on volatile register writes for Micron and Cypress
flashes since the operation is instant.
Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com>
|
|
'ppc/pamu', 'x86/vt-d', 'x86/amd' and 'vfio-notifier-fix' into next
|
|
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
|
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
|
Add nvme_fc_io_getuuid() to the nvme-fc transport. The routine is invoked
by the FC LLDD on a per-I/O request basis. The routine translates from the
FC-specific request structure to the bio and the cgroup structure in order
to obtain the FC appid stored in the cgroup structure. If a value is not
set or a bio is not found, a NULL appid (aka uuid) will be returned to the
LLDD.
Link: https://lore.kernel.org/r/20220519123110.17361-2-jsmart2021@gmail.com
Reviewed-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com>
Acked-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Muneendra Kumar <muneendra.kumar@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
PTP one step sync packets cannot have CSUM padding and insertion in
SW since time stamp is inserted on the fly by HW.
In addition, ptp4l version 3.0 and above report an error when skb
timestamps are reported for packets that not processed for TX TS
after transmission.
Add a helper to identify PTP one step sync and fix the above two
errors. Add a common mask for PTP header flag field "twoStepflag".
Also reset ptp OSS bit when one step is not selected.
Fixes: ab91f0a9b5f4 ("net: macb: Add hardware PTP support")
Fixes: 653e92a9175e ("net: macb: add support for padding and fcs computation")
Signed-off-by: Harini Katakam <harini.katakam@xilinx.com>
Reviewed-by: Radhey Shyam Pandey <radhey.shyam.pandey@xilinx.com>
Reviewed-by: Claudiu Beznea <claudiu.beznea@microchip.com>
Link: https://lore.kernel.org/r/20220518170756.7752-1-harini.katakam@xilinx.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
This patch set implements kexec_file_load() for RISC-V, which is
currently only allowed on rv64 due to some minor build issues on 32-bit
platforms in the generic code. This allows users to kexec() using an FD
as opposed to a buffer.
Link: https://lore.kernel.org/all/20220408100914.150110-1-lizhengyu3@huawei.com/
* palmer/riscv-kexec_file:
RISC-V: Load purgatory in kexec_file
RISC-V: Add purgatory
RISC-V: Support for kexec_file on panic
RISC-V: Add kexec_file support
RISC-V: use memcpy for kexec_file mode
kexec_file: Fix kexec_file.c build error for riscv platform
|
|
default_topology[] uses cpu_clustergroup_mask() for the CLS level
(guarded by CONFIG_SCHED_CLUSTER) which is currently provided by x86
(arch/x86/kernel/smpboot.c) and arm64 (drivers/base/arch_topology.c).
Fixes: 778c558f49a2c ("sched: Add cluster scheduler level in core and
related Kconfig for ARM64")
Signed-off-by: Dietmar Eggemann <dietmar.eggemann@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Barry Song <baohua@kernel.org>
Link: https://lore.kernel.org/r/20220513093433.425163-1-dietmar.eggemann@arm.com
|
|
With gcc version 12.0.1 20220401 (Red Hat 12.0.1-0), building with
defconfig results in the following compilation error:
| CC mm/swapfile.o
| mm/swapfile.c: In function `setup_swap_info':
| mm/swapfile.c:2291:47: error: array subscript -1 is below array bounds
| of `struct plist_node[]' [-Werror=array-bounds]
| 2291 | p->avail_lists[i].prio = 1;
| | ~~~~~~~~~~~~~~^~~
| In file included from mm/swapfile.c:16:
| ./include/linux/swap.h:292:27: note: while referencing `avail_lists'
| 292 | struct plist_node avail_lists[]; /*
| | ^~~~~~~~~~~
This is due to the compiler detecting that the mask in
node_states[__state] could theoretically be zero, which would lead to
first_node() returning -1 through find_first_bit.
I believe that the warning/error is legitimate. I first tried adding a
test to check that the node mask is not emtpy, since a similar test exists
in the case where MAX_NUMNODES == 1.
However, adding the if statement causes other warnings to appear in
for_each_cpu_node_but, because it introduces a dangling else ambiguity.
And unfortunately, GCC is not smart enough to detect that the added test
makes the case where (node) == -1 impossible, so it still complains with
the same message.
This is why I settled on replacing that with a harmless, but relatively
useless (node) >= 0 test. Based on the warning for the dangling else, I
also decided to fix the case where MAX_NUMNODES == 1 by moving the
condition inside the for loop. It will still only be tested once. This
ensures that the meaning of an else following for_each_node_mask or
derivatives would not silently have a different meaning depending on the
configuration.
Link: https://lkml.kernel.org/r/20220414150855.2407137-3-dinechin@redhat.com
Signed-off-by: Christophe de Dinechin <christophe@dinechin.org>
Signed-off-by: Christophe de Dinechin <dinechin@redhat.com>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Ben Segall <bsegall@google.com>
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Dietmar Eggemann <dietmar.eggemann@arm.com>
Cc: Vincent Guittot <vincent.guittot@linaro.org>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Daniel Bristot de Oliveira <bristot@redhat.com>
Cc: Jason Wang <jasowang@redhat.com>
Cc: Zhen Lei <thunder.leizhen@huawei.com>
Cc: Juri Lelli <juri.lelli@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
We expect no warnings to be issued when we specify __GFP_NOWARN, but
currently in paths like alloc_pages() and kmalloc(), there are still some
warnings printed, fix it.
But for some warnings that report usage problems, we don't deal with them.
If such warnings are printed, then we should fix the usage problems.
Such as the following case:
WARN_ON_ONCE((gfp_flags & __GFP_NOFAIL) && (order > 1));
[zhengqi.arch@bytedance.com: v2]
Link: https://lkml.kernel.org/r/20220511061951.1114-1-zhengqi.arch@bytedance.com
Link: https://lkml.kernel.org/r/20220510113809.80626-1-zhengqi.arch@bytedance.com
Signed-off-by: Qi Zheng <zhengqi.arch@bytedance.com>
Cc: Akinobu Mita <akinobu.mita@gmail.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Jiri Slaby <jirislaby@kernel.org>
Cc: Steven Rostedt (Google) <rostedt@goodmis.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
The rmap locks(i_mmap_rwsem and anon_vma->root->rwsem) could be contended
under memory pressure if processes keep working on their vmas(e.g., fork,
mmap, munmap). It makes reclaim path stuck. In our real workload traces,
we see kswapd is waiting the lock for 300ms+(worst case, a sec) and it
makes other processes entering direct reclaim, which were also stuck on
the lock.
This patch makes lru aging path try_lock mode like shink_page_list so the
reclaim context will keep working with next lru pages without being stuck.
if it found the rmap lock contended, it rotates the page back to head of
lru in both active/inactive lrus to make them consistent behavior, which
is basic starting point rather than adding more heristic.
Since this patch introduces a new "contended" field as out-param along
with try_lock in-param in rmap_walk_control, it's not immutable any longer
if the try_lock is set so remove const keywords on rmap related functions.
Since rmap walking is already expensive operation, I doubt the const
would help sizable benefit( And we didn't have it until 5.17).
In a heavy app workload in Android, trace shows following statistics. It
almost removes rmap lock contention from reclaim path.
Martin Liu reported:
Before:
max_dur(ms) min_dur(ms) max-min(dur)ms avg_dur(ms) sum_dur(ms) count blocked_function
1632 0 1631 151.542173 31672 209 page_lock_anon_vma_read
601 0 601 145.544681 28817 198 rmap_walk_file
After:
max_dur(ms) min_dur(ms) max-min(dur)ms avg_dur(ms) sum_dur(ms) count blocked_function
NaN NaN NaN NaN NaN 0.0 NaN
0 0 0 0.127645 1 12 rmap_walk_file
[minchan@kernel.org: add comment, per Matthew]
Link: https://lkml.kernel.org/r/YnNqeB5tUf6LZ57b@google.com
Link: https://lkml.kernel.org/r/20220510215423.164547-1-minchan@kernel.org
Signed-off-by: Minchan Kim <minchan@kernel.org>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: John Dias <joaodias@google.com>
Cc: Tim Murray <timmurray@google.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
Cc: Martin Liu <liumartin@google.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Matthew Wilcox <willy@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Applications can currently escape their cgroup memory containment when
zswap is enabled. This patch adds per-cgroup tracking and limiting of
zswap backend memory to rectify this.
The existing cgroup2 memory.stat file is extended to show zswap statistics
analogous to what's in meminfo and vmstat. Furthermore, two new control
files, memory.zswap.current and memory.zswap.max, are added to allow
tuning zswap usage on a per-workload basis. This is important since not
all workloads benefit from zswap equally; some even suffer compared to
disk swap when memory contents don't compress well. The optimal size of
the zswap pool, and the threshold for writeback, also depends on the size
of the workload's warm set.
The implementation doesn't use a traditional page_counter transaction.
zswap is unconventional as a memory consumer in that we only know the
amount of memory to charge once expensive compression has occurred. If
zwap is disabled or the limit is already exceeded we obviously don't want
to compress page upon page only to reject them all. Instead, the limit is
checked against current usage, then we compress and charge. This allows
some limit overrun, but not enough to matter in practice.
[hannes@cmpxchg.org: fix for CONFIG_SLOB builds]
Link: https://lkml.kernel.org/r/YnwD14zxYjUJPc2w@cmpxchg.org
[hannes@cmpxchg.org: opt out of cgroups v1]
Link: https://lkml.kernel.org/r/Yn6it9mBYFA+/lTb@cmpxchg.org
Link: https://lkml.kernel.org/r/20220510152847.230957-7-hannes@cmpxchg.org
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Roman Gushchin <guro@fb.com>
Cc: Shakeel Butt <shakeelb@google.com>
Cc: Seth Jennings <sjenning@redhat.com>
Cc: Dan Streetman <ddstreet@ieee.org>
Cc: Minchan Kim <minchan@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|