Age | Commit message (Collapse) | Author |
|
With all users converted to ->launder_folio, remove ->launder_page.
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Tested-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
Acked-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
Tested-by: Mike Marshall <hubcap@omnibond.com> # orangefs
Tested-by: David Howells <dhowells@redhat.com> # afs
|
|
Since the only difference between ->launder_page and ->launder_folio
is the type of the pointer, these can safely use a union without
affecting bisectability.
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Tested-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
Acked-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
Tested-by: Mike Marshall <hubcap@omnibond.com> # orangefs
Tested-by: David Howells <dhowells@redhat.com> # afs
|
|
With all users migrated to ->invalidate_folio, remove the old operation.
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Tested-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
Acked-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
Tested-by: Mike Marshall <hubcap@omnibond.com> # orangefs
Tested-by: David Howells <dhowells@redhat.com> # afs
|
|
Print the folio index instead of the pointer, since this is more
useful. We also don't need to use page_file_mapping() as we do not
invalidate swapcache pages. Since this is the only caller of
nfs_wb_page_cancel(), convert it to nfs_wb_folio_cancel().
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Tested-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
Acked-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
Tested-by: Mike Marshall <hubcap@omnibond.com> # orangefs
Tested-by: David Howells <dhowells@redhat.com> # afs
|
|
Extensive changes, but fairly mechanical.
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Tested-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
Acked-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
Tested-by: Mike Marshall <hubcap@omnibond.com> # orangefs
Tested-by: David Howells <dhowells@redhat.com> # afs
|
|
We used to have to use noop_invalidatepage() to prevent
block_invalidatepage() from being called, but that behaviour is now gone.
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Tested-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
Acked-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
Tested-by: Mike Marshall <hubcap@omnibond.com> # orangefs
Tested-by: David Howells <dhowells@redhat.com> # afs
|
|
Remove special-casing of a NULL invalidatepage, since there is no
more block_invalidatepage.
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Tested-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
Acked-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
Tested-by: Mike Marshall <hubcap@omnibond.com> # orangefs
Tested-by: David Howells <dhowells@redhat.com> # afs
|
|
Use iomap_invalidate_folio() in all the iomap-based filesystems
and rename the iomap_invalidatepage tracepoint.
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Tested-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
Acked-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
Tested-by: Mike Marshall <hubcap@omnibond.com> # orangefs
Tested-by: David Howells <dhowells@redhat.com> # afs
|
|
This is used in preference to invalidatepage, if defined.
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Tested-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
Acked-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
Tested-by: Mike Marshall <hubcap@omnibond.com> # orangefs
Tested-by: David Howells <dhowells@redhat.com> # afs
|
|
Take a folio instead of a page, fix the types of the offset & length,
and export it to filesystems.
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Tested-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
Acked-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
Tested-by: Mike Marshall <hubcap@omnibond.com> # orangefs
Tested-by: David Howells <dhowells@redhat.com> # afs
|
|
Conflicts:
arch/x86/include/asm/cpufeatures.h
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
Conflicts:
tools/objtool/arch/x86/decode.c
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
Because we need a variant for .S files too.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/Yi9gOW9f1GGwwUD6@hirez.programming.kicks-ass.net
|
|
vmlinux.o: warning: objtool: ksys_unshare()+0x36c: unreachable instruction
0000 0000000000067040 <ksys_unshare>:
...
0364 673a4: 4c 89 ef mov %r13,%rdi
0367 673a7: e8 00 00 00 00 call 673ac <ksys_unshare+0x36c> 673a8: R_X86_64_PLT32 __invalid_creds-0x4
036c 673ac: e9 28 ff ff ff jmp 672d9 <ksys_unshare+0x299>
0371 673b1: 41 bc f4 ff ff ff mov $0xfffffff4,%r12d
0377 673b7: e9 80 fd ff ff jmp 6713c <ksys_unshare+0xfc>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/Yi9gOW9f1GGwwUD6@hirez.programming.kicks-ass.net
|
|
vmlinux.o: warning: objtool: get_signal()+0x108: unreachable instruction
0000 000000000007f930 <get_signal>:
...
0103 7fa33: e8 00 00 00 00 call 7fa38 <get_signal+0x108> 7fa34: R_X86_64_PLT32 do_group_exit-0x4
0108 7fa38: 41 8b 45 74 mov 0x74(%r13),%eax
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Josh Poimboeuf <jpoimboe@redhat.com>
Link: https://lore.kernel.org/r/20220308154319.351270711@infradead.org
|
|
Since the references to the module init/exit points only have external
references, a module LTO run will consider them 'unused' and seal
them, leading to an immediate fail on module load.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Josh Poimboeuf <jpoimboe@redhat.com>
Link: https://lore.kernel.org/r/20220308154319.113767246@infradead.org
|
|
In order to allow kprobes to skip the ENDBR instructions at sym+0 for
X86_KERNEL_IBT builds, change _kprobe_addr() to take an architecture
callback to inspect the function at hand and modify the offset if
needed.
This streamlines the existing interface to cover more cases and
require less hooks. Once PowerPC gets fully converted there will only
be the one arch hook.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Masami Hiramatsu <mhiramat@kernel.org>
Acked-by: Josh Poimboeuf <jpoimboe@redhat.com>
Link: https://lore.kernel.org/r/20220308154318.405947704@infradead.org
|
|
In order to have objtool warn about code references to !ENDBR
instruction, we need an annotation to allow this for non-control-flow
instances -- consider text range checks, text patching, or return
trampolines etc.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Kees Cook <keescook@chromium.org>
Acked-by: Josh Poimboeuf <jpoimboe@redhat.com>
Link: https://lore.kernel.org/r/20220308154317.578968224@infradead.org
|
|
Enjoy the cleanups and avoid conflicts vs linkage
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
|
|
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
Merge the scheduler build speedup of the fast-headers tree.
Cumulative scheduler (kernel/sched/) build time speedup on a
Linux distribution's config, which enables all scheduler features,
compared to the vanilla kernel:
_____________________________________________________________________________
|
| Vanilla kernel (v5.13-rc7):
|_____________________________________________________________________________
|
| Performance counter stats for 'make -j96 kernel/sched/' (3 runs):
|
| 126,975,564,374 instructions # 1.45 insn per cycle ( +- 0.00% )
| 87,637,847,671 cycles # 3.959 GHz ( +- 0.30% )
| 22,136.96 msec cpu-clock # 7.499 CPUs utilized ( +- 0.29% )
|
| 2.9520 +- 0.0169 seconds time elapsed ( +- 0.57% )
|_____________________________________________________________________________
|
| Patched kernel:
|_____________________________________________________________________________
|
| Performance counter stats for 'make -j96 kernel/sched/' (3 runs):
|
| 50,420,496,914 instructions # 1.47 insn per cycle ( +- 0.00% )
| 34,234,322,038 cycles # 3.946 GHz ( +- 0.31% )
| 8,675.81 msec cpu-clock # 3.053 CPUs utilized ( +- 0.45% )
|
| 2.8420 +- 0.0181 seconds time elapsed ( +- 0.64% )
|_____________________________________________________________________________
Summary:
- CPU time used to build the scheduler dropped by -60.9%, a reduction
from 22.1 clock-seconds to 8.7 clock-seconds.
- Wall-clock time to build the scheduler dropped by -3.9%, a reduction
from 2.95 seconds to 2.84 seconds.
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
Let's introduce a common library file for the physmap show function
duplicated between three different keyboard drivers. This largely copies
the code from cros_ec_keyb.c which has the most recent version of the
show function, while using the vivaldi_data struct from the hid-vivaldi
driver. This saves a small amount of space in an allyesconfig build.
$ ./scripts/bloat-o-meter vmlinux.before vmlinux.after
add/remove: 3/0 grow/shrink: 2/3 up/down: 412/-720 (-308)
Function old new delta
vivaldi_function_row_physmap_show - 292 +292
_sub_I_65535_1 1057564 1057616 +52
_sub_D_65535_0 1057564 1057616 +52
e843419@49f2_00062737_9b04 - 8 +8
e843419@20f6_0002a34d_35bc - 8 +8
atkbd_parse_fwnode_data 480 472 -8
atkbd_do_show_function_row_physmap 316 76 -240
function_row_physmap_show 620 148 -472
Total: Before=285581925, After=285581617, chg -0.00%
Signed-off-by: Stephen Boyd <swboyd@chromium.org>
Tested-by: Stephen Boyd <swboyd@chromium.org> # coachz, wormdingler
Link: https://lore.kernel.org/r/20220228075446.466016-3-dmitry.torokhov@gmail.com
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
|
|
Sync up with mainline to again get the latest changes in HID subsystem.
|
|
a647a524a467 ("block: don't call rq_qos_ops->done_bio if the bio isn't
tracked") made bio_endio() skip rq_qos_done_bio() if BIO_TRACKED is not set.
While this fixed a potential oops, it also broke blk-iocost by skipping the
done_bio callback for merged bios.
Before, whether a bio goes through rq_qos_throttle() or rq_qos_merge(),
rq_qos_done_bio() would be called on the bio on completion with BIO_TRACKED
distinguishing the former from the latter. rq_qos_done_bio() is not called
for bios which wenth through rq_qos_merge(). This royally confuses
blk-iocost as the merged bios never finish and are considered perpetually
in-flight.
One reliably reproducible failure mode is an intermediate cgroup geting
stuck active preventing its children from being activated due to the
leaf-only rule, leading to loss of control. The following is from
resctl-bench protection scenario which emulates isolating a web server like
workload from a memory bomb run on an iocost configuration which should
yield a reasonable level of protection.
# cat /sys/block/nvme2n1/device/model
Samsung SSD 970 PRO 512GB
# cat /sys/fs/cgroup/io.cost.model
259:0 ctrl=user model=linear rbps=834913556 rseqiops=93622 rrandiops=102913 wbps=618985353 wseqiops=72325 wrandiops=71025
# cat /sys/fs/cgroup/io.cost.qos
259:0 enable=1 ctrl=user rpct=95.00 rlat=18776 wpct=95.00 wlat=8897 min=60.00 max=100.00
# resctl-bench -m 29.6G -r out.json run protection::scenario=mem-hog,loops=1
...
Memory Hog Summary
==================
IO Latency: R p50=242u:336u/2.5m p90=794u:1.4m/7.5m p99=2.7m:8.0m/62.5m max=8.0m:36.4m/350m
W p50=221u:323u/1.5m p90=709u:1.2m/5.5m p99=1.5m:2.5m/9.5m max=6.9m:35.9m/350m
Isolation and Request Latency Impact Distributions:
min p01 p05 p10 p25 p50 p75 p90 p95 p99 max mean stdev
isol% 15.90 15.90 15.90 40.05 57.24 59.07 60.01 74.63 74.63 90.35 90.35 58.12 15.82
lat-imp% 0 0 0 0 0 4.55 14.68 15.54 233.5 548.1 548.1 53.88 143.6
Result: isol=58.12:15.82% lat_imp=53.88%:143.6 work_csv=100.0% missing=3.96%
The isolation result of 58.12% is close to what this device would show
without any IO control.
Fix it by introducing a new flag BIO_QOS_MERGED to mark merged bios and
calling rq_qos_done_bio() on them too. For consistency and clarity, rename
BIO_TRACKED to BIO_QOS_THROTTLED. The flag checks are moved into
rq_qos_done_bio() so that it's next to the code paths that set the flags.
With the patch applied, the above same benchmark shows:
# resctl-bench -m 29.6G -r out.json run protection::scenario=mem-hog,loops=1
...
Memory Hog Summary
==================
IO Latency: R p50=123u:84.4u/985u p90=322u:256u/2.5m p99=1.6m:1.4m/9.5m max=11.1m:36.0m/350m
W p50=429u:274u/995u p90=1.7m:1.3m/4.5m p99=3.4m:2.7m/11.5m max=7.9m:5.9m/26.5m
Isolation and Request Latency Impact Distributions:
min p01 p05 p10 p25 p50 p75 p90 p95 p99 max mean stdev
isol% 84.91 84.91 89.51 90.73 92.31 94.49 96.36 98.04 98.71 100.0 100.0 94.42 2.81
lat-imp% 0 0 0 0 0 2.81 5.73 11.11 13.92 17.53 22.61 4.10 4.68
Result: isol=94.42:2.81% lat_imp=4.10%:4.68 work_csv=58.34% missing=0%
Signed-off-by: Tejun Heo <tj@kernel.org>
Fixes: a647a524a467 ("block: don't call rq_qos_ops->done_bio if the bio isn't tracked")
Cc: stable@vger.kernel.org # v5.15+
Cc: Ming Lei <ming.lei@redhat.com>
Cc: Yu Kuai <yukuai3@huawei.com>
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Link: https://lore.kernel.org/r/Yi7rdrzQEHjJLGKB@slm.duckdns.org
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Since the uptodate property is maintained on a per-folio basis, the
is_partially_uptodate method should also take a folio. Fix the types
at the same time so it's clear that it returns true/false and takes
the count in bytes, not blocks.
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Tested-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
Acked-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
Tested-by: Mike Marshall <hubcap@omnibond.com> # orangefs
Tested-by: David Howells <dhowells@redhat.com> # afs
|
|
While there is no intent to use large folios in filesystems using buffer
heads, converting the filesystems to use single-page folios is still worth
doing to remove legacy infrastructure and hidden calls to compound_head().
These helper functions are needed for that conversion to take place.
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Tested-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
Acked-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
Tested-by: Mike Marshall <hubcap@omnibond.com> # orangefs
Tested-by: David Howells <dhowells@redhat.com> # afs
|
|
While read_cache_page() takes a void *, because you can pass a
pointer to anything as the first argument of filler_t, if we
are calling read_mapping_page(), it will be passed as the first
argument of ->readpage, so we know this must be a struct file
pointer, and we should let the compiler enforce that for us.
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Tested-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
Acked-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
Tested-by: Mike Marshall <hubcap@omnibond.com> # orangefs
Tested-by: David Howells <dhowells@redhat.com> # afs
|
|
Merge in the latest Spectre mess to fix up conflicts with what was
already queued for 5.18 when the embargo finally lifted.
* for-next/spectre-bhb: (21 commits)
arm64: Do not include __READ_ONCE() block in assembly files
arm64: proton-pack: Include unprivileged eBPF status in Spectre v2 mitigation reporting
arm64: Use the clearbhb instruction in mitigations
KVM: arm64: Allow SMCCC_ARCH_WORKAROUND_3 to be discovered and migrated
arm64: Mitigate spectre style branch history side channels
arm64: proton-pack: Report Spectre-BHB vulnerabilities as part of Spectre-v2
arm64: Add percpu vectors for EL1
arm64: entry: Add macro for reading symbol addresses from the trampoline
arm64: entry: Add vectors that have the bhb mitigation sequences
arm64: entry: Add non-kpti __bp_harden_el1_vectors for mitigations
arm64: entry: Allow the trampoline text to occupy multiple pages
arm64: entry: Make the kpti trampoline's kpti sequence optional
arm64: entry: Move trampoline macros out of ifdef'd section
arm64: entry: Don't assume tramp_vectors is the start of the vectors
arm64: entry: Allow tramp_alias to access symbols after the 4K boundary
arm64: entry: Move the trampoline data page before the text page
arm64: entry: Free up another register on kpti's tramp_exit path
arm64: entry: Make the trampoline cleanup optional
KVM: arm64: Allow indirect vectors to be used without SPECTRE_V3A
arm64: spectre: Rename spectre_v4_patch_fw_mitigation_conduit
...
|
|
* for-next/perf: (25 commits)
perf/marvell: Fix !CONFIG_OF build for CN10K DDR PMU driver
drivers/perf: Add Apple icestorm/firestorm CPU PMU driver
drivers/perf: arm_pmu: Handle 47 bit counters
arm64: perf: Consistently make all event numbers as 16-bits
arm64: perf: Expose some Armv9 common events under sysfs
perf/marvell: cn10k DDR perf event core ownership
perf/marvell: cn10k DDR perfmon event overflow handling
perf/marvell: CN10k DDR performance monitor support
dt-bindings: perf: marvell: cn10k ddr performance monitor
perf/arm-cmn: Update watchpoint format
perf/arm-cmn: Hide XP PUB events for CMN-600
perf: replace bitmap_weight with bitmap_empty where appropriate
perf: Replace acpi_bus_get_device()
perf/marvell_cn10k: Fix unused variable warning when W=1 and CONFIG_OF=n
perf/arm-cmn: Make arm_cmn_debugfs static
perf: MARVELL_CN10K_TAD_PMU should depend on ARCH_THUNDER
perf/arm-ccn: Use platform_get_irq() to get the interrupt
irqchip/apple-aic: Move PMU-specific registers to their own include file
arm64: dts: apple: Add t8303 PMU nodes
arm64: dts: apple: Add t8103 PMU interrupt affinities
...
|
|
* for-next/mte:
docs: sysfs-devices-system-cpu: document "asymm" value for mte_tcf_preferred
arm64/mte: Remove asymmetric mode from the prctl() interface
kasan: fix a missing header include of static_keys.h
arm64/mte: Add userspace interface for enabling asymmetric mode
arm64/mte: Add hwcap for asymmetric mode
arm64/mte: Add a little bit of documentation for mte_update_sctlr_user()
arm64/mte: Document ABI for asymmetric mode
arm64: mte: avoid clearing PSTATE.TCO on entry unless necessary
kasan: split kasan_*enabled() functions into a separate header
|
|
* for-next/linkage:
arm64: module: remove (NOLOAD) from linker script
linkage: remove SYM_FUNC_{START,END}_ALIAS()
x86: clean up symbol aliasing
arm64: clean up symbol aliasing
linkage: add SYM_FUNC_ALIAS{,_LOCAL,_WEAK}()
|
|
Conflicts:
drivers/pinctrl/pinctrl-starfive.c
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
syzbot was kind enough to remind us that dev->{tx_dropped|rx_dropped}
could be increased in process context.
BUG: using smp_processor_id() in preemptible [00000000] code: syz-executor413/3593
caller is netdev_core_stats_alloc+0x98/0x110 net/core/dev.c:10298
CPU: 1 PID: 3593 Comm: syz-executor413 Not tainted 5.17.0-rc7-syzkaller-02426-g97aeb877de7f #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Call Trace:
<TASK>
__dump_stack lib/dump_stack.c:88 [inline]
dump_stack_lvl+0xcd/0x134 lib/dump_stack.c:106
check_preemption_disabled+0x16b/0x170 lib/smp_processor_id.c:49
netdev_core_stats_alloc+0x98/0x110 net/core/dev.c:10298
dev_core_stats include/linux/netdevice.h:3855 [inline]
dev_core_stats_rx_dropped_inc include/linux/netdevice.h:3866 [inline]
tun_get_user+0x3455/0x3ab0 drivers/net/tun.c:1800
tun_chr_write_iter+0xe1/0x200 drivers/net/tun.c:2015
call_write_iter include/linux/fs.h:2074 [inline]
new_sync_write+0x431/0x660 fs/read_write.c:503
vfs_write+0x7cd/0xae0 fs/read_write.c:590
ksys_write+0x12d/0x250 fs/read_write.c:643
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x44/0xae
RIP: 0033:0x7f2cf4f887e3
Code: 5d 41 5c 41 5d 41 5e e9 9b fd ff ff 66 2e 0f 1f 84 00 00 00 00 00 90 64 8b 04 25 18 00 00 00 85 c0 75 14 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 55 c3 0f 1f 40 00 48 83 ec 28 48 89 54 24 18
RSP: 002b:00007ffd50dd5fd8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
RAX: ffffffffffffffda RBX: 00007ffd50dd6000 RCX: 00007f2cf4f887e3
RDX: 000000000000002a RSI: 0000000000000000 RDI: 00000000000000c8
RBP: 0000000000000003 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
R13: 00007ffd50dd5ff0 R14: 00007ffd50dd5fe8 R15: 00007ffd50dd5fe4
</TASK>
Fixes: 625788b58445 ("net: add per-cpu storage and net->core_stats")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: jeffreyji <jeffreyji@google.com>
Cc: Brian Vazquez <brianvv@google.com>
Acked-by: Paolo Abeni <pabeni@redhat.com>
Link: https://lore.kernel.org/r/20220312214505.3294762-1-eric.dumazet@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
partid and partname debugfs files were used just by SPI NOR, but they were
replaced by sysfs entries. Since these debugfs files are no longer used in
mtd, remove dead code. The directory is kept as it is used by nandsim,
mtdswap and docg3.
Signed-off-by: Tudor Ambarus <tudor.ambarus@microchip.com>
Reviewed-by: Pratyush Yadav <p.yadav@ti.com>
Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com>
Link: https://lore.kernel.org/linux-mtd/20220225144656.634682-1-tudor.ambarus@microchip.com
|
|
Encoded I/O in Btrfs needs to check a write with a given logical size
without an iov_iter that matches that size (because the iov_iter we have
is for the compressed data). So, factor out the parts of
generic_write_check() that don't need an iov_iter into a new
generic_write_checks_count() function and export that.
Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: Omar Sandoval <osandov@fb.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
|
|
I'm adding btrfs ioctls to read and write compressed data, and rather
than duplicating the checks in rw_verify_area(), let's just export it.
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Omar Sandoval <osandov@fb.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
|
|
____napi_schedule() needs to be invoked with disabled interrupts due to
__raise_softirq_irqoff (in order not to corrupt the per-CPU list).
____napi_schedule() needs also to be invoked from an interrupt context
so that the raised-softirq is processed while the interrupt context is
left.
Add lockdep asserts for both conditions.
While this is the second time the irq/softirq check is needed, provide a
generic lockdep_assert_softirq_will_run() which is used by both caller.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Add net device refcount tracker to macvlan.
Signed-off-by: Ziyang Xuan <william.xuanziyang@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/maz/arm-platforms into irq/core
Pull irqchip updates from Marc Zyngier:
- Add support for the STM32MP13 variant
- Move parent device away from struct irq_chip
- Remove all instances of non-const strings assigned to
struct irq_chip::name, enabling a nice cleanup for VIC and GIC)
- Simplify the Qualcomm PDC driver
- A bunch of SiFive PLIC cleanups
- Add support for a new variant of the Meson GPIO block
- Add support for the irqchip side of the Apple M1 PMU
- Add support for the Apple M1 Pro/Max AICv2 irqchip
- Add support for the Qualcomm MPM wakeup gadget
- Move the Xilinx driver over to the generic irqdomain handling
- Tiny speedup for IPIs on GICv3 systems
- The usual odd cleanups
Link: https://lore.kernel.org/all/20220313105142.704579-1-maz@kernel.org
|
|
1/ Taking the i_rwsem for swap IO triggers lockdep warnings regarding
possible deadlocks with "fs_reclaim". These deadlocks could, I believe,
eventuate if a buffered read on the swapfile was attempted.
We don't need coherence with the page cache for a swap file, and
buffered writes are forbidden anyway. There is no other need for
i_rwsem during direct IO. So never take it for swap_rw()
2/ generic_write_checks() explicitly forbids writes to swap, and
performs checks that are not needed for swap. So bypass it
for swap_rw().
Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
|
|
If we are swapping over NFSv4, we may not be able to allocate memory to
start the state-manager thread at the time when we need it.
So keep it always running when swap is enabled, and just signal it to
start.
This requires updating and testing the cl_swapper count on the root
rpc_clnt after following all ->cl_parent links.
Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
|
|
NFS_RPC_SWAPFLAGS is only used for READ requests.
It sets RPC_TASK_SWAPPER which gives some memory-allocation priority to
requests. This is not needed for swap READ - though it is for writes
where it is set via a different mechanism.
RPC_TASK_ROOTCREDS causes the 'machine' credential to be used.
This is not needed as the root credential is saved when the swap file is
opened, and this is used for all IO.
So NFS_RPC_SWAPFLAGS isn't needed, and as it is the only user of
RPC_TASK_ROOTCREDS, that isn't needed either.
Remove both.
Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
|
|
When memory is short, new worker threads cannot be created and we depend
on the minimum one rpciod thread to be able to handle everything. So it
must not block waiting for memory.
mempools are particularly a problem as memory can only be released back
to the mempool by an async rpc task running. If all available workqueue
threads are waiting on the mempool, no thread is available to return
anything.
lookup_cred() can block on a mempool or kmalloc - and this can cause
deadlocks. So add a new RPCAUTH_LOOKUP flag for async lookups and don't
block on memory. If the -ENOMEM gets back to call_refreshresult(), wait
a short while and try again. HZ>>4 is chosen as it is used elsewhere
for -ENOMEM retries.
Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
|
|
During the system suspend path we must set all queues to operate in
polled mode as it is possible for any protocol built using this mailbox,
such as TISCI, to require communication during the no irq phase of suspend,
and we cannot rely on interrupts there.
Polled mode is implemented by allowing the mailbox user to define an
RX channel as part of the message that is sent which is what gets polled
for a response. If polled mode is enabled, this will immediately be
polled for a response at the end of the mailbox send_data op before
returning success for the data send or timing out if no response is
received.
Finally, to ensure polled mode is always enabled during system suspend,
iterate through all queues to set RX queues to polled mode during system
suspend and disable polled mode for all in the resume handler.
Signed-off-by: Dave Gerlach <d-gerlach@ti.com>
Signed-off-by: Jassi Brar <jaswinder.singh@linaro.org>
|
|
Drivers such as WireGuard need to learn when VMs fork in order to clear
sessions. This commit provides a simple notifier_block for that, with a
register and unregister function. When no VM fork detection is compiled
in, this turns into a no-op, similar to how the power notifier works.
Cc: Dominik Brodowski <linux@dominikbrodowski.net>
Cc: Theodore Ts'o <tytso@mit.edu>
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
|
|
We previously rolled our own randomness readiness notifier, which only
has two users in the whole kernel. Replace this with a more standard
atomic notifier block that serves the same purpose with less code. Also
unexport the symbols, because no modules use it, only unconditional
builtins. The only drawback is that it's possible for a notification
handler returning the "stop" code to prevent further processing, but
given that there are only two users, and that we're unexporting this
anyway, that doesn't seem like a significant drawback for the
simplification we receive here.
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Theodore Ts'o <tytso@mit.edu>
Reviewed-by: Dominik Brodowski <linux@dominikbrodowski.net>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
|
|
Since add_vmfork_randomness() is only called from vmgenid.o, we can
guard it in CONFIG_VMGENID, similarly to how we do with
add_disk_randomness() and CONFIG_BLOCK. If we ever have multiple things
calling into add_vmfork_randomness(), we can add another shared Kconfig
symbol for that, but for now, this is good enough. Even though
add_vmfork_randomess() is a pretty small function, removing it means
that there are only calls to crng_reseed(false) and none to
crng_reseed(true), which means the compiler can constant propagate the
false, removing branches from crng_reseed() and its descendants.
Additionally, we don't even need the symbol to be exported if
CONFIG_VMGENID is not a module, so conditionalize that too.
Cc: Dominik Brodowski <linux@dominikbrodowski.net>
Cc: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
|
|
We create a list of ACPI "PNP" IDs which contains _HID, _CID, and CLS
entries of the respective devices. However, when making structs for
matching, we squeeze those IDs into acpi_device_id, which only has 9
bytes space to store the identifier. The subsystem actually captures the
full length of the IDs, and the modalias has the full length, but this
struct we use for matching is limited. It originally had 16 bytes, but
was changed to only have 9 in 6543becf26ff ("mod/file2alias: make
modalias generation safe for cross compiling"), presumably on the theory
that it would match the ACPI spec so it didn't matter.
Unfortunately, while most people adhere to the ACPI specs, Microsoft
decided that its VM Generation Counter device [1] should only be
identifiable by _CID with a value of "VM_Gen_Counter", which is longer
than 9 characters.
To allow device drivers to match identifiers that exceed the 9 byte
limit, this simply ups the length to 16, just like it was before the
aforementioned commit. Empirical testing indicates that this
doesn't actually increase vmlinux size on 64-bit, because the ulong in
the same struct caused there to be 7 bytes of padding anyway, and when
doing a s/M/Y/g i386_defconfig build, the bzImage only increased by
0.0055%, so negligible.
This patch is a prerequisite to add support for VMGenID in Linux, the
subsequent patch in this series. It has been confirmed to also work on
the udev/modalias side in userspace.
[1] https://download.microsoft.com/download/3/1/C/31CFC307-98CA-4CA5-914C-D9772691E214/VirtualMachineGenerationID.docx
Signed-off-by: Alexander Graf <graf@amazon.com>
Co-developed-by: Jason A. Donenfeld <Jason@zx2c4.com>
[Jason: reworked commit message a bit, went with len=16 approach.]
Cc: Mika Westerberg <mika.westerberg@linux.intel.com>
Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Cc: Len Brown <lenb@kernel.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Reviewed-by: Ard Biesheuvel <ardb@kernel.org>
Acked-by: Hans de Goede <hdegoede@redhat.com>
Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
|
|
When a VM forks, we must immediately mix in additional information to
the stream of random output so that two forks or a rollback don't
produce the same stream of random numbers, which could have catastrophic
cryptographic consequences. This commit adds a simple API, add_vmfork_
randomness(), for that, by force reseeding the crng.
This has the added benefit of also draining the entropy pool and setting
its timer back, so that any old entropy that was there prior -- which
could have already been used by a different fork, or generally gone
stale -- does not contribute to the accounting of the next 256 bits.
Cc: Dominik Brodowski <linux@dominikbrodowski.net>
Cc: Theodore Ts'o <tytso@mit.edu>
Cc: Jann Horn <jannh@google.com>
Cc: Eric Biggers <ebiggers@google.com>
Reviewed-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
|
|
This topic has come up countless times, and usually doesn't go anywhere.
This time I thought I'd bring it up with a slightly narrower focus,
updated for some developments over the last three years: we finally can
make /dev/urandom always secure, in light of the fact that our RNG is
now always seeded.
Ever since Linus' 50ee7529ec45 ("random: try to actively add entropy
rather than passively wait for it"), the RNG does a haveged-style jitter
dance around the scheduler, in order to produce entropy (and credit it)
for the case when we're stuck in wait_for_random_bytes(). How ever you
feel about the Linus Jitter Dance is beside the point: it's been there
for three years and usually gets the RNG initialized in a second or so.
As a matter of fact, this is what happens currently when people use
getrandom(). It's already there and working, and most people have been
using it for years without realizing.
So, given that the kernel has grown this mechanism for seeding itself
from nothing, and that this procedure happens pretty fast, maybe there's
no point any longer in having /dev/urandom give insecure bytes. In the
past we didn't want the boot process to deadlock, which was
understandable. But now, in the worst case, a second goes by, and the
problem is resolved. It seems like maybe we're finally at a point when
we can get rid of the infamous "urandom read hole".
The one slight drawback is that the Linus Jitter Dance relies on random_
get_entropy() being implemented. The first lines of try_to_generate_
entropy() are:
stack.now = random_get_entropy();
if (stack.now == random_get_entropy())
return;
On most platforms, random_get_entropy() is simply aliased to get_cycles().
The number of machines without a cycle counter or some other
implementation of random_get_entropy() in 2022, which can also run a
mainline kernel, and at the same time have a both broken and out of date
userspace that relies on /dev/urandom never blocking at boot is thought
to be exceedingly low. And to be clear: those museum pieces without
cycle counters will continue to run Linux just fine, and even
/dev/urandom will be operable just like before; the RNG just needs to be
seeded first through the usual means, which should already be the case
now.
On systems that really do want unseeded randomness, we already offer
getrandom(GRND_INSECURE), which is in use by, e.g., systemd for seeding
their hash tables at boot. Nothing in this commit would affect
GRND_INSECURE, and it remains the means of getting those types of random
numbers.
This patch goes a long way toward eliminating a long overdue userspace
crypto footgun. After several decades of endless user confusion, we will
finally be able to say, "use any single one of our random interfaces and
you'll be fine. They're all the same. It doesn't matter." And that, I
think, is really something. Finally all of those blog posts and
disagreeing forums and contradictory articles will all become correct
about whatever they happened to recommend, and along with it, a whole
class of vulnerabilities eliminated.
With very minimal downside, we're finally in a position where we can
make this change.
Cc: Dinh Nguyen <dinguyen@kernel.org>
Cc: Nick Hu <nickhu@andestech.com>
Cc: Max Filippov <jcmvbkbc@gmail.com>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Cc: Michal Simek <monstr@monstr.eu>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Guo Ren <guoren@kernel.org>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Joshua Kinard <kumba@gentoo.org>
Cc: David Laight <David.Laight@aculab.com>
Cc: Dominik Brodowski <linux@dominikbrodowski.net>
Cc: Eric Biggers <ebiggers@google.com>
Cc: Ard Biesheuvel <ardb@kernel.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Kees Cook <keescook@chromium.org>
Cc: Lennart Poettering <mzxreary@0pointer.de>
Cc: Konstantin Ryabitsev <konstantin@linuxfoundation.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
|