Age | Commit message (Collapse) | Author |
|
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull misc fixes from Andrew Morton:
"Nine hotfixes.
Six for MM, three for other areas. Four of these patches address
post-6.0 issues"
* tag 'mm-hotfixes-stable-2022-12-10-1' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm:
memcg: fix possible use-after-free in memcg_write_event_control()
MAINTAINERS: update Muchun Song's email
mm/gup: fix gup_pud_range() for dax
mmap: fix do_brk_flags() modifying obviously incorrect VMAs
mm/swap: fix SWP_PFN_BITS with CONFIG_PHYS_ADDR_T_64BIT on 32bit
tmpfs: fix data loss from failed fallocate
kselftests: cgroup: update kmem test precision tolerance
mm: do not BUG_ON missing brk mapping, because userspace can unmap it
mailmap: update Matti Vaittinen's email address
|
|
Change the run_estimation flag to start/stop the kthread tasks.
Signed-off-by: Julian Anastasov <ja@ssi.bg>
Cc: yunhong-cgl jiang <xintian1976@gmail.com>
Cc: "dust.li" <dust.li@linux.alibaba.com>
Reviewed-by: Jiri Wiesner <jwiesner@suse.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
Allow the kthreads for stats to be configured for
specific cpulist (isolation) and niceness (scheduling
priority).
Signed-off-by: Julian Anastasov <ja@ssi.bg>
Cc: yunhong-cgl jiang <xintian1976@gmail.com>
Cc: "dust.li" <dust.li@linux.alibaba.com>
Reviewed-by: Jiri Wiesner <jwiesner@suse.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
Estimating all entries in single list in timer context
by single CPU causes large latency with multiple IPVS rules
as reported in [1], [2], [3].
Spread the estimator structures in multiple chains and
use kthread(s) for the estimation. The chains are processed
in multiple (50) timer ticks to ensure the 2-second interval
between estimations with some accuracy. Every chain is
processed under RCU lock.
Every kthread works over its own data structure and all
such contexts are attached to array. The contexts can be
preserved while the kthread tasks are stopped or restarted.
When estimators are removed, unused kthread contexts are
released and the slots in array are left empty.
First kthread determines parameters to use, eg. maximum
number of estimators to process per kthread based on
chain's length (chain_max), allowing sub-100us cond_resched
rate and estimation taking up to 1/8 of the CPU capacity
to avoid any problems if chain_max is not correctly
calculated.
chain_max is calculated taking into account factors
such as CPU speed and memory/cache speed where the
cache_factor (4) is selected from real tests with
current generation of CPU/NUMA configurations to
correct the difference in CPU usage between
cached (during calc phase) and non-cached (working) state
of the estimated per-cpu data.
First kthread also plays the role of distributor of
added estimators to all kthreads, keeping low the
time to add estimators. The optimization is based on
the fact that newly added estimator should be estimated
after 2 seconds, so we have the time to offload the
adding to chain from controlling process to kthread 0.
The allocated kthread context may grow from 1 to 50
allocated structures for timer ticks which saves memory for
setups with small number of estimators.
We also add delayed work est_reload_work that will
make sure the kthread tasks are properly started/stopped.
ip_vs_start_estimator() is changed to report errors
which allows to safely store the estimators in
allocated structures.
Many thanks to Jiri Wiesner for his valuable comments
and for spending a lot of time reviewing and testing
the changes on different platforms with 48-256 CPUs and
1-8 NUMA nodes under different cpufreq governors.
[1] Report from Yunhong Jiang:
https://lore.kernel.org/netdev/D25792C1-1B89-45DE-9F10-EC350DC04ADC@gmail.com/
[2]
https://marc.info/?l=linux-virtual-server&m=159679809118027&w=2
[3] Report from Dust:
https://archive.linuxvirtualserver.org/html/lvs-devel/2020-12/msg00000.html
Signed-off-by: Julian Anastasov <ja@ssi.bg>
Cc: yunhong-cgl jiang <xintian1976@gmail.com>
Cc: "dust.li" <dust.li@linux.alibaba.com>
Reviewed-by: Jiri Wiesner <jwiesner@suse.de>
Tested-by: Jiri Wiesner <jwiesner@suse.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
Use the provided u64_stats_t type to avoid
load/store tearing.
Fixes: 316580b69d0a ("u64_stats: provide u64_stats_t type")
Signed-off-by: Julian Anastasov <ja@ssi.bg>
Cc: yunhong-cgl jiang <xintian1976@gmail.com>
Cc: "dust.li" <dust.li@linux.alibaba.com>
Reviewed-by: Jiri Wiesner <jwiesner@suse.de>
Tested-by: Jiri Wiesner <jwiesner@suse.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
Move alloc_percpu/free_percpu logic in new functions
Signed-off-by: Julian Anastasov <ja@ssi.bg>
Cc: yunhong-cgl jiang <xintian1976@gmail.com>
Cc: "dust.li" <dust.li@linux.alibaba.com>
Reviewed-by: Jiri Wiesner <jwiesner@suse.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
In preparation to using RCU locking for the list
with estimators, make sure the struct ip_vs_stats
are released after RCU grace period by using RCU
callbacks. This affects ipvs->tot_stats where we
can not use RCU callbacks for ipvs, so we use
allocated struct ip_vs_stats_rcu. For services
and dests we force RCU callbacks for all cases.
Signed-off-by: Julian Anastasov <ja@ssi.bg>
Cc: yunhong-cgl jiang <xintian1976@gmail.com>
Cc: "dust.li" <dust.li@linux.alibaba.com>
Reviewed-by: Jiri Wiesner <jwiesner@suse.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
Eduard Zingerman says:
====================
This patch-set consists of a series of bug fixes for register ID
tracking in verifier.c:states_equal()/regsafe() functions:
- for registers of type PTR_TO_MAP_{KEY,VALUE}, PTR_TO_PACKET[_META]
the regsafe() should call check_ids() even if registers are
byte-to-byte equal;
- states_equal() must maintain idmap that covers all function frames
in the state because functions like mark_ptr_or_null_regs() operate
on all registers in the state;
- regsafe() must compare spin lock ids for PTR_TO_MAP_VALUE registers.
The last point covers issue reported by Kumar Kartikeya Dwivedi in [1],
I borrowed the test commit from there.
Note, that there is also an issue with register id tracking for
scalars described here [2], it would be addressed separately.
[1] https://lore.kernel.org/bpf/20221111202719.982118-1-memxor@gmail.com/
[2] https://lore.kernel.org/bpf/20221128163442.280187-2-eddyz87@gmail.com/
Eduard Zingerman (6):
bpf: regsafe() must not skip check_ids()
selftests/bpf: test cases for regsafe() bug skipping check_id()
bpf: states_equal() must build idmap for all function frames
selftests/bpf: verify states_equal() maintains idmap across all frames
bpf: use check_ids() for active_lock comparison
selftests/bpf: test case for relaxed prunning of active_lock.id
====================
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
Check that verifier.c:states_equal() uses check_ids() to match
consistent active_lock/map_value configurations. This allows to prune
states with active spin locks even if numerical values of
active_lock ids do not match across compared states.
Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/r/20221209135733.28851-8-eddyz87@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
Test that when reg->id is not same for the same register of type
PTR_TO_MAP_VALUE between current and old explored state, we currently
return false from regsafe and continue exploring.
Without the fix in prior commit, the test case fails.
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/r/20221209135733.28851-7-eddyz87@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
An update for verifier.c:states_equal()/regsafe() to use check_ids()
for active spin lock comparisons. This fixes the issue reported by
Kumar Kartikeya Dwivedi in [1] using technique suggested by Edward Cree.
W/o this commit the verifier might be tricked to accept the following
program working with a map containing spin locks:
0: r9 = map_lookup_elem(...) ; Returns PTR_TO_MAP_VALUE_OR_NULL id=1.
1: r8 = map_lookup_elem(...) ; Returns PTR_TO_MAP_VALUE_OR_NULL id=2.
2: if r9 == 0 goto exit ; r9 -> PTR_TO_MAP_VALUE.
3: if r8 == 0 goto exit ; r8 -> PTR_TO_MAP_VALUE.
4: r7 = ktime_get_ns() ; Unbound SCALAR_VALUE.
5: r6 = ktime_get_ns() ; Unbound SCALAR_VALUE.
6: bpf_spin_lock(r8) ; active_lock.id == 2.
7: if r6 > r7 goto +1 ; No new information about the state
; is derived from this check, thus
; produced verifier states differ only
; in 'insn_idx'.
8: r9 = r8 ; Optionally make r9.id == r8.id.
--- checkpoint --- ; Assume is_state_visisted() creates a
; checkpoint here.
9: bpf_spin_unlock(r9) ; (a,b) active_lock.id == 2.
; (a) r9.id == 2, (b) r9.id == 1.
10: exit(0)
Consider two verification paths:
(a) 0-10
(b) 0-7,9-10
The path (a) is verified first. If checkpoint is created at (8)
the (b) would assume that (8) is safe because regsafe() does not
compare register ids for registers of type PTR_TO_MAP_VALUE.
[1] https://lore.kernel.org/bpf/20221111202719.982118-1-memxor@gmail.com/
Reported-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Suggested-by: Edward Cree <ecree.xilinx@gmail.com>
Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/r/20221209135733.28851-6-eddyz87@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
A test case that would erroneously pass verification if
verifier.c:states_equal() maintains separate register ID mappings for
call frames.
Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/r/20221209135733.28851-5-eddyz87@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
verifier.c:states_equal() must maintain register ID mapping across all
function frames. Otherwise the following example might be erroneously
marked as safe:
main:
fp[-24] = map_lookup_elem(...) ; frame[0].fp[-24].id == 1
fp[-32] = map_lookup_elem(...) ; frame[0].fp[-32].id == 2
r1 = &fp[-24]
r2 = &fp[-32]
call foo()
r0 = 0
exit
foo:
0: r9 = r1
1: r8 = r2
2: r7 = ktime_get_ns()
3: r6 = ktime_get_ns()
4: if (r6 > r7) goto skip_assign
5: r9 = r8
skip_assign: ; <--- checkpoint
6: r9 = *r9 ; (a) frame[1].r9.id == 2
; (b) frame[1].r9.id == 1
7: if r9 == 0 goto exit: ; mark_ptr_or_null_regs() transfers != 0 info
; for all regs sharing ID:
; (a) r9 != 0 => &frame[0].fp[-32] != 0
; (b) r9 != 0 => &frame[0].fp[-24] != 0
8: r8 = *r8 ; (a) r8 == &frame[0].fp[-32]
; (b) r8 == &frame[0].fp[-32]
9: r0 = *r8 ; (a) safe
; (b) unsafe
exit:
10: exit
While processing call to foo() verifier considers the following
execution paths:
(a) 0-10
(b) 0-4,6-10
(There is also path 0-7,10 but it is not interesting for the issue at
hand. (a) is verified first.)
Suppose that checkpoint is created at (6) when path (a) is verified,
next path (b) is verified and (6) is reached.
If states_equal() maintains separate 'idmap' for each frame the
mapping at (6) for frame[1] would be empty and
regsafe(r9)::check_ids() would add a pair 2->1 and return true,
which is an error.
If states_equal() maintains single 'idmap' for all frames the mapping
at (6) would be { 1->1, 2->2 } and regsafe(r9)::check_ids() would
return false when trying to add a pair 2->1.
This issue was suggested in the following discussion:
https://lore.kernel.org/bpf/CAEf4BzbFB5g4oUfyxk9rHy-PJSLQ3h8q9mV=rVoXfr_JVm8+1Q@mail.gmail.com/
Suggested-by: Andrii Nakryiko <andrii.nakryiko@gmail.com>
Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/r/20221209135733.28851-4-eddyz87@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
Under certain conditions it was possible for verifier.c:regsafe() to
skip check_id() call. This commit adds negative test cases previously
errorneously accepted as safe.
Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/r/20221209135733.28851-3-eddyz87@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
The verifier.c:regsafe() has the following shortcut:
equal = memcmp(rold, rcur, offsetof(struct bpf_reg_state, parent)) == 0;
...
if (equal)
return true;
Which is executed regardless old register type. This is incorrect for
register types that might have an ID checked by check_ids(), namely:
- PTR_TO_MAP_KEY
- PTR_TO_MAP_VALUE
- PTR_TO_PACKET_META
- PTR_TO_PACKET
The following pattern could be used to exploit this:
0: r9 = map_lookup_elem(...) ; Returns PTR_TO_MAP_VALUE_OR_NULL id=1.
1: r8 = map_lookup_elem(...) ; Returns PTR_TO_MAP_VALUE_OR_NULL id=2.
2: r7 = ktime_get_ns() ; Unbound SCALAR_VALUE.
3: r6 = ktime_get_ns() ; Unbound SCALAR_VALUE.
4: if r6 > r7 goto +1 ; No new information about the state
; is derived from this check, thus
; produced verifier states differ only
; in 'insn_idx'.
5: r9 = r8 ; Optionally make r9.id == r8.id.
--- checkpoint --- ; Assume is_state_visisted() creates a
; checkpoint here.
6: if r9 == 0 goto <exit> ; Nullness info is propagated to all
; registers with matching ID.
7: r1 = *(u64 *) r8 ; Not always safe.
Verifier first visits path 1-7 where r8 is verified to be not null
at (6). Later the jump from 4 to 6 is examined. The checkpoint for (6)
looks as follows:
R8_rD=map_value_or_null(id=2,off=0,ks=4,vs=8,imm=0)
R9_rwD=map_value_or_null(id=2,off=0,ks=4,vs=8,imm=0)
R10=fp0
The current state is:
R0=... R6=... R7=... fp-8=...
R8=map_value_or_null(id=2,off=0,ks=4,vs=8,imm=0)
R9=map_value_or_null(id=1,off=0,ks=4,vs=8,imm=0)
R10=fp0
Note that R8 states are byte-to-byte identical, so regsafe() would
exit early and skip call to check_ids(), thus ID mapping 2->2 will not
be added to 'idmap'. Next, states for R9 are compared: these are not
identical and check_ids() is executed, but 'idmap' is empty, so
check_ids() adds mapping 2->1 to 'idmap' and returns success.
This commit pushes the 'equal' down to register types that don't need
check_ids().
Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/r/20221209135733.28851-2-eddyz87@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
sysv_nblocks() returns 'blocks' rather than 'res', which only counting
the number of triple-indirect blocks and causing sysv_getattr() gets a
wrong result.
[AV: this is actually a sysv counterpart of minixfs fix -
0fcd426de9d0 "[PATCH] minix block usage counting fix" in
historical tree; mea culpa, should've thought to check
fs/sysv back then...]
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Signed-off-by: Chen Zhongjin <chenzhongjin@huawei.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
|
Now that we've worked out performance issues and have a server patch
addressing the failed xfstests, we can safely enable this feature by
default.
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
|
|
Pull ARM fix from Russell King:
"One further ARM fix for 6.1 from Wang Kefeng, fixing up the handling
for kfence faults"
* tag 'for-linus' of git://git.armlinux.org.uk/~rmk/linux-arm:
ARM: 9278/1: kfence: only handle translation faults
|
|
- Remove unnecessary <linux/of_irq.h> includes (Bjorn Helgaas)
* pci/kbuild:
PCI: Drop of_match_ptr() to avoid unused variables
PCI: Remove unnecessary <linux/of_irq.h> includes
PCI: xgene-msi: Include <linux/irqdomain.h> explicitly
PCI: mvebu: Include <linux/irqdomain.h> explicitly
PCI: microchip: Include <linux/irqdomain.h> explicitly
PCI: altera-msi: Include <linux/irqdomain.h> explicitly
# Conflicts:
# drivers/pci/controller/pci-mvebu.c
|
|
- Fix whitespace issues (Michal Simek)
* pci/ctrl/xilinx:
PCI: xilinx-nwl: Fix coding style violations
|
|
- Switch to the gpiod API so we can make of_get_named_gpio_flags() private
(Dmitry Torokhov)
* pci/ctrl/mvebu:
PCI: mvebu: Switch to using gpiod API
|
|
- Switch to using devm_gpiod_get_optional() so we can stop exporting
devm_gpiod_get_from_of_node() (Dmitry Torokhov)
* pci/ctrl/aardvark:
PCI: aardvark: Switch to using devm_gpiod_get_optional()
|
|
- Register notifier if core_init_notifier is enabled in pci-epf-test
(Kunihiko Hayashi)
- Fixup Kconfig indentation (Shunsuke Mie)
* remotes/lorenzo/pci/misc:
PCI: endpoint: Fix Kconfig indent style
PCI: pci-epf-test: Register notifier if only core_init_notifier is enabled
|
|
- Restore MSI remapping configuration during resume because the
configuration is cleared out by firmware when suspending (Nirmal Patel)
- Reset the hierarchy below VMD when probing the VMD; we attempted this
before, but with the wrong device, so it didn't work (Francisco Munoz)
* remotes/lorenzo/pci/vmd:
PCI: vmd: Fix secondary bus reset for Intel bridges
PCI: vmd: Disable MSI remapping after suspend
|
|
- Switch from devm_gpiod_get_from_of_node() to devm_fwnode_gpiod_get()
(Dmitry Torokhov)
* remotes/lorenzo/pci/tegra:
PCI: tegra: Switch to using devm_fwnode_gpiod_get
|
|
- Add DT and driver support for SC8280XP/SA8540P basic interconnects where
interconnect bandwidth must be requested before enabling interconnect
clocks (Johan Hovold)
- Add 'dma-coherent' property (Johan Hovold)
* remotes/lorenzo/pci/qcom:
dt-bindings: PCI: qcom: Allow 'dma-coherent' property
PCI: qcom: Add basic interconnect support
dt-bindings: PCI: qcom: Add SC8280XP/SA8540P interconnects
|
|
- Add sentinel to mt7621_pcie_quirks_match[] to prevent oops when parsing
the table (John Thomson)
* remotes/lorenzo/pci/mt7621:
PCI: mt7621: Add sentinel to quirks table
|
|
- Add a .release() callback for the Endpoint Controller library so an
Endpoint driver is removable (Yoshihiro Shimoda)
- Fix pci-epf-vntb kernel-doc and whitespace (Frank Li)
- Fix pci-epf-vntb error path usage of pci_epc_mem_free_addr() (Frank Li)
- Remove pci-epf-vntb unused epf_db_phy (Frank Li)
- Fix pci-epf-vntb sparse warnings (Frank Li)
* remotes/lorenzo/pci/endpoint:
PCI: endpoint: pci-epf-vntb: Fix sparse ntb->reg build warning
PCI: endpoint: pci-epf-vntb: Fix sparse build warning for epf_db
PCI: endpoint: pci-epf-vntb: Replace hardcoded 4 with sizeof(u32)
PCI: endpoint: pci-epf-vntb: Remove unused epf_db_phy struct member
PCI: endpoint: pci-epf-vntb: Fix call pci_epc_mem_free_addr() in error path
PCI: endpoint: pci-epf-vntb: Fix struct epf_ntb_ctrl indentation
PCI: endpoint: pci-epf-vntb: Clean up kernel_doc warning
PCI: endpoint: Fix WARN() when an endpoint driver is removed
|
|
- Fix n_fts[] array overrun (Vidya Sagar)
- Don't advertise PTM Responder role for Endpoints (Vidya Sagar)
- Fix qcom "reset assert" error message (Manivannan Sadhasivam)
- Downgrade "link didn't come up" message to dev_info (Vidya Sagar)
- Initialize PHY before deasserting core reset so the link comes up on
boards where the PHY provides the reference clock (this was a regression
in v6.0) (Sascha Hauer)
- Switch histb to the gpiod API (Dmitry Torokhov)
- Fix imx6sx and imx8mq clock names in DT binding (Serge Semin)
- Fix visconti MSI interrupt in DT binding (Serge Semin)
- Consolidate reset-gpio, cdm, windows info in common DT shared by both
Root Port and Endpoint bindings (Serge Semin)
- Remove bus node from DT examples (Serge Semin)
- Add common phys, phy-names to DT (Serge Semin)
- Add default max-link-speed of Gen5 to DT (Serge Semin)
- Apply generic schema for generic device (Serge Semin)
- Add default max-functions of 32 to DT (Serge Semin)
- Add common interrupts, interrupt-names to DT (Serge Semin)
- Add common regs, reg-names to DT (Serge Semin)
- Add common clocks, resets to DT (Serge Semin)
- Add dma-coherent to DT (Serge Semin)
- Apply common schema to Rockchip DT (Serge Semin)
- Add Baikal-T1 DT bindings (Serge Semin)
- Add dma-ranges support in DesignWare core (Serge Semin)
- Add dw_pcie_cap_is() for testing controller capabilities (Serge Semin)
- Add generic resources getter to DesignWare core (Serge Semin)
- Combine iATU detection procedures (Serge Semin)
- Add generic clock and reset names to DesignWare core (Serge Semin)
- Add Baikal-T1 PCIe controller driver (Serge Semin)
* remotes/lorenzo/pci/dwc:
PCI: dwc: Add Baikal-T1 PCIe controller support
PCI: dwc: Introduce generic platform clocks and resets
PCI: dwc: Combine iATU detection procedures
PCI: dwc: Introduce generic resources getter
PCI: dwc: Introduce generic controller capabilities interface
PCI: dwc: Introduce dma-ranges property support for RC-host
dt-bindings: PCI: dwc: Add Baikal-T1 PCIe Root Port bindings
dt-bindings: PCI: dwc: Apply common schema to Rockchip DW PCIe nodes
dt-bindings: PCI: dwc: Add dma-coherent property
dt-bindings: PCI: dwc: Add clocks/resets common properties
dt-bindings: PCI: dwc: Add reg/reg-names common properties
dt-bindings: PCI: dwc: Add interrupts/interrupt-names common properties
dt-bindings: PCI: dwc: Add max-functions EP property
dt-bindings: PCI: dwc: Apply generic schema for generic device only
dt-bindings: PCI: dwc: Add max-link-speed common property
dt-bindings: PCI: dwc: Add phys/phy-names common properties
dt-bindings: PCI: dwc: Remove bus node from the examples
dt-bindings: PCI: dwc: Detach common RP/EP DT bindings
dt-bindings: visconti-pcie: Fix interrupts array max constraints
dt-bindings: imx6q-pcie: Fix clock names for imx6sx and imx8mq
PCI: histb: Switch to using gpiod API
PCI: imx6: Initialize PHY before deasserting core reset
PCI: dwc: Use dev_info for PCIe link down event logging
PCI: qcom: Fix error message for reset_control_assert()
PCI: designware-ep: Disable PTM capabilities for EP mode
PCI: Add PCI_PTM_CAP_RES macro
PCI: dwc: Fix n_fts[] array overrun
|
|
- Enable Multi-MSI (Jim Quinlan)
- Wait for 100ms after PERST# deassert for power and clocks to stabilize
(Jim Quinlan)
- Use readl_poll_timeout_atomic() instead of hand-rolled timeout loop (Jim
Quinlan)
- Drop needless "inline" annotations (Jim Quinlan)
- Set RCB_MPS mode bit so data for reads up to MPS are returned in a single
completion (Jim Quinlan)
* remotes/lorenzo/pci/brcmstb:
PCI: brcmstb: Set RCB_{MPS,64B}_MODE bits
PCI: brcmstb: Drop needless 'inline' annotations
PCI: brcmstb: Replace status loops with read_poll_timeout_atomic()
PCI: brcmstb: Wait for 100ms following PERST# deassert
PCI: brcmstb: Enable Multi-MSI
|
|
- Add ti,j721e-pci-host interrupt controller definition (Matt Ranostay)
- Add ti,j721e-pci-host interrupt properties (Matt Ranostay)
- Add ti,j721s2 host mode device-id (Matt Ranostay)
- Add mediatek-gen3 iommu, power properties (Jianjun Wang)
- Add mediatek-gen3 SoC-based clock names (Frank Wunderlich)
- Add mediatek-gen3 mt7986 support (Frank Wunderlich)
* remotes/lorenzo/pci/dt:
dt-bindings: PCI: mediatek-gen3: add support for mt7986
dt-bindings: PCI: mediatek-gen3: add SoC based clock config
dt-bindings: PCI: Add host mode device-id for j721s2 platform
dt-bindings: PCI: mediatek-gen3: Support mt8195
dt-bindings: PCI: ti,j721e-pci-*: Add missing interrupt properties
dt-bindings: PCI: ti,j721e-pci-host: add interrupt controller definition
|
|
- Fix a double free in the error path of creating sysfs "resource%d"
attributes (Sascha Hauer)
* pci/sysfs:
PCI/sysfs: Fix double free in error path
|
|
- Remove EfiMemoryMappedIO regions from the E820 map to allow PCI core to
allocate BARs from them. The only purpose of EfiMemoryMappedIO is to
tell the OS to map things needed by EFI runtime services, so it's often
used for PCI host bridge apertures. If we can't allocate from those
apertures, we can't hot-add devices (Bjorn Helgaas)
* pci/resource:
x86/PCI: Use pr_info() when possible
x86/PCI: Fix log message typo
x86/PCI: Tidy E820 removal messages
PCI: Skip allocate_resource() if too little space available
efi/x86: Remove EfiMemoryMappedIO from E820 map
|
|
- Squash portdrv_core.c and portdrv_pci.c into portdrv.c to make it easier
to find things (Bjorn Helgaas)
- Allow AER service only for Root Ports & RCECs so portdrv can successfully
bind to other devices that have AER but lack MSI (which they don't need
for AER), which allows power management for those devices (Bjorn Helgaas)
* pci/portdrv:
PCI/portdrv: Allow AER service only for Root Ports & RCECs
PCI/portdrv: Unexport pcie_port_service_register(), pcie_port_service_unregister()
PCI/portdrv: Move private things to portdrv.c
PCI/portdrv: Squash into portdrv.c
|
|
- Convert AGP efficeon, intel, amd-k7, ati, nvidia to generic power
management (Bjorn Helgaas)
* pci/pm-agp:
agp/via: Update to DEFINE_SIMPLE_DEV_PM_OPS()
agp/sis: Update to DEFINE_SIMPLE_DEV_PM_OPS()
agp/amd64: Update to DEFINE_SIMPLE_DEV_PM_OPS()
agp/nvidia: Convert to generic power management
agp/ati: Convert to generic power management
agp/amd-k7: Convert to generic power management
agp/intel: Convert to generic power management
agp/efficeon: Convert to generic power management
|
|
- Remove unused 'state' parameter to pci_legacy_suspend_late() (Bjorn
Helgaas)
* pci/pm:
PCI/PM: Remove unused 'state' parameter to pci_legacy_suspend_late()
|
|
- Use METHOD_NAME__UID instead of plain string to make it easier to find
all uses (Yipeng Zou)
* pci/misc:
PCI/ACPI: Use METHOD_NAME__UID instead of plain string
|
|
- Enable pciehp by default if USB4 is enabled because USB4/Thunderbolt
tunneling depends on native PCIe hotplug (Albert Zhou)
- Make sure pciehp binds only to Downstream Ports, not Upstream Ports
(Rafael J. Wysocki)
- Remove unused get_mode1_ECC_cap callback in shpchp (Ian Cowan)
- Enable pciehp Command Completed Interrupt only if supported to reduce
confusion when looking at lspci output (Pali Rohár)
* pci/hotplug:
PCI: pciehp: Enable Command Completed Interrupt only if supported
PCI: shpchp: Remove unused get_mode1_ECC_cap callback
PCI: acpiphp: Avoid setting is_hotplug_bridge for PCIe Upstream Ports
PCI/portdrv: Set PCIE_PORT_SERVICE_HP for Root and Downstream Ports only
PCI: pciehp: Enable by default if USB4 enabled
|
|
- Only read/write PCIe Link 2 registers for devices with Links and PCIe
Capability version >= 2 (Maciej W. Rozycki)
- Revert a patch that cleared PCI_STATUS during enumeration because it
broke Linux guests on Apple's virtualization framework (Bjorn Helgaas)
- Assign PCI domain IDs using IDAs so IDs can be easily reused after
loading/unloading host bridge drivers (Pali Rohár)
- Fix pci_device_is_present(), which previously always returned "false" for
VFs because their vendor ID is always 0xfff (Michael S. Tsirkin)
- Check for alloc failure in pci_request_irq() (Zeng Heng)
* pci/enumeration:
PCI: Check for alloc failure in pci_request_irq()
PCI: Fix pci_device_is_present() for VFs by checking PF
PCI: Assign PCI domain IDs by ida_alloc()
Revert "PCI: Clear PCI_STATUS when setting up device"
PCI: Access Link 2 registers only for devices with Links
|
|
- Fix calculation of DOE length to account for the "0 means 2^18 DWORDs"
special case (Li Ming)
* pci/doe:
PCI/DOE: Fix maximum data object length miscalculation
|
|
Use pr_info() and similar when possible. No functional change intended.
Link: https://lore.kernel.org/r/20221209205131.GA1726524@bhelgaas
Suggested-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
|
|
Add missing word in the log message:
- ... so future kernels can this automatically
+ ... so future kernels can do this automatically
Suggested-by: Andy Shevchenko <andriy.shevchenko@intel.com>
Link: https://lore.kernel.org/r/20221208190341.1560157-5-helgaas@kernel.org
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Acked-by: Hans de Goede <hdegoede@redhat.com>
|
|
These messages:
clipped [mem size 0x00000000 64bit] to [mem size 0xfffffffffffa0000 64bit] for e820 entry [mem 0x0009f000-0x000fffff]
aren't as useful as they could be because (a) the resource is often
IORESOURCE_UNSET, so we print the size instead of the start/end and (b) we
print the available resource even if it is empty after removing the E820
entry.
Print the available space by hand to avoid the IORESOURCE_UNSET problem and
only if it's non-empty. No functional change intended.
Link: https://lore.kernel.org/r/20221208190341.1560157-4-helgaas@kernel.org
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Acked-by: Hans de Goede <hdegoede@redhat.com>
|
|
pci_bus_alloc_from_region() allocates MMIO space by iterating through all
the resources available on the bus. The available resource might be
reduced if the caller requires 32-bit space or we're avoiding BIOS or E820
areas.
Don't bother calling allocate_resource() if we need more space than is
available in this resource. This prevents some pointless and annoying
messages about avoided areas.
Link: https://lore.kernel.org/r/20221208190341.1560157-3-helgaas@kernel.org
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Acked-by: Hans de Goede <hdegoede@redhat.com>
|
|
Firmware can use EfiMemoryMappedIO to request that MMIO regions be mapped
by the OS so they can be accessed by EFI runtime services, but should have
no other significance to the OS (UEFI r2.10, sec 7.2). However, most
bootloaders and EFI stubs convert EfiMemoryMappedIO regions to
E820_TYPE_RESERVED entries, which prevent Linux from allocating space from
them (see remove_e820_regions()).
Some platforms use EfiMemoryMappedIO entries for PCI MMCONFIG space and PCI
host bridge windows, which means Linux can't allocate BAR space for
hot-added devices.
Remove large EfiMemoryMappedIO regions from the E820 map to avoid this
problem.
Leave small (< 256KB) EfiMemoryMappedIO regions alone because on some
platforms, these describe non-window space that's included in host bridge
_CRS. If we assign that space to PCI devices, they don't work. On the
Lenovo X1 Carbon, this leads to suspend/resume failures.
The previous solution to the problem of allocating BARs in these regions
was to add pci_crs_quirks[] entries to disable E820 checking for these
machines (see d341838d776a ("x86/PCI: Disable E820 reserved region clipping
via quirks")):
Acer DMI_PRODUCT_NAME Spin SP513-54N
Clevo DMI_BOARD_NAME X170KM-G
Lenovo DMI_PRODUCT_VERSION *IIL*
Florent reported the BAR allocation issue on the Clevo NL4XLU. We could
add another quirk for the NL4XLU, but I hope this generic change can solve
it for many machines without having to add quirks.
This change has been tested on Clevo X170KM-G (Konrad) and Lenovo Ideapad
Slim 3 (Matt) and solves the problem even when overriding the existing
quirks by booting with "pci=use_e820".
Link: https://bugzilla.kernel.org/show_bug.cgi?id=216565 Clevo NL4XLU
Link: https://bugzilla.kernel.org/show_bug.cgi?id=206459#c78 Clevo X170KM-G
Link: https://bugzilla.redhat.com/show_bug.cgi?id=1868899 Ideapad Slim 3
Link: https://bugzilla.redhat.com/show_bug.cgi?id=2029207 X1 Carbon
Link: https://lore.kernel.org/r/20221208190341.1560157-2-helgaas@kernel.org
Reported-by: Florent DELAHAYE <kernelorg@undead.fr>
Tested-by: Konrad J Hambrick <kjhambrick@gmail.com>
Tested-by: Matt Hansen <2lprbe78@duck.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Acked-by: Hans de Goede <hdegoede@redhat.com>
|
|
Previously portdrv allowed the AER service for any device with an AER
capability (assuming Linux had control of AER) even though the AER service
driver only attaches to Root Port and RCECs.
Because get_port_device_capability() included AER for non-RP, non-RCEC
devices, we tried to initialize the AER IRQ even though these devices
don't generate AER interrupts.
Intel DG1 and DG2 discrete graphics cards contain a switch leading to a
GPU. The switch supports AER but not MSI, so initializing an AER IRQ
failed, and portdrv failed to claim the switch port at all. The GPU itself
could be suspended, but the switch could not be put in a low-power state
because it had no driver.
Don't allow the AER service on non-Root Port, non-Root Complex Event
Collector devices. This means we won't enable Bus Mastering if the device
doesn't require MSI, the AER service will not appear in sysfs, and the AER
service driver will not bind to the device.
Link: https://lore.kernel.org/r/20221207084105.84947-1-mika.westerberg@linux.intel.com
Link: https://lore.kernel.org/r/20221210002922.1749403-1-helgaas@kernel.org
Based-on-patch-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
|
|
When built with Control Flow Integrity, function prototypes between
caller and function declaration must match. These mismatches are visible
at compile time with the new -Wcast-function-type-strict in Clang[1].
There were 97 warnings produced by NFS. For example:
fs/nfsd/nfs4xdr.c:2228:17: warning: cast from '__be32 (*)(struct nfsd4_compoundargs *, struct nfsd4_access *)' (aka 'unsigned int (*)(struct nfsd4_compoundargs *, struct nfsd4_access *)') to 'nfsd4_dec' (aka 'unsigned int (*)(struct nfsd4_compoundargs *, void *)') converts to incompatible function type [-Wcast-function-type-strict]
[OP_ACCESS] = (nfsd4_dec)nfsd4_decode_access,
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
The enc/dec callbacks were defined as passing "void *" as the second
argument, but were being implicitly cast to a new type. Replace the
argument with union nfsd4_op_u, and perform explicit member selection
in the function body. There are no resulting binary differences.
Changes were made mechanically using the following Coccinelle script,
with minor by-hand fixes for members that didn't already match their
existing argument name:
@find@
identifier func;
type T, opsT;
identifier ops, N;
@@
opsT ops[] = {
[N] = (T) func,
};
@already_void@
identifier find.func;
identifier name;
@@
func(...,
-void
+union nfsd4_op_u
*name)
{
...
}
@proto depends on !already_void@
identifier find.func;
type T;
identifier name;
position p;
@@
func@p(...,
T name
) {
...
}
@script:python get_member@
type_name << proto.T;
member;
@@
coccinelle.member = cocci.make_ident(type_name.split("_", 1)[1].split(' ',1)[0])
@convert@
identifier find.func;
type proto.T;
identifier proto.name;
position proto.p;
identifier get_member.member;
@@
func@p(...,
- T name
+ union nfsd4_op_u *u
) {
+ T name = &u->member;
...
}
@cast@
identifier find.func;
type T, opsT;
identifier ops, N;
@@
opsT ops[] = {
[N] =
- (T)
func,
};
Cc: Chuck Lever <chuck.lever@oracle.com>
Cc: Jeff Layton <jlayton@kernel.org>
Cc: Gustavo A. R. Silva <gustavoars@kernel.org>
Cc: linux-nfs@vger.kernel.org
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
|
|
If a zero length is passed to kmalloc() it returns 0x10, which is
not a valid address. gss_verify_mic() subsequently crashes when it
attempts to dereference that pointer.
Instead of allocating this memory on every call based on an
untrusted size value, use a piece of dynamically-allocated scratch
memory that is always available.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
|
|
Clean up: Simplify the tracepoint's only call site.
Also, I noticed that when svc_authenticate() returns SVC_COMPLETE,
it leaves rq_auth_stat set to an error value. That doesn't need to
be recorded in the trace log.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
|
|
Clean up: NFSv2 has the only two usages of rpc_drop_reply in the
NFSD code base. Since NFSv2 is going away at some point, replace
these in order to simplify the "drop this reply?" check in
nfsd_dispatch().
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
|