Age | Commit message (Collapse) | Author |
|
Add bpf_link-based API (bpf_xdp_link) to attach BPF XDP program through
BPF_LINK_CREATE command.
bpf_xdp_link is mutually exclusive with direct BPF program attachment,
previous BPF program should be detached prior to attempting to create a new
bpf_xdp_link attachment (for a given XDP mode). Once BPF link is attached, it
can't be replaced by other BPF program attachment or link attachment. It will
be detached only when the last BPF link FD is closed.
bpf_xdp_link will be auto-detached when net_device is shutdown, similarly to
how other BPF links behave (cgroup, flow_dissector). At that point bpf_link
will become defunct, but won't be destroyed until last FD is closed.
Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20200722064603.3350758-5-andriin@fb.com
|
|
Instead of delegating to drivers, maintain information about which BPF
programs are attached in which XDP modes (generic/skb, driver, or hardware)
locally in net_device. This effectively obsoletes XDP_QUERY_PROG command.
Such re-organization simplifies existing code already. But it also allows to
further add bpf_link-based XDP attachments without drivers having to know
about any of this at all, which seems like a good setup.
XDP_SETUP_PROG/XDP_SETUP_PROG_HW are just low-level commands to driver to
install/uninstall active BPF program. All the higher-level concerns about
prog/link interaction will be contained within generic driver-agnostic logic.
All the XDP_QUERY_PROG calls to driver in dev_xdp_uninstall() were removed.
It's not clear for me why dev_xdp_uninstall() were passing previous prog_flags
when resetting installed programs. That seems unnecessary, plus most drivers
don't populate prog_flags anyways. Having XDP_SETUP_PROG vs XDP_SETUP_PROG_HW
should be enough of an indicator of what is required of driver to correctly
reset active BPF program. dev_xdp_uninstall() is also generalized as an
iteration over all three supported mode.
Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20200722064603.3350758-3-andriin@fb.com
|
|
Similarly to bpf_prog, make bpf_link and related generic API available
unconditionally to make it easier to have bpf_link support in various parts of
the kernel. Stub out init/prime/settle/cleanup and inc/put APIs.
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20200722064603.3350758-2-andriin@fb.com
|
|
This change comes in several parts:
One, the restriction that the CGROUP_STORAGE map can only be used
by one program is removed. This results in the removal of the field
'aux' in struct bpf_cgroup_storage_map, and removal of relevant
code associated with the field, and removal of now-noop functions
bpf_free_cgroup_storage and bpf_cgroup_storage_release.
Second, we permit a key of type u64 as the key to the map.
Providing such a key type indicates that the map should ignore
attach type when comparing map keys. However, for simplicity newly
linked storage will still have the attach type at link time in
its key struct. cgroup_storage_check_btf is adapted to accept
u64 as the type of the key.
Third, because the storages are now shared, the storages cannot
be unconditionally freed on program detach. There could be two
ways to solve this issue:
* A. Reference count the usage of the storages, and free when the
last program is detached.
* B. Free only when the storage is impossible to be referred to
again, i.e. when either the cgroup_bpf it is attached to, or
the map itself, is freed.
Option A has the side effect that, when the user detach and
reattach a program, whether the program gets a fresh storage
depends on whether there is another program attached using that
storage. This could trigger races if the user is multi-threaded,
and since nondeterminism in data races is evil, go with option B.
The both the map and the cgroup_bpf now tracks their associated
storages, and the storage unlink and free are removed from
cgroup_bpf_detach and added to cgroup_bpf_release and
cgroup_storage_map_free. The latter also new holds the cgroup_mutex
to prevent any races with the former.
Fourth, on attach, we reuse the old storage if the key already
exists in the map, via cgroup_storage_lookup. If the storage
does not exist yet, we create a new one, and publish it at the
last step in the attach process. This does not create a race
condition because for the whole attach the cgroup_mutex is held.
We keep track of an array of new storages that was allocated
and if the process fails only the new storages would get freed.
Signed-off-by: YiFei Zhu <zhuyifei@google.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/d5401c6106728a00890401190db40020a1f84ff1.1595565795.git.zhuyifei@google.com
|
|
bpf_get_[stack|stackid] on perf_events with precise_ip uses callchain
attached to perf_sample_data. If this callchain is not presented, do not
allow attaching BPF program that calls bpf_get_[stack|stackid] to this
event.
In the error case, -EPROTO is returned so that libbpf can identify this
error and print proper hint message.
Signed-off-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20200723180648.1429892-3-songliubraving@fb.com
|
|
Calling get_perf_callchain() on perf_events from PEBS entries may cause
unwinder errors. To fix this issue, the callchain is fetched early. Such
perf_events are marked with __PERF_SAMPLE_CALLCHAIN_EARLY.
Similarly, calling bpf_get_[stack|stackid] on perf_events from PEBS may
also cause unwinder errors. To fix this, add separate version of these
two helpers, bpf_get_[stack|stackid]_pe. These two hepers use callchain in
bpf_perf_event_data_kern->data->callchain.
Signed-off-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20200723180648.1429892-2-songliubraving@fb.com
|
|
The bpf iterator for map elements are implemented.
The bpf program will receive four parameters:
bpf_iter_meta *meta: the meta data
bpf_map *map: the bpf_map whose elements are traversed
void *key: the key of one element
void *value: the value of the same element
Here, meta and map pointers are always valid, and
key has register type PTR_TO_RDONLY_BUF_OR_NULL and
value has register type PTR_TO_RDWR_BUF_OR_NULL.
The kernel will track the access range of key and value
during verification time. Later, these values will be compared
against the values in the actual map to ensure all accesses
are within range.
A new field iter_seq_info is added to bpf_map_ops which
is used to add map type specific information, i.e., seq_ops,
init/fini seq_file func and seq_file private data size.
Subsequent patches will have actual implementation
for bpf_map_ops->iter_seq_info.
In user space, BPF_ITER_LINK_MAP_FD needs to be
specified in prog attr->link_create.flags, which indicates
that attr->link_create.target_fd is a map_fd.
The reason for such an explicit flag is for possible
future cases where one bpf iterator may allow more than
one possible customization, e.g., pid and cgroup id for
task_file.
Current kernel internal implementation only allows
the target to register at most one required bpf_iter_link_info.
To support the above case, optional bpf_iter_link_info's
are needed, the target can be extended to register such link
infos, and user provided link_info needs to match one of
target supported ones.
Signed-off-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20200723184112.590360-1-yhs@fb.com
|
|
Readonly and readwrite buffer register states
are introduced. Totally four states,
PTR_TO_RDONLY_BUF[_OR_NULL] and PTR_TO_RDWR_BUF[_OR_NULL]
are supported. As suggested by their respective
names, PTR_TO_RDONLY_BUF[_OR_NULL] are for
readonly buffers and PTR_TO_RDWR_BUF[_OR_NULL]
for read/write buffers.
These new register states will be used
by later bpf map element iterator.
New register states share some similarity to
PTR_TO_TP_BUFFER as it will calculate accessed buffer
size during verification time. The accessed buffer
size will be later compared to other metrics during
later attach/link_create time.
Similar to reg_state PTR_TO_BTF_ID_OR_NULL in bpf
iterator programs, PTR_TO_RDONLY_BUF_OR_NULL or
PTR_TO_RDWR_BUF_OR_NULL reg_types can be set at
prog->aux->bpf_ctx_arg_aux, and bpf verifier will
retrieve the values during btf_ctx_access().
Later bpf map element iterator implementation
will show how such information will be assigned
during target registeration time.
The verifier is also enhanced such that PTR_TO_RDONLY_BUF
can be passed to ARG_PTR_TO_MEM[_OR_NULL] helper argument, and
PTR_TO_RDWR_BUF can be passed to ARG_PTR_TO_MEM[_OR_NULL] or
ARG_PTR_TO_UNINIT_MEM.
Signed-off-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20200723184111.590274-1-yhs@fb.com
|
|
This patch refactored target bpf_iter_init_seq_priv_t callback
function to accept additional information. This will be needed
in later patches for map element targets since a particular
map should be passed to traverse elements for that particular
map. In the future, other information may be passed to target
as well, e.g., pid, cgroup id, etc. to customize the iterator.
Signed-off-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20200723184110.590156-1-yhs@fb.com
|
|
There is no functionality change for this patch.
Struct bpf_iter_reg is used to register a bpf_iter target,
which includes information for both prog_load, link_create
and seq_file creation.
This patch puts fields related seq_file creation into
a different structure. This will be useful for map
elements iterator where one iterator covers different
map types and different map types may have different
seq_ops, init/fini private_data function and
private_data size.
Signed-off-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20200723184109.590030-1-yhs@fb.com
|
|
It's mostly a copy paste of commit 6086d29def80 ("bpf: Add bpf_map iterator")
that is use to implement bpf_seq_file opreations to traverse all bpf programs.
v1->v2: Tweak to use build time btf_id
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Yonghong Song <yhs@fb.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
|
|
A common pattern for using plain DEVICE_ATTR() instead of
DEVICE_ATTR_RO() and DEVICE_ATTR_RW() is for attributes that want to
limit read to only root. I.e. many users of DEVICE_ATTR() are
specifying 0400 or 0600 for permissions.
Given the expectation that CAP_SYS_ADMIN is needed to access these
sensitive attributes add an explicit helper with the _ADMIN_ identifier
for DEVICE_ATTR_ADMIN_{RO,RW}.
Cc: "Rafael J. Wysocki" <rafael.j.wysocki@intel.com>
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Vishal Verma <vishal.l.verma@intel.com>
|
|
DSMs are strictly an ACPI mechanism, evict the bus_dsm_mask concept from
the generic 'struct nvdimm_bus_descriptor' object.
As a side effect the test facility ->bus_nfit_cmd_force_en is no longer
necessary. The test infrastructure can communicate that information
directly in ->bus_dsm_mask.
Cc: Vishal Verma <vishal.l.verma@intel.com>
Cc: Dave Jiang <dave.jiang@intel.com>
Cc: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Vishal Verma <vishal.l.verma@intel.com>
|
|
The ND_CMD_CALL format allows for a general passthrough of passlisted
commands targeting a given command set. However there is no validation
of the family index relative to what the bus supports.
- Update the NFIT bus implementation (the only one that supports
ND_CMD_CALL passthrough) to also passlist the valid set of command
family indices.
- Update the generic __nd_ioctl() path to validate that field on behalf
of all implementations.
Fixes: 31eca76ba2fc ("nfit, libnvdimm: limited/whitelisted dimm command marshaling mechanism")
Cc: Vishal Verma <vishal.l.verma@intel.com>
Cc: Dave Jiang <dave.jiang@intel.com>
Cc: Ira Weiny <ira.weiny@intel.com>
Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
Cc: Len Brown <lenb@kernel.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Vishal Verma <vishal.l.verma@intel.com>
|
|
The UDP reuseport conflict was a little bit tricky.
The net-next code, via bpf-next, extracted the reuseport handling
into a helper so that the BPF sk lookup code could invoke it.
At the same time, the logic for reuseport handling of unconnected
sockets changed via commit efc6b6f6c3113e8b203b9debfb72d81e0f3dcace
which changed the logic to carry on the reuseport result into the
rest of the lookup loop if we do not return immediately.
This requires moving the reuseport_has_conns() logic into the callers.
While we are here, get rid of inline directives as they do not belong
in foo.c files.
The other changes were cases of more straightforward overlapping
modifications.
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip into master
Pull EFI fixes from Ingo Molnar:
"Various EFI fixes:
- Fix the layering violation in the use of the EFI runtime services
availability mask in users of the 'efivars' abstraction
- Revert build fix for GCC v4.8 which is no longer supported
- Clean up some x86 EFI stub details, some of which are borderline
bugs that copy around garbage into padding fields - let's fix these
out of caution.
- Fix build issues while working on RISC-V support
- Avoid --whole-archive when linking the stub on arm64"
* tag 'efi-urgent-2020-07-25' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
efi: Revert "efi/x86: Fix build with gcc 4"
efi/efivars: Expose RT service availability via efivars abstraction
efi/libstub: Move the function prototypes to header file
efi/libstub: Fix gcc error around __umoddi3 for 32 bit builds
efi/libstub/arm64: link stub lib.a conditionally
efi/x86: Only copy upto the end of setup_header
efi/x86: Remove unused variables
|
|
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
Pull networking fixes from David Miller:
1) Fix RCU locaking in iwlwifi, from Johannes Berg.
2) mt76 can access uninitialized NAPI struct, from Felix Fietkau.
3) Fix race in updating pause settings in bnxt_en, from Vasundhara
Volam.
4) Propagate error return properly during unbind failures in ax88172a,
from George Kennedy.
5) Fix memleak in adf7242_probe, from Liu Jian.
6) smc_drv_probe() can leak, from Wang Hai.
7) Don't muck with the carrier state if register_netdevice() fails in
the bonding driver, from Taehee Yoo.
8) Fix memleak in dpaa_eth_probe, from Liu Jian.
9) Need to check skb_put_padto() return value in hsr_fill_tag(), from
Murali Karicheri.
10) Don't lose ionic RSS hash settings across FW update, from Shannon
Nelson.
11) Fix clobbered SKB control block in act_ct, from Wen Xu.
12) Missing newlink in "tx_timeout" sysfs output, from Xiongfeng Wang.
13) IS_UDPLITE cleanup a long time ago, incorrectly handled
transformations involving UDPLITE_RECV_CC. From Miaohe Lin.
14) Unbalanced locking in netdevsim, from Taehee Yoo.
15) Suppress false-positive error messages in qed driver, from Alexander
Lobakin.
16) Out of bounds read in ax25_connect and ax25_sendmsg, from Peilin Ye.
17) Missing SKB release in cxgb4's uld_send(), from Navid Emamdoost.
18) Uninitialized value in geneve_changelink(), from Cong Wang.
19) Fix deadlock in xen-netfront, from Andera Righi.
19) flush_backlog() frees skbs with IRQs disabled, so should use
dev_kfree_skb_irq() instead of kfree_skb(). From Subash Abhinov
Kasiviswanathan.
* git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (111 commits)
drivers/net/wan: lapb: Corrected the usage of skb_cow
dev: Defer free of skbs in flush_backlog
qrtr: orphan socket in qrtr_release()
xen-netfront: fix potential deadlock in xennet_remove()
flow_offload: Move rhashtable inclusion to the source file
geneve: fix an uninitialized value in geneve_changelink()
bonding: check return value of register_netdevice() in bond_newlink()
tcp: allow at most one TLP probe per flight
AX.25: Prevent integer overflows in connect and sendmsg
cxgb4: add missing release on skb in uld_send()
net: atlantic: fix PTP on AQC10X
AX.25: Prevent out-of-bounds read in ax25_sendmsg()
sctp: shrink stream outq when fails to do addstream reconf
sctp: shrink stream outq only when new outcnt < old outcnt
AX.25: Fix out-of-bounds read in ax25_connect()
enetc: Remove the mdio bus on PF probe bailout
net: ethernet: ti: add NETIF_F_HW_TC hw feature flag for taprio offload
net: ethernet: ave: Fix error returns in ave_init
drivers/net/wan/x25_asy: Fix to make it work
ipvs: fix the connection sync failed in some cases
...
|
|
On non-EFI systems, it wasn't possible to test the platform firmware
loader because it will have never set "checked_fw" during __init.
Instead, allow the test code to override this check. Additionally split
the declarations into a private header file so it there is greater
enforcement of the symbol visibility.
Fixes: 548193cba2a7 ("test_firmware: add support for firmware_request_platform")
Cc: stable@vger.kernel.org
Acked-by: Scott Branden <scott.branden@broadcom.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
Link: https://lore.kernel.org/r/20200724213640.389191-2-keescook@chromium.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
Add support for the Inline Crypto Engine (ICE) key programming interface
that's needed for the ufs-qcom driver to use inline encryption on
Snapdragon SoCs. This interface consists of two SCM calls: one to program
a key into a keyslot, and one to invalidate a keyslot.
Although the UFS specification defines a standard way to do this, on these
SoCs the Linux kernel isn't permitted to access the needed crypto
configuration registers directly; these SCM calls must be used instead.
Link: https://lore.kernel.org/r/20200710072013.177481-2-ebiggers@kernel.org
Acked-by: Bjorn Andersson <bjorn.andersson@linaro.org>
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
Extend the rfc 4884 read interface introduced for ipv4 in
commit eba75c587e81 ("icmp: support rfc 4884") to ipv6.
Add socket option SOL_IPV6/IPV6_RECVERR_RFC4884.
Changes v1->v2:
- make ipv6_icmp_error_rfc4884 static (file scope)
Signed-off-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The RFC 4884 spec is largely the same between IPv4 and IPv6.
Factor out the IPv4 specific parts in preparation for IPv6 support:
- icmp types supported
- icmp header size, and thus offset to original datagram start
- datagram length field offset in icmp(6)hdr.
- datagram length field word size: 4B for IPv4, 8B for IPv6.
Signed-off-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
For architectures like x86 and arm64 we don't need the separate bit to
indicate that a pointer is a kernel pointer as the address spaces are
unified. That way the sockptr_t can be reduced to a union of two
pointers, which leads to nicer calling conventions.
The only caveat is that we need to check that users don't pass in kernel
address and thus gain access to kernel memory. Thus the USER_SOCKPTR
helper is replaced with a init_user_sockptr function that does this check
and returns an error if it fails.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Rework the remaining setsockopt code to pass a sockptr_t instead of a
plain user pointer. This removes the last remaining set_fs(KERNEL_DS)
outside of architecture specific code.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Stefan Schmidt <stefan@datenfreihafen.org> [ieee802154]
Acked-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Pass a sockptr_t to prepare for set_fs-less handling of the kernel
pointer from bpf-cgroup.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Pass a sockptr_t to prepare for set_fs-less handling of the kernel
pointer from bpf-cgroup.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
This is mostly to prepare for cleaning up the callers, as bpfilter by
design can't handle kernel pointers.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Pass a sockptr_t to prepare for set_fs-less handling of the kernel
pointer from bpf-cgroup.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Pass a sockptr_t to prepare for set_fs-less handling of the kernel
pointer from bpf-cgroup.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Pass a sockptr_t to prepare for set_fs-less handling of the kernel
pointer from bpf-cgroup.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Add a uptr_t type that can hold a pointer to either a user or kernel
memory region, and simply helpers to copy to and from it.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Retreive a hash value from the SKB and store it
in the dissector key for future matching.
Signed-off-by: Ariel Levkovich <lariel@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Rationale:
Reduces attack surface on kernel devs opening the links for MITM
as HTTPS traffic is much harder to manipulate.
Deterministic algorithm:
For each file:
If not .svg:
For each line:
If doesn't contain `\bxmlns\b`:
For each link, `\bhttp://[^# \t\r\n]*(?:\w|/)`:
If neither `\bgnu\.org/license`, nor `\bmozilla\.org/MPL\b`:
If both the HTTP and HTTPS versions
return 200 OK and serve the same content:
Replace HTTP with HTTPS.
Signed-off-by: Alexander A. Klimov <grandmaster@al2klimov.de>
Acked-by: Rob Herring <robh@kernel.org>
Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
|
|
Drop the repeated word "an" in a comment.
Insert "and" between "source" and "destination" as is done a few
lines earlier.
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Reviewed-by: Tero Kristo <t-kristo@ti.com>
Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
|
|
Merge misc fixes from Andrew Morton:
"Subsystems affected by this patch series: mm/pagemap, mm/shmem,
mm/hotfixes, mm/memcg, mm/hugetlb, mailmap, squashfs, scripts,
io-mapping, MAINTAINERS, and gdb"
* emailed patches from Andrew Morton <akpm@linux-foundation.org>:
scripts/gdb: fix lx-symbols 'gdb.error' while loading modules
MAINTAINERS: add KCOV section
io-mapping: indicate mapping failure
scripts/decode_stacktrace: strip basepath from all paths
squashfs: fix length field overlap check in metadata reading
mailmap: add entry for Mike Rapoport
khugepaged: fix null-pointer dereference due to race
mm/hugetlb: avoid hardcoding while checking if cma is enabled
mm: memcg/slab: fix memory leak at non-root kmem_cache destroy
mm/memcg: fix refcount error while moving and swapping
mm/memcontrol: fix OOPS inside mem_cgroup_get_nr_swap_pages()
mm: initialize return of vm_insert_pages
vfs/xattr: mm/shmem: kernfs: release simple xattr entry in a right way
mm/mmap.c: close race between munmap() and expand_upwards()/downwards()
|
|
Add new API k3_ringacc_request_rings_pair() to request pair of rings at
once, as in the most cases Rings are used with DMA channels, which need to
request pair of rings - one to feed DMA with descriptors (TX/RX FDQ) and
one to receive completions (RX/TX CQ). This will allow to simplify Ringacc
API users.
Signed-off-by: Grygorii Strashko <grygorii.strashko@ti.com>
Reviewed-by: Peter Ujfalusi <peter.ujfalusi@ti.com>
Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm into master
Pull device mapper fix from Mike Snitzer:
"A stable fix for DM integrity target's integrity recalculation that
gets skipped when resuming a device. This is a fix for a previous
stable@ fix"
* tag 'for-5.8/dm-fixes-3' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm:
dm integrity: fix integrity recalculation that is improperly skipped
|
|
This manually reverts commit d1d84bb95364ed604015c2b788caaf3dbca0262f.
The only user has gone two years ago with commit 589edb56b424 ("ACPI /
scan: Create platform device for INT33FE ACPI nodes") and no new user
has showed up. Remove and hope we will never need it again.
Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com>
Acked-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Wolfram Sang <wsa@kernel.org>
|
|
The !ATOMIC_IOMAP version of io_maping_init_wc will always return
success, even when the ioremap fails.
Since the ATOMIC_IOMAP version returns NULL when the init fails, and
callers check for a NULL return on error this is unexpected.
During a device probe, where the ioremap failed, a crash can look like
this:
BUG: unable to handle page fault for address: 0000000000210000
#PF: supervisor write access in kernel mode
#PF: error_code(0x0002) - not-present page
Oops: 0002 [#1] PREEMPT SMP
CPU: 0 PID: 177 Comm:
RIP: 0010:fill_page_dma [i915]
gen8_ppgtt_create [i915]
i915_ppgtt_create [i915]
intel_gt_init [i915]
i915_gem_init [i915]
i915_driver_probe [i915]
pci_device_probe
really_probe
driver_probe_device
The remap failure occurred much earlier in the probe. If it had been
propagated, the driver would have exited with an error.
Return NULL on ioremap failure.
[akpm@linux-foundation.org: detect ioremap_wc() errors earlier]
Fixes: cafaf14a5d8f ("io-mapping: Always create a struct to hold metadata about the io-mapping")
Signed-off-by: Michael J. Ruhl <michael.j.ruhl@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Mike Rapoport <rppt@linux.ibm.com>
Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Daniel Vetter <daniel@ffwll.ch>
Cc: <stable@vger.kernel.org>
Link: http://lkml.kernel.org/r/20200721171936.81563-1-michael.j.ruhl@intel.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
After commit fdc85222d58e ("kernfs: kvmalloc xattr value instead of
kmalloc"), simple xattr entry is allocated with kvmalloc() instead of
kmalloc(), so we should release it with kvfree() instead of kfree().
Fixes: fdc85222d58e ("kernfs: kvmalloc xattr value instead of kmalloc")
Signed-off-by: Chengguang Xu <cgxu519@mykernel.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Acked-by: Hugh Dickins <hughd@google.com>
Acked-by: Tejun Heo <tj@kernel.org>
Cc: Daniel Xu <dxu@dxuuu.xyz>
Cc: Chris Down <chris@chrisdown.name>
Cc: Andreas Dilger <adilger@dilger.ca>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: <stable@vger.kernel.org> [5.7]
Link: http://lkml.kernel.org/r/20200704051608.15043-1-cgxu519@mykernel.net
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Up to ConnectX-7 setting mkey relaxed ordering read/write attributes
by UMR is not supported. ConnectX-7 supports this option, which is
indicated by two new HCA capabilities - relaxed_ordering_write_umr
and relaxed_ordering_read_umr.
Signed-off-by: Meir Lichtinger <meirl@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
|
|
put_task_struct_many() is as put_task_struct() but puts several
references at once. Useful to batching it.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
into char-misc-next
Georgi writes:
interconnect changes for 5.9
Here are the interconnect changes for the 5.9-rc1 merge window
consisting mostly of changes that give the core more flexibility
in order to support some new provider drivers.
Core changes:
- Export of_icc_get_from_provider()
- Relax requirement in of_icc_get_from_provider()
- Allow inter-provider pairs to be configured
- Mark all dummy functions as static inline
Signed-off-by: Georgi Djakov <georgi.djakov@linaro.org>
* tag 'icc-5.9-rc1' of https://git.linaro.org/people/georgi.djakov/linux:
interconnect: Mark all dummy functions as static inline
interconnect: Allow inter-provider pairs to be configured
interconnect: Relax requirement in of_icc_get_from_provider()
interconnect: Export of_icc_get_from_provider()
|
|
dyndbg populates its callsite info into __verbose section, change that
to a more specific and descriptive name, __dyndbg.
Also, per checkpatch:
simplify __attribute(..) to __section(__dyndbg) declaration.
and 1 spelling fix, decriptor
Acked-by: <jbaron@akamai.com>
Signed-off-by: Jim Cromie <jim.cromie@gmail.com>
Link: https://lore.kernel.org/r/20200719231058.1586423-6-jim.cromie@gmail.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
For SR-IOV, the PF PRI is shared between the PF and any associated VFs, and
the PRI Capability is allowed for PFs but not for VFs. Searching for the
PRI Capability on a VF always fails, even if its associated PF supports
PRI.
Add pci_pri_supported() to check whether device or its associated PF
supports PRI.
[bhelgaas: commit log, avoid "!!"]
Fixes: b16d0cb9e2fc ("iommu/vt-d: Always enable PASID/PRI PCI capabilities before ATS")
Link: https://lore.kernel.org/r/1595543849-19692-1-git-send-email-ashok.raj@intel.com
Signed-off-by: Ashok Raj <ashok.raj@intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com>
Acked-by: Joerg Roedel <jroedel@suse.de>
Cc: stable@vger.kernel.org # v4.4+
|
|
Entering a guest is similar to exiting to user space. Pending work like
handling signals, rescheduling, task work etc. needs to be handled before
that.
Provide generic infrastructure to avoid duplication of the same handling
code all over the place.
The transfer to guest mode handling is different from the exit to usermode
handling, e.g. vs. rseq and live patching, so a separate function is used.
The initial list of work items handled is:
TIF_SIGPENDING, TIF_NEED_RESCHED, TIF_NOTIFY_RESUME
Architecture specific TIF flags can be added via defines in the
architecture specific include files.
The calling convention is also different from the syscall/interrupt entry
functions as KVM invokes this from the outer vcpu_run() loop with
interrupts and preemption enabled. To prevent missing a pending work item
it invokes a check for pending TIF work from interrupt disabled code right
before transitioning to guest mode. The lockdep, RCU and tracing state
handling is also done directly around the switch to and from guest mode.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lkml.kernel.org/r/20200722220519.833296398@linutronix.de
|
|
Like the syscall entry/exit code interrupt/exception entry after the real
low level ASM bits should not be different accross architectures.
Provide a generic version based on the x86 code.
irqentry_enter() is called after the low level entry code and
irqentry_exit() must be invoked right before returning to the low level
code which just contains the actual return logic. The code before
irqentry_enter() and irqentry_exit() must not be instrumented. Code after
irqentry_enter() and before irqentry_exit() can be instrumented.
irqentry_enter() invokes irqentry_enter_from_user_mode() if the
interrupt/exception came from user mode. If if entered from kernel mode it
handles the kernel mode variant of establishing state for lockdep, RCU and
tracing depending on the kernel context it interrupted (idle, non-idle).
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lkml.kernel.org/r/20200722220519.723703209@linutronix.de
|
|
Like syscall entry all architectures have similar and pointlessly different
code to handle pending work before returning from a syscall to user space.
1) One-time syscall exit work:
- rseq syscall exit
- audit
- syscall tracing
- tracehook (single stepping)
2) Preparatory work
- Exit to user mode loop (common TIF handling).
- Architecture specific one time work arch_exit_to_user_mode_prepare()
- Address limit and lockdep checks
3) Final transition (lockdep, tracing, context tracking, RCU). Invokes
arch_exit_to_user_mode() to handle e.g. speculation mitigations
Provide a generic version based on the x86 code which has all the RCU and
instrumentation protections right.
Provide a variant for interrupt return to user mode as well which shares
the above #2 and #3 work items.
After syscall_exit_to_user_mode() and irqentry_exit_to_user_mode() the
architecture code just has to return to user space. The code after
returning from these functions must not be instrumented.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Kees Cook <keescook@chromium.org>
Link: https://lkml.kernel.org/r/20200722220519.613977173@linutronix.de
|
|
On syscall entry certain work needs to be done:
- Establish state (lockdep, context tracking, tracing)
- Conditional work (ptrace, seccomp, audit...)
This code is needlessly duplicated and different in all
architectures.
Provide a generic version based on the x86 implementation which has all the
RCU and instrumentation bits right.
As interrupt/exception entry from user space needs parts of the same
functionality, provide a function for this as well.
syscall_enter_from_user_mode() and irqentry_enter_from_user_mode() must be
called right after the low level ASM entry. The calling code must be
non-instrumentable. After the functions returns state is correct and the
subsequent functions can be instrumented.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Kees Cook <keescook@chromium.org>
Link: https://lkml.kernel.org/r/20200722220519.513463269@linutronix.de
|
|
To avoid #ifdeffery in the upcoming generic syscall entry work code provide
a stub for __secure_computing() as this is preferred over
secure_computing() because the TIF flag is already evaluated.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Kees Cook <keescook@chromium.org>
Link: https://lkml.kernel.org/r/20200722220519.404974280@linutronix.de
|