summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2021-08-25CIFS: Fix a potencially linear read overflowLen Baker
strlcpy() reads the entire source buffer first. This read may exceed the destination size limit. This is both inefficient and can lead to linear read overflows if a source string is not NUL-terminated. Also, the strnlen() call does not avoid the read overflow in the strlcpy function when a not NUL-terminated string is passed. So, replace this block by a call to kstrndup() that avoids this type of overflow and does the same. Fixes: 066ce6899484d ("cifs: rename cifs_strlcpy_to_host and make it use new functions") Signed-off-by: Len Baker <len.baker@gmx.com> Reviewed-by: Paulo Alcantara (SUSE) <pc@cjr.nz> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>
2021-08-25nbd: remove nbd->destroy_completeChristoph Hellwig
The nbd->destroy_complete pointer is not really needed. For creating a device without a specific index we now simplify skip devices marked NBD_DESTROY_ON_DISCONNECT as there is not much point to reuse them. For device creation with a specific index there is no real need to treat the case of a requested but not finished disconnect different than any other device that is being shutdown, i.e. we can just return an error, as a slightly different race window would anyway. Fixes: 6e4df4c64881 ("nbd: reduce the nbd_index_mutex scope") Reported-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Reported-by: syzbot+2c98885bcd769f56b6d6@syzkaller.appspotmail.com Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20210825163108.50713-7-hch@lst.de Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-08-25nbd: only return usable devices from nbd_find_unusedChristoph Hellwig
Device marked as NBD_DESTROY_ON_DISCONNECT can and should be skipped given that they won't survive the disconnect. So skip them and try to grab a reference directly and just continue if the the devices is being torn down or created and thus has a zero refcount. Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20210825163108.50713-6-hch@lst.de Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-08-25nbd: set nbd->index before releasing nbd_index_mutexTetsuo Handa
Set nbd->index before releasing nbd_index_mutex, as populate_nbd_status() might access nbd->index as soon as nbd_index_mutex is released. Fixes: 6e4df4c64881 ("nbd: reduce the nbd_index_mutex scope") Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> [hch: split from a larger patch] Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20210825163108.50713-5-hch@lst.de Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-08-25nbd: prevent IDR lookups from finding partially initialized devicesTetsuo Handa
Previously nbd_index_mutex was held during whole add/remove/lookup operations in order to guarantee that partially initialized devices are not reachable via idr_find() or idr_for_each(). But now that partially initialized devices become reachable as soon as idr_alloc() succeeds, we need to skip partially initialized devices. Since it seems that all functions use refcount_inc_not_zero(&nbd->refs) in order to skip destroying devices, update nbd->refs from zero to non-zero as the last step of device initialization in order to also skip partially initialized devices. Fixes: 6e4df4c64881 ("nbd: reduce the nbd_index_mutex scope") Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> [hch: split from a larger patch, added comments] Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20210825163108.50713-4-hch@lst.de Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-08-25nbd: reset NBD to NULL when restarting in nbd_genl_connectChristoph Hellwig
When nbd_genl_connect restarts to wait for a disconnecting device, nbd needs to be reset to NULL. Do that by facoring out a helper to find an unused device. Fixes: 6177b56c96ff ("nbd: refactor device search and allocation in nbd_genl_connect") Reported-by: Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp> Reported-by: Hillf Danton <hdanton@sina.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20210825163108.50713-3-hch@lst.de Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-08-25nbd: add missing locking to the nbd_dev_add error pathTetsuo Handa
idr_remove needs external synchronization. Fixes: 6e4df4c64881 ("nbd: reduce the nbd_index_mutex scope") Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> [hch: split from a larger patch] Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20210825163108.50713-2-hch@lst.de Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-08-25Merge branch 'akpm' (patches from Andrew)Linus Torvalds
Merge fixes from Andrew Morton: "2 patches. Subsystems affected by this patch series: mm/memory-hotplug and MAINTAINERS" * emailed patches from Andrew Morton <akpm@linux-foundation.org>: MAINTAINERS: exfat: update my email address mm/memory_hotplug: fix potential permanent lru cache disable
2021-08-25MAINTAINERS: exfat: update my email addressNamjae Jeon
My email address in exfat entry will be not available in a few days. Update it to my own kernel.org address. Link: https://lkml.kernel.org/r/20210825044833.16806-1-namjae.jeon@samsung.com Signed-off-by: Namjae Jeon <namjae.jeon@samsung.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-08-25mm/memory_hotplug: fix potential permanent lru cache disableMiaohe Lin
If offline_pages failed after lru_cache_disable(), it forgot to do lru_cache_enable() in error path. So we would have lru cache disabled permanently in this case. Link: https://lkml.kernel.org/r/20210821094246.10149-3-linmiaohe@huawei.com Fixes: d479960e44f2 ("mm: disable LRU pagevec during the migration temporarily") Signed-off-by: Miaohe Lin <linmiaohe@huawei.com> Reviewed-by: David Hildenbrand <david@redhat.com> Reviewed-by: Oscar Salvador <osalvador@suse.de> Reviewed-by: Naoya Horiguchi <naoya.horiguchi@nec.com> Cc: Chris Goldsworthy <cgoldswo@codeaurora.org> Cc: Michal Hocko <mhocko@suse.com> Cc: Minchan Kim <minchan@kernel.org> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-08-25Merge branch 'selftests: xsk: various simplifications'Alexei Starovoitov
Magnus Karlsson says: ==================== This patch set mainly contains various simplifications to the xsk selftests. The only exception is the introduction of packet streams that describes what the Tx process should send and what the Rx process should receive. If it receives anything else, the test fails. This mechanism can be used to produce tests were all packets are not received by the Rx thread or modified in some way. An example of this is if an XDP program does XDP_PASS on some of the packets. This patch set will be followed by another patch set that implements a new structure that will facilitate adding new tests. A couple of new tests will also be included in that patch set. v2 -> v3: * Reworked patch 12 so that it now has functions for creating and destroying ifobjects. Simplifies the code. [Maciej] * The packet stream now allocates the supplied buffer array length, instead of the default one. [Maciej] * pkt_stream_get_pkt() now returns NULL when indexing a non-existing packet. [Maciej] * pkt_validate() is now is_pkt_valid(). [Maciej] * Slowed down packet sending speed even more in patch 11 so that slow systems do not silenty drop packets in skb mode. v1 -> v2: * Dropped the patch with per process limit changes as it is not needed [Yonghong] * Improved the commit message of patch 1 [Yonghong] * Fixed a spelling error in patch 9 Thanks: Magnus ==================== Acked-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2021-08-25selftests: xsk: Preface options with optMagnus Karlsson
Preface all options with opt_ and make them booleans. Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20210825093722.10219-17-magnus.karlsson@gmail.com
2021-08-25selftests: xsk: Make enums lower caseMagnus Karlsson
Make enums lower case as that is the standard. Also drop the unnecessary TEST_MODE_UNCONFIGURED mode. Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20210825093722.10219-16-magnus.karlsson@gmail.com
2021-08-25selftests: xsk: Generate packets from specificationMagnus Karlsson
Generate packets from a specification instead of something hard coded. The idea is that a test generates one or more packet specifications and provides it/them to both Tx and Rx. The Tx thread will generate from this specification and Rx will validate that it receives what is in the specification. The specification can be the same on both ends, meaning that everything that was sent should be received, or different which means that Rx will only receive part of the sent packets. Currently, the packet specification is the same for both Rx and Tx and the same for each test. This will change in later work as features and tests are added. The data path functions are also renamed to better reflect what actions they are performing after introducing this feature. Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20210825093722.10219-15-magnus.karlsson@gmail.com
2021-08-25selftests: xsk: Generate packet directly in umemMagnus Karlsson
Generate the packet directly in the umem instead of in a temporary buffer that is copied out. Simplifies the code and improves performance. Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20210825093722.10219-14-magnus.karlsson@gmail.com
2021-08-25selftests: xsk: Simplify cleanup of ifobjectsMagnus Karlsson
Simpify the cleanup of ifobjects right before the program exits by introducing functions for creating and destroying these objects. Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20210825093722.10219-13-magnus.karlsson@gmail.com
2021-08-25selftests: xsk: Decrease sending speedMagnus Karlsson
Decrease sending speed to avoid potentially overflowing some buffers in the skb case that leads to dropped packets we cannot control (and thus the tests may generate false negatives). Decrease batch size and introduce a usleep in the transmit thread to not overflow the receiver. Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20210825093722.10219-12-magnus.karlsson@gmail.com
2021-08-25selftests: xsk: Validate tx stats on tx threadMagnus Karlsson
Validate the tx stats on the Tx thread instead of the Rx thread. Depending on your settings, you might not be allowed to query the statistics of a socket you do not own, so better to do this on the correct thread to start with. Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20210825093722.10219-11-magnus.karlsson@gmail.com
2021-08-25selftests: xsk: Simplify packet validation in xsk testsMagnus Karlsson
Simplify packet validation in the xsk selftests by performing it at once for every packet. The current code performed this per batch and did this on copied packet data. Make it simpler and faster by validating it at once and on the umem packet data thus skipping the copy and the memory allocation for the temprary buffer. The optional packet dump feature is also simplified in the same manner. Memory allocation and copying is removed and the dump is performed directly on the umem data. Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20210825093722.10219-10-magnus.karlsson@gmail.com
2021-08-25selftests: xsk: Rename worker_* functions that are not thread entry pointsMagnus Karlsson
Rename worker_* functions that are not thread entry points to something else. This was confusing. Now only thread entry points are worker_something. Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20210825093722.10219-9-magnus.karlsson@gmail.com
2021-08-25selftests: xsk: Disassociate umem size with packets sentMagnus Karlsson
Disassociate the number of packets sent with the number of buffers in the umem. This so we can loop over the umem to test more things. Set the size of the umem to be a multiple of 2M. A requirement for huge pages that are needed in unaligned mode. Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20210825093722.10219-8-magnus.karlsson@gmail.com
2021-08-25selftests: xsk: Remove end-of-test packetMagnus Karlsson
Get rid of the end-of-test packet and just count the number of packets received and quit when the expected number as been received. Simplifies the code. Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20210825093722.10219-7-magnus.karlsson@gmail.com
2021-08-25selftests: xsk: Simplify the retry codeMagnus Karlsson
Simplify the retry code and make it more efficient by waiting first, instead of trying immediately which always fails due to the asynchronous nature of xsk socket close. Also decrease the wait time to significantly lower the run-time of the test suite. Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20210825093722.10219-6-magnus.karlsson@gmail.com
2021-08-25selftests: xsk: Return correct error codesMagnus Karlsson
Return the correct error codes so they can be printed correctly. Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20210825093722.10219-5-magnus.karlsson@gmail.com
2021-08-25selftests: xsk: Remove unused variablesMagnus Karlsson
Remove unused variables and typedefs. The *_npkts variables are incremented but never used. Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20210825093722.10219-4-magnus.karlsson@gmail.com
2021-08-25selftests: xsk: Remove the num_tx_packets optionMagnus Karlsson
Remove the number of tx packet option as this should be decided by the test itself. Also change the number of packets to be sent to 4096 speeding up the execution. Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20210825093722.10219-3-magnus.karlsson@gmail.com
2021-08-25selftests: xsk: Remove color modeMagnus Karlsson
Remove color mode since it does not add any value and having less code means less maintenance which is a good thing. Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20210825093722.10219-2-magnus.karlsson@gmail.com
2021-08-25io_uring: don't free request to slabHao Xu
It's not necessary to free the request back to slab when we fail to get sqe, just move it to state->free_list. Signed-off-by: Hao Xu <haoxu@linux.alibaba.com> Reviewed-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/20210825175856.194299-1-haoxu@linux.alibaba.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-08-25hv_utils: Set the maximum packet size for VSS driver to the length of the ↵Vitaly Kuznetsov
receive buffer Commit adae1e931acd ("Drivers: hv: vmbus: Copy packets sent by Hyper-V out of the ring buffer") introduced a notion of maximum packet size and for KVM and FCOPY drivers set it to the length of the receive buffer. VSS driver wasn't updated, this means that the maximum packet size is now VMBUS_DEFAULT_MAX_PKT_SIZE (4k). Apparently, this is not enough. I'm observing a packet of 6304 bytes which is being truncated to 4096. When VSS driver tries to read next packet from ring buffer it starts from the wrong offset and receives garbage. Set the maximum packet size to 'HV_HYP_PAGE_SIZE * 2' in VSS driver. This matches the length of the receive buffer and is in line with other utils drivers. Fixes: adae1e931acd ("Drivers: hv: vmbus: Copy packets sent by Hyper-V out of the ring buffer") Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Reviewed-by: Michael Kelley <mikelley@microsoft.com> Link: https://lore.kernel.org/r/20210825133857.847866-1-vkuznets@redhat.com Signed-off-by: Wei Liu <wei.liu@kernel.org>
2021-08-25PM: domains: Improve runtime PM performance state handlingDmitry Osipenko
GENPD core doesn't support handling performance state changes while consumer device is runtime-suspended or when runtime PM is disabled. GENPD core may override performance state that was configured by device driver while RPM of the device was disabled or device was RPM-suspended. Let's close that gap by allowing drivers to control performance state while RPM of a consumer device is disabled and to set up performance state of RPM-suspended device that will be applied by GENPD core on RPM-resume of the device. Fixes: 5937c3ce2122 ("PM: domains: Drop/restore performance state votes for devices at runtime PM") Signed-off-by: Dmitry Osipenko <digetx@gmail.com> Reviewed-by: Ulf Hansson <ulf.hansson@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2021-08-25dt-bindings: Add vendor prefix for Topic Embedded SystemsMichal Simek
Add vendor prefix for Topic Embedded Systems (http://topic.nl). Signed-off-by: Michal Simek <michal.simek@xilinx.com> Link: https://lore.kernel.org/r/b6e42012977876c421672a84bdb7636be819d664.1629877585.git.michal.simek@xilinx.com Signed-off-by: Rob Herring <robh@kernel.org>
2021-08-25of: fdt: Rename reserve_elfcorehdr() to fdt_reserve_elfcorehdr()Geert Uytterhoeven
On ia64/allmodconfig: drivers/of/fdt.c:609:20: error: conflicting types for 'reserve_elfcorehdr'; have 'void(void)' 609 | static void __init reserve_elfcorehdr(void) | ^~~~~~~~~~~~~~~~~~ arch/ia64/include/asm/meminit.h:43:12: note: previous declaration of 'reserve_elfcorehdr' with type 'int(u64 *, u64 *)' {aka 'int(long long unsigned int *, long long unsigned int *)'} 43 | extern int reserve_elfcorehdr(u64 *start, u64 *end); | ^~~~~~~~~~~~~~~~~~ Fix this by prefixing the FDT function name with "fdt_". Fixes: f7e7ce93aac13118 ("of: fdt: Add generic support for handling elf core headers property") Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be> Link: https://lore.kernel.org/r/f6eabbbce0fba6da3da0264c1e1cf23c01173999.1629884393.git.geert+renesas@glider.be Signed-off-by: Rob Herring <robh@kernel.org>
2021-08-25powercap: Add Power Limit4 support for Alder Lake SoCSumeet Pawnikar
Add Power Limit4 support for Alder Lake SoC. Signed-off-by: Sumeet Pawnikar <sumeet.r.pawnikar@intel.com> Acked-by: Zhang Rui <rui.zhang@intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2021-08-25cpufreq: intel_pstate: Process HWP Guaranteed change notificationSrinivas Pandruvada
It is possible that HWP guaranteed ratio is changed in response to change in power and thermal limits. For example when Intel Speed Select performance profile is changed or there is change in TDP, hardware can send notifications. It is possible that the guaranteed ratio is increased. This creates an issue when turbo is disabled, as the old limits set in MSR_HWP_REQUEST are still lower and hardware will clip to older limits. This change enables HWP interrupt and process HWP interrupts. When guaranteed is changed, calls cpufreq_update_policy() so that driver callbacks are called to update to new HWP limits. This callback is called from a delayed workqueue of 10ms to avoid frequent updates. Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2021-08-25thermal: intel: Allow processing of HWP interruptSrinivas Pandruvada
Add a weak function to process HWP (Hardware P-states) notifications and move updating HWP_STATUS MSR to this function. This allows HWP interrupts to be processed by the intel_pstate driver in HWP mode by overriding the implementation. Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> Acked-by: Zhang Rui <rui.zhang@intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2021-08-25ACPI: tables: FPDT: Do not print FW_BUG message if record types are reservedAdrian Huang
In ACPI 6.4 spec, record types "0x0002-0xffff" of FPDT Performance Record Types [1] and record types "0x0003-0xffff" of Runtime Performance Record Types [2] are reserved. Users might be confused with the FW_BUG message, and they think this is the FW issue. Here is the example in a Lenovo box: ACPI: FPDT 0x00000000A820A000 000044 (v01 LENOVO THINKSYS 00000100 01000013) ACPI: Reserving FPDT table memory at [mem 0xa820a000-0xa820a043] ACPI FPDT: [Firmware Bug]: Invalid record 4113 found So, remove the FW_BUG message to avoid confusion since those types are reserved in ACPI 6.4 spec. [1] https://uefi.org/specs/ACPI/6.4/05_ACPI_Software_Programming_Model/ACPI_Software_Programming_Model.html#fpdt-performance-record-types-table [2] https://uefi.org/specs/ACPI/6.4/05_ACPI_Software_Programming_Model/ACPI_Software_Programming_Model.html#runtime-performance-record-types-table Signed-off-by: Adrian Huang <ahuang12@lenovo.com> Acked-by: Zhang Rui <rui.zhang@intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2021-08-25ACPI: button: Add DMI quirk for Lenovo Yoga 9 (14INTL5)Ulrich Huber
The Lenovo Yoga 9 (14INTL5)'s ACPI _LID is bugged: After hibernation the lid is initially reported as closed. Once closing and then reopening the lid reports the lid as open again. This leads to the conclusion that the initial notification of the lid is missing but subsequent notifications are correct. In order fo the Linux LID code to handle this device properly the lid_init_state must be set to ACPI_BUTTON_LID_INIT_OPEN. Signed-off-by: Ulrich Huber <ulrich@huberulrich.de> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2021-08-25ACPI: Add memory semantics to acpi_os_map_memory()Lorenzo Pieralisi
The memory attributes attached to memory regions depend on architecture specific mappings. For some memory regions, the attributes specified by firmware (eg uncached) are not sufficient to determine how a memory region should be mapped by an OS (for instance a region that is define as uncached in firmware can be mapped as Normal or Device memory on arm64) and therefore the OS must be given control on how to map the region to match the expected mapping behaviour (eg if a mapping is requested with memory semantics, it must allow unaligned accesses). Rework acpi_os_map_memory() and acpi_os_ioremap() back-end to split them into two separate code paths: acpi_os_memmap() -> memory semantics acpi_os_ioremap() -> MMIO semantics The split allows the architectural implementation back-ends to detect the default memory attributes required by the mapping in question (ie the mapping API defines the semantics memory vs MMIO) and map the memory accordingly. Link: https://lore.kernel.org/linux-arm-kernel/31ffe8fc-f5ee-2858-26c5-0fd8bdd68702@arm.com Tested-by: Hanjun Guo <guohanjun@huawei.com> Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com> Acked-by: Ard Biesheuvel <ardb@kernel.org> Acked-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2021-08-25Merge branch 'bpf: Add bpf_task_pt_regs() helper'Alexei Starovoitov
Daniel Xu says: ==================== The motivation behind this helper is to access userspace pt_regs in a kprobe handler. uprobe's ctx is the userspace pt_regs. kprobe's ctx is the kernelspace pt_regs. bpf_task_pt_regs() allows accessing userspace pt_regs in a kprobe handler. The final case (kernelspace pt_regs in uprobe) is pretty rare (usermode helper) so I think that can be solved later if necessary. More concretely, this helper is useful in doing BPF-based DWARF stack unwinding. Currently the kernel can only do framepointer based stack unwinds for userspace code. This is because the DWARF state machines are too fragile to be computed in kernelspace [0]. The idea behind DWARF-based stack unwinds w/ BPF is to copy a chunk of the userspace stack (while in prog context) and send it up to userspace for unwinding (probably with libunwind) [1]. This would effectively enable profiling applications with -fomit-frame-pointer using kprobes and uprobes. [0]: https://lkml.org/lkml/2012/2/10/356 [1]: https://github.com/danobi/bpf-dwarf-walk Changes from v1: - Conwolidate BTF_ID decls for task_struct - Enable bpf_get_current_task_btf() for all prog types - Enable bpf_task_pt_regs() for all prog types - Use ASSERT_* macros instead of CHECK ==================== Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2021-08-25bpf: selftests: Add bpf_task_pt_regs() selftestDaniel Xu
This test retrieves the uprobe's pt_regs in two different ways and compares the contents in an arch-agnostic way. Signed-off-by: Daniel Xu <dxu@dxuuu.xyz> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/5581eb8800f6625ec8813fe21e9dce1fbdef4937.1629772842.git.dxu@dxuuu.xyz
2021-08-25bpf: Add bpf_task_pt_regs() helperDaniel Xu
The motivation behind this helper is to access userspace pt_regs in a kprobe handler. uprobe's ctx is the userspace pt_regs. kprobe's ctx is the kernelspace pt_regs. bpf_task_pt_regs() allows accessing userspace pt_regs in a kprobe handler. The final case (kernelspace pt_regs in uprobe) is pretty rare (usermode helper) so I think that can be solved later if necessary. More concretely, this helper is useful in doing BPF-based DWARF stack unwinding. Currently the kernel can only do framepointer based stack unwinds for userspace code. This is because the DWARF state machines are too fragile to be computed in kernelspace [0]. The idea behind DWARF-based stack unwinds w/ BPF is to copy a chunk of the userspace stack (while in prog context) and send it up to userspace for unwinding (probably with libunwind) [1]. This would effectively enable profiling applications with -fomit-frame-pointer using kprobes and uprobes. [0]: https://lkml.org/lkml/2012/2/10/356 [1]: https://github.com/danobi/bpf-dwarf-walk Signed-off-by: Daniel Xu <dxu@dxuuu.xyz> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/e2718ced2d51ef4268590ab8562962438ab82815.1629772842.git.dxu@dxuuu.xyz
2021-08-25bpf: Extend bpf_base_func_proto helpers with bpf_get_current_task_btf()Daniel Xu
bpf_get_current_task() is already supported so it's natural to also include the _btf() variant for btf-powered helpers. This is required for non-tracing progs to use bpf_task_pt_regs() in the next commit. Signed-off-by: Daniel Xu <dxu@dxuuu.xyz> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/f99870ed5f834c9803d73b3476f8272b1bb987c0.1629772842.git.dxu@dxuuu.xyz
2021-08-25bpf: Consolidate task_struct BTF_ID declarationsDaniel Xu
No need to have it defined 5 times. Once is enough. Signed-off-by: Daniel Xu <dxu@dxuuu.xyz> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/6dcefa5bed26fe1226f26683f36819bb53ec19a2.1629772842.git.dxu@dxuuu.xyz
2021-08-25bpf: Add BTF_ID_LIST_GLOBAL_SINGLE macroDaniel Xu
Same as BTF_ID_LIST_SINGLE macro except defines a global ID. Signed-off-by: Daniel Xu <dxu@dxuuu.xyz> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/a867a97517df42fd3953eeb5454402b57e74538f.1629772842.git.dxu@dxuuu.xyz
2021-08-25pipe: do FASYNC notifications for every pipe IO, not just state changesLinus Torvalds
It turns out that the SIGIO/FASYNC situation is almost exactly the same as the EPOLLET case was: user space really wants to be notified after every operation. Now, in a perfect world it should be sufficient to only notify user space on "state transitions" when the IO state changes (ie when a pipe goes from unreadable to readable, or from unwritable to writable). User space should then do as much as possible - fully emptying the buffer or what not - and we'll notify it again the next time the state changes. But as with EPOLLET, we have at least one case (stress-ng) where the kernel sent SIGIO due to the pipe being marked for asynchronous notification, but the user space signal handler then didn't actually necessarily read it all before returning (it read more than what was written, but since there could be multiple writes, it could leave data pending). The user space code then expected to get another SIGIO for subsequent writes - even though the pipe had been readable the whole time - and would only then read more. This is arguably a user space bug - and Colin King already fixed the stress-ng code in question - but the kernel regression rules are clear: it doesn't matter if kernel people think that user space did something silly and wrong. What matters is that it used to work. So if user space depends on specific historical kernel behavior, it's a regression when that behavior changes. It's on us: we were silly to have that non-optimal historical behavior, and our old kernel behavior was what user space was tested against. Because of how the FASYNC notification was tied to wakeup behavior, this was first broken by commits f467a6a66419 and 1b6b26ae7053 ("pipe: fix and clarify pipe read/write wakeup logic"), but at the time it seems nobody noticed. Probably because the stress-ng problem case ends up being timing-dependent too. It was then unwittingly fixed by commit 3a34b13a88ca ("pipe: make pipe writes always wake up readers") only to be broken again when by commit 3b844826b6c6 ("pipe: avoid unnecessary EPOLLET wakeups under normal loads"). And at that point the kernel test robot noticed the performance refression in the stress-ng.sigio.ops_per_sec case. So the "Fixes" tag below is somewhat ad hoc, but it matches when the issue was noticed. Fix it for good (knock wood) by simply making the kill_fasync() case separate from the wakeup case. FASYNC is quite rare, and we clearly shouldn't even try to use the "avoid unnecessary wakeups" logic for it. Link: https://lore.kernel.org/lkml/20210824151337.GC27667@xsang-OptiPlex-9020/ Fixes: 3b844826b6c6 ("pipe: avoid unnecessary EPOLLET wakeups under normal loads") Reported-by: kernel test robot <oliver.sang@intel.com> Tested-by: Oliver Sang <oliver.sang@intel.com> Cc: Eric Biederman <ebiederm@xmission.com> Cc: Colin Ian King <colin.king@canonical.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-08-25arm64: mm: fix comment typo of pud_offset_phys()Xujun Leng
Fix a typo in the comment of macro pud_offset_phys(). Signed-off-by: Xujun Leng <lengxujun2007@126.com> Link: https://lore.kernel.org/r/20210825150526.12582-1-lengxujun2007@126.com Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2021-08-25Merge branch 'for-v5.14' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace Pull ucount fixes from Eric Biederman: "This branch fixes a regression that made it impossible to increase rlimits that had been converted to the ucount infrastructure, and also fixes a reference counting bug where the reference was not incremented soon enough. The fixes are trivial and the bugs have been encountered in the wild, and the fixes have been tested" * 'for-v5.14' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace: ucounts: Increase ucounts reference counter before the security hook ucounts: Fix regression preventing increasing of rlimits in init_user_ns
2021-08-25cgroup/cpuset: Avoid memory migration when nodemasks matchNicolas Saenz Julienne
With the introduction of ee9707e8593d ("cgroup/cpuset: Enable memory migration for cpuset v2") attaching a process to a different cgroup will trigger a memory migration regardless of whether it's really needed. Memory migration is an expensive operation, so bypass it if the nodemasks passed to cpuset_migrate_mm() are equal. Note that we're not only avoiding the migration work itself, but also a call to lru_cache_disable(), which triggers and flushes an LRU drain work on every online CPU. Signed-off-by: Nicolas Saenz Julienne <nsaenzju@redhat.com> Signed-off-by: Tejun Heo <tj@kernel.org>
2021-08-25arm64: signal32: Drop pointless call to sigdelsetmask()Will Deacon
Commit 77097ae503b1 ("most of set_current_blocked() callers want SIGKILL/SIGSTOP removed from set") extended set_current_blocked() to remove SIGKILL and SIGSTOP from the new signal set and updated all callers accordingly. Unfortunately, this collided with the merge of the arm64 architecture, which duly removes these signals when restoring the compat sigframe, as this was what was previously done by arch/arm/. Remove the redundant call to sigdelsetmask() from compat_restore_sigframe(). Reported-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Will Deacon <will@kernel.org> Link: https://lore.kernel.org/r/20210825093911.24493-1-will@kernel.org Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2021-08-25Merge remote-tracking branch 'regulator/for-5.15' into regulator-nextMark Brown