Age | Commit message (Collapse) | Author |
|
Commit 10cb8e617560 ("mac80211: enable QoS support for nl80211 ctrl port")
changed ieee80211_tx_control_port() to aways call
__ieee80211_select_queue() without checking local->hw.queues.
__ieee80211_select_queue() returns a queue-id between 0 and 3, which means
that now ieee80211_tx_control_port() may end up setting the queue-mapping
for a skb to a value higher then local->hw.queues if local->hw.queues
is less then 4.
Specifically this is a problem for ralink rt2500-pci cards where
local->hw.queues is 2. There this causes rt2x00queue_get_tx_queue() to
return NULL and the following error to be logged: "ieee80211 phy0:
rt2x00mac_tx: Error - Attempt to send packet over invalid queue 2",
after which association with the AP fails.
Other callers of __ieee80211_select_queue() skip calling it when
local->hw.queues < IEEE80211_NUM_ACS, add the same check to
ieee80211_tx_control_port(). This fixes ralink rt2500-pci and
similar cards when less then 4 tx-queues no longer working.
Fixes: 10cb8e617560 ("mac80211: enable QoS support for nl80211 ctrl port")
Cc: Markus Theil <markus.theil@tu-ilmenau.de>
Suggested-by: Stanislaw Gruszka <stf_xl@wp.pl>
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Link: https://lore.kernel.org/r/20220918192052.443529-1-hdegoede@redhat.com
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
|
|
Make sure local->queue_stop_reasons and vif.txqs_stopped stay in sync.
When a new vif is created the queues may end up in an inconsistent state
and be inoperable:
Communication not using iTXQ will work, allowing to e.g. complete the
association. But the 4-way handshake will time out. The sta will not
send out any skbs queued in iTXQs.
All normal attempts to start the queues will fail when reaching this
state.
local->queue_stop_reasons will have marked all queues as operational but
vif.txqs_stopped will still be set, creating an inconsistent internal
state.
In reality this seems to be race between the mac80211 function
ieee80211_do_open() setting SDATA_STATE_RUNNING and the wake_txqs_tasklet:
Depending on the driver and the timing the queues may end up to be
operational or not.
Cc: stable@vger.kernel.org
Fixes: f856373e2f31 ("wifi: mac80211: do not wake queues on a vif that is being stopped")
Signed-off-by: Alexander Wetzel <alexander@wetzel-home.de>
Acked-by: Felix Fietkau <nbd@nbd.name>
Link: https://lore.kernel.org/r/20220915130946.302803-1-alexander@wetzel-home.de
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
|
|
ieee80211_txq_purge() calls fq_tin_reset() and
ieee80211_purge_tx_queue(); Both are then calling
ieee80211_free_txskb(). Which can decide to TX the skb again.
There are at least two ways to get a deadlock:
1) When we have a TDLS teardown packet queued in either tin or frags
ieee80211_tdls_td_tx_handle() will call ieee80211_subif_start_xmit()
while we still hold fq->lock. ieee80211_txq_enqueue() will thus
deadlock.
2) A variant of the above happens if aggregation is up and running:
In that case ieee80211_iface_work() will deadlock with the original
task: The original tasks already holds fq->lock and tries to get
sta->lock after kicking off ieee80211_iface_work(). But the worker
can get sta->lock prior to the original task and will then spin for
fq->lock.
Avoid these deadlocks by not sending out any skbs when called via
ieee80211_free_txskb().
Signed-off-by: Alexander Wetzel <alexander@wetzel-home.de>
Link: https://lore.kernel.org/r/20220915124120.301918-1-alexander@wetzel-home.de
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
|
|
The Bitrate for HE/EHT MCS6 is calculated wrongly due to the
incorrect MCS divisor value for mcs6. Fix it with the proper
value.
previous mcs_divisor value = (11769/6144) = 1.915527
fixed mcs_divisor value = (11377/6144) = 1.851725
Fixes: 9c97c88d2f4b ("cfg80211: Add support to calculate and report 4096-QAM HE rates")
Signed-off-by: Tamizh Chelvam Raja <quic_tamizhr@quicinc.com>
Link: https://lore.kernel.org/r/20220908181034.9936-1-quic_tamizhr@quicinc.com
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
|
|
Quite often, the HW get stuck in error condition if a stream error
was detected. As documented, the HW should stop immediately and self
reset. There is likely a problem or a miss-understanding of the self
reset mechanism, as unless we make a long pause, the next command
will then report an error even if there is no error in it.
Disabling error detection fixes the issue, and let the decoder continue
after an error. This patch is safe for backport into older kernels.
Fixes: cd33c830448b ("media: rkvdec: Add the rkvdec driver")
Signed-off-by: Nicolas Dufresne <nicolas.dufresne@collabora.com>
Reviewed-by: Brian Norris <briannorris@chromium.org>
Tested-by: Brian Norris <briannorris@chromium.org>
Reviewed-by: Ezequiel Garcia <ezequiel@vanguardiasur.com.ar>
Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl>
Signed-off-by: Mauro Carvalho Chehab <mchehab@kernel.org>
|
|
Commit a1a2b7125e10 ("of/platform: Drop static setup of IRQ resource
from DT core") removed support for calling platform_get_resource(...,
IORESOURCE_IRQ, ...) on DT-based drivers, but the probe() function of
mtk-vcodec's encoder was still making use of it. This caused the encoder
driver to fail probe.
Since the platform_get_resource() call was only being used to check for
the presence of the interrupt (its returned resource wasn't even used)
and platform_get_irq() was already being used to get the IRQ, simply
drop the use of platform_get_resource(IORESOURCE_IRQ) and handle the
failure of platform_get_irq(), to get the driver probing again.
[hverkuil: drop unused struct resource *res]
Fixes: a1a2b7125e10 ("of/platform: Drop static setup of IRQ resource from DT core")
Signed-off-by: Nícolas F. R. A. Prado <nfraprado@collabora.com>
Reviewed-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com>
Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl>
Signed-off-by: Mauro Carvalho Chehab <mchehab@kernel.org>
|
|
vb2_core_qbuf and vb2_core_querybuf don't check the range of b->index
controlled by the user.
Fix this by adding range checking code before using them.
Fixes: 57868acc369a ("media: videobuf2: Add new uAPI for DVB streaming I/O")
Signed-off-by: Hangyu Hua <hbh25y@gmail.com>
Reviewed-by: Sergey Senozhatsky <senozhatsky@chromium.org>
Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl>
Signed-off-by: Mauro Carvalho Chehab <mchehab@kernel.org>
|
|
If allocating array_buf fails, or copying data from userspace into that
buffer fails, then just free memory and return the error. Don't attempt
to call video_put_user() since there is no point, and it would copy back
data on error even if INFO_FL_ALWAYS_COPY wasn't set.
So if writing the array back to userspace fails, then don't go to
out_array_args, instead just continue with the regular code that just
returns the error unless 'always_copy' is set.
Update the VIDIOC_G/S/TRY_EXT_CTRLS ioctls to set the ALWAYS_COPY flag
since they now need it. Before this worked due to this buggy code, but
now that that is fixed these ioctls need to set this flag explicitly.
Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl>
Signed-off-by: Mauro Carvalho Chehab <mchehab@kernel.org>
|
|
The v4l2_compat_get_array_args() function can leave uninitialized memory in the
buffer it is passed. So zero it before copying array elements from userspace
into the buffer.
Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl>
Reported-by: syzbot+ff18193ff05f3f87f226@syzkaller.appspotmail.com
Reviewed-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@kernel.org>
|
|
E3C/E4C SSDs do support the Write Zeroes command in theory, but have very
bad performance when using it. As the firmware has been frozen for these
products we can not expect firmware improvements for it, so disable
Write Zeroes.
Signed-off-by: Tina Hsu <tina_hsu@phison.corp-partner.google.com>
[hch: update the commit message]
Signed-off-by: Christoph Hellwig <hch@lst.de>
|
|
The IOC_PR_CLEAR and IOC_PR_RELEASE ioctls are
non-functional on NVMe devices because the nvme_pr_clear()
and nvme_pr_release() functions set the IEKEY field incorrectly.
The IEKEY field should be set only when the key is zero (i.e,
not specified). The current code does it backwards.
Furthermore, the NVMe spec describes the persistent
reservation "clear" function as an option on the reservation
release command. The current implementation of nvme_pr_clear()
erroneously uses the reservation register command.
Fix these errors. Note that NVMe version 1.3 and later specify
that setting the IEKEY field will return an error of Invalid
Field in Command. The fix will set IEKEY when the key is zero,
which is appropriate as these ioctls consider a zero key to
be "unspecified", and the intention of the spec change is
to require a valid key.
Tested on a version 1.4 PCI NVMe device in an Azure VM.
Fixes: 1673f1f08c88 ("nvme: move block_device_operations and ns/ctrl freeing to common code")
Fixes: 1d277a637a71 ("NVMe: Add persistent reservation ops")
Signed-off-by: Michael Kelley <mikelley@microsoft.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
|
|
The name of this function looks not very accurate compared to it's
implementation and it's only a wrapper to erofs_read_metabuf(). So,
let's fold it directly instead.
Signed-off-by: Yue Hu <huyue2@coolpad.com>
Reviewed-by: Gao Xiang <hsiangkao@linux.alibaba.com>
Link: https://lore.kernel.org/r/20220927032518.25266-1-zbestahu@gmail.com
Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
|
|
Commit 1527f69204fe ("ata: ahci: Add Green Sardine vendor ID as
board_ahci_mobile") added an explicit entry for AMD Green Sardine
AHCI controller using the board_ahci_mobile configuration (this
configuration has later been renamed to board_ahci_low_power).
The board_ahci_low_power configuration enables support for low power
modes.
This explicit entry takes precedence over the generic AHCI controller
entry, which does not enable support for low power modes.
Therefore, when commit 1527f69204fe ("ata: ahci: Add Green Sardine
vendor ID as board_ahci_mobile") was backported to stable kernels,
it make some Pioneer optical drives, which was working perfectly fine
before the commit was backported, stop working.
The real problem is that the Pioneer optical drives do not handle low
power modes correctly. If these optical drives would have been tested
on another AHCI controller using the board_ahci_low_power configuration,
this issue would have been detected earlier.
Unfortunately, the board_ahci_low_power configuration is only used in
less than 15% of the total AHCI controller entries, so many devices
have never been tested with an AHCI controller with low power modes.
Fixes: 1527f69204fe ("ata: ahci: Add Green Sardine vendor ID as board_ahci_mobile")
Cc: stable@vger.kernel.org
Reported-by: Jaap Berkhout <j.j.berkhout@staalenberk.nl>
Signed-off-by: Niklas Cassel <niklas.cassel@wdc.com>
Reviewed-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Dave Hansen:
- A performance fix for recent large AMD systems that avoids an ancient
cpu idle hardware workaround
- A new Intel model number. Folks like these upstream as soon as
possible so that each developer doing feature development doesn't
need to carry their own #define
- SGX fixes for a userspace crash and a rare kernel warning
* tag 'x86_urgent_for_v6.0-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
ACPI: processor idle: Practically limit "Dummy wait" workaround to old Intel systems
x86/sgx: Handle VA page allocation failure for EAUG on PF.
x86/sgx: Do not fail on incomplete sanitization on premature stop of ksgxd
x86/cpu: Add CPU model numbers for Meteor Lake
|
|
A recent change affecting the behaviour of phys_to_dma() to
actually require the device tree ranges to work unmasked a
bug in the Integrator DMA ranges.
The PL110 uses the CMA allocator to obtain coherent allocations
from a dedicated 1MB video memory, leading to the following
call chain:
drm_gem_cma_create()
dma_alloc_attrs()
dma_alloc_from_dev_coherent()
__dma_alloc_from_coherent()
dma_get_device_base()
phys_to_dma()
translate_phys_to_dma()
phys_to_dma() by way of translate_phys_to_dma() will nowadays not
provide 1:1 mappings unless the ranges are properly defined in
the device tree and reflected into the dev->dma_range_map.
There is a bug in the device trees because the DMA ranges are
incorrectly specified, and the patch uncovers this bug.
Solution:
- Fix the LB (logic bus) ranges to be 1-to-1 like they should
have always been.
- Provide a 1:1 dma-ranges attribute to the PL110.
- Mark the PL110 display controller as DMA coherent.
This makes the DMA ranges work right and makes the PL110
framebuffer work again.
Fixes: af6f23b88e95 ("ARM/dma-mapping: use the generic versions of dma_to_phys/phys_to_dma by default")
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20220926073311.1610568-1-linus.walleij@linaro.org'
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull last (?) hotfixes from Andrew Morton:
"26 hotfixes.
8 are for issues which were introduced during this -rc cycle, 18 are
for earlier issues, and are cc:stable"
* tag 'mm-hotfixes-stable-2022-09-26' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (26 commits)
x86/uaccess: avoid check_object_size() in copy_from_user_nmi()
mm/page_isolation: fix isolate_single_pageblock() isolation behavior
mm,hwpoison: check mm when killing accessing process
mm/hugetlb: correct demote page offset logic
mm: prevent page_frag_alloc() from corrupting the memory
mm: bring back update_mmu_cache() to finish_fault()
frontswap: don't call ->init if no ops are registered
mm/huge_memory: use pfn_to_online_page() in split_huge_pages_all()
mm: fix madivse_pageout mishandling on non-LRU page
powerpc/64s/radix: don't need to broadcast IPI for radix pmd collapse flush
mm: gup: fix the fast GUP race against THP collapse
mm: fix dereferencing possible ERR_PTR
vmscan: check folio_test_private(), not folio_get_private()
mm: fix VM_BUG_ON in __delete_from_swap_cache()
tools: fix compilation after gfp_types.h split
mm/damon/dbgfs: fix memory leak when using debugfs_lookup()
mm/migrate_device.c: copy pte dirty bit to page
mm/migrate_device.c: add missing flush_cache_page()
mm/migrate_device.c: flush TLB while holding PTL
x86/mm: disable instrumentations of mm/pgprot.c
...
|
|
Add missing pci_disable_device() if rr_init_one() fails
Signed-off-by: ruanjinjie <ruanjinjie@huawei.com>
Link: https://lore.kernel.org/r/20220923094320.3109154-1-ruanjinjie@huawei.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
The devm_ioremap() function returns NULL on error, it doesn't return
error pointers.
Fixes: 3a1a274e933f ("mlxbf_gige: compute MDIO period based on i1clk")
Signed-off-by: Peng Wu <wupeng58@huawei.com>
Link: https://lore.kernel.org/r/20220923023640.116057-1-wupeng58@huawei.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
The label passed to the QDESC_GET for the ETHOFLD TXQ, RXQ, and FLQ, is the
'out' one, which skips the 'out_unlock' label, and thus doesn't unlock the
'uld_mutex' before returning. Additionally, since commit 5148e5950c67
("cxgb4: add EOTID tracking and software context dump"), the access to
these ETHOFLD hardware queues should be protected by the 'mqprio_mutex'
instead.
Fixes: 2d0cb84dd973 ("cxgb4: add ETHOFLD hardware queue support")
Fixes: 5148e5950c67 ("cxgb4: add EOTID tracking and software context dump")
Signed-off-by: Rafael Mendonca <rafaelmendsr@gmail.com>
Reviewed-by: Rahul Lakkireddy <rahul.lakkireddy@chelsio.com>
Link: https://lore.kernel.org/r/20220922175109.764898-1-rafaelmendsr@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4
Pull missed ext4 fix from Ted Ts'o:
"Fix an potential unitialzied variable bug; this was a fixup that I had
forgotten to apply before the last pull request for ext4. My bad"
* tag 'ext4_for_linus_fixes2' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4:
ext4: fixup possible uninitialized variable access in ext4_mb_choose_next_group_cr1()
|
|
nf_ct_put need to be called to put the refcount got by tcf_ct_fill_params
to avoid possible refcount leak when tcf_ct_flow_table_get fails.
Fixes: c34b961a2492 ("net/sched: act_ct: Create nf flow table per zone")
Signed-off-by: Hangyu Hua <hbh25y@gmail.com>
Link: https://lore.kernel.org/r/20220923020046.8021-1-hbh25y@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
The check_object_size() helper under CONFIG_HARDENED_USERCOPY is designed
to skip any checks where the length is known at compile time as a
reasonable heuristic to avoid "likely known-good" cases. However, it can
only do this when the copy_*_user() helpers are, themselves, inline too.
Using find_vmap_area() requires taking a spinlock. The
check_object_size() helper can call find_vmap_area() when the destination
is in vmap memory. If show_regs() is called in interrupt context, it will
attempt a call to copy_from_user_nmi(), which may call check_object_size()
and then find_vmap_area(). If something in normal context happens to be
in the middle of calling find_vmap_area() (with the spinlock held), the
interrupt handler will hang forever.
The copy_from_user_nmi() call is actually being called with a fixed-size
length, so check_object_size() should never have been called in the first
place. Given the narrow constraints, just replace the
__copy_from_user_inatomic() call with an open-coded version that calls
only into the sanitizers and not check_object_size(), followed by a call
to raw_copy_from_user().
[akpm@linux-foundation.org: no instrument_copy_from_user() in my tree...]
Link: https://lkml.kernel.org/r/20220919201648.2250764-1-keescook@chromium.org
Link: https://lore.kernel.org/all/CAOUHufaPshtKrTWOz7T7QFYUNVGFm0JBjvM700Nhf9qEL9b3EQ@mail.gmail.com
Fixes: 0aef499f3172 ("mm/usercopy: Detect vmalloc overruns")
Signed-off-by: Kees Cook <keescook@chromium.org>
Reported-by: Yu Zhao <yuzhao@google.com>
Reported-by: Florian Lehner <dev@der-flo.net>
Suggested-by: Andrew Morton <akpm@linux-foundation.org>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Tested-by: Florian Lehner <dev@der-flo.net>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Josh Poimboeuf <jpoimboe@kernel.org>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
set_migratetype_isolate() does not allow isolating MIGRATE_CMA pageblocks
unless it is used for CMA allocation. isolate_single_pageblock() did not
have the same behavior when it is used together with
set_migratetype_isolate() in start_isolate_page_range(). This allows
alloc_contig_range() with migratetype other than MIGRATE_CMA, like
MIGRATE_MOVABLE (used by alloc_contig_pages()), to isolate first and last
pageblock but fail the rest. The failure leads to changing migratetype of
the first and last pageblock to MIGRATE_MOVABLE from MIGRATE_CMA,
corrupting the CMA region. This can happen during gigantic page
allocations.
Like Doug said here:
https://lore.kernel.org/linux-mm/a3363a52-883b-dcd1-b77f-f2bb378d6f2d@gmail.com/T/#u,
for gigantic page allocations, the user would notice no difference,
since the allocation on CMA region will fail as well as it did before.
But it might hurt the performance of device drivers that use CMA, since
CMA region size decreases.
Fix it by passing migratetype into isolate_single_pageblock(), so that
set_migratetype_isolate() used by isolate_single_pageblock() will prevent
the isolation happening.
Link: https://lkml.kernel.org/r/20220914023913.1855924-1-zi.yan@sent.com
Fixes: b2c9e2fbba32 ("mm: make alloc_contig_range work at pageblock granularity")
Signed-off-by: Zi Yan <ziy@nvidia.com>
Reported-by: Doug Berger <opendmb@gmail.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Doug Berger <opendmb@gmail.com>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
The GHES code calls memory_failure_queue() from IRQ context to queue work
into workqueue and schedule it on the current CPU. Then the work is
processed in memory_failure_work_func() by kworker and calls
memory_failure().
When a page is already poisoned, commit a3f5d80ea401 ("mm,hwpoison: send
SIGBUS with error virutal address") make memory_failure() call
kill_accessing_process() that:
- holds mmap locking of current->mm
- does pagetable walk to find the error virtual address
- and sends SIGBUS to the current process with error info.
However, the mm of kworker is not valid, resulting in a null-pointer
dereference. So check mm when killing the accessing process.
[akpm@linux-foundation.org: remove unrelated whitespace alteration]
Link: https://lkml.kernel.org/r/20220914064935.7851-1-xueshuai@linux.alibaba.com
Fixes: a3f5d80ea401 ("mm,hwpoison: send SIGBUS with error virutal address")
Signed-off-by: Shuai Xue <xueshuai@linux.alibaba.com>
Reviewed-by: Miaohe Lin <linmiaohe@huawei.com>
Acked-by: Naoya Horiguchi <naoya.horiguchi@nec.com>
Cc: Huang Ying <ying.huang@intel.com>
Cc: Baolin Wang <baolin.wang@linux.alibaba.com>
Cc: Bixuan Cui <cuibixuan@linux.alibaba.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
With gigantic pages it may not be true that struct page structures are
contiguous across the entire gigantic page. The nth_page macro is used
here in place of direct pointer arithmetic to correct for this.
Mike said:
: This error could cause addressing exceptions. However, this is only
: possible in configurations where CONFIG_SPARSEMEM &&
: !CONFIG_SPARSEMEM_VMEMMAP. Such a configuration option is rare and
: unknown to be the default anywhere.
Link: https://lkml.kernel.org/r/20220914190917.3517663-1-opendmb@gmail.com
Fixes: 8531fc6f52f5 ("hugetlb: add hugetlb demote page support")
Signed-off-by: Doug Berger <opendmb@gmail.com>
Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com>
Reviewed-by: Oscar Salvador <osalvador@suse.de>
Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com>
Cc: Muchun Song <songmuchun@bytedance.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
A number of drivers call page_frag_alloc() with a fragment's size >
PAGE_SIZE.
In low memory conditions, __page_frag_cache_refill() may fail the order
3 cache allocation and fall back to order 0; In this case, the cache
will be smaller than the fragment, causing memory corruptions.
Prevent this from happening by checking if the newly allocated cache is
large enough for the fragment; if not, the allocation will fail and
page_frag_alloc() will return NULL.
Link: https://lkml.kernel.org/r/20220715125013.247085-1-mlombard@redhat.com
Fixes: b63ae8ca096d ("mm/net: Rename and move page fragment handling from net/ to mm/")
Signed-off-by: Maurizio Lombardi <mlombard@redhat.com>
Reviewed-by: Alexander Duyck <alexanderduyck@fb.com>
Cc: Chen Lin <chen45464546@163.com>
Cc: Jakub Kicinski <kuba@kernel.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Running this test program on ARMv4 a few times (sometimes just once)
reproduces the bug.
int main()
{
unsigned i;
char paragon[SIZE];
void* ptr;
memset(paragon, 0xAA, SIZE);
ptr = mmap(NULL, SIZE, PROT_READ | PROT_WRITE,
MAP_ANON | MAP_SHARED, -1, 0);
if (ptr == MAP_FAILED) return 1;
printf("ptr = %p\n", ptr);
for (i=0;i<10000;i++){
memset(ptr, 0xAA, SIZE);
if (memcmp(ptr, paragon, SIZE)) {
printf("Unexpected bytes on iteration %u!!!\n", i);
break;
}
}
munmap(ptr, SIZE);
}
In the "ptr" buffer there appear runs of zero bytes which are aligned
by 16 and their lengths are multiple of 16.
Linux v5.11 does not have the bug, "git bisect" finds the first bad commit:
f9ce0be71d1f ("mm: Cleanup faultaround and finish_fault() codepaths")
Before the commit update_mmu_cache() was called during a call to
filemap_map_pages() as well as finish_fault(). After the commit
finish_fault() lacks it.
Bring back update_mmu_cache() to finish_fault() to fix the bug.
Also call update_mmu_tlb() only when returning VM_FAULT_NOPAGE to more
closely reproduce the code of alloc_set_pte() function that existed before
the commit.
On many platforms update_mmu_cache() is nop:
x86, see arch/x86/include/asm/pgtable
ARMv6+, see arch/arm/include/asm/tlbflush.h
So, it seems, few users ran into this bug.
Link: https://lkml.kernel.org/r/20220908204809.2012451-1-saproj@gmail.com
Fixes: f9ce0be71d1f ("mm: Cleanup faultaround and finish_fault() codepaths")
Signed-off-by: Sergei Antonov <saproj@gmail.com>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Will Deacon <will@kernel.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
If no frontswap module (i.e. zswap) was registered, frontswap_ops will be
NULL. In such situation, swapon crashes with the following stack trace:
Unable to handle kernel access to user memory outside uaccess routines at virtual address 0000000000000000
Mem abort info:
ESR = 0x0000000096000004
EC = 0x25: DABT (current EL), IL = 32 bits
SET = 0, FnV = 0
EA = 0, S1PTW = 0
FSC = 0x04: level 0 translation fault
Data abort info:
ISV = 0, ISS = 0x00000004
CM = 0, WnR = 0
user pgtable: 4k pages, 48-bit VAs, pgdp=00000020a4fab000
[0000000000000000] pgd=0000000000000000, p4d=0000000000000000
Internal error: Oops: 96000004 [#1] SMP
Modules linked in: zram fsl_dpaa2_eth pcs_lynx phylink ahci_qoriq crct10dif_ce ghash_ce sbsa_gwdt fsl_mc_dpio nvme lm90 nvme_core at803x xhci_plat_hcd rtc_fsl_ftm_alarm xgmac_mdio ahci_platform i2c_imx ip6_tables ip_tables fuse
Unloaded tainted modules: cppc_cpufreq():1
CPU: 10 PID: 761 Comm: swapon Not tainted 6.0.0-rc2-00454-g22100432cf14 #1
Hardware name: SolidRun Ltd. SolidRun CEX7 Platform, BIOS EDK II Jun 21 2022
pstate: 00400005 (nzcv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
pc : frontswap_init+0x38/0x60
lr : __do_sys_swapon+0x8a8/0x9f4
sp : ffff80000969bcf0
x29: ffff80000969bcf0 x28: ffff37bee0d8fc00 x27: ffff80000a7f5000
x26: fffffcdefb971e80 x25: ffffaba797453b90 x24: 0000000000000064
x23: ffff37c1f209d1a8 x22: ffff37bee880e000 x21: ffffaba797748560
x20: ffff37bee0d8fce4 x19: ffffaba797748488 x18: 0000000000000014
x17: 0000000030ec029a x16: ffffaba795a479b0 x15: 0000000000000000
x14: 0000000000000000 x13: 0000000000000030 x12: 0000000000000001
x11: ffff37c63c0aba18 x10: 0000000000000000 x9 : ffffaba7956b8c88
x8 : ffff80000969bcd0 x7 : 0000000000000000 x6 : 0000000000000000
x5 : 0000000000000001 x4 : 0000000000000000 x3 : ffffaba79730f000
x2 : ffff37bee0d8fc00 x1 : 0000000000000000 x0 : 0000000000000000
Call trace:
frontswap_init+0x38/0x60
__do_sys_swapon+0x8a8/0x9f4
__arm64_sys_swapon+0x28/0x3c
invoke_syscall+0x78/0x100
el0_svc_common.constprop.0+0xd4/0xf4
do_el0_svc+0x38/0x4c
el0_svc+0x34/0x10c
el0t_64_sync_handler+0x11c/0x150
el0t_64_sync+0x190/0x194
Code: d000e283 910003fd f9006c41 f946d461 (f9400021)
---[ end trace 0000000000000000 ]---
Link: https://lkml.kernel.org/r/20220909130829.3262926-1-hch@lst.de
Fixes: 1da0d94a3ec8 ("frontswap: remove support for multiple ops")
Reported-by: Nathan Chancellor <nathan@kernel.org>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Liu Shixin <liushixin2@huawei.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
NULL pointer dereference is triggered when calling thp split via debugfs
on the system with offlined memory blocks. With debug option enabled, the
following kernel messages are printed out:
page:00000000467f4890 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x121c000
flags: 0x17fffc00000000(node=0|zone=2|lastcpupid=0x1ffff)
raw: 0017fffc00000000 0000000000000000 dead000000000122 0000000000000000
raw: 0000000000000000 0000000000000000 00000001ffffffff 0000000000000000
page dumped because: unmovable page
page:000000007d7ab72e is uninitialized and poisoned
page dumped because: VM_BUG_ON_PAGE(PagePoisoned(p))
------------[ cut here ]------------
kernel BUG at include/linux/mm.h:1248!
invalid opcode: 0000 [#1] PREEMPT SMP PTI
CPU: 16 PID: 20964 Comm: bash Tainted: G I 6.0.0-rc3-foll-numa+ #41
...
RIP: 0010:split_huge_pages_write+0xcf4/0xe30
This shows that page_to_nid() in page_zone() is unexpectedly called for an
offlined memmap.
Use pfn_to_online_page() to get struct page in PFN walker.
Link: https://lkml.kernel.org/r/20220908041150.3430269-1-naoya.horiguchi@linux.dev
Fixes: f1dd2cd13c4b ("mm, memory_hotplug: do not associate hotadded memory to zones until online") [visible after d0dc12e86b319]
Signed-off-by: Naoya Horiguchi <naoya.horiguchi@nec.com>
Co-developed-by: David Hildenbrand <david@redhat.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Yang Shi <shy828301@gmail.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Reviewed-by: Miaohe Lin <linmiaohe@huawei.com>
Reviewed-by: Oscar Salvador <osalvador@suse.de>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Muchun Song <songmuchun@bytedance.com>
Cc: <stable@vger.kernel.org> [5.10+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
MADV_PAGEOUT tries to isolate non-LRU pages and gets a warning from
isolate_lru_page below.
Fix it by checking PageLRU in advance.
------------[ cut here ]------------
trying to isolate tail page
WARNING: CPU: 0 PID: 6175 at mm/folio-compat.c:158 isolate_lru_page+0x130/0x140
Modules linked in:
CPU: 0 PID: 6175 Comm: syz-executor.0 Not tainted 5.18.12 #1
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-1ubuntu1.1 04/01/2014
RIP: 0010:isolate_lru_page+0x130/0x140
Link: https://lore.kernel.org/linux-mm/485f8c33.2471b.182d5726afb.Coremail.hantianshuo@iie.ac.cn/
Link: https://lkml.kernel.org/r/20220908151204.762596-1-minchan@kernel.org
Fixes: 1a4e58cce84e ("mm: introduce MADV_PAGEOUT")
Signed-off-by: Minchan Kim <minchan@kernel.org>
Reported-by: 韩天ç` <hantianshuo@iie.ac.cn>
Suggested-by: Yang Shi <shy828301@gmail.com>
Acked-by: Yang Shi <shy828301@gmail.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
The IPI broadcast is used to serialize against fast-GUP, but fast-GUP will
move to use RCU instead of disabling local interrupts in fast-GUP. Using
an IPI is the old-styled way of serializing against fast-GUP although it
still works as expected now.
And fast-GUP now fixed the potential race with THP collapse by checking
whether PMD is changed or not. So IPI broadcast in radix pmd collapse
flush is not necessary anymore. But it is still needed for hash TLB.
Link: https://lkml.kernel.org/r/20220907180144.555485-2-shy828301@gmail.com
Suggested-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Signed-off-by: Yang Shi <shy828301@gmail.com>
Acked-by: David Hildenbrand <david@redhat.com>
Acked-by: Peter Xu <peterx@redhat.com>
Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
Cc: Hugh Dickins <hughd@google.com>
Cc: Jason Gunthorpe <jgg@nvidia.com>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Since general RCU GUP fast was introduced in commit 2667f50e8b81 ("mm:
introduce a general RCU get_user_pages_fast()"), a TLB flush is no longer
sufficient to handle concurrent GUP-fast in all cases, it only handles
traditional IPI-based GUP-fast correctly. On architectures that send an
IPI broadcast on TLB flush, it works as expected. But on the
architectures that do not use IPI to broadcast TLB flush, it may have the
below race:
CPU A CPU B
THP collapse fast GUP
gup_pmd_range() <-- see valid pmd
gup_pte_range() <-- work on pte
pmdp_collapse_flush() <-- clear pmd and flush
__collapse_huge_page_isolate()
check page pinned <-- before GUP bump refcount
pin the page
check PTE <-- no change
__collapse_huge_page_copy()
copy data to huge page
ptep_clear()
install huge pmd for the huge page
return the stale page
discard the stale page
The race can be fixed by checking whether PMD is changed or not after
taking the page pin in fast GUP, just like what it does for PTE. If the
PMD is changed it means there may be parallel THP collapse, so GUP should
back off.
Also update the stale comment about serializing against fast GUP in
khugepaged.
Link: https://lkml.kernel.org/r/20220907180144.555485-1-shy828301@gmail.com
Fixes: 2667f50e8b81 ("mm: introduce a general RCU get_user_pages_fast()")
Acked-by: David Hildenbrand <david@redhat.com>
Acked-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Yang Shi <shy828301@gmail.com>
Reviewed-by: John Hubbard <jhubbard@nvidia.com>
Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Jason Gunthorpe <jgg@nvidia.com>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
We've had some reports of problems in the refcounting for delegation
stateids that we've yet to track down. Add some extra checks to ensure
that we've removed the object from various lists before freeing it.
Link: https://bugzilla.redhat.com/show_bug.cgi?id=2127067
Signed-off-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
|
|
queue_work can return false and not queue anything, if the work is
already queued. If that happens in the case of a CB_RECALL, we'll have
taken an extra reference to the stid that will never be put. Ensure we
throw a warning in that case.
Signed-off-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
|
|
The ZERO_CALL_USED_REGS feature may zero out caller-saved registers
before returning.
In spurious_kernel_fault(), the "pte_offset_kernel()" call results in
this assembly code:
.Ltmp151:
#APP
# ALT: oldnstr
.Ltmp152:
.Ltmp153:
.Ltmp154:
.section .discard.retpoline_safe,"",@progbits
.quad .Ltmp154
.text
callq *pv_ops+536(%rip)
.Ltmp155:
.section .parainstructions,"a",@progbits
.p2align 3, 0x0
.quad .Ltmp153
.byte 67
.byte .Ltmp155-.Ltmp153
.short 1
.text
.Ltmp156:
# ALT: padding
.zero (-(((.Ltmp157-.Ltmp158)-(.Ltmp156-.Ltmp152))>0))*((.Ltmp157-.Ltmp158)-(.Ltmp156-.Ltmp152)),144
.Ltmp159:
.section .altinstructions,"a",@progbits
.Ltmp160:
.long .Ltmp152-.Ltmp160
.Ltmp161:
.long .Ltmp158-.Ltmp161
.short 33040
.byte .Ltmp159-.Ltmp152
.byte .Ltmp157-.Ltmp158
.text
.section .altinstr_replacement,"ax",@progbits
# ALT: replacement 1
.Ltmp158:
movq %rdi, %rax
.Ltmp157:
.text
#NO_APP
.Ltmp162:
testb $-128, %dil
The "testb" here is using %dil, but the %rdi register was cleared before
returning from "callq *pv_ops+536(%rip)". Adding the proper constraints
results in the use of a different register:
movq %r11, %rdi
# Similar to above.
testb $-128, %r11b
Link: https://github.com/KSPP/linux/issues/192
Signed-off-by: Bill Wendling <morbo@google.com>
Reported-and-tested-by: Nathan Chancellor <nathan@kernel.org>
Fixes: 035f7f87b729 ("randstruct: Enable Clang support")
Reviewed-by: Juergen Gross <jgross@suse.com>
Link: https://lore.kernel.org/lkml/fa6df43b-8a1a-8ad1-0236-94d2a0b588fa@suse.com/
Signed-off-by: Kees Cook <keescook@chromium.org>
Link: https://lore.kernel.org/r/20220902213750.1124421-3-morbo@google.com
|
|
Drive-by clean up of the comment.
[ Impact: cleanup]
Signed-off-by: Bill Wendling <morbo@google.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
Link: https://lore.kernel.org/r/20220902213750.1124421-2-morbo@google.com
|
|
Signed-off-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
|
|
In the case of a revoked delegation, we still fill out the pointer even
when returning an error, which is bad form. Only overwrite the pointer
on success.
Signed-off-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
|
|
Currently usbnet_disconnect() unanchors and frees all deferred URBs
using usb_scuttle_anchored_urbs(), which does not free urb->context,
causing a memory leak as reported by syzbot.
Use a usb_get_from_anchor() while loop instead, similar to what we did
in commit 19cfe912c37b ("Bluetooth: btusb: Fix memory leak in
play_deferred"). Also free urb->sg.
Reported-and-tested-by: syzbot+dcd3e13cf4472f2e0ba1@syzkaller.appspotmail.com
Fixes: 69ee472f2706 ("usbnet & cdc-ether: Autosuspend for online devices")
Fixes: 638c5115a794 ("USBNET: support DMA SG")
Signed-off-by: Peilin Ye <peilin.ye@bytedance.com>
Link: https://lore.kernel.org/r/20220923042551.2745-1-yepeilin.cs@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Use-after-free occurred when the laundromat tried to free expired
cpntf_state entry on the s2s_cp_stateids list after inter-server
copy completed. The sc_cp_list that the expired copy state was
inserted on was already freed.
When COPY completes, the Linux client normally sends LOCKU(lock_state x),
FREE_STATEID(lock_state x) and CLOSE(open_state y) to the source server.
The nfs4_put_stid call from nfsd4_free_stateid cleans up the copy state
from the s2s_cp_stateids list before freeing the lock state's stid.
However, sometimes the CLOSE was sent before the FREE_STATEID request.
When this happens, the nfsd4_close_open_stateid call from nfsd4_close
frees all lock states on its st_locks list without cleaning up the copy
state on the sc_cp_list list. When the time the FREE_STATEID arrives the
server returns BAD_STATEID since the lock state was freed. This causes
the use-after-free error to occur when the laundromat tries to free
the expired cpntf_state.
This patch adds a call to nfs4_free_cpntf_statelist in
nfsd4_close_open_stateid to clean up the copy state before calling
free_ol_stateid_reaplist to free the lock state's stid on the reaplist.
Signed-off-by: Dai Ngo <dai.ngo@oracle.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
|
|
Since before the git era, NFSD has conserved the number of pages
held by each nfsd thread by combining the RPC receive and send
buffers into a single array of pages. This works because there are
no cases where an operation needs a large RPC Call message and a
large RPC Reply at the same time.
Once an RPC Call has been received, svc_process() updates
svc_rqst::rq_res to describe the part of rq_pages that can be
used for constructing the Reply. This means that the send buffer
(rq_res) shrinks when the received RPC record containing the RPC
Call is large.
Add an NFSv4 helper that computes the size of the send buffer. It
replaces svc_max_payload() in spots where svc_max_payload() returns
a value that might be larger than the remaining send buffer space.
Callers who need to know the transport's actual maximum payload size
will continue to use svc_max_payload().
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
|
|
Code maintenance: The name of the copy_stateid_t::sc_count field
collides with the sc_count field in struct nfs4_stid, making the
latter difficult to grep for when auditing stateid reference
counting.
No behavior change expected.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
|
|
Use DEFINE_SHOW_ATTRIBUTE helper macro to simplify the code.
Signed-off-by: ChenXiaoSong <chenxiaosong2@huawei.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
|
|
Use DEFINE_SHOW_ATTRIBUTE helper macro to simplify the code.
nfsd_net is converted from seq_file->file instead of seq_file->private in
nfsd_reply_cache_stats_show().
Signed-off-by: ChenXiaoSong <chenxiaosong2@huawei.com>
[ cel: reduce line length ]
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
|
|
Use DEFINE_SHOW_ATTRIBUTE helper macro to simplify the code.
inode is converted from seq_file->file instead of seq_file->private in
client_info_show().
Signed-off-by: ChenXiaoSong <chenxiaosong2@huawei.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
|
|
supported_enctypes_fops
Use DEFINE_SHOW_ATTRIBUTE helper macro to simplify the code.
Signed-off-by: ChenXiaoSong <chenxiaosong2@huawei.com>
[ cel: reduce line length ]
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
|
|
Use DEFINE_PROC_SHOW_ATTRIBUTE helper macro to simplify the code.
Signed-off-by: ChenXiaoSong <chenxiaosong2@huawei.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
|
|
Remove a couple of 4-byte holes on platforms with 64-bit pointers.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
|
|
This field was added by commit 1091006c5eb1 ("nfsd: turn on reply
cache for NFSv4") but was never put to use.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
|
|
These helpers are always invoked indirectly, so the compiler can't
inline these anyway. While we're updating the synopses of these
helpers, defensively convert their parameters to const pointers.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
|