Age | Commit message (Collapse) | Author |
|
This reverts commit fe53ca54270a ("mm: use early_pfn_to_nid in
page_ext_init").
When booting a system with "page_owner=on",
start_kernel
page_ext_init
invoke_init_callbacks
init_section_page_ext
init_page_owner
init_early_allocated_pages
init_zones_in_node
init_pages_in_zone
lookup_page_ext
page_to_nid
The issue here is that page_to_nid() will not work since some page flags
have no node information until later in page_alloc_init_late() due to
DEFERRED_STRUCT_PAGE_INIT. Hence, it could trigger an out-of-bounds
access with an invalid nid.
UBSAN: Undefined behaviour in ./include/linux/mm.h:1104:50
index 7 is out of range for type 'zone [5]'
Also, kernel will panic since flags were poisoned earlier with,
CONFIG_DEBUG_VM_PGFLAGS=y
CONFIG_NODE_NOT_IN_PAGE_FLAGS=n
start_kernel
setup_arch
pagetable_init
paging_init
sparse_init
sparse_init_nid
memblock_alloc_try_nid_raw
It did not handle it well in init_pages_in_zone() which ends up calling
page_to_nid().
page:ffffea0004200000 is uninitialized and poisoned
raw: ffffffffffffffff ffffffffffffffff ffffffffffffffff ffffffffffffffff
raw: ffffffffffffffff ffffffffffffffff ffffffffffffffff ffffffffffffffff
page dumped because: VM_BUG_ON_PAGE(PagePoisoned(p))
page_owner info is not active (free page?)
kernel BUG at include/linux/mm.h:990!
RIP: 0010:init_page_owner+0x486/0x520
This means that assumptions behind commit fe53ca54270a ("mm: use
early_pfn_to_nid in page_ext_init") are incomplete. Therefore, revert
the commit for now. A proper way to move the page_owner initialization
to sooner is to hook into memmap initialization.
Link: http://lkml.kernel.org/r/20190115202812.75820-1-cai@lca.pw
Signed-off-by: Qian Cai <cai@lca.pw>
Acked-by: Michal Hocko <mhocko@kernel.org>
Cc: Pasha Tatashin <Pavel.Tatashin@microsoft.com>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Yang Shi <yang.shi@linaro.org>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
For dax pmd, pmd_trans_huge() returns false but pmd_huge() returns true
on x86. So the function works as long as hugetlb is configured.
However, dax doesn't depend on hugetlb.
Link: http://lkml.kernel.org/r/20190111034033.601-1-yuzhao@google.com
Signed-off-by: Yu Zhao <yuzhao@google.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Huang Ying <ying.huang@intel.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Keith Busch <keith.busch@intel.com>
Cc: "Michael S . Tsirkin" <mst@redhat.com>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: Wei Yang <richard.weiyang@gmail.com>
Cc: Mike Rapoport <rppt@linux.ibm.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: "Kirill A . Shutemov" <kirill.shutemov@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
This reverts commit 172b06c32b9497 ("mm: slowly shrink slabs with a
relatively small number of objects").
This change changes the agressiveness of shrinker reclaim, causing small
cache and low priority reclaim to greatly increase scanning pressure on
small caches. As a result, light memory pressure has a disproportionate
affect on small caches, and causes large caches to be reclaimed much
faster than previously.
As a result, it greatly perturbs the delicate balance of the VFS caches
(dentry/inode vs file page cache) such that the inode/dentry caches are
reclaimed much, much faster than the page cache and this drives us into
several other caching imbalance related problems.
As such, this is a bad change and needs to be reverted.
[ Needs some massaging to retain the later seekless shrinker
modifications.]
Link: http://lkml.kernel.org/r/20190130041707.27750-3-david@fromorbit.com
Fixes: 172b06c32b9497 ("mm: slowly shrink slabs with a relatively small number of objects")
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Cc: Wolfgang Walter <linux@stwm.de>
Cc: Roman Gushchin <guro@fb.com>
Cc: Spock <dairinin@gmail.com>
Cc: Rik van Riel <riel@surriel.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
This reverts commit a76cf1a474d7d ("mm: don't reclaim inodes with many
attached pages").
This change causes serious changes to page cache and inode cache
behaviour and balance, resulting in major performance regressions when
combining worklaods such as large file copies and kernel compiles.
https://bugzilla.kernel.org/show_bug.cgi?id=202441
This change is a hack to work around the problems introduced by changing
how agressive shrinkers are on small caches in commit 172b06c32b94 ("mm:
slowly shrink slabs with a relatively small number of objects"). It
creates more problems than it solves, wasn't adequately reviewed or
tested, so it needs to be reverted.
Link: http://lkml.kernel.org/r/20190130041707.27750-2-david@fromorbit.com
Fixes: a76cf1a474d7d ("mm: don't reclaim inodes with many attached pages")
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Cc: Wolfgang Walter <linux@stwm.de>
Cc: Roman Gushchin <guro@fb.com>
Cc: Spock <dairinin@gmail.com>
Cc: Rik van Riel <riel@surriel.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging
Pull hwmon fix from Guenter Roeck:
"Fix fan detection for NCT6793D"
* tag 'hwmon-for-v5.0-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging:
hwmon: (nct6775) Fix fan6 detection for NCT6793D
|
|
Pull MD fix from Song
* 'md-fixes' of https://github.com/liu-song-6/linux:
md/raid1: don't clear bitmap bits on interrupted recovery.
|
|
sync_request_write no longer submits writes to a Faulty device. This has
the unfortunate side effect that bitmap bits can be incorrectly cleared
if a recovery is interrupted (previously, end_sync_write would have
prevented this). This means the next recovery may not copy everything
it should, potentially corrupting data.
Add a function for doing the proper md_bitmap_end_sync, called from
end_sync_write and the Faulty case in sync_request_write.
backport note to 4.14: s/md_bitmap_end_sync/bitmap_end_sync
Cc: stable@vger.kernel.org 4.14+
Fixes: 0c9d5b127f69 ("md/raid1: avoid reusing a resync bio after error handling.")
Reviewed-by: Jack Wang <jinpu.wang@cloud.ionos.com>
Tested-by: Jack Wang <jinpu.wang@cloud.ionos.com>
Signed-off-by: Nate Dailey <nate.dailey@stratus.com>
Signed-off-by: Song Liu <songliubraving@fb.com>
|
|
After 610d2b601bba ("rocker: Remove getting PORT_BRIDGE_FLAGS") we no
longer have a port_attr_bridge_flags_get member in the rocker_world_ops
structre, fix that.
Fixes: 610d2b601bba ("rocker: Remove getting PORT_BRIDGE_FLAGS")
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/palmer/riscv-linux
Pull RISC-V fixes from Palmer Dabbelt:
"This contains a pair of bug fixes that I'd like to include in 5.0:
- A fix to disambiguate swap from invalid PTEs, which fixes an error
when trying to unmap PROT_NONE pages.
- A revert to an optimization of the size of flat binaries. This is
really a workaround to prevent breaking existing boot flows, but
since the change was introduced as part of the 5.0 merge window I'd
like to have the fix in before 5.0 so we can avoid a regression for
any proper releases.
With these I hope we're out of patches for 5.0 in RISC-V land"
* tag 'riscv-for-linus-5.0-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/palmer/riscv-linux:
Revert "RISC-V: Make BSS section as the last section in vmlinux.lds.S"
riscv: Add pte bit to distinguish swap from invalid
|
|
The current opt_inst_list operations inside team_nl_cmd_options_set()
is too complex to track:
LIST_HEAD(opt_inst_list);
nla_for_each_nested(...) {
list_for_each_entry(opt_inst, &team->option_inst_list, list) {
if (__team_option_inst_tmp_find(&opt_inst_list, opt_inst))
continue;
list_add(&opt_inst->tmp_list, &opt_inst_list);
}
}
team_nl_send_event_options_get(team, &opt_inst_list);
as while we retrieve 'opt_inst' from team->option_inst_list, it could
be added to the local 'opt_inst_list' for multiple times. The
__team_option_inst_tmp_find() doesn't work, as the setter
team_mode_option_set() still calls team->ops.exit() which uses
->tmp_list too in __team_options_change_check().
Simplify the list operations by moving the 'opt_inst_list' and
team_nl_send_event_options_get() into the nla_for_each_nested() loop so
that it can be guranteed that we won't insert a same list entry for
multiple times. Therefore, __team_option_inst_tmp_find() can be removed
too.
Fixes: 4fb0534fb7bb ("team: avoid adding twice the same option to the event list")
Fixes: 2fcdb2c9e659 ("team: allow to send multiple set events in one message")
Reported-by: syzbot+4d4af685432dc0e56c91@syzkaller.appspotmail.com
Reported-by: syzbot+68ee510075cf64260cc4@syzkaller.appspotmail.com
Cc: Jiri Pirko <jiri@resnulli.us>
Cc: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Cong Wang says:
====================
net_sched: some fixes for cls_tcindex
This patchset contains 3 bug fixes for tcindex filter. Please check
each patch for details.
v2: fix a compile error in patch 2
drop netns refcnt in patch 1
====================
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
|
|
struct tcindex_filter_result contains two parts:
struct tcf_exts and struct tcf_result.
For the local variable 'cr', its exts part is never used but
initialized without being released properly on success path. So
just completely remove the exts part to fix this leak.
For the local variable 'new_filter_result', it is never properly
released if not used by 'r' on success path.
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Cc: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
When tcindex_destroy() destroys all the filter results in
the perfect hash table, it invokes the walker to delete
each of them. However, results with class==0 are skipped
in either tcindex_walk() or tcindex_delete(), which causes
a memory leak reported by kmemleak.
This patch fixes it by skipping the walker and directly
deleting these filter results so we don't miss any filter
result.
As a result of this change, we have to initialize exts->net
properly in tcindex_alloc_perfect_hash(). For net-next, we
need to consider whether we should initialize ->net in
tcf_exts_init() instead, before that just directly test
CONFIG_NET_CLS_ACT=y.
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Cc: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
tcindex_destroy() invokes tcindex_destroy_element() via
a walker to delete each filter result in its perfect hash
table, and tcindex_destroy_element() calls tcindex_delete()
which schedules tcf RCU works to do the final deletion work.
Unfortunately this races with the RCU callback
__tcindex_destroy(), which could lead to use-after-free as
reported by Adrian.
Fix this by migrating this RCU callback to tcf RCU work too,
as that workqueue is ordered, we will not have use-after-free.
Note, we don't need to hold netns refcnt because we don't call
tcf_exts_destroy() here.
Fixes: 27ce4f05e2ab ("net_sched: use tcf_queue_work() in tcindex filter")
Reported-by: Adrian <bugs@abtelecom.ro>
Cc: Ben Hutchings <ben@decadent.org.uk>
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Cc: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Arthur Kiyanovski says:
====================
net: ena: race condition bug fix and version update
This patchset includes a fix to a race condition that can cause
kernel panic, as well as a driver version update because of this
fix.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Update driver version due to bug fix.
Signed-off-by: Arthur Kiyanovski <akiyano@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Fix race condition between ena_update_on_link_change() and
ena_restore_device().
This race can occur if link notification arrives while the driver
is performing a reset sequence. In this case link can be set up,
enabling the device, before it is fully restored. If packets are
sent at this time, the driver might access uninitialized data
structures, causing kernel crash.
Move the clearing of ENA_FLAG_ONGOING_RESET and netif_carrier_on()
after ena_up() to ensure the device is ready when link is set up.
Fixes: d18e4f683445 ("net: ena: fix race condition between device reset and link up setup")
Signed-off-by: Arthur Kiyanovski <akiyano@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Vlad Buslov says:
====================
Refactor classifier API to work with chain/classifiers without rtnl lock
Currently, all netlink protocol handlers for updating rules, actions and
qdiscs are protected with single global rtnl lock which removes any
possibility for parallelism. This patch set is a third step to remove
rtnl lock dependency from TC rules update path.
Recently, new rtnl registration flag RTNL_FLAG_DOIT_UNLOCKED was added.
Handlers registered with this flag are called without RTNL taken. End
goal is to have rule update handlers(RTM_NEWTFILTER, RTM_DELTFILTER,
etc.) to be registered with UNLOCKED flag to allow parallel execution.
However, there is no intention to completely remove or split rtnl lock
itself. This patch set addresses specific problems in implementation of
classifiers API that prevent its control path from being executed
concurrently, and completes refactoring of cls API rules update handlers
by removing rtnl lock dependency from code that handles chains and
classifiers. Rules update handlers are registered with
RTNL_FLAG_DOIT_UNLOCKED flag.
This patch set substitutes global rtnl lock dependency on rules update
path in cls API by extending its data structures with following locks:
- tcf_block with 'lock' mutex. It is used to protect block state and
life-time management fields of chains on the block (chain->refcnt,
chain->action_refcnt, chain->explicitly crated, etc.).
- tcf_chain with 'filter_chain_lock' mutex, that is used to protect list
of classifier instances attached to chain. chain0->filter_chain_lock
serializes calls to head change callbacks and allows them to rely on
filter_chain_lock for serialization instead of rtnl lock.
- tcf_proto with 'lock' spinlock that is intended to be used to
synchronize access to classifiers that support unlocked execution.
Classifiers are extended with reference counting to accommodate parallel
access by unlocked cls API. Classifier ops structure is extended with
additional 'put' function to allow reference counting of filters and
intended to be used by classifiers that implement rtnl-unlocked API.
Users of classifiers and individual filter instances are modified to
always hold reference while working with them.
Classifiers that support unlocked execution still need to know the
status of rtnl lock, so their API is extended with additional
'rtnl_held' argument that is used to indicate that caller holds rtnl
lock. Cls API propagates rtnl lock status across its helper functions
and passes it to classifier.
Changes from V3 to V4:
- Patch 1:
- Extract code that manages chain 'explicitly_created' flag into
standalone patch.
- Patch 2 - new.
Changes from V2 to V3:
- Change block->lock and chain->filter_chain_lock type to mutex. This
removes the need for async miniqp refactoring and allows calling
sleeping functions while holding the block->lock and
chain->filter_chain_lock locks.
- Previous patch 1 - async miniqp is no longer needed, remove the patch.
- Patch 1:
- Change block->lock type to mutex.
- Implement tcf_block_destroy() helper function that destroys
block->lock mutex before deallocating the block.
- Revert GFP_KERNEL->GFP_ATOMIC memory allocation flags of tcf_chain
which is no longer needed after block->lock type change.
- Patch 6:
- Change chain->filter_chain_lock type to mutex.
- Assume chain0->filter_chain_lock synchronizations instead of rtnl
lock in mini_qdisc_pair_swap() function that is called from head
change callback of ingress Qdisc. With filter_chain_lock type
changed to mutex it is now possible to call sleeping function while
holding it, so it is now used instead of async implementation from
previous versions of this patch set.
- Patch 7:
- Add local tp_next var to tcf_chain_flush() and use it to store
tp->next pointer dereferenced with rcu_dereference_protected() to
satisfy kbuild test robot.
- Reset tp pointer to NULL at the beginning of tc_new_tfilter() to
prevent its uninitialized usage in error handling code. This code
was already implemented in patch 10, but must be in patch 8 to
preserve code bisectability.
- Put parent chain in tcf_proto_destroy(). In previous version this
code was implemented in patch 1 which was removed in V3.
Changes from V1 to V2:
- Patch 1:
- Use smp_store_release() instead of xchg() for setting
miniqp->tp_head.
- Move Qdisc deallocation to tc_proto_wq ordered workqueue that is
used to destroy tcf proto instances. This is necessary to ensure
that Qdisc is destroyed after all instances of chain/proto that it
contains in order to prevent use-after-free error in
tc_chain_notify_delete().
- Cache parent net device ifindex in block to prevent use-after-free
of dev queue in tc_chain_notify_delete().
- Patch 2:
- Use lockdep_assert_held() instead of spin_is_locked() for assertion.
- Use 'free_block' argument in tcf_chain_destroy() instead of checking
block's reference count and chain_list for second time.
- Patch 7:
- Refactor tcf_chain0_head_change_cb_add() to not take block->lock and
chain0->filter_chain_lock in correct order.
- Patch 10:
- Always set 'tp_created' flag when creating tp to prevent releasing
the chain twice when tp with same priority was inserted
concurrently.
- Patch 11:
- Add additional check to prevent creation of new proto instance when
parent chain is being flushed to reduce CPU usage.
- Don't call tcf_chain_delete_empty() if tp insertion failed.
- Patch 16 - new.
- Patch 17:
- Refactor to only lock take rtnl lock once (at the beginning of rule
update handlers).
- Always release rtnl mutex in the same function that locked it.
Remove unlock code from tcf_block_release().
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Register netlink protocol handlers for message types RTM_NEWTFILTER,
RTM_DELTFILTER, RTM_GETTFILTER as unlocked. Set rtnl_held variable that
tracks rtnl mutex state to be false by default.
Introduce tcf_proto_is_unlocked() helper that is used to check
tcf_proto_ops->flag to determine if ops can be called without taking rtnl
lock. Manually lookup Qdisc, class and block in rule update handlers.
Verify that both Qdisc ops and proto ops are unlocked before using any of
their callbacks, and obtain rtnl lock otherwise.
Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Refactor tcf_block_find() code into three standalone functions:
- __tcf_qdisc_find() to lookup Qdisc and increment its reference counter.
- __tcf_qdisc_cl_find() to lookup class.
- __tcf_block_find() to lookup block and increment its reference counter.
This change is necessary to allow netlink tc rule update handlers to call
these functions directly in order to conditionally take rtnl lock
according to Qdisc class ops flags before calling any of class ops
functions.
Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Extend Qdisc_class_ops with flags. Create enum to hold possible class ops
flag values. Add first class ops flags value QDISC_CLASS_OPS_DOIT_UNLOCKED
to indicate that class ops functions can be called without taking rtnl
lock.
Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Add 'rtnl_held' flag to tcf proto change, delete, destroy, dump, walk
functions to track rtnl lock status. Extend users of these function in cls
API to propagate rtnl lock status to them. This allows classifiers to
obtain rtnl lock when necessary and to pass rtnl lock status to extensions
and driver offload callbacks.
Add flags field to tcf proto ops. Add flag value to indicate that
classifier doesn't require rtnl lock.
Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Add optional tp->ops->put() API to be implemented for filter reference
counting. This new function is called by cls API to release filter
reference for filters returned by tp->ops->change() or tp->ops->get()
functions. Implement tfilter_put() helper to call tp->ops->put() only for
classifiers that implement it.
Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Actions API is already updated to not rely on rtnl lock for
synchronization. However, it need to be provided with rtnl status when
called from classifiers API in order to be able to correctly release the
lock when loading kernel module.
Extend extension validation function with 'rtnl_held' flag which is passed
to actions API. Add new 'rtnl_held' parameter to tcf_exts_validate() in cls
API. No classifier is currently updated to support unlocked execution, so
pass hardcoded 'true' flag parameter value.
Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Extend tcf_chain with 'flushing' flag. Use the flag to prevent insertion of
new classifier instances when chain flushing is in progress in order to
prevent resource leak when tcf_proto is created by unlocked users
concurrently.
Return EAGAIN error from tcf_chain_tp_insert_unique() to restart
tc_new_tfilter() and lookup the chain/proto again.
Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Implement unique insertion function to atomically attach tcf_proto to chain
after verifying that no other tcf proto with specified priority exists.
Implement delete function that verifies that tp is actually empty before
deleting it. Use these functions to refactor cls API to account for
concurrent tp and rule update instead of relying on rtnl lock. Add new
'deleting' flag to tcf proto. Use it to restart search when iterating over
tp's on chain to prevent accessing potentially inval tp->next pointer.
Extend tcf proto with spinlock that is intended to be used to protect its
data from concurrent modification instead of relying on rtnl mutex. Use it
to protect 'deleting' flag. Add lockdep macros to validate that lock is
held when accessing protected fields.
Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
All users of chain->filters_chain rely on rtnl lock and assume that no new
classifier instances are added when traversing the list. Use
tcf_get_next_proto() to traverse filters list without relying on rtnl
mutex. This function iterates over classifiers by taking reference to
current iterator classifier only and doesn't assume external
synchronization of filters list.
Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
In order to remove dependency on rtnl lock and allow concurrent tcf_proto
modification, extend tcf_proto with reference counter. Implement helper
get/put functions for tcf proto and use them to modify cls API to always
take reference to tcf_proto while using it. Only release reference to
parent chain after releasing last reference to tp.
Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Extend tcf_chain with new filter_chain_lock mutex. Always lock the chain
when accessing filter_chain list, instead of relying on rtnl lock.
Dereference filter_chain with tcf_chain_dereference() lockdep macro to
verify that all users of chain_list have the lock taken.
Rearrange tp insert/remove code in tc_new_tfilter/tc_del_tfilter to execute
all necessary code while holding chain lock in order to prevent
invalidation of chain_info structure by potential concurrent change. This
also serializes calls to tcf_chain0_head_change(), which allows head change
callbacks to rely on filter_chain_lock for synchronization instead of rtnl
mutex.
Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
When cls API is called without protection of rtnl lock, parallel
modification of chain is possible, which means that chain template can be
changed concurrently in certain circumstances. For example, when chain is
'deleted' by new user-space chain API, the chain might continue to be used
if it is referenced by actions, and can be 're-created' again by user. In
such case same chain structure is reused and its template is changed. To
protect from described scenario, cache chain template while holding block
lock. Introduce standalone tc_chain_notify_delete() function that works
with cached template values, instead of chains themselves.
Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
All users of block->chain_list rely on rtnl lock and assume that no new
chains are added when traversing the list. Use tcf_get_next_chain() to
traverse chain list without relying on rtnl mutex. This function iterates
over chains by taking reference to current iterator chain only and doesn't
assume external synchronization of chain list.
Don't take reference to all chains in block when flushing and use
tcf_get_next_chain() to safely iterate over chain list instead. Remove
tcf_block_put_all_chains() that is no longer used.
Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
In order to remove dependency on rtnl lock, use block->lock to protect
chain0 struct from concurrent modification. Rearrange code in chain0
callback add and del functions to only access chain0 when block->lock is
held.
Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
In order to remove dependency on rtnl lock, modify chain API to use
block->lock to protect chain from concurrent modification. Rearrange
tc_ctl_chain() code to call tcf_chain_hold() while holding block->lock to
prevent concurrent chain removal.
Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
In order to remove dependency on rtnl lock, protect
tcf_chain->explicitly_created flag with block->lock. Consolidate code that
checks and resets 'explicitly_created' flag into __tcf_chain_put() to
execute it atomically with rest of code that puts chain reference.
Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Currently, tcf_block doesn't use any synchronization mechanisms to protect
critical sections that manage lifetime of its chains. block->chain_list and
multiple variables in tcf_chain that control its lifetime assume external
synchronization provided by global rtnl lock. Converting chain reference
counting to atomic reference counters is not possible because cls API uses
multiple counters and flags to control chain lifetime, so all of them must
be synchronized in chain get/put code.
Use single per-block lock to protect block data and manage lifetime of all
chains on the block. Always take block->lock when accessing chain_list.
Chain get and put modify chain lifetime-management data and parent block's
chain_list, so take the lock in these functions. Verify block->lock state
with assertions in functions that expect to be called with the lock taken
and are called from multiple places. Take block->lock when accessing
filter_chain_list.
In order to allow parallel update of rules on single block, move all calls
to classifiers outside of critical sections protected by new block->lock.
Rearrange chain get and put functions code to only access protected chain
data while holding block lock:
- Rearrange code to only access chain reference counter and chain action
reference counter while holding block lock.
- Extract code that requires block->lock from tcf_chain_destroy() into
standalone tcf_chain_destroy() function that is called by
__tcf_chain_put() in same critical section that changes chain reference
counters.
Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
When calculating rb->frames_per_block * req->tp_block_nr the result
can overflow. Check it for overflow without limiting the total buffer
size to UINT_MAX.
This change fixes support for packet ring buffers >= UINT_MAX.
Fixes: 8f8d28e4d6d8 ("net/packet: fix overflow in check for tp_frame_nr")
Signed-off-by: Kal Conley <kal.conley@dectris.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Field idiag_ext in struct inet_diag_req_v2 used as bitmap of requested
extensions has only 8 bits. Thus extensions starting from DCTCPINFO
cannot be requested directly. Some of them included into response
unconditionally or hook into some of lower 8 bits.
Extension INET_DIAG_CLASS_ID has not way to request from the beginning.
This patch bundle it with INET_DIAG_TCLASS (ipv6 tos), fixes space
reservation, and documents behavior for other extensions.
Also this patch adds fallback to reporting socket priority. This filed
is more widely used for traffic classification because ipv4 sockets
automatically maps TOS to priority and default qdisc pfifo_fast knows
about that. But priority could be changed via setsockopt SO_PRIORITY so
INET_DIAG_TOS isn't enough for predicting class.
Also cgroup2 obsoletes net_cls classid (it always zero), but we cannot
reuse this field for reporting cgroup2 id because it is 64-bit (ino+gen).
So, after this patch INET_DIAG_CLASS_ID will report socket priority
for most common setup when net_cls isn't set and/or cgroup2 in use.
Fixes: 0888e372c37f ("net: inet: diag: expose sockets cgroup classid")
Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
KMSAN reported batadv_interface_tx() was possibly using a
garbage value [1]
batadv_get_vid() does have a pskb_may_pull() call
but batadv_interface_tx() does not actually make sure
this did not fail.
[1]
BUG: KMSAN: uninit-value in batadv_interface_tx+0x908/0x1e40 net/batman-adv/soft-interface.c:231
CPU: 0 PID: 10006 Comm: syz-executor469 Not tainted 4.20.0-rc7+ #5
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Call Trace:
__dump_stack lib/dump_stack.c:77 [inline]
dump_stack+0x173/0x1d0 lib/dump_stack.c:113
kmsan_report+0x12e/0x2a0 mm/kmsan/kmsan.c:613
__msan_warning+0x82/0xf0 mm/kmsan/kmsan_instr.c:313
batadv_interface_tx+0x908/0x1e40 net/batman-adv/soft-interface.c:231
__netdev_start_xmit include/linux/netdevice.h:4356 [inline]
netdev_start_xmit include/linux/netdevice.h:4365 [inline]
xmit_one net/core/dev.c:3257 [inline]
dev_hard_start_xmit+0x607/0xc40 net/core/dev.c:3273
__dev_queue_xmit+0x2e42/0x3bc0 net/core/dev.c:3843
dev_queue_xmit+0x4b/0x60 net/core/dev.c:3876
packet_snd net/packet/af_packet.c:2928 [inline]
packet_sendmsg+0x8306/0x8f30 net/packet/af_packet.c:2953
sock_sendmsg_nosec net/socket.c:621 [inline]
sock_sendmsg net/socket.c:631 [inline]
__sys_sendto+0x8c4/0xac0 net/socket.c:1788
__do_sys_sendto net/socket.c:1800 [inline]
__se_sys_sendto+0x107/0x130 net/socket.c:1796
__x64_sys_sendto+0x6e/0x90 net/socket.c:1796
do_syscall_64+0xbc/0xf0 arch/x86/entry/common.c:291
entry_SYSCALL_64_after_hwframe+0x63/0xe7
RIP: 0033:0x441889
Code: 18 89 d0 c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 bb 10 fc ff c3 66 2e 0f 1f 84 00 00 00 00
RSP: 002b:00007ffdda6fd468 EFLAGS: 00000216 ORIG_RAX: 000000000000002c
RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 0000000000441889
RDX: 000000000000000e RSI: 00000000200000c0 RDI: 0000000000000003
RBP: 0000000000000003 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000216 R12: 00007ffdda6fd4c0
R13: 00007ffdda6fd4b0 R14: 0000000000000000 R15: 0000000000000000
Uninit was created at:
kmsan_save_stack_with_flags mm/kmsan/kmsan.c:204 [inline]
kmsan_internal_poison_shadow+0x92/0x150 mm/kmsan/kmsan.c:158
kmsan_kmalloc+0xa6/0x130 mm/kmsan/kmsan_hooks.c:176
kmsan_slab_alloc+0xe/0x10 mm/kmsan/kmsan_hooks.c:185
slab_post_alloc_hook mm/slab.h:446 [inline]
slab_alloc_node mm/slub.c:2759 [inline]
__kmalloc_node_track_caller+0xe18/0x1030 mm/slub.c:4383
__kmalloc_reserve net/core/skbuff.c:137 [inline]
__alloc_skb+0x309/0xa20 net/core/skbuff.c:205
alloc_skb include/linux/skbuff.h:998 [inline]
alloc_skb_with_frags+0x1c7/0xac0 net/core/skbuff.c:5220
sock_alloc_send_pskb+0xafd/0x10e0 net/core/sock.c:2083
packet_alloc_skb net/packet/af_packet.c:2781 [inline]
packet_snd net/packet/af_packet.c:2872 [inline]
packet_sendmsg+0x661a/0x8f30 net/packet/af_packet.c:2953
sock_sendmsg_nosec net/socket.c:621 [inline]
sock_sendmsg net/socket.c:631 [inline]
__sys_sendto+0x8c4/0xac0 net/socket.c:1788
__do_sys_sendto net/socket.c:1800 [inline]
__se_sys_sendto+0x107/0x130 net/socket.c:1796
__x64_sys_sendto+0x6e/0x90 net/socket.c:1796
do_syscall_64+0xbc/0xf0 arch/x86/entry/common.c:291
entry_SYSCALL_64_after_hwframe+0x63/0xe7
Fixes: c6c8fea29769 ("net: Add batman-adv meshing protocol")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: syzbot <syzkaller@googlegroups.com>
Cc: Marek Lindner <mareklindner@neomailbox.ch>
Cc: Simon Wunderlich <sw@simonwunderlich.de>
Cc: Antonio Quartulli <a@unstable.cc>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound
Pull sound fixes from Takashi Iwai:
"It's a bit of surprising that we've got more changes than hoped at
this late stage, but they all don't look too scary but small fixes.
One change in ALSA core side is again the PCM regression fix that was
partially addressed for OSS, but now the all relevant change is
reverted instead. Also, a few ASoC core fixes for UAF and OOB are
included, while the rest are usual random device-specific fixes"
* tag 'sound-5.0-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound:
ALSA: pcm: Revert capture stream behavior change in blocking mode
ALSA: usb-audio: Fix implicit fb endpoint setup by quirk
ALSA: hda - Add quirk for HP EliteBook 840 G5
ASoC: samsung: Prevent clk_get_rate() calls in atomic context
ASoC: rsnd: ssiu: correct shift bit for ssiu9
ASoC: rsnd: fixup rsnd_ssi_master_clk_start() user count check
ASoC: dapm: fix out-of-bounds accesses to DAPM lookup tables
ASoC: topology: fix oops/use-after-free case with dai driver
ASoC: rsnd: fixup MIX kctrl registration
ASoC: core: Allow soc_find_component lookups to match parent of_node
ASoC: rt5682: Correct the setting while select ASRC clk for AD/DA filter
ASoC: MAINTAINERS: fsl: Change Fabio's email address
ASoC: hdmi-codec: fix oops on re-probe
|
|
In preparation to enabling -Wimplicit-fallthrough, mark switch
cases where we are expecting to fall through.
This patch fixes the following warnings:
drivers/isdn/i4l/isdn_v110.c: In function ‘EncodeMatrix’:
drivers/isdn/i4l/isdn_v110.c:353:7: warning: this statement may fall through [-Wimplicit-fallthrough=]
if (line >= mlen) {
^
drivers/isdn/i4l/isdn_v110.c:358:3: note: here
case 128:
^~~~
Warning level 3 was used: -Wimplicit-fallthrough=3
Notice that, in this particular case, the code comment is modified
in accordance with what GCC is expecting to find.
This patch is part of the ongoing efforts to enable
-Wimplicit-fallthrough.
Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
In preparation to enabling -Wimplicit-fallthrough, mark switch
cases where we are expecting to fall through.
This patch fixes the following warnings:
drivers/isdn/i4l/isdn_tty.c: In function ‘isdn_tty_edit_at’:
drivers/isdn/i4l/isdn_tty.c:3644:18: warning: this statement may fall through [-Wimplicit-fallthrough=]
m->mdmcmdl = 0;
~~~~~~~~~~~^~~
drivers/isdn/i4l/isdn_tty.c:3646:5: note: here
case 0:
^~~~
Warning level 3 was used: -Wimplicit-fallthrough=3
Notice that, in this particular case, the code comment is modified
in accordance with what GCC is expecting to find.
This patch is part of the ongoing efforts to enable
-Wimplicit-fallthrough.
Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
In preparation to enabling -Wimplicit-fallthrough, mark switch
cases where we are expecting to fall through.
This patch fixes the following warning:
drivers/isdn/gigaset/ser-gigaset.c: In function ‘gigaset_tty_ioctl’:
drivers/isdn/gigaset/ser-gigaset.c:627:3: warning: this statement may fall through [-Wimplicit-fallthrough=]
switch (arg) {
^~~~~~
drivers/isdn/gigaset/ser-gigaset.c:638:2: note: here
default:
^~~~~~~
Warning level 3 was used: -Wimplicit-fallthrough=3
Notice that, in this particular case, the code comment is modified
in accordance with what GCC is expecting to find.
This patch is part of the ongoing efforts to enable
-Wimplicit-fallthrough.
Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
Acked-by: Paul Bolle <pebolle@tiscali.nl>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Julian Wiedmann says:
====================
s390/qeth: updates 2019-02-12
please apply one more round of qeth patches to net-next.
This series targets the driver's control paths. It primarily brings improvements
to the error handling for sent cmds and received responses, along with the
usual cleanup and consolidation efforts.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
This calls the existing errno translation helpers from the callbacks,
adding trivial wrappers where necessary. For cmds that have no
sophisticated errno translation, default to -EIO.
For IPA cmds with no callback, fall back to a minimal default. This is
currently being used by qeth_l3_send_setrouting().
Thus having all converted all callbacks, remove the legacy path in
qeth_send_control_data_cb().
Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
By letting the callbacks deal with error translation, we no longer need
to pass the raw error codes back to the originator. This allows us to
slim down the callback's private data, and nicely simplifies the code.
Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Error propagation from cmd callbacks currently works in a way where
qeth_send_control_data_cb() picks the raw HW code from the response,
and the cmd's originator later translates this into an errno.
The callback itself only returns 0 ("done") or 1 ("expect more data").
This is
1. limiting, as the only means for the callback to report an internal
error is to invent pseudo HW codes (such as IPA_RC_ENOMEM), that
the originator then needs to understand. For non-IPA callbacks, we
even provide a separate field in the IO buffer metadata (iob->rc) so
the callback can pass back a return value.
2. fragile, as the originator must take care to not translate any errno
that is returned by qeth's own IO code paths (eg -ENOMEM). Also, any
originator that forgets to translate the HW codes potentially passes
garbage back to its caller. For instance, see
commit 2aa4867198c2 ("s390/qeth: translate SETVLAN/DELVLAN errors").
Introduce a new model where all HW error translation is done within the
callback, and the callback returns
> 0, if it expects more data (as before)
== 0, on success
< 0, with an errno
Start off with converting all callbacks to the new model that either
a) pass back pseudo HW codes, or b) have a dependency on a specific
HW error code. Also convert c) the one callback that uses iob->rc, and
d) qeth_setadpparms_change_macaddr_cb() so that it can pass back an
error back to qeth_l2_request_initial_mac() even when the cmd itself
was successful.
The old model remains supported: if the callback returns 0, we still
propagate the response's HW error code back to the originator.
Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
When sending cmds via qeth_send_control_data(), qeth puts the request
on the IO channel and then blocks on the reply object until the response
has been received.
If the IO completes with error, there will never be a response and we
block until the reply-wait hits its timeout. For this case, connect the
request buffer to its reply object, so that we can immediately cancel
the wait.
Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Current code enqueues & dequeues a reply object from the waiter list
in various places. In particular, the dequeue & enqueue in
qeth_send_control_data_cb() looks fragile - this can cause
qeth_clear_ipacmd_list() to skip the active object.
Add some helpers, and boil the logic down by giving
qeth_send_control_data() the sole responsibility to add and remove
objects.
qeth_send_control_data_cb() and qeth_clear_ipacmd_list() will now only
notify the reply object to interrupt its wait cycle. This can cause
a slight delay in the removal, but that's no concern.
Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
'len' specifies how much data we send to the HW, don't dump beyond this
boundary.
As of today this is no big concern - commands are built in full, zeroed
pages.
Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
csum offload and TSO have similar programming requirements. The TSO code
was reworked with commit "s390/qeth: enhance TSO control sequence",
adjust the csum control flow accordingly. Primarily this means replacing
custom helpers with more generic infrastructure.
Also, change the LP2LP check so that it warns on TX offload (not RX).
This is where reduced csum capability actually matters.
Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|