summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2015-04-14watchdog: introduce proc_watchdog_common()Ulrich Obergfell
Three of four handlers for the watchdog parameters in /proc/sys/kernel essentially have to do the same thing. if the parameter is being read { return the state of the corresponding bit(s) in 'watchdog_enabled' } else { set/clear the state of the corresponding bit(s) in 'watchdog_enabled' update the run state of the lockup detector(s) } Hence, introduce a common function that can be called by those handlers. The callers pass a 'bit mask' to this function to indicate which bit(s) should be set/cleared in 'watchdog_enabled'. This function handles an uncommon race with watchdog_nmi_enable() where a concurrent update of 'watchdog_enabled' is possible. We use 'cmpxchg' to detect the concurrency. [This avoids introducing a new spinlock or a mutex to synchronize updates of 'watchdog_enabled'. Using the same lock or mutex in watchdog thread context and in system call context needs to be considered carefully because it can make the code prone to deadlock situations in connection with parking/unparking the watchdog threads.] Signed-off-by: Ulrich Obergfell <uobergfe@redhat.com> Signed-off-by: Don Zickus <dzickus@redhat.com> Cc: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-04-14watchdog: move definition of 'watchdog_proc_mutex' outside of proc_dowatchdog()Ulrich Obergfell
This series removes proc_dowatchdog(). Since multiple new functions need the 'watchdog_proc_mutex' to serialize access to the watchdog parameters in /proc/sys/kernel, move the mutex outside of any function. Signed-off-by: Ulrich Obergfell <uobergfe@redhat.com> Signed-off-by: Don Zickus <dzickus@redhat.com> Cc: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-04-14watchdog: introduce the proc_watchdog_update() functionUlrich Obergfell
This series introduces a separate handler for each watchdog parameter in /proc/sys/kernel. The separate handlers need a common function that they can call to update the run state of the lockup detectors, or to have the lockup detectors use a new sample period. Signed-off-by: Ulrich Obergfell <uobergfe@redhat.com> Signed-off-by: Don Zickus <dzickus@redhat.com> Cc: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-04-14watchdog: new definitions and variables, initializationUlrich Obergfell
The hardlockup and softockup had always been tied together. Due to the request of KVM folks, they had a need to have one enabled but not the other. Internally rework the code to split things apart more cleanly. There is a bunch of churn here, but the end result should be code that should be easier to maintain and fix without knowing the internals of what is going on. This patch (of 9): Introduce new definitions and variables to separate the user interface in /proc/sys/kernel from the internal run state of the lockup detectors. The internal run state is represented by two bits in a new variable that is named 'watchdog_enabled'. This helps simplify the code, for example: - In order to check if any of the two lockup detectors is enabled, it is sufficient to check if 'watchdog_enabled' is not zero. - In order to enable/disable one or both lockup detectors, it is sufficient to set/clear one or both bits in 'watchdog_enabled'. - Concurrent updates of 'watchdog_enabled' need not be synchronized via a spinlock or a mutex. Updates can either be atomic or concurrency can be detected by using 'cmpxchg'. Signed-off-by: Ulrich Obergfell <uobergfe@redhat.com> Signed-off-by: Don Zickus <dzickus@redhat.com> Cc: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-04-14ocfs2: make mlog_errno return the errnoAndrew Morton
ocfs2 does mlog_errno(v); return v; in many places. Change mlog_errno() so we can do return mlog_errno(v); For some weird reason this patch reduces the size of ocfs2 by 6k: akpm3:/usr/src/25> size fs/ocfs2/ocfs2.ko text data bss dec hex filename 1146613 82767 832192 2061572 1f7504 fs/ocfs2/ocfs2.ko-before 1140857 82767 832192 2055816 1f5e88 fs/ocfs2/ocfs2.ko-after [dan.carpenter@oracle.com: double evaluation concerns in mlog_errno()] Cc: Mark Fasheh <mfasheh@suse.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: alex chen <alex.chen@huawei.com> Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-04-14ocfs2: check if the ocfs2 lock resource has been initialized before calling ↵alex chen
ocfs2_dlm_lock If ocfs2 lockres has not been initialized before calling ocfs2_dlm_lock, the lock won't be dropped and then will lead umount hung. The case is described below: ocfs2_mknod ocfs2_mknod_locked __ocfs2_mknod_locked ocfs2_journal_access_di Failed because of -ENOMEM or other reasons, the inode lockres has not been initialized yet. iput(inode) ocfs2_evict_inode ocfs2_delete_inode ocfs2_inode_lock ocfs2_inode_lock_full_nested __ocfs2_cluster_lock Succeeds and allocates a new dlm lockres. ocfs2_clear_inode ocfs2_open_unlock ocfs2_drop_inode_locks ocfs2_drop_lock Since lockres has not been initialized, the lock can't be dropped and the lockres can't be migrated, thus umount will hang forever. Signed-off-by: Alex Chen <alex.chen@huawei.com> Reviewed-by: Joseph Qi <joseph.qi@huawei.com> Reviewed-by: joyce.xue <xuejiufei@huawei.com> Cc: Mark Fasheh <mfasheh@suse.com> Cc: Joel Becker <jlbec@evilplan.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-04-14ocfs2: logging: remove static buffer, use vsprintf extension %pVJoe Perches
Use the vsprintf %pV extension to avoid using a static buffer and remove the now unnecessary buffer. Signed-off-by: Joe Perches <joe@perches.com> Cc: Mark Fasheh <mfasheh@suse.com> Cc: Joel Becker <jlbec@evilplan.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-04-14ocfs2: incorrect check for debugfs returnsChengyu Song
debugfs_create_dir and debugfs_create_file may return -ENODEV when debugfs is not configured, so the return value should be checked against ERROR_VALUE as well, otherwise the later dereference of the dentry pointer would crash the kernel. This patch tries to solve this problem by fixing certain checks. However, I have that found other call sites are protected by #ifdef CONFIG_DEBUG_FS. In current implementation, if CONFIG_DEBUG_FS is defined, then the above two functions will never return any ERROR_VALUE. So another possibility to fix this is to surround all the buggy checks/functions with the same #ifdef CONFIG_DEBUG_FS. But I'm not sure if this would break any functionality, as only OCFS2_FS_STATS declares dependency on DEBUG_FS. Signed-off-by: Chengyu Song <csong84@gatech.edu> Cc: Mark Fasheh <mfasheh@suse.com> Cc: Joel Becker <jlbec@evilplan.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-04-14ocfs2: fix a typo in the copyright statementJakub Wilk
Signed-off-by: Jakub Wilk <jwilk@jwilk.net> Reviewed-by: Eric Ren <zren@suse.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-04-14ocfs2: fix possible uninitialized variable accessJoseph Qi
In ocfs2_local_alloc_find_clear_bits and ocfs2_get_dentry, variable numfound and set may be uninitialized and then used in tracepoint. In ocfs2_xattr_block_get and ocfs2_delete_xattr_in_bucket, variable block_off and xv may be uninitialized and then used in the following logic due to unchecked return value. This patch fixes these possible issues. Signed-off-by: Joseph Qi <joseph.qi@huawei.com> Cc: Mark Fasheh <mfasheh@suse.com> Cc: Joel Becker <jlbec@evilplan.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-04-14ocfs2: remove goto statement in ocfs2_check_dir_for_entry()Daeseok Youn
Signed-off-by: Daeseok Youn <daeseok.youn@gmail.com> Reviewed-by: Joseph Qi <joseph.qi@huawei.com> Cc: Mark Fasheh <mfasheh@suse.com> Cc: Joel Becker <jlbec@evilplan.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-04-14ocfs2: rollback the cleared bits if error occurs after ↵Joseph Qi
ocfs2_block_group_clear_bits ocfs2_block_group_clear_bits will clear bits in block group bitmap. Once it succeeds but fails in the following step, it will cause block group bitmap mismatch the corresponding count recorded in dinode. So rollback the cleared bits if error occurs. Signed-off-by: Joseph Qi <joseph.qi@huawei.com> Cc: Mark Fasheh <mfasheh@suse.com> Cc: Joel Becker <jlbec@evilplan.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-04-14ocfs2: use ENOENT instead of EEXIST when get system file failsJoseph Qi
When ocfs2_get_system_file_inode fails, it is obscure to set the return value to -EEXIST. So change it to -ENOENT. Signed-off-by: Joseph Qi <joseph.qi@huawei.com> Cc: Mark Fasheh <mfasheh@suse.com> Cc: Joel Becker <jlbec@evilplan.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-04-14ocfs2: use actual name length when find entry in ocfs2_orphan_del()Joseph Qi
If the namelen is 20 and name only has actual length 16, it will fail in ocfs2_find_entry because of mismatch. So use actual name length when find entry. Signed-off-by: Joseph Qi <joseph.qi@huawei.com> Signed-off-by: Yiwen Jiang <jiangyiwen@huawei.com> Cc: Mark Fasheh <mfasheh@suse.com> Cc: Joel Becker <jlbec@evilplan.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-04-14ocfs2: dereferencing freed pointers in ocfs2_reflink()Dan Carpenter
The code at the "out" label assumes that "default_acl" and "acl" are NULL, but actually the pointers can be NULL, unitialized, or freed. Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Reviewed-by: Mark Fasheh <mfasheh@suse.de> Cc: Joel Becker <jlbec@evilplan.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-04-14ocfs2: fix typo in ocfs2_reserve_local_alloc_bitsJoseph Qi
In ocfs2_reserve_local_alloc_bits, it calls ocfs2_error if local alloc inode bitmap used bits mismatch, but the log mistakes it as free bits. Signed-off-by: Joseph Qi <joseph.qi@huawei.com> Cc: Mark Fasheh <mfasheh@suse.com> Cc: Joel Becker <jlbec@evilplan.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-04-14ocfs2: do not use ocfs2_zero_extend during direct IOJoseph Qi
In ocfs2_direct_IO_write, we use ocfs2_zero_extend to zero allocated clusters in case of cluster not aligned. But ocfs2_zero_extend uses page cache, this may happen that it clears the data which blockdev_direct_IO has already written. We should use blkdev_issue_zeroout instead of ocfs2_zero_extend during direct IO. So fix this issue by introducing ocfs2_direct_IO_zero_extend and ocfs2_direct_IO_extend_no_holes. Reported-by: Yiwen Jiang <jiangyiwen@huawei.com> Signed-off-by: Joseph Qi <joseph.qi@huawei.com> Tested-by: Yiwen Jiang <jiangyiwen@huawei.com> Cc: Mark Fasheh <mfasheh@suse.com> Cc: Joel Becker <jlbec@evilplan.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-04-14ocfs2: take inode lock when get clustersJoseph Qi
We need take inode lock when calling ocfs2_get_clusters. And use GFP_NOFS instead of GFP_KERNEL. Signed-off-by: Joseph Qi <joseph.qi@huawei.com> Cc: Mark Fasheh <mfasheh@suse.com> Cc: Joel Becker <jlbec@evilplan.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-04-14ocfs2: no need get dinode bh when zeroing extendJoseph Qi
Since di_bh won't be used when zeroing extend, set it to NULL. Signed-off-by: Joseph Qi <joseph.qi@huawei.com> Cc: Mark Fasheh <mfasheh@suse.com> Cc: Joel Becker <jlbec@evilplan.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-04-14ocfs2: fix a typing error in ocfs2_direct_IO_writeJoseph Qi
Only when direct IO succeeds we need consider zeroing out in case of cluster not aligned. Signed-off-by: Joseph Qi <joseph.qi@huawei.com> Cc: Mark Fasheh <mfasheh@suse.com> Cc: Joel Becker <jlbec@evilplan.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-04-14ocfs2: avoid a pointless delay in o2cb_cluster_check()Daeseok Youn
Fix an off-by-one when attempting to avoid an msleep() on the final loop iteration. Signed-off-by: Daeseok Youn <daeseok.youn@gmail.com> Cc: Mark Fasheh <mfasheh@suse.com> Cc: Joel Becker <jlbec@evilplan.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-04-14ocfs2: one function call less in user_cluster_connect() after error detectionMarkus Elfring
kfree() was called by user_cluster_connect() even if a previous call of the kzalloc() function failed. Return from this implementation directly after failure detection. Signed-off-by: Markus Elfring <elfring@users.sourceforge.net> Cc: Mark Fasheh <mfasheh@suse.com> Cc: Joel Becker <jlbec@evilplan.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-04-14ocfs2: one function call less in ocfs2_init_slot_info() after error detectionMarkus Elfring
__ocfs2_free_slot_info() was called by ocfs2_init_slot_info() even if a call of the kzalloc() function failed. Return from this implementation directly after corresponding exception handling. Signed-off-by: Markus Elfring <elfring@users.sourceforge.net> Cc: Mark Fasheh <mfasheh@suse.com> Cc: Joel Becker <jlbec@evilplan.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-04-14ocfs2: one function call less in ocfs2_merge_rec_right() after error detectionMarkus Elfring
ocfs2_free_path() was called by ocfs2_merge_rec_right() even if a call of the ocfs2_get_right_path() function failed. Return from this implementation directly after corresponding exception handling. Signed-off-by: Markus Elfring <elfring@users.sourceforge.net> Cc: Mark Fasheh <mfasheh@suse.com> Cc: Joel Becker <jlbec@evilplan.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-04-14ocfs2: one function call less in ocfs2_merge_rec_left() after error detectionMarkus Elfring
ocfs2_free_path() was called by ocfs2_merge_rec_left() even if a call of the ocfs2_get_left_path() function failed. Return from this implementation directly after corresponding exception handling. Signed-off-by: Markus Elfring <elfring@users.sourceforge.net> Cc: Mark Fasheh <mfasheh@suse.com> Cc: Joel Becker <jlbec@evilplan.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-04-14ocfs2: less function calls in ocfs2_figure_merge_contig_type() after error ↵Markus Elfring
detection ocfs2_free_path() was called in some cases by ocfs2_figure_merge_contig_type() during error handling even if the passed variables "left_path" and "right_path" contained still a null pointer. Corresponding implementation details could be improved by adjustments for jump labels according to the current Linux coding style convention. Signed-off-by: Markus Elfring <elfring@users.sourceforge.net> Cc: Mark Fasheh <mfasheh@suse.com> Cc: Joel Becker <jlbec@evilplan.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-04-14ocfs2: less function calls in ocfs2_convert_inline_data_to_extents() after ↵Markus Elfring
error detection kfree() was called in a few cases by ocfs2_convert_inline_data_to_extents() during error handling even if the passed variable "pages" contained a null pointer. * Return from this implementation directly after failure detection for the function call "kcalloc". * Corresponding details could be improved by the introduction of another jump label. Signed-off-by: Markus Elfring <elfring@users.sourceforge.net> Cc: Mark Fasheh <mfasheh@suse.com> Cc: Joel Becker <jlbec@evilplan.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-04-14ocfs2: delete unnecessary checks before three function callsMarkus Elfring
kfree(), ocfs2_free_path() and __ocfs2_free_slot_info() test whether their argument is NULL and then return immediately. Thus the test around their calls is not needed. This issue was detected by using the Coccinelle software. Signed-off-by: Markus Elfring <elfring@users.sourceforge.net> Cc: Mark Fasheh <mfasheh@suse.com> Cc: Joel Becker <jlbec@evilplan.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-04-14arch/sh/kernel/dwarf.c: use mempool_create_slab_pool()David Rientjes
Mempools created for slab caches should use mempool_create_slab_pool(). Signed-off-by: David Rientjes <rientjes@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-04-14arch/sh/kernel/dwarf.c: destroy mempools on cleanupDavid Rientjes
dwarf_reg_pool and dwarf_frame_pool are not properly destroyed when cleaning up the dwarf unwinder. Destroy them with mempool_destroy(). Also mark dwarf_unwinder_cleanup() as __init. Signed-off-by: David Rientjes <rientjes@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-04-14scripts/coccinelle/misc/bugon.cocci: update bug_on conversion warningFabian Frederick
if()/BUG conversion to BUG_ON must be avoided when there's side effect in condition. The reason being BUG_ON won't execute the condition when CONFIG_BUG is not defined. With inspiration from Bruce Fields. Signed-off-by: Fabian Frederick <fabf@skynet.be> Suggested-by: Julia Lawall <Julia.Lawall@lip6.fr> Acked-by: Julia Lawall <Julia.Lawall@lip6.fr> Cc: J. Bruce Fields <bfields@fieldses.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-04-14mm/hugetlb: use pmd_page() in follow_huge_pmd()Gerald Schaefer
Commit 61f77eda9bbf ("mm/hugetlb: reduce arch dependent code around follow_huge_*") broke follow_huge_pmd() on s390, where pmd and pte layout differ and using pte_page() on a huge pmd will return wrong results. Using pmd_page() instead fixes this. All architectures that were touched by that commit have pmd_page() defined, so this should not break anything on other architectures. Fixes: 61f77eda "mm/hugetlb: reduce arch dependent code around follow_huge_*" Signed-off-by: Gerald Schaefer <gerald.schaefer@de.ibm.com> Acked-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Cc: Hugh Dickins <hughd@google.com> Cc: Michal Hocko <mhocko@suse.cz>, Andrea Arcangeli <aarcange@redhat.com> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Acked-by: David Rientjes <rientjes@google.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-04-14Merge tag 'gfs2-merge-window' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2 Pull GFS2 updates from Bob Peterson: "Here is a list of patches we've accumulated for GFS2 for the current upstream merge window. Most of the patches fix GFS2 quotas, which were not properly enforced. There's another that adds me as a GFS2 co-maintainer, and a couple patches that fix a kernel panic doing splice_write on GFS2 as well as a few correctness patches" * tag 'gfs2-merge-window' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2: gfs2: fix quota refresh race in do_glock() gfs2: incorrect check for debugfs returns gfs2: allow fallocate to max out quotas/fs efficiently gfs2: allow quota_check and inplace_reserve to return available blocks gfs2: perform quota checks against allocation parameters GFS2: Move gfs2_file_splice_write outside of #ifdef GFS2: Allocate reservation during splice_write GFS2: gfs2_set_acl(): Cache "no acl" as well Add myself (Bob Peterson) as a maintainer of GFS2
2015-04-14Merge tag 'jfs-4.1' of git://github.com/kleikamp/linux-shaggyLinus Torvalds
Pull jfs update from David Kleikamp: "Not much this time. Just a one-liner format fix" * tag 'jfs-4.1' of git://github.com/kleikamp/linux-shaggy: jfs: %pf is only for function pointers
2015-04-14Merge tag 'please-pull-pstore' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/aegl/linux Pull pstore fix from Tony Luck: "Just one pstore fix this release" * tag 'please-pull-pstore' of git://git.kernel.org/pub/scm/linux/kernel/git/aegl/linux: pstore: Fix the ramoops module parameters update
2015-04-14fm10k: Bump driver version to 0.15.2Jeff Kirsher
With the recent driver changes, bump the version. Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Tested-by: Krishneil Singh <krishneil.k.singh@intel.com>
2015-04-14fm10k: corrected VF multicast updateJeff Kirsher
VFs were being improperly added to the switch's multicast group. The error stems from the fact that incorrect arguments were passed to the "update_mc_addr" function. It would seem to be a copy paste error since the parameters are similar to the "update_uc_addr" function. Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Signed-off-by: Ngai-Mint Kwan <ngai-mint.kwan@intel.com> Acked-by: Matthew Vick <matthew.vick@intel.com> Tested-by: Krishneil Singh <krishneil.k.singh@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2015-04-14fm10k: mbx_update_max_size does not drop all oversized messagesJeff Kirsher
When we call update_max_size it does not drop all oversized messages. This is due to the difficulty in performing this operation, since it is a FIFO which makes updating anything other than head or tail very difficult. To fix this, modify validate_msg_size to ensure that we error out later when trying to transmit the message that could be oversized. This will generally be a rare condition, as it requires the FIFO to include a message larger than the max_size negotiated during mailbox connect. Note that max_size is always smaller than rx.size so it should be safe to use here. Also, update the update_max_size function header comment to clearly indicate that it does not drop all oversized messages, but only those at the head of the FIFO. Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Acked-by: Matthew Vick <matthew.vick@intel.com> Tested-by: Krishneil Singh <krishneil.k.singh@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2015-04-14fm10k: reset head instead of calling update_max_sizeJeff Kirsher
When we forcefully shutdown the mailbox, we then go about resetting max size to 0, and clearing all messages in the FIFO. Instead, we should just reset the head pointer so that the FIFO becomes empty, rather than changing the max size to 0. This helps prevent increment in tx_dropped counter during mailbox negotiation, which is confusing to viewers of Linux ethtool statistics output. Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Acked-by: Matthew Vick <matthew.vick@intel.com> Tested-by: Krishneil Singh <krishneil.k.singh@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2015-04-14fm10k: renamed mbx_tx_dropped to mbx_tx_oversizedJeff Kirsher
The use of dropped doesn't really mean dropped mailbox messages, but rather specifically messages which were too large to fit in the remote Rx FIFO. Rename the stat to more clearly indicate what it means. Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Acked-by: Matthew Vick <matthew.vick@intel.com> Tested-by: Krishneil Singh <krishneil.k.singh@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2015-04-14Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-nextDavid S. Miller
Pablo Neira Ayuso says: ==================== Netfilter updates for net-next A final pull request, I know it's very late but this time I think it's worth a bit of rush. The following patchset contains Netfilter/nf_tables updates for net-next, more specifically concatenation support and dynamic stateful expression instantiation. This also comes with a couple of small patches. One to fix the ebtables.h userspace header and another to get rid of an obsolete example file in tree that describes a nf_tables expression. This time, I decided to paste the original descriptions. This will result in a rather large commit description, but I think these bytes to keep. Patrick McHardy says: ==================== netfilter: nf_tables: concatenation support The following patches add support for concatenations, which allow multi dimensional exact matches in O(1). The basic idea is to split the data registers, currently consisting of 4 registers of 16 bytes each, into smaller units, 16 registers of 4 bytes each, and making sure each register store always leaves the full 32 bit in a well defined state, meaning smaller stores will zero the remaining bits. Based on that, we can load multiple adjacent registers with different values, thereby building a concatenated bigger value, and use that value for set lookups. Sets are changed to use variable sized extensions for their key and data values, removing the fixed limit of 16 bytes while saving memory if less space is needed. As a side effect, these patches will allow some nice optimizations in the future, like using jhash2 in nft_hash, removing the masking in nft_cmp_fast, optimized data comparison using 32 bit word size etc. These are not done so far however. The patches are split up as follows: * the first five patches add length validation to register loads and stores to make sure we stay within bounds and prepare the validation functions for the new addressing mode * the next patches prepare for changing to 32 bit addressing by introducing a struct nft_regs, which holds the verdict register as well as the data registers. The verdict members are moved to a new struct nft_verdict to allow to pull struct nft_data out of the stack. * the next patches contain preparatory conversions of expressions and sets to use 32 bit addressing * the next patch introduces so far unused register conversion helpers for parsing and dumping register numbers over netlink * following is the real conversion to 32 bit addressing, consisting of replacing struct nft_data in struct nft_regs by an array of u32s and actually translating and validating the new register numbers. * the final two patches add support for variable sized data items and variable sized keys / data in set elements The patches have been verified to work correctly with nft binaries using both old and new addressing. ==================== Patrick McHardy says: ==================== netfilter: nf_tables: dynamic stateful expression instantiation The following patches are the grand finale of my nf_tables set work, using all the building blocks put in place by the previous patches to support something like iptables hashlimit, but a lot more powerful. Sets are extended to allow attaching expressions to set elements. The dynset expression dynamically instantiates these expressions based on a template when creating new set elements and evaluates them for all new or updated set members. In combination with concatenations this effectively creates state tables for arbitrary combinations of keys, using the existing expression types to maintain that state. Regular set GC takes care of purging expired states. We currently support two different stateful expressions, counter and limit. Using limit as a template we can express the functionality of hashlimit, but completely unrestricted in the combination of keys. Using counter we can perform accounting for arbitrary flows. The following examples from patch 5/5 show some possibilities. Userspace syntax is still WIP, especially the listing of state tables will most likely be seperated from normal set listings and use a more structured format: 1. Limit the rate of new SSH connections per host, similar to iptables hashlimit: flow ip saddr timeout 60s \ limit 10/second \ accept 2. Account network traffic between each set of /24 networks: flow ip saddr & 255.255.255.0 . ip daddr & 255.255.255.0 \ counter 3. Account traffic to each host per user: flow skuid . ip daddr \ counter 4. Account traffic for each combination of source address and TCP flags: flow ip saddr . tcp flags \ counter The resulting set content after a Xmas-scan look like this: { 192.168.122.1 . fin | psh | urg : counter packets 1001 bytes 40040, 192.168.122.1 . ack : counter packets 74 bytes 3848, 192.168.122.1 . psh | ack : counter packets 35 bytes 3144 } In the future the "expressions attached to elements" will be extended to also support user created non-stateful expressions to allow to efficiently select beween a set of parameter sets, f.i. a set of log statements with different prefixes based on the interface, which currently require one rule each. This will most likely have to wait until the next kernel version though. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2015-04-14fm10k: update xcast mode before synchronizing multicast addressesJeff Kirsher
When the PF receives a request to update a multicast address for the VF, it checks the enabled multicast mode first. Fix a bug where the VF tried to set a multicast address before requesting the required xcast mode. This ensures the multicast addresses are honored as long as the xcast mode was allowed. Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Acked-by: Matthew Vick <matthew.vick@intel.com> Tested-by: Krishneil Singh <krishneil.k.singh@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2015-04-14fm10k: start service timer on probeJeff Kirsher
Since the service task handles varying work that doesn't all require the interface to be up, launch the service timer immediately. This ensures that we continually check the mailbox, as well as handle other tasks while the device is down. Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Acked-by: Matthew Vick <matthew.vick@intel.com> Tested-by: Krishneil Singh <krishneil.k.singh@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2015-04-14fm10k: fix function header commentJeff Kirsher
The header comment included a miscopy of a C-code line, and also mis-used Rx FIFO when it clearly meant Tx FIFO Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Acked-by: Matthew Vick <matthew.vick@intel.com> Tested-by: Krishneil Singh <krishneil.k.singh@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2015-04-14fm10k: comment next_vf_mbx flowJeff Kirsher
Add a header comment explaining why we have the somewhat crazy mailbox flow. This flow is necessary as it prevents the PF<->SM mailbox from being flooded by the VF messages, which normally trigger a message to the PF. This helps prevent the case where we see a PF mailbox timeout. Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Acked-by: Matthew Vick <matthew.vick@intel.com> Tested-by: Krishneil Singh <krishneil.k.singh@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2015-04-14fm10k: don't handle mailbox events in iov_event path and always process mailboxJeff Kirsher
Since we already schedule the service task, we can just wait for this task to handle the mailbox events from the VF. This reduces some complex code flow, and makes it so we have a single path for handling the VF messages. There is a possibility that we have a slight delay in handling VF messages, but it should be minimal. The result of tx_complete and !rx_ready is insufficient to determine whether we need to process the mailbox. There is a possible race condition whereby the VF fills up the mbmem for us, but we have already recently processed the mailboxes in the interrupt. During this time, the interrupt is disabled. Thus, our Rx FIFO is empty, but the mbmem now has data in it. Since we continually check whether Rx FIFO is empty, we then never call process. This results in the possibility to prevent PF from handling the VF mailbox messages. Instead, just call process every time, despite the fact that we may or may not have anything to process for the VF. There should be minimal overhead for doing this, and it resolves an issue where the VF never comes up due to never getting response for its SET_LPORT_STATE message. Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Acked-by: Matthew Vick <matthew.vick@intel.com> Tested-by: Krishneil Singh <krishneil.k.singh@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2015-04-14fm10k: use separate workqueue for fm10k driverJeff Kirsher
Since we run the watchdog periodically, which might take a while and potentially monopolize the system default workqueue, create our own separate work queue. This also helps reduce and stabilize latency between scheduling the work in our interrupt and actually performing the work. Still use a timer for the regular scheduled interval but queue the work onto its own work queue. It seemed overkill to create a single workqueue per interface, so we just spawn a single work queue for all interfaces upon driver load. For this reason, use a multi-threaded workqueue with one thread per processor, rather than single threaded queue. Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Acked-by: Matthew Vick <matthew.vick@intel.com> Tested-by: Krishneil Singh <krishneil.k.singh@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2015-04-14fm10k: Set PF queues to unlimited bandwidth during virtualizationJeff Kirsher
When returning virtualization queues from the VF back to the PF, do not retain the VF rate limiter. Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Signed-off-by: Todd Russell <todd.a.russell@intel.com> Acked-by: Matthew Vick <matthew.vick@intel.com> Tested-by: Krishneil Singh <krishneil.k.singh@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2015-04-14fm10k: expose tx_timeout_count as an ethtool statJeff Kirsher
Named it tx_hang_count to differentiate it from tx_hwtstamp_timeout. Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Acked-by: Matthew Vick <matthew.vick@intel.com> Tested-by: Krishneil Singh <krishneil.k.singh@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2015-04-14fm10k: only increment tx_timeout_count in Tx hang pathJeff Kirsher
We were incrementing the tx_timeout_count for both the Tx hang and then for all reset flows. Instead, we should only increment tx_timeout_count in the Tx hang path, so that our Tx hang counter does not increment when it was not caused by a Tx hang. Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Acked-by: Matthew Vick <matthew.vick@intel.com> Tested-by: Krishneil Singh <krishneil.k.singh@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>