summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2016-05-12hfs: switch to ->iterate_shared()Al Viro
exact parallel of hfsplus analogue Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-05-12hfsplus: switch to ->iterate_shared()Al Viro
We need to protect the list of hfsplus_readdir_data against parallel insertions (in readdir) and removals (in release). Add a spinlock for that. Note that it has nothing to do with protection of hfsplus_readdir_data->key - we have an exclusion between hfsplus_readdir() and hfsplus_delete_cat() on directory lock and between several hfsplus_readdir() for the same struct file on ->f_pos_lock. The spinlock is strictly for list changes. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-05-12hostfs: switch to ->iterate_shared()Al Viro
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-05-12hpfs: switch to ->iterate_shared()Al Viro
NOTE: the only reason we can do that without ->i_rdir_offs races is that hpfs_lock() serializes everything in there anyway. It's not that hard to get rid of, but not as part of this series... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-05-12hpfs: handle allocation failures in hpfs_add_pos()Al Viro
pr_err() is nice, but we'd better propagate the error to caller and not proceed to violate the invariants (namely, "every file with f_pos tied to directory block should have its address visible in per-inode array"). Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-05-12mm: thp: calculate the mapcount correctly for THP pages during WP faultsAndrea Arcangeli
This will provide fully accuracy to the mapcount calculation in the write protect faults, so page pinning will not get broken by false positive copy-on-writes. total_mapcount() isn't the right calculation needed in reuse_swap_page(), so this introduces a page_trans_huge_mapcount() that is effectively the full accurate return value for page_mapcount() if dealing with Transparent Hugepages, however we only use the page_trans_huge_mapcount() during COW faults where it strictly needed, due to its higher runtime cost. This also provide at practical zero cost the total_mapcount information which is needed to know if we can still relocate the page anon_vma to the local vma. If page_trans_huge_mapcount() returns 1 we can reuse the page no matter if it's a pte or a pmd_trans_huge triggering the fault, but we can only relocate the page anon_vma to the local vma->anon_vma if we're sure it's only this "vma" mapping the whole THP physical range. Kirill A. Shutemov discovered the problem with moving the page anon_vma to the local vma->anon_vma in a previous version of this patch and another problem in the way page_move_anon_rmap() was called. Andrew Morton discovered that CONFIG_SWAP=n wouldn't build in a previous version, because reuse_swap_page must be a macro to call page_trans_huge_mapcount from swap.h, so this uses a macro again instead of an inline function. With this change at least it's a less dangerous usage than it was before, because "page" is used only once now, while with the previous code reuse_swap_page(page++) would have called page_mapcount on page+1 and it would have increased page twice instead of just once. Dean Luick noticed an uninitialized variable that could result in a rmap inefficiency for the non-THP case in a previous version. Mike Marciniszyn said: : Our RDMA tests are seeing an issue with memory locking that bisects to : commit 61f5d698cc97 ("mm: re-enable THP") : : The test program registers two rather large MRs (512M) and RDMA : writes data to a passive peer using the first and RDMA reads it back : into the second MR and compares that data. The sizes are chosen randomly : between 0 and 1024 bytes. : : The test will get through a few (<= 4 iterations) and then gets a : compare error. : : Tracing indicates the kernel logical addresses associated with the individual : pages at registration ARE correct , the data in the "RDMA read response only" : packets ARE correct. : : The "corruption" occurs when the packet crosse two pages that are not physically : contiguous. The second page reads back as zero in the program. : : It looks like the user VA at the point of the compare error no longer points to : the same physical address as was registered. : : This patch totally resolves the issue! Link: http://lkml.kernel.org/r/1462547040-1737-2-git-send-email-aarcange@redhat.com Signed-off-by: Andrea Arcangeli <aarcange@redhat.com> Reviewed-by: "Kirill A. Shutemov" <kirill@shutemov.name> Reviewed-by: Dean Luick <dean.luick@intel.com> Tested-by: Alex Williamson <alex.williamson@redhat.com> Tested-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Tested-by: Josh Collier <josh.d.collier@intel.com> Cc: Marc Haber <mh+linux-kernel@zugschlus.de> Cc: <stable@vger.kernel.org> [4.5] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-05-12ksm: fix conflict between mmput and scan_get_next_rmap_itemZhou Chengming
A concurrency issue about KSM in the function scan_get_next_rmap_item. task A (ksmd): |task B (the mm's task): | mm = slot->mm; | down_read(&mm->mmap_sem); | | ... | | spin_lock(&ksm_mmlist_lock); | | ksm_scan.mm_slot go to the next slot; | | spin_unlock(&ksm_mmlist_lock); | |mmput() -> | ksm_exit(): | |spin_lock(&ksm_mmlist_lock); |if (mm_slot && ksm_scan.mm_slot != mm_slot) { | if (!mm_slot->rmap_list) { | easy_to_free = 1; | ... | |if (easy_to_free) { | mmdrop(mm); | ... | |So this mm_struct may be freed in the mmput(). | up_read(&mm->mmap_sem); | As we can see above, the ksmd thread may access a mm_struct that already been freed to the kmem_cache. Suppose a fork will get this mm_struct from the kmem_cache, the ksmd thread then call up_read(&mm->mmap_sem), will cause mmap_sem.count to become -1. As suggested by Andrea Arcangeli, unmerge_and_remove_all_rmap_items has the same SMP race condition, so fix it too. My prev fix in function scan_get_next_rmap_item will introduce a different SMP race condition, so just invert the up_read/spin_unlock order as Andrea Arcangeli said. Link: http://lkml.kernel.org/r/1462708815-31301-1-git-send-email-zhouchengming1@huawei.com Signed-off-by: Zhou Chengming <zhouchengming1@huawei.com> Suggested-by: Andrea Arcangeli <aarcange@redhat.com> Reviewed-by: Andrea Arcangeli <aarcange@redhat.com> Cc: Hugh Dickins <hughd@google.com> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Geliang Tang <geliangtang@163.com> Cc: Minchan Kim <minchan@kernel.org> Cc: Hanjun Guo <guohanjun@huawei.com> Cc: Ding Tianhong <dingtianhong@huawei.com> Cc: Li Bin <huawei.libin@huawei.com> Cc: Zhen Lei <thunder.leizhen@huawei.com> Cc: Xishi Qiu <qiuxishi@huawei.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-05-12ocfs2: fix posix_acl_create deadlockJunxiao Bi
Commit 702e5bc68ad2 ("ocfs2: use generic posix ACL infrastructure") refactored code to use posix_acl_create. The problem with this function is that it is not mindful of the cluster wide inode lock making it unsuitable for use with ocfs2 inode creation with ACLs. For example, when used in ocfs2_mknod, this function can cause deadlock as follows. The parent dir inode lock is taken when calling posix_acl_create -> get_acl -> ocfs2_iop_get_acl which takes the inode lock again. This can cause deadlock if there is a blocked remote lock request waiting for the lock to be downconverted. And same deadlock happened in ocfs2_reflink. This fix is to revert back using ocfs2_init_acl. Fixes: 702e5bc68ad2 ("ocfs2: use generic posix ACL infrastructure") Signed-off-by: Tariq Saeed <tariq.x.saeed@oracle.com> Signed-off-by: Junxiao Bi <junxiao.bi@oracle.com> Cc: Mark Fasheh <mfasheh@suse.de> Cc: Joel Becker <jlbec@evilplan.org> Cc: Joseph Qi <joseph.qi@huawei.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-05-12ocfs2: revert using ocfs2_acl_chmod to avoid inode cluster lock hangJunxiao Bi
Commit 743b5f1434f5 ("ocfs2: take inode lock in ocfs2_iop_set/get_acl()") introduced this issue. ocfs2_setattr called by chmod command holds cluster wide inode lock when calling posix_acl_chmod. This latter function in turn calls ocfs2_iop_get_acl and ocfs2_iop_set_acl. These two are also called directly from vfs layer for getfacl/setfacl commands and therefore acquire the cluster wide inode lock. If a remote conversion request comes after the first inode lock in ocfs2_setattr, OCFS2_LOCK_BLOCKED will be set. And this will cause the second call to inode lock from the ocfs2_iop_get_acl() to block indefinetly. The deleted version of ocfs2_acl_chmod() calls __posix_acl_chmod() which does not call back into the filesystem. Therefore, we restore ocfs2_acl_chmod(), modify it slightly for locking as needed, and use that instead. Fixes: 743b5f1434f5 ("ocfs2: take inode lock in ocfs2_iop_set/get_acl()") Signed-off-by: Tariq Saeed <tariq.x.saeed@oracle.com> Signed-off-by: Junxiao Bi <junxiao.bi@oracle.com> Cc: Mark Fasheh <mfasheh@suse.de> Cc: Joel Becker <jlbec@evilplan.org> Cc: Joseph Qi <joseph.qi@huawei.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-05-12tipc: eliminate risk of double link_up eventsJon Paul Maloy
When an ACTIVATE or data packet is received in a link in state ESTABLISHING, the link does not immediately change state to ESTABLISHED, but does instead return a LINK_UP event to the caller, which will execute the state change in a different lock context. This non-atomic approach incurs a low risk that we may have two LINK_UP events pending simultaneously for the same link, resulting in the final part of the setup procedure being executed twice. The only potential harm caused by this it that we may see two LINK_UP events issued to subsribers of the topology server, something that may cause confusion. This commit eliminates this risk by checking if the link is already up before proceeding with the second half of the setup. Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-05-12gfs2: switch to ->iterate_shared()Al Viro
protected by glock and already used without locking the directory by gfs2_get_name() Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-05-12net: mvneta: bm: fix dependencies againArnd Bergmann
I tried to fix this before, but my previous fix was incomplete and we can still get the same link error in randconfig builds because of the way that Kconfig treats the default y if MVNETA=y && MVNETA_BM_ENABLE line that does not actually trigger when MVNETA_BM_ENABLE=m, unlike I intended. Changing the line to use MVNETA_BM_ENABLE!=n however has the desired effect and hopefully makes all configurations work as expected. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Fixes: 019ded3aa7c9 ("net: mvneta: bm: clarify dependencies") Acked-by: Gregory CLEMENT <gregory.clement@free-electrons.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-05-12coredump: only charge written data against RLIMIT_COREOmar Sandoval
Commit 9b56d54380ad ("dump_skip(): dump_seek() replacement taking coredump_params") introduced a regression with regard to RLIMIT_CORE. Previously, when a core dump was sparse, only the data that was actually written out would count against the limit. Now, the sparse ranges are also included, which leads to truncated core dumps when the actual disk usage is still well below the limit. Restore the old behavior by only counting what gets emitted and ignoring what gets skipped. Signed-off-by: Omar Sandoval <osandov@fb.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-05-12coredump: get rid of coredump_params->writtenOmar Sandoval
cprm->written is redundant with cprm->file->f_pos, so use that instead. Signed-off-by: Omar Sandoval <osandov@fb.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-05-12phy: micrel: Use MICREL_PHY_ID_MASK definitionFabio Estevam
Replace the hardcoded mask 0x00fffff0 with MICREL_PHY_ID_MASK for better readability. Suggested-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: Fabio Estevam <fabio.estevam@nxp.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Acked-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-05-12gre: Fix wrong tpi->proto in WCCPHaishuang Yan
When dealing with WCCP in gre6 tunnel, it sets the wrong tpi->protocol, that is, ETH_P_IP instead of ETH_P_IPV6 for the encapuslated traffic. Signed-off-by: Haishuang Yan <yanhaishuang@cmss.chinamobile.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-05-12ip6_gre: Fix get_size calculation for gre6 tunnelHaishuang Yan
Do not include attribute IFLA_GRE_TOS. Signed-off-by: Haishuang Yan <yanhaishuang@cmss.chinamobile.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-05-12Merge tag 'keys-fixes-20160512' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs Pull keyring fix from David Howells: "Fix ASN.1 indefinite length object parsing" * tag 'keys-fixes-20160512' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs: KEYS: Fix ASN.1 indefinite length object parsing
2016-05-12Merge tag 'sound-4.6' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound Pull sound fixes from Takashi Iwai: "This is a pretty boring pull request as you wish: including a few small and trivial HD-audio and USB-audio quirks and a couple of small regression fixes in HD-audio" * tag 'sound-4.6' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound: ALSA: usb-audio: Yet another Phoneix Audio device quirk ALSA: hda - Fix regression on ATI HDMI audio ALSA: hda - Fix subwoofer pin on ASUS N751 and N551 ALSA: hda - Fix broken reconfig ALSA: hda - Fix white noise on Asus UX501VW headset ALSA: usb-audio: Quirk for yet another Phoenix Audio devices (v2)
2016-05-12Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input Pull input subsystem fixes from Dmitry Torokhov. * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input: Input: twl6040-vibra - fix DT node memory management Input: max8997-haptic - fix NULL pointer dereference Input: byd - update copyright header
2016-05-12perf stat: Fallback to user only counters when perf_event_paranoid > 1Arnaldo Carvalho de Melo
After 0161028b7c8a ("perf/core: Change the default paranoia level to 2") 'perf stat' fails for users without CAP_SYS_ADMIN, so just use 'perf_evsel__fallback()' to have the same behaviour as 'perf record', i.e. set perf_event_attr.exclude_kernel to 1. Now: [acme@jouet linux]$ perf stat usleep 1 Performance counter stats for 'usleep 1': 0.352536 task-clock:u (msec) # 0.423 CPUs utilized 0 context-switches:u # 0.000 K/sec 0 cpu-migrations:u # 0.000 K/sec 49 page-faults:u # 0.139 M/sec 309,407 cycles:u # 0.878 GHz 243,791 instructions:u # 0.79 insn per cycle 49,622 branches:u # 140.757 M/sec 3,884 branch-misses:u # 7.83% of all branches 0.000834174 seconds time elapsed [acme@jouet linux]$ Reported-by: Ingo Molnar <mingo@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: David Ahern <dsahern@gmail.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kees Cook <keescook@chromium.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Wang Nan <wangnan0@huawei.com> Link: http://lkml.kernel.org/n/tip-b20jmx4dxt5hpaa9t2rroi0o@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2016-05-12perf evsel: Handle EACCESS + perf_event_paranoid=2 in fallback()Arnaldo Carvalho de Melo
Now with the default for the kernel.perf_event_paranoid sysctl being 2 [1] we need to fall back to :u, i.e. to set perf_event_attr.exclude_kernel to 1. Before: [acme@jouet linux]$ perf record usleep 1 Error: You may not have permission to collect stats. Consider tweaking /proc/sys/kernel/perf_event_paranoid, which controls use of the performance events system by unprivileged users (without CAP_SYS_ADMIN). The current value is 2: -1: Allow use of (almost) all events by all users >= 0: Disallow raw tracepoint access by users without CAP_IOC_LOCK >= 1: Disallow CPU event access by users without CAP_SYS_ADMIN >= 2: Disallow kernel profiling by users without CAP_SYS_ADMIN [acme@jouet linux]$ After: [acme@jouet linux]$ perf record usleep 1 [ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 0.016 MB perf.data (7 samples) ] [acme@jouet linux]$ perf evlist cycles:u [acme@jouet linux]$ perf evlist -v cycles:u: size: 112, { sample_period, sample_freq }: 4000, sample_type: IP|TID|TIME|PERIOD, disabled: 1, inherit: 1, exclude_kernel: 1, mmap: 1, comm: 1, freq: 1, enable_on_exec: 1, task: 1, sample_id_all: 1, exclude_guest: 1, mmap2: 1, comm_exec: 1 [acme@jouet linux]$ And if the user turns on verbose mode, an explanation will appear: [acme@jouet linux]$ perf record -v usleep 1 Warning: kernel.perf_event_paranoid=2, trying to fall back to excluding kernel samples mmap size 528384B [ perf record: Woken up 1 times to write data ] Looking at the vmlinux_path (8 entries long) Using /lib/modules/4.6.0-rc7+/build/vmlinux for symbols [ perf record: Captured and wrote 0.016 MB perf.data (7 samples) ] [acme@jouet linux]$ [1] 0161028b7c8a ("perf/core: Change the default paranoia level to 2") Reported-by: Ingo Molnar <mingo@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: David Ahern <dsahern@gmail.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kees Cook <keescook@chromium.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Wang Nan <wangnan0@huawei.com> Link: http://lkml.kernel.org/n/tip-b20jmx4dxt5hpaa9t2rroi0o@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2016-05-12drm/amdgpu: fix DP mode validationAlex Deucher
Switch the order of the loops to walk the rates on the top so we exhaust all DP 1.1 rate/lane combinations before trying DP 1.2 rate/lane combos. This avoids selecting rates that are supported by the monitor, but not the connector leading to valid modes getting rejected. bug: https://bugs.freedesktop.org/show_bug.cgi?id=95206 Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org
2016-05-12drm/radeon: fix DP mode validationAlex Deucher
Switch the order of the loops to walk the rates on the top so we exhaust all DP 1.1 rate/lane combinations before trying DP 1.2 rate/lane combos. This avoids selecting rates that are supported by the monitor, but not the connector leading to valid modes getting rejected. bug: https://bugs.freedesktop.org/show_bug.cgi?id=95206 Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org
2016-05-12perf evsel: Improve EPERM error handling in open_strerror()Arnaldo Carvalho de Melo
We were showing a hardcoded default value for the kernel.perf_event_paranoid sysctl, now that it became more paranoid (1 -> 2 [1]), this would need to be updated, instead show the current value: [acme@jouet linux]$ perf record ls Error: You may not have permission to collect stats. Consider tweaking /proc/sys/kernel/perf_event_paranoid, which controls use of the performance events system by unprivileged users (without CAP_SYS_ADMIN). The current value is 2: -1: Allow use of (almost) all events by all users >= 0: Disallow raw tracepoint access by users without CAP_IOC_LOCK >= 1: Disallow CPU event access by users without CAP_SYS_ADMIN >= 2: Disallow kernel profiling by users without CAP_SYS_ADMIN [acme@jouet linux]$ [1] 0161028b7c8a ("perf/core: Change the default paranoia level to 2") Reported-by: Ingo Molnar <mingo@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: David Ahern <dsahern@gmail.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kees Cook <keescook@chromium.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Wang Nan <wangnan0@huawei.com> Link: http://lkml.kernel.org/n/tip-0gc4rdpg8d025r5not8s8028@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2016-05-12Merge tag 'pinctrl-v4.6-4' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl Pull pinctrl fix from Linus Walleij: "A single last pin control fix for v4.6. t's tagged for stable and only hits a single driver with two added lines so should be safe. Tested in linux-next. - The pull up/down logic for the AT91 PIO4 controller was tilted: we need to mask the reverse pull when unmasking a pull direction. Setting both pull up & pull down is illegal and makes no sense" * tag 'pinctrl-v4.6-4' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl: pinctrl: at91-pio4: fix pull-up/down logic
2016-05-12gtp: put back reference to netns when not required anymorePablo Neira
This patch fixes a netns leak. Fixes: 93edb8c7f94f ("gtp: reload GTPv1 header after pskb_may_pull()") Reported-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-05-12workqueue: fix rebind bound workers warningWanpeng Li
------------[ cut here ]------------ WARNING: CPU: 0 PID: 16 at kernel/workqueue.c:4559 rebind_workers+0x1c0/0x1d0 Modules linked in: CPU: 0 PID: 16 Comm: cpuhp/0 Not tainted 4.6.0-rc4+ #31 Hardware name: IBM IBM System x3550 M4 Server -[7914IUW]-/00Y8603, BIOS -[D7E128FUS-1.40]- 07/23/2013 0000000000000000 ffff881037babb58 ffffffff8139d885 0000000000000010 0000000000000000 0000000000000000 0000000000000000 ffff881037babba8 ffffffff8108505d ffff881037ba0000 000011cf3e7d6e60 0000000000000046 Call Trace: dump_stack+0x89/0xd4 __warn+0xfd/0x120 warn_slowpath_null+0x1d/0x20 rebind_workers+0x1c0/0x1d0 workqueue_cpu_up_callback+0xf5/0x1d0 notifier_call_chain+0x64/0x90 ? trace_hardirqs_on_caller+0xf2/0x220 ? notify_prepare+0x80/0x80 __raw_notifier_call_chain+0xe/0x10 __cpu_notify+0x35/0x50 notify_down_prepare+0x5e/0x80 ? notify_prepare+0x80/0x80 cpuhp_invoke_callback+0x73/0x330 ? __schedule+0x33e/0x8a0 cpuhp_down_callbacks+0x51/0xc0 cpuhp_thread_fun+0xc1/0xf0 smpboot_thread_fn+0x159/0x2a0 ? smpboot_create_threads+0x80/0x80 kthread+0xef/0x110 ? wait_for_completion+0xf0/0x120 ? schedule_tail+0x35/0xf0 ret_from_fork+0x22/0x50 ? __init_kthread_worker+0x70/0x70 ---[ end trace eb12ae47d2382d8f ]--- notify_down_prepare: attempt to take down CPU 0 failed This bug can be reproduced by below config w/ nohz_full= all cpus: CONFIG_BOOTPARAM_HOTPLUG_CPU0=y CONFIG_DEBUG_HOTPLUG_CPU0=y CONFIG_NO_HZ_FULL=y As Thomas pointed out: | If a down prepare callback fails, then DOWN_FAILED is invoked for all | callbacks which have successfully executed DOWN_PREPARE. | | But, workqueue has actually two notifiers. One which handles | UP/DOWN_FAILED/ONLINE and one which handles DOWN_PREPARE. | | Now look at the priorities of those callbacks: | | CPU_PRI_WORKQUEUE_UP = 5 | CPU_PRI_WORKQUEUE_DOWN = -5 | | So the call order on DOWN_PREPARE is: | | CB 1 | CB ... | CB workqueue_up() -> Ignores DOWN_PREPARE | CB ... | CB X ---> Fails | | So we call up to CB X with DOWN_FAILED | | CB 1 | CB ... | CB workqueue_up() -> Handles DOWN_FAILED | CB ... | CB X-1 | | So the problem is that the workqueue stuff handles DOWN_FAILED in the up | callback, while it should do it in the down callback. Which is not a good idea | either because it wants to be called early on rollback... | | Brilliant stuff, isn't it? The hotplug rework will solve this problem because | the callbacks become symetric, but for the existing mess, we need some | workaround in the workqueue code. The boot CPU handles housekeeping duty(unbound timers, workqueues, timekeeping, ...) on behalf of full dynticks CPUs. It must remain online when nohz full is enabled. There is a priority set to every notifier_blocks: workqueue_cpu_up > tick_nohz_cpu_down > workqueue_cpu_down So tick_nohz_cpu_down callback failed when down prepare cpu 0, and notifier_blocks behind tick_nohz_cpu_down will not be called any more, which leads to workers are actually not unbound. Then hotplug state machine will fallback to undo and online cpu 0 again. Workers will be rebound unconditionally even if they are not unbound and trigger the warning in this progress. This patch fix it by catching !DISASSOCIATED to avoid rebind bound workers. Cc: Tejun Heo <tj@kernel.org> Cc: Lai Jiangshan <jiangshanlai@gmail.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Frédéric Weisbecker <fweisbec@gmail.com> Cc: stable@vger.kernel.org Suggested-by: Lai Jiangshan <jiangshanlai@gmail.com> Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com>
2016-05-12Merge tag 'mac80211-next-for-davem-2016-05-12' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211-next Johannes Berg says: ==================== Some more work for 4.7, notably: * completion and fixups of nla_put_64_64bit() work * remove a/b/g/n from wext nickname to avoid confusion with 11ac (which wouldn't even fit fully there due to string length restrictions) along with some other minor changes/cleanups. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2016-05-12Merge tag 'at91-fixes2' of ↵Arnd Bergmann
git://git.kernel.org/pub/scm/linux/kernel/git/nferre/linux-at91 into fixes Merge "Second AT91 fix PR for 4.6" from Nicolas Ferre: - fix a regression on the clock subsystem while switching to syscon/regmap due to a stricter check of the register map. * tag 'at91-fixes2' of git://git.kernel.org/pub/scm/linux/kernel/git/nferre/linux-at91: ARM: dts: at91: sam9x5: Fix the memory range assigned to the PMC
2016-05-12cgroup: fix compile warningFelipe Balbi
commit 4f41fc59620f ("cgroup, kernfs: make mountinfo show properly scoped path for cgroup namespaces") added the following compile warning: kernel/cgroup.c: In function ‘cgroup_show_path’: kernel/cgroup.c:1634:15: warning: unused variable ‘ret’ [-Wunused-variable] int len = 0, ret = 0; ^ fix it. Fixes: 4f41fc59620f ("cgroup, kernfs: make mountinfo show properly scoped path for cgroup namespaces") Signed-off-by: Felipe Balbi <felipe.balbi@linux.intel.com> Signed-off-by: Tejun Heo <tj@kernel.org>
2016-05-12kernfs: kernfs_sop_show_path: don't return 0 after seq_dentry callSerge E. Hallyn
Our caller expects 0 on success, not >0. This fixes a bug in the patch cgroup, kernfs: make mountinfo show properly scoped path for cgroup namespaces where /sys does not show up in mountinfo, breaking criu. Thanks for catching this, Andrei. Reported-by: Andrei Vagin <avagin@gmail.com> Signed-off-by: Serge Hallyn <serge.hallyn@ubuntu.com> Signed-off-by: Tejun Heo <tj@kernel.org>
2016-05-12tools lib traceevent: Do not reassign parg after collapse_tree()Steven Rostedt
At the end of process_filter(), collapse_tree() was changed to update the parg parameter, but the reassignment after the call wasn't removed. What happens is that the "current_op" gets modified and freed and parg is assigned to the new allocated argument. But after the call to collapse_tree(), parg is assigned again to the just freed "current_op", and this causes the tool to crash. The current_op variable must also be assigned to NULL in case of error, otherwise it will cause it to be free()ed twice. Signed-off-by: Steven Rostedt <rostedt@goodmis.org> Acked-by: Namhyung Kim <namhyung@kernel.org> Cc: stable@vger.kernel.org # 3.14+ Fixes: 42d6194d133c ("tools lib traceevent: Refactor process_filter()") Link: http://lkml.kernel.org/r/20160511150936.678c18a1@gandalf.local.home Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2016-05-12perf probe: Check if dwarf_getlocations() is availableArnaldo Carvalho de Melo
If not, tell the user that: config/Makefile:273: Old libdw.h, finding variables at given 'perf probe' point will not work, install elfutils-devel/libdw-dev >= 0.157 And return -ENOTSUPP in die_get_var_range(), failing features that need it, like the one pointed out above. This fixes the build on older systems, such as Ubuntu 12.04.5. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Vinson Lee <vlee@freedesktop.org> Cc: Wang Nan <wangnan0@huawei.com> Link: http://lkml.kernel.org/n/tip-9l7luqkq4gfnx7vrklkq4obs@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2016-05-12perf dwarf: Guard !x86_64 definitions under #ifdef else clauseArnaldo Carvalho de Melo
To fix the build on Fedora Rawhide (gcc 6.0.0 20160311 (Red Hat 6.0.0-0.17): CC /tmp/build/perf/arch/x86/util/dwarf-regs.o arch/x86/util/dwarf-regs.c:66:36: error: 'x86_32_regoffset_table' defined but not used [-Werror=unused-const-variable=] static const struct pt_regs_offset x86_32_regoffset_table[] = { ^~~~~~~~~~~~~~~~~~~~~~ cc1: all warnings being treated as errors Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Wang Nan <wangnan0@huawei.com> Link: http://lkml.kernel.org/n/tip-fghuksc1u8ln82bof4lwcj0o@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2016-05-12perf tools: Use readdir() instead of deprecated readdir_r()Arnaldo Carvalho de Melo
The readdir() function is thread safe as long as just one thread uses a DIR, which is the case when parsing tracepoint event definitions, to avoid breaking the build with glibc-2.23.90 (upcoming 2.24), use it instead of readdir_r(). See: http://man7.org/linux/man-pages/man3/readdir.3.html "However, in modern implementations (including the glibc implementation), concurrent calls to readdir() that specify different directory streams are thread-safe. In cases where multiple threads must read from the same directory stream, using readdir() with external synchronization is still preferable to the use of the deprecated readdir_r(3) function." Noticed while building on a Fedora Rawhide docker container. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Wang Nan <wangnan0@huawei.com> Link: http://lkml.kernel.org/n/tip-wddn49r6bz6wq4ee3dxbl7lo@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2016-05-12perf thread_map: Use readdir() instead of deprecated readdir_r()Arnaldo Carvalho de Melo
The readdir() function is thread safe as long as just one thread uses a DIR, which is the case in thread_map, so, to avoid breaking the build with glibc-2.23.90 (upcoming 2.24), use it instead of readdir_r(). See: http://man7.org/linux/man-pages/man3/readdir.3.html "However, in modern implementations (including the glibc implementation), concurrent calls to readdir() that specify different directory streams are thread-safe. In cases where multiple threads must read from the same directory stream, using readdir() with external synchronization is still preferable to the use of the deprecated readdir_r(3) function." Noticed while building on a Fedora Rawhide docker container. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Wang Nan <wangnan0@huawei.com> Link: http://lkml.kernel.org/n/tip-del8h2a0f40z75j4r42l96l0@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2016-05-12perf script: Use readdir() instead of deprecated readdir_r()Arnaldo Carvalho de Melo
The readdir() function is thread safe as long as just one thread uses a DIR, which is the case in 'perf script', so, to avoid breaking the build with glibc-2.23.90 (upcoming 2.24), use it instead of readdir_r(). See: http://man7.org/linux/man-pages/man3/readdir.3.html "However, in modern implementations (including the glibc implementation), concurrent calls to readdir() that specify different directory streams are thread-safe. In cases where multiple threads must read from the same directory stream, using readdir() with external synchronization is still preferable to the use of the deprecated readdir_r(3) function." Noticed while building on a Fedora Rawhide docker container. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Wang Nan <wangnan0@huawei.com> Link: http://lkml.kernel.org/n/tip-mt3xz7n2hl49ni2vx7kuq74g@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2016-05-12perf tools: Use readdir() instead of deprecated readdir_r()Arnaldo Carvalho de Melo
The readdir() function is thread safe as long as just one thread uses a DIR, which is the case when synthesizing events for pre-existing threads by traversing /proc, so, to avoid breaking the build with glibc-2.23.90 (upcoming 2.24), use it instead of readdir_r(). See: http://man7.org/linux/man-pages/man3/readdir.3.html "However, in modern implementations (including the glibc implementation), concurrent calls to readdir() that specify different directory streams are thread-safe. In cases where multiple threads must read from the same directory stream, using readdir() with external synchronization is still preferable to the use of the deprecated readdir_r(3) function." Noticed while building on a Fedora Rawhide docker container. CC /tmp/build/perf/util/event.o util/event.c: In function '__event__synthesize_thread': util/event.c:466:2: error: 'readdir_r' is deprecated [-Werror=deprecated-declarations] while (!readdir_r(tasks, &dirent, &next) && next) { ^~~~~ In file included from /usr/include/features.h:368:0, from /usr/include/stdint.h:25, from /usr/lib/gcc/x86_64-redhat-linux/6.0.0/include/stdint.h:9, from /git/linux/tools/include/linux/types.h:6, from util/event.c:1: /usr/include/dirent.h:189:12: note: declared here Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Wang Nan <wangnan0@huawei.com> Link: http://lkml.kernel.org/n/tip-i1vj7nyjp2p750rirxgrfd3c@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2016-05-12arm64: do not enforce strict 16 byte alignment to stack pointerColin Ian King
copy_thread should not be enforcing 16 byte aligment and returning -EINVAL. Other architectures trap misaligned stack access with SIGBUS so arm64 should follow this convention, so remove the strict enforcement check. For example, currently clone(2) fails with -EINVAL when passing a misaligned stack and this gives little clue to what is wrong. Instead, it is arguable that a SIGBUS on the fist access to a misaligned stack allows one to figure out that it is a misaligned stack issue rather than trying to figure out why an unconventional (and undocumented) -EINVAL is being returned. Acked-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
2016-05-12perf/core: Disable the event on a truncated AUX recordAlexander Shishkin
When the PMU driver reports a truncated AUX record, it effectively means that there is no more usable room in the event's AUX buffer (even though there may still be some room, so that perf_aux_output_begin() doesn't take action). At this point the consumer still has to be woken up and the event has to be disabled, otherwise the event will just keep spinning between perf_aux_output_begin() and perf_aux_output_end() until its context gets unscheduled. Again, for cpu-wide events this means never, so once in this condition, they will be forever losing data. Fix this by disabling the event and waking up the consumer in case of a truncated AUX record. Reported-by: Markus Metzger <markus.t.metzger@intel.com> Signed-off-by: Alexander Shishkin <alexander.shishkin@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: <stable@vger.kernel.org> Cc: Arnaldo Carvalho de Melo <acme@infradead.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Cc: vince@deater.net Link: http://lkml.kernel.org/r/1462886313-13660-3-git-send-email-alexander.shishkin@linux.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-05-12perf/x86/intel/pt: Generate PMI in the STOP region as wellAlexander Shishkin
Currently, the PT driver always sets the PMI bit one region (page) before the STOP region so that we can wake up the consumer before we run out of room in the buffer and have to disable the event. However, we also need an interrupt in the last output region, so that we actually get to disable the event (if no more room from new data is available at that point), otherwise hardware just quietly refuses to start, but the event is scheduled in and we end up losing trace data till the event gets removed. For a cpu-wide event it is even worse since there may not be any re-scheduling at all and no chance for the ring buffer code to notice that its buffer is filled up and the event needs to be disabled (so that the consumer can re-enable it when it finishes reading the data out). In other words, all the trace data will be lost after the buffer gets filled up. This patch makes PT also generate a PMI when the last output region is full. Reported-by: Markus Metzger <markus.t.metzger@intel.com> Signed-off-by: Alexander Shishkin <alexander.shishkin@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: <stable@vger.kernel.org> Cc: Arnaldo Carvalho de Melo <acme@infradead.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Cc: vince@deater.net Link: http://lkml.kernel.org/r/1462886313-13660-2-git-send-email-alexander.shishkin@linux.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-05-12x86: Use compat version for preadv2 and pwritev2Dmitry V. Levin
Similar to preadv and pwritev, preadv2 and pwritev2 need compat entries in the 32-bit syscall table. This bug was found by strace test suite. Fixes: 4babf2c5efb7 ("x86: wire up preadv2 and pwritev2") Signed-off-by: Dmitry V. Levin <ldv@altlinux.org> Acked-by: Christoph Hellwig <hch@lst.de> Link: http://lkml.kernel.org/r/20160511084817.GA29823@altlinux.org Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2016-05-12KEYS: Fix ASN.1 indefinite length object parsingDavid Howells
This fixes CVE-2016-0758. In the ASN.1 decoder, when the length field of an ASN.1 value is extracted, it isn't validated against the remaining amount of data before being added to the cursor. With a sufficiently large size indicated, the check: datalen - dp < 2 may then fail due to integer overflow. Fix this by checking the length indicated against the amount of remaining data in both places a definite length is determined. Whilst we're at it, make the following changes: (1) Check the maximum size of extended length does not exceed the capacity of the variable it's being stored in (len) rather than the type that variable is assumed to be (size_t). (2) Compare the EOC tag to the symbolic constant ASN1_EOC rather than the integer 0. (3) To reduce confusion, move the initialisation of len outside of: for (len = 0; n > 0; n--) { since it doesn't have anything to do with the loop counter n. Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: Mimi Zohar <zohar@linux.vnet.ibm.com> Acked-by: David Woodhouse <David.Woodhouse@intel.com> Acked-by: Peter Jones <pjones@redhat.com>
2016-05-12mac80211: allow software PS-Poll/U-APSD with AP_LINK_PSJohannes Berg
When using RSS, frames might not be processed in the correct order, and thus AP_LINK_PS must be used; most likely with firmware keeping track of the powersave state, this is the case in iwlwifi now. In this case, the driver can use ieee80211_sta_ps_transition() to still have mac80211 manage powersave buffering. However, for U-APSD and PS-Poll this isn't sufficient. If the device can't manage that entirely on its own, mac80211's code should be used. To allow this, export two functions: ieee80211_sta_uapsd_trigger() and ieee80211_sta_pspoll(). Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2016-05-12cfg80211: make wdev_list accessible to driversJohannes Berg
There's no harm in having drivers read the list, since they can use RCU protection or RTNL locking; allow this to not require each and every driver to also implement its own bookkeeping. Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2016-05-12cfg80211: remove erroneous commentJohannes Berg
The devlist_mtx mutex was removed about two years ago, in favour of just using RTNL/RCU protection. Remove the comment still referencing it. Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2016-05-12cfg80211: allow finding vendor with OUI without specifying the OUI typeEmmanuel Grumbach
This allows finding vendor IE from a specific vendor. Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com> Signed-off-by: Luca Coelho <luciano.coelho@intel.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2016-05-12mac80211: allow same PN for AMSDU sub-framesSara Sharon
Some hardware (iwlwifi an example) de-aggregate AMSDUs and copy the IV as is to the generated MPDUs, so the same PN appears in multiple packets without being a replay attack. Allow driver to explicitly indicate that a frame is allowed to have the same PN as the previous frame. Signed-off-by: Sara Sharon <sara.sharon@intel.com> Signed-off-by: Luca Coelho <luciano.coelho@intel.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2016-05-12mac80211: remove disconnected APs from BSS tableDavid Spinadel
In some cases, after a sudden AP disappearing and reconnection to another AP in the same ESS, user space gets the old AP in scan results (cached). User space may decide to roam to that old AP which will cause a disconnection and longer recovery. Remove APs that are probably out of range from BSS table. Signed-off-by: David Spinadel <david.spinadel@intel.com> Signed-off-by: Luca Coelho <luciano.coelho@intel.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>